Wes Peters writes:
> Amancio Hasty wrote:
> >
> > > 200 Mb/s = 25 MB/s, which seems a little low, but is within the realm of
> > > what I would expect.
> >
> > I think the system should be able to support at least 70MB/s at least I do over here
> > with a bt848 video capture board capturing 640x480x4 at 30 frames per second
> > and then displaying the frames on video display card 8)
>
> An article in IEEE Computer magazine last summer reported achieving
> 320 Mb/s throughput with Myricom Myrinet boards on FreeBSD. I've
> seen this number batted around industry publications like Network
> World a number of times also. That would seem to require only a 10
> Mhz clock with a 32-bit bus bandwidth; is there really this much
> overhead in the PCI transactions?
Its possible to do far better with Myrinet hardware.
I haven't read the article in question, but I suspect that they're
using the Myricom supplied firmware. If so, the overhead is not the
Myrinet PCI adaptor, nor is it the PCI bus, nor is it in FreeBSD,
rather its in the firmware running on the card. The Myricom MyriApi
firmware is overly complex and quite slow. Their API also forces one
to do many memory-mapped reads from the adaptor. As you can imagine,
doing reads across the PCI bus is painfully slow.
Using much more efficient firmware (the Duke Trapeze MCP) we're able
to get 660Mb/s between 2 450Mhz PIIs (Asus P2B) using a standard
FreeBSD-4.0 IP stack & a very large MTU (57k):
<9:55am>muffin/gallatin:api>netperf -Hgrits-my
TCP STREAM TEST to grits-my : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
1048576 1048576 1048576 10.01 660.97
Using local zero-copy modifications on both the send & receive side,
we see better than 800Mb/s:
<9:57am>muffin/gallatin:api>netperf -Hgrits-my
TCP STREAM TEST to grits-my : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
524288 524288 524288 10.01 808.61
This 101MB/sec is still far below the measured DMA bandwidth of the
LANai4 in a 440BX motherboard (over 130MB/sec for both reads and
writes). Most of the difference between the theoretical 132MB/sec max
bandwidth and our 100MB/sec is due to the fact that the LANai4 has a
slow CPU and terrible memory bandwidth. The new LANai7's will have a
much faster CPU, and much better memory bandwidth (as well as a DMA
engine which can do IP checksum offloading). We expect to see much
better performance from these boards.
Given that the Tigon-II adaptors have 2 Mips R4000 CPU's and can do
checksum offloading, I expect wonderful things from them as well.
I've been playing with the latest revision Bill's tigon driver (where
he's found some chip settings which optimize DMA performance) and have
seen UDP xmit performance of 850Mb/s.
Cheers,
Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
Duke University Email: gallatin@cs.duke.edu
Department of Computer Science Phone: (919) 660-6590
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message