Over in the "Single Board Computer" section of the Element14 community forum, we've been measuring the Ethernet throughput of as many ARM Linux boards as we can lay our hands on, and we've compiled a pretty interesting table of the results. We're using the single (and very well respected) measuring utility in all cases, in order to guarantee consistency. The table is maintained in the leading article of the thread , which also explains how to do the measurements.
When we get our own Parallella boards then of course we'll put them under test ourselves, but in the meantime, since many of you already have the board, perhaps a few of you would like to follow the instructions and give us an early set of measurements to add to the table? I'll gladly record in the table any results posted either here in this thread or on the Element14 forum.
A lot of applications use the common architectural pattern of feeding data to a machine through Ethernet, passing it to an on-board computation engine or accelerator for processing (CPUs, GPUs, FPGA, or in our case here, Epiphany), and then passing it back out over Ethernet again. To make the most of the available computational resources, it's important to know the I/O bottlenecks to avoid hitting them and to maximize the time spent computing, and to do that those bottlenecks have to be measured first. It's important generally, but especially so when a board like Parallella is designed as a strong computational engine
The maximum read and write throughputs to the host over Ethernet are only two of several limiting I/O parameters to be measured, and we'll have to quantify the others too in due course --- direct memory copy between host and Epiphany, DMA throughput and Epiphany inter-core throughput will very often be of interest too.
For now, if anyone wishes to run the measurements described in the link above, it'll be very interesting to read your results. Note that the ARM boards with gigabit Ethernet measured so far have had trouble achieving high utilization of the link, rarely reaching half of the maximum theoretical 941.482 Mbps TCP payload throughput with TCP TimeStamps enabled and no jumbo frames. In contrast, x86 machines frequently achieve very close to the maximum theoretical throughput. The Zynq is an unknown quantity to us at present, but hopefully it'll be better than the cheaper SoCs tested. Whether better or poorer, its limiting throughput needs to be known anyway.
Happy measuring!
Morgaine.