Thanks for analyzing the system bandwidth and pointing out the documentation deficiency. We will beef up the section on the link in the datasheet.Getting data transfer right takes careful design to ensure that there are no bottlenecks in the system and/or program. We are still working on optimizing the FPGA logic and software architecture to boost performance.
In the meantime, some pointers about the Epiphany link hardware.
-The elink has an "automatic" burst mode that only kicks in for 64bit data streams of sequential transactions with the following stride, e.g 0x0, 0x8, 0x10, 0x18,etc
-In this burst mode the elink transfer stream becomes: 32 bit address, 64 bit data, 64 bit data,64 bit data,....getting us very close to the peak theoretical bandwidth for large buffers.
-In all other transfer cases (read, byte,short,word, non-sequential addresses), the transfers are 104 bits (of which only 8-32 bits are "useful" link bandwidth).
-To maximize bandwidth, the cores should access off chip resources through the link in an orderly fashion (not randomly). This is similar to DRAM access constraints one would usually employ to avoid page thrashing. Still, we wish we would have put something in the link to make this link burst mode more automatic..next version of the chip
This will be documented in the datasheet...
Andreas