Parallella Community

by **shodruk** » Thu Jun 20, 2013 6:18 am

Isn't it about eMesh? I'm talking about eLink.
AFAICS eLink only has two(i/o) 8-bit serial link with no address bus...

by **aolofsson** » Thu Jun 20, 2013 4:11 pm

Is this what you are looking for? (data from Yaniv)

Using memcpy() @ 600MHz:

Core -> SRAM: Write speed = 504.09 MBps clocks = 9299
Core <- SRAM: Read speed = 115.65 MBps clocks = 40531
Core -> ERAM: Write speed = 142.99 MBps clocks = 32782
Core <- ERAM: Read speed = 4.19 MBps clocks = 1119132

Using dma_copy():

Core -> SRAM: Write speed = 1949.88 MBps clocks = 2404
Core <- SRAM: Read speed = 480.82 MBps clocks = 9749
Core -> ERAM: Write speed = 493.21 MBps clocks = 9504
Core <- ERAM: Read speed = 154.52 MBps clocks = 30336

by **shodruk** » Fri Jun 21, 2013 9:00 am

Significant improvement! (especially at ERAM Write speed)

But now I'm a little more confused.
What was the cause of that?
What I want to know is the calculation formula of the theoretical data bandwidth of eLink.
I want to know the correct specifications of Epiphany, because without it we can't determine where/how to optimize for Epiphany.
I read the reference manual, datasheet, hdl source code, but I felt the description of eLink is not very sufficient.

I have some ideas of memory optimization, so I need to know the detail of eLink.
(Using the specified core as a memory management unit, complex gather/scatter, prefetch, assign/feed data to another core, etc.)

by **aolofsson** » Fri Jun 21, 2013 1:15 pm

Thanks for analyzing the system bandwidth and pointing out the documentation deficiency. We will beef up the section on the link in the datasheet.Getting data transfer right takes careful design to ensure that there are no bottlenecks in the system and/or program. We are still working on optimizing the FPGA logic and software architecture to boost performance.

In the meantime, some pointers about the Epiphany link hardware.

-The elink has an "automatic" burst mode that only kicks in for 64bit data streams of sequential transactions with the following stride, e.g 0x0, 0x8, 0x10, 0x18,etc
-In this burst mode the elink transfer stream becomes: 32 bit address, 64 bit data, 64 bit data,64 bit data,....getting us very close to the peak theoretical bandwidth for large buffers.
-In all other transfer cases (read, byte,short,word, non-sequential addresses), the transfers are 104 bits (of which only 8-32 bits are "useful" link bandwidth).
-To maximize bandwidth, the cores should access off chip resources through the link in an orderly fashion (not randomly). This is similar to DRAM access constraints one would usually employ to avoid page thrashing. Still, we wish we would have put something in the link to make this link burst mode more automatic..next version of the chip

This will be documented in the datasheet...

Andreas

by **shodruk** » Sat Jun 22, 2013 5:31 am

by **ticso** » Wed Jul 03, 2013 8:18 am

What options are available to copy from host to epiphany memory without doing read requests from epiphany?

So far I've seen memcopy on ARM, which was benchmarked in this thread.
Was it generic ARM memcopy or NEON enhanced code?

I've heared about DMA - has the ARM or current FPGA code generic transfer DMA?
Another thread however makes me believe there is no such thing right now and needs to be implemented in FPGA.

by **shodruk** » Wed Jul 03, 2013 9:44 am

Maybe this page could help you, but it seems not very easy.

by **tnt** » Wed Jul 03, 2013 9:55 am

The bad performance when writing data from the ARM to the Epiphany is most likely not due to the ARM itself but just because in this datapath you're going through the GP AXI slave interface which is really not meant for high performance transfer. I'm not sure that using the DMA would help all that much there.

The best option is a DMA code on a HP AXI port that would read data from DDR and write directly to the e-link, skipping a lot of layers of the interconnect.

by **shodruk** » Wed Jul 03, 2013 12:21 pm

Is it possible the host or the eCore kicks the DMA from ERAM to SRAM?

Parallella Community

Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Re: Memory transfer benchmark

Who is online