Page 3 of 3

Re: Memory transfer benchmark

PostPosted: Wed Jul 03, 2013 12:44 pm
by tnt
shodruk wrote:Is it possible the host or the eCore kicks the DMA from ERAM to SRAM?


Huh ? Can you rephrase the question ?

Re: Memory transfer benchmark

PostPosted: Wed Jul 03, 2013 2:00 pm
by shodruk
I'm sorry, English is difficult for me... :)
again,
Is it possible to let the host (or the eCore) kicks off the DMA from ERAM to SRAM?

Re: Memory transfer benchmark

PostPosted: Sat Jul 20, 2013 6:06 am
by mipt98
Any chance you could share your transfer bandwidth benchmark code?
-Ivan

Re: Memory transfer benchmark

PostPosted: Thu Oct 24, 2013 9:39 am
by jimmystone
Could you help upload your test code, and how about memory access latency.
ysapir wrote:Here's the output of my memory access speed test, for E64G4:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   17.12 MBps
Host <- SRAM: Read speed  =   20.93 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.83 MBps
Host <- ERAM: Read speed  =  136.66 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1949.88 MBps   clocks = 2404
Core <- SRAM: Read speed  =  480.82 MBps   clocks = 9749
Core -> ERAM: Write speed =  304.05 MBps   clocks = 15417
Core <- ERAM: Read speed  =  153.31 MBps   clocks = 30576



and here's for E16G3:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   14.62 MBps
Host <- SRAM: Read speed  =   17.85 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.71 MBps
Host <- ERAM: Read speed  =  135.42 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps   clocks = 3645
Core <- SRAM: Read speed  =  406.80 MBps   clocks = 11523
Core -> ERAM: Write speed =  235.88 MBps   clocks = 19872
Core <- ERAM: Read speed  =   85.99 MBps   clocks = 54514

Re: Memory transfer benchmark

PostPosted: Thu Oct 24, 2013 9:06 pm
by mhonman
I'd imagine those results are from e_dma_copy, copying about 6KB of data (DMA doesn't need to read instructions, so makes the best use of available memory bandwidth).

Have you had a look through the Adapteva and Embecosm repositories on Github? It's a goldmine! I haven't specifically seen this example there, but you may be in luck.

Regarding latency, there are effectively 3 memory tiers - internal SRAM, other cores' SRAM, and external DRAM. Other than accesses to internal SRAM the memory reads and writes are routed via the on-chip mesh metwork, and external memory accesses go via an off-chip interface, via the FPGA, to the DRAM chip.

Internal RAM is IIRC single-cycle for read and write, but for off-core accesses the latency increases with the number of hops across the mesh - see the documentation for details. External memory latency is going to be affected by a combination of mesh latency, DRAM speed (+ effects of contention with the host program), and speed of the interface between Epiphany and FPGA. Given the number of variables, if you wanted to know you'd have to measure it!* But the consensus seems to be that external RAM reads are a major bottle-neck.

I'm not a hardware guy so may have got the wrong end of the stick here, but if you study the documentation I think you'll get most of the answers you're looking for.

* (possible measurement approach: start a single-word DMA transfer and count the number of cycles until the completion interrupt. There is a DMA setup overhead but this can be factored out by measuring the time taken for a word to be read from an adjacent core).

Re: Memory transfer benchmark

PostPosted: Wed Dec 03, 2014 8:23 pm
by grzeskob
I would like to refresh the topic and ask question about Core -> SRAM speed with DMA.

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps clocks = 3645

Why do we get 1,29 GBps, if max sustained data transfer for DMA is 8GBps ?
Epiphany Architecture Reference REV 14.03.11
The DMA engine works at the same clock frequency as the CPU and can
transfer one 64-bit double word per clock cycle, enabling a sustained data transfer rate of
8GB/sec.


cMesh: Used for write transactions destined for an on-chip mesh node. The cMesh network
connects a mesh node to all four of its neighbors and has a maximum bidirectional
throughput of 8 bytes/cycle in each of the four routing directions.


Please correct me if I am wrong - DMA transfer will be limited by cmesh (max. onedirection throughput will be 4 bytes/cycle). But even with this constrain I still get something around (600 MHz * 4 bytes/cycle) = 2,4 GBps ?