Memory transfer benchmark

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Re: Memory transfer benchmark

Postby tnt » Wed Jul 03, 2013 12:44 pm

shodruk wrote:Is it possible the host or the eCore kicks the DMA from ERAM to SRAM?


Huh ? Can you rephrase the question ?
tnt
 
Posts: 408
Joined: Mon Dec 17, 2012 3:21 am

Re: Memory transfer benchmark

Postby shodruk » Wed Jul 03, 2013 2:00 pm

I'm sorry, English is difficult for me... :)
again,
Is it possible to let the host (or the eCore) kicks off the DMA from ERAM to SRAM?
Shodruky
shodruk
 
Posts: 464
Joined: Mon Apr 08, 2013 7:03 pm

Re: Memory transfer benchmark

Postby mipt98 » Sat Jul 20, 2013 6:06 am

Any chance you could share your transfer bandwidth benchmark code?
-Ivan
mipt98
 
Posts: 17
Joined: Sat May 25, 2013 12:39 am

Re: Memory transfer benchmark

Postby jimmystone » Thu Oct 24, 2013 9:39 am

Could you help upload your test code, and how about memory access latency.
ysapir wrote:Here's the output of my memory access speed test, for E64G4:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   17.12 MBps
Host <- SRAM: Read speed  =   20.93 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.83 MBps
Host <- ERAM: Read speed  =  136.66 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1949.88 MBps   clocks = 2404
Core <- SRAM: Read speed  =  480.82 MBps   clocks = 9749
Core -> ERAM: Write speed =  304.05 MBps   clocks = 15417
Core <- ERAM: Read speed  =  153.31 MBps   clocks = 30576



and here's for E16G3:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   14.62 MBps
Host <- SRAM: Read speed  =   17.85 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.71 MBps
Host <- ERAM: Read speed  =  135.42 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps   clocks = 3645
Core <- SRAM: Read speed  =  406.80 MBps   clocks = 11523
Core -> ERAM: Write speed =  235.88 MBps   clocks = 19872
Core <- ERAM: Read speed  =   85.99 MBps   clocks = 54514
jimmystone
 
Posts: 48
Joined: Tue Sep 24, 2013 12:09 pm

Re: Memory transfer benchmark

Postby mhonman » Thu Oct 24, 2013 9:06 pm

I'd imagine those results are from e_dma_copy, copying about 6KB of data (DMA doesn't need to read instructions, so makes the best use of available memory bandwidth).

Have you had a look through the Adapteva and Embecosm repositories on Github? It's a goldmine! I haven't specifically seen this example there, but you may be in luck.

Regarding latency, there are effectively 3 memory tiers - internal SRAM, other cores' SRAM, and external DRAM. Other than accesses to internal SRAM the memory reads and writes are routed via the on-chip mesh metwork, and external memory accesses go via an off-chip interface, via the FPGA, to the DRAM chip.

Internal RAM is IIRC single-cycle for read and write, but for off-core accesses the latency increases with the number of hops across the mesh - see the documentation for details. External memory latency is going to be affected by a combination of mesh latency, DRAM speed (+ effects of contention with the host program), and speed of the interface between Epiphany and FPGA. Given the number of variables, if you wanted to know you'd have to measure it!* But the consensus seems to be that external RAM reads are a major bottle-neck.

I'm not a hardware guy so may have got the wrong end of the stick here, but if you study the documentation I think you'll get most of the answers you're looking for.

* (possible measurement approach: start a single-word DMA transfer and count the number of cycles until the completion interrupt. There is a DMA setup overhead but this can be factored out by measuring the time taken for a word to be read from an adjacent core).
mhonman
 
Posts: 112
Joined: Thu Apr 25, 2013 2:22 pm

Re: Memory transfer benchmark

Postby grzeskob » Wed Dec 03, 2014 8:23 pm

I would like to refresh the topic and ask question about Core -> SRAM speed with DMA.

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps clocks = 3645

Why do we get 1,29 GBps, if max sustained data transfer for DMA is 8GBps ?
Epiphany Architecture Reference REV 14.03.11
The DMA engine works at the same clock frequency as the CPU and can
transfer one 64-bit double word per clock cycle, enabling a sustained data transfer rate of
8GB/sec.


cMesh: Used for write transactions destined for an on-chip mesh node. The cMesh network
connects a mesh node to all four of its neighbors and has a maximum bidirectional
throughput of 8 bytes/cycle in each of the four routing directions.


Please correct me if I am wrong - DMA transfer will be limited by cmesh (max. onedirection throughput will be 4 bytes/cycle). But even with this constrain I still get something around (600 MHz * 4 bytes/cycle) = 2,4 GBps ?
grzeskob
 
Posts: 12
Joined: Mon Nov 17, 2014 8:36 pm

Previous

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 10 guests

cron