Page 1 of 3

Memory transfer benchmark

PostPosted: Thu May 02, 2013 6:43 pm
by tnt
Has anyone done benchmark from core to/from external DRAM ?

I thought I had seen some a few months ago but can't find them ...

Cheers,

Sylvain

Re: Memory transfer benchmark

PostPosted: Thu May 02, 2013 8:19 pm
by ysapir
Here's the output of my memory access speed test, for E64G4:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   17.12 MBps
Host <- SRAM: Read speed  =   20.93 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.83 MBps
Host <- ERAM: Read speed  =  136.66 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1949.88 MBps   clocks = 2404
Core <- SRAM: Read speed  =  480.82 MBps   clocks = 9749
Core -> ERAM: Write speed =  304.05 MBps   clocks = 15417
Core <- ERAM: Read speed  =  153.31 MBps   clocks = 30576



and here's for E16G3:

Code: Select all
Testing SRAM speed.
Host -> SRAM: Write speed =   14.62 MBps
Host <- SRAM: Read speed  =   17.85 MBps

Testing ERAM speed.
Host -> ERAM: Write speed =  100.71 MBps
Host <- ERAM: Read speed  =  135.42 MBps

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps   clocks = 3645
Core <- SRAM: Read speed  =  406.80 MBps   clocks = 11523
Core -> ERAM: Write speed =  235.88 MBps   clocks = 19872
Core <- ERAM: Read speed  =   85.99 MBps   clocks = 54514

Re: Memory transfer benchmark

PostPosted: Thu May 02, 2013 9:30 pm
by tnt
Thanks, that's consistent with what I get ( 87 Mo/s read , 234 Mo/s write ).

But that's pretty low, the interface peak is supposed to be like 900 Mo/s right ?

Re: Memory transfer benchmark

PostPosted: Thu May 02, 2013 10:02 pm
by ysapir
Please note that the host transfer speeds were measured using memcpy() calls (which is the implementation of the e_read() and e_write() API's). You can probably get better performance using DMA.

Re: Memory transfer benchmark

PostPosted: Thu May 02, 2013 10:05 pm
by tnt
I assumed that those :

Core -> ERAM: Write speed = 235.88 MBps clocks = 19872
Core <- ERAM: Read speed = 85.99 MBps clocks = 54514


are done with the DMA ?

at least they match the speed that I get when doing DMA. (assuming MBps is 'Mega Bytes per sec' and not 'Mega Bits per sec' )

Re: Memory transfer benchmark

PostPosted: Thu May 02, 2013 11:10 pm
by ysapir
Yes.

Re: Memory transfer benchmark

PostPosted: Fri May 03, 2013 1:57 pm
by shodruk
Hmm... Host <-> ERAM bandwidth is strangely slow.

What is the Zynq's DDR configuration (operating frequency, DRAM bus width) ?

Re: Memory transfer benchmark

PostPosted: Fri May 03, 2013 8:56 pm
by ysapir
I added a section to the test, measuring the memcpy() speed withing application space (virtual memory). It happens that memcpy() between buffers within the host application (i.e., virtual to virtual space, insode the O/S DRAM segment) achieve speeds of 240 MBps.

This is about 2x the speed of DRAM to/from ERAM (reminder: ERAM here is the segment of the board's DRAM dedicated to the Epiphany and not seen by linux). I am open to explanations on *why* the two operations are so different in speeds.

Looking at the memcpy() disassembly code, it looks like the copy is done via reg read/write and not DMA.

Regarding the ZedBoard's spec - according to Roman, the default ZedBoard configuration is 533MHz Operating Frequency, 32bit effective DRAM bus width, which means ~2 GBps. However, we will look further on the documentation to see if the actual settings we have is different.

Re: Memory transfer benchmark

PostPosted: Fri May 03, 2013 9:17 pm
by tnt
ysapir wrote:This is about 2x the speed of DRAM to/from ERAM (reminder: ERAM here is the segment of the board's DRAM dedicated to the Epiphany and not seen by linux). I am open to explanations on *why* the two operations are so different in speeds.


The ERAM zone is most likely mapped as non-cacheable, no prefetch, no write combining or any of those things to optimize data access. But it's also those same things being disabled that make it "easy" and that you haven't have any cache issue when talking to the epiphany :)


ysapir wrote:Looking at the memcpy() disassembly code, it looks like the copy is done via reg read/write and not DMA.


Yes, userspace wouldn't have any way to control a DMA peripheral anyway and the libc would have to know about the hw specifics ...

Cheers,

Sylvain

Re: Memory transfer benchmark

PostPosted: Thu Jun 20, 2013 4:57 am
by shodruk
Is this understanding about eLink correct?

eLink packet size is always 104 bits.
(data:32, src_address:32, dst_address:32, control:8)

Writing 32 bits costs 104 bps of bandwidth.

Reading 32 bits costs 208 bps of bandwidth.
(104 bits for request, 104 bits for response)