Parallella Memory benchmark

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Parallella Memory benchmark

Postby grzeskob » Tue Dec 09, 2014 8:59 pm

I would like to ask question about Core -> SRAM speed with DMA.

Testing chip speed (@ 600Mz)
Core -> SRAM: Write speed = 1286.01 MBps clocks = 3645

Why do I get 1,29 GBps, if max sustained data transfer for DMA is 8GBps ?
Epiphany Architecture Reference REV 14.03.11

The DMA engine works at the same clock frequency as the CPU and can
transfer one 64-bit double word per clock cycle, enabling a sustained data transfer rate of

cMesh: Used for write transactions destined for an on-chip mesh node. The cMesh network
connects a mesh node to all four of its neighbors and has a maximum bidirectional
throughput of 8 bytes/cycle in each of the four routing directions.

Please correct me if I am wrong - DMA transfer will be limited by cmesh (max. onedirection throughput will be 4 bytes/cycle). But even with this constrain I still get (600 MHz * 4 bytes/cycle) = 2,4 GBps ?

I have found also Errata #1 inside e16g301_datasheet:
The DMA engine bandwidth per channel is stuck at 50% throttle, meaning that each DMA channel can
transfer at most 1 double word every two clock cycles.

But it should still give me 1 double word (64-bits - 8-bytes) per 2 cycles - means 4 bytes per cycle - Leads to 600 x 4 = 2,4 GBps ?

Thank you for any hints in this topic
Posts: 12
Joined: Mon Nov 17, 2014 8:36 pm

Re: Parallella Memory benchmark

Postby grzeskob » Mon Dec 15, 2014 1:13 pm

Could someone give me some advice ?
I am stuck on this problem.
I have spend long time by searching documentation and trying to apply different changes to benchmark app. :

- Is my calculation wrong, and I can not expect 2,4 GBps ?
- There is a SW bug inside benchmark, which I can correct to get 2,4 GBps ?
Posts: 12
Joined: Mon Nov 17, 2014 8:36 pm

Re: Parallella Memory benchmark

Postby aolofsson » Mon Dec 15, 2014 1:20 pm

Sorry for the slow reply!

The cmesh can transfer 8 bytes/cycle at each node in each direction. At 600Mhz this implies a peak bandwidth of 4.8GB/s. However, due two errata items, the DMA bandwidth out of one core is limited to ~25% of this. This is documented in the datasheet of the processor E16G301 and E64G401. As regrettable as this is, we have found the existing on chip bandwidth to be the least of our problems. (see FFT and matmul benchmarks on github for examples showing effective on chip communication patterns). The 1.2GB/s is still much higher than the off chip bandwidth.

What are you trying to test?

User avatar
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Parallella Memory benchmark

Postby grzeskob » Mon Dec 15, 2014 2:42 pm

Hi Andreas,

Thank you for your answer. It has helped me a lot.
I do my thesis on Parallella board. First step is to measure and validate peak performance between different memory blocks on the board. I already have seen the posts about ERAM<->SRAM bandwidth problems. Later on I want to find out bottlenecks and possible congestion points and try to optimize apps with this knowledge.

Posts: 12
Joined: Mon Nov 17, 2014 8:36 pm

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 5 guests