dot product

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Re: dot product

Postby smatthews » Wed Aug 26, 2015 2:32 pm

Sorry, you're right. Each e-core has 32KB of local memory (not 8KB).

Thus for dot product, we could theoretically place two 16KB arrays on each e-core. With 4 bytes per integer, it will be about 4,096 elements per core. Over 16 cores, we could support arrays up to 65,536 in length without requiring a fetch to main memory. The DMA cost I was talking about was the DMA channel between the ARM and the Epiphany chip, not the elinks between cores. If we want to exceed array sizes of 65,536, we would have to write our program in such a way to use the DMA channel to fetch new portions of the array from the 2GB main memory bank on the ARM chip. As the array gets large, I imagine this performance cost will get prohibitively high.

Back to the issue with greater than 4096. The SOP for two integer arrays consisting of (i=0... n-1) for n = 4096 is 22,898,104,320. This exceeds the capacity of a 4 byte integer or long. There is a long long type that is 8 bytes, which should hold this value. However, I ran into trouble when I tried to change the type of the sop variable to be unsigned long long. That's why I left it as an open problem.

-Suzanne
smatthews
 
Posts: 13
Joined: Fri Mar 13, 2015 7:04 pm

Re: dot product

Postby sebraa » Wed Aug 26, 2015 7:44 pm

smatthews wrote:If we want to exceed array sizes of 65,536, we would have to write our program in such a way to use the DMA channel to fetch new portions of the array from the 2GB main memory bank on the ARM chip. As the array gets large, I imagine this performance cost will get prohibitively high.
The ARM chip only has 1 GB of memory, and you officially can't access those from the Epiphany anyway. You would want to create some kind of ring buffer in shared memory (which is 32 MB in size, but only half of it is freely usable). The ARM processor then would feed this buffer, while the Epiphany would simultaneously read from it, using the DMA engine.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: dot product

Postby dobkeratops » Thu Aug 27, 2015 11:38 am

You would want to create some kind of ring buffer in shared memory (which is 32 MB in size, but only half of it is freely usable). The ARM processor then would feed this buffer, while the Epiphany would simultaneously read from it, using the DMA engine.


Was there something in progress to allow ARM <-> Epiphany communication without going through DDR - with the buffering happening in FPGA memory. Should be lower latency right (and less ddr bandidth )? 1 hop between physical chips rather than 3.. Zynq.arm->DDR.shared->Zynq.fpga_dma->epiphany .

was it already possible for the FPGA to DMA from 'anywhere in DDR to epiphany', just not the other way round
dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Re: dot product

Postby smatthews » Fri Aug 28, 2015 12:17 pm

Interesting. I remember this line of code from an earlier version of the hello world example:

Code: Select all
char outbuf [128] SECTION("shared dram");


I had thought that the "shared dram" could refer to external shared memory of the device, and assumed it meant the memory on the ARM chip.

I see in the manual that the Parallella is configured by default to use a pool of 32 MB DRAM device memory. I must have missed that. Is this pictorially diagrammed out anywhere? Most of the illustrations I see describing the Epiphany chip refer to the 32KB banks on each e-core, and showing out the 32KB address space is mapped out.

Thanks for the clarification!
smatthews
 
Posts: 13
Joined: Fri Mar 13, 2015 7:04 pm

Re: dot product

Postby sebraa » Fri Aug 28, 2015 4:36 pm

smatthews wrote:I had thought that the "shared dram" could refer to external shared memory of the device, and assumed it meant the memory on the ARM chip.
That is true.

smatthews wrote:I see in the manual that the Parallella is configured by default to use a pool of 32 MB DRAM device memory. I must have missed that. Is this pictorially diagrammed out anywhere? Most of the illustrations I see describing the Epiphany chip refer to the 32KB banks on each e-core, and showing out the 32KB address space is mapped out.
Each e-core uses 1 MB of address space, of which 32 KB is memory. But since you don't have 4096 e-cores, quite a lot of that address space is unused. Inside the FPGA, a 32 MB block of the ARM DDR memory is mapped into this address space on the Epiphany (at addresses 0x8e000000 to 0x8fffffff). The Linux kernel running on the ARM is instructed to not even know about this memory, since it is manually managed (by the eSDK).

The upper half of that memory (put into the Epiphany ELF section "shared_dram") is more or less(*) available, while the lower half is used for some parts of the C library, depending on the linker script chosen.
(*) this memory is also designated as a per-core heap, but for some reason that didn't work correctly when i tested it; dynamic memory is not the best choice for Epiphany applications anyway
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: dot product

Postby smatthews » Mon Aug 31, 2015 12:55 pm

Does the 32MB also apply to the 16-core Epiphany chips, or just the 64MB chips? The manuals sometime does not make the distinction clear in their descriptions.

Is their a figure anywhere in the manuals that diagram out this memory block? I haven't seen it anywhere yet, and I think it would really add to people's understanding to see it pictorially.
smatthews
 
Posts: 13
Joined: Fri Mar 13, 2015 7:04 pm

Re: dot product

Postby sebraa » Tue Sep 01, 2015 12:06 pm

smatthews wrote:Does the 32MB also apply to the 16-core Epiphany chips, or just the 64MB chips? The manuals sometime does not make the distinction clear in their descriptions.
The 32 MB limit is not an Epiphany limit, but a Parallella limit, so it applies for both the 16- and the 64-core Parallella versions. The Epiphany chip itself has no such restriction, but you need to make sure that the address range you want to access is routed to the correct eLink interface.

smatthews wrote:Is their a figure anywhere in the manuals that diagram out this memory block? I haven't seen it anywhere yet, and I think it would really add to people's understanding to see it pictorially.
I don't know. I took a look at the linker script to see how memory is mapped. Apparently, the FPGA logic on the Parallella even does address translation (mapping the 0x8e000000-0x8fffffff range seen from the Epiphany to some other address in the ARM address space; however, the address spaces seem to overlap for other addresses). Since my FPGA experience is limited, I didn't look at the details there.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Previous

Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 7 guests

cron