Epiphany vs Arm performance

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Epiphany vs Arm performance

Postby Nerded » Mon Jul 10, 2017 5:19 pm

Hello,

I am running the omp4-epiphany examples currently. In the pi-kernel example, I am getting an extremely poor benchmark for the epiphany when compared to the Arm. With 16 Epiphany kernels, I complete the calculations in .96 seconds. With 1 arm kernel, I complete the calculations in .16 seconds; and with 2 arm kernals: just .085 seconds. Is this discrepancy of performance expected? If so, what exactly is the point of the epiphany processor? Thank you.

The examples are located here https://github.com/parallella/parallella-examples/blob/master/omp4-epiphany/pi_kernel/pi.c
Nerded
 
Posts: 19
Joined: Tue Jun 06, 2017 8:30 pm

Re: Epiphany vs Arm performance

Postby jar » Mon Jul 10, 2017 7:17 pm

A couple things...

1) There is no 64-bit double precision support on E3, so software emulation will have difficulty outperforming ARM cores which have 64-bit DP support. So this is expected.
2) The OpenMP programming model is designed for symmetric multiprocessing (SMP), whereas Epiphany should primarily be thought of as a networked cluster of distributed memory where partitioned global address space (PGAS) models are used. Although OpenMP may use the global shared memory on Parallella, it lacks a semantics within the model to address Epiphany's local core memory.

You had asked earlier whether it was possible, not if it was a good idea. Exploring the OpenMP + MPI programming paradigm is useful for education, but not ideal for a Parallella cluster (with few exceptions).
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Epiphany vs Arm performance

Postby Nerded » Tue Jul 11, 2017 11:38 am

Well, as far as MPI + X goes on the Epiphany, what would be my best option for X in the case of performance? Like I said before, I am working with a cluster of these Parallella boards.

Also, should I expect better performance for the Epiphany than the Arm in general? (given the proper implementation)
Nerded
 
Posts: 19
Joined: Tue Jun 06, 2017 8:30 pm

Re: Epiphany vs Arm performance

Postby jar » Tue Jul 11, 2017 1:07 pm

For "X", I use the COPRTHR SDK for offload and OpenSHMEM for on-chip inter-core.

As far as performance goes, it depends on the application. Applications with high arithmetic intensity will do well, but Epiphany on the Parallella board has limited off-chip bandwidth (less than 300 MB/s write). There is a non-zero offload overhead time, but if the computational kernel is called repeatedly, those secondary calls can be fast.

On the Parallella board, there are two ARM Cortex-A9 cores at 667 MHz with 4 SP FLOPS/cycle (NEON SIMD instructions) for peak performance of 2*4*0.667 of 5.336 GFLOPS. Epiphany-III has 16 cores 600 MHz with 2 SP FLOPS/cycle (fused multiply-add) for a peak performance of 19.2 GFLOPS. I find the Epiphany scalar code easier than vector code, particularly when writing optimized assembly.

Good luck!
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Epiphany vs Arm performance

Postby Nerded » Tue Jul 11, 2017 2:30 pm

Awesome! This is some great information. So, seeing as COPRTHR is for the offloading, and that OMP is currently doing the offloading; would MPI + (OpenSHMEM + OpenMP) be possible in this case? I sincerely appreciate your help, however!
Nerded
 
Posts: 19
Joined: Tue Jun 06, 2017 8:30 pm

Re: Epiphany vs Arm performance

Postby jar » Tue Jul 11, 2017 3:46 pm

I have never tried (OpenSHMEM + OpenMP) and I have some doubts that it will work. Each has a different memory model, but I'm not familiar enough with the OpenMP implementation to say with certainty that they're incompatible.
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Epiphany vs Arm performance

Postby Nerded » Tue Jul 11, 2017 3:59 pm

Okay, are there any examples that you know of, of MPI + a threaded OpenSHMEM implementation? Or, I guess, just a threaded OpenSHMEM implementation.
Nerded
 
Posts: 19
Joined: Tue Jun 06, 2017 8:30 pm

Re: Epiphany vs Arm performance

Postby jar » Tue Jul 11, 2017 8:50 pm

Start with this one:
https://github.com/USArmyResearchLab/op ... le/c_nbody

Here are a few more:
https://github.com/USArmyResearchLab/op ... r/example/

The examples include just the device-level parallelism (threaded OpenSHMEM). If you find you like OpenSHMEM, you could use that instead of MPI for your node-level parallelism. So you would effectively have an OpenSHMEM + OpenSHMEM code.
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Epiphany vs Arm performance

Postby Nerded » Wed Jul 12, 2017 12:55 am

Interesting! I was definitely unfamiliar with the scope of OpenSHMEM. Would you believe this to be a more efficient method of doing cross node communications?

Also, a beginner SHMEM question, but does it effectively pool the memory of the 16 e cores? As in, I am working with 512kb instead of 32kb × 16?
Nerded
 
Posts: 19
Joined: Tue Jun 06, 2017 8:30 pm

Re: Epiphany vs Arm performance

Postby jar » Wed Jul 12, 2017 1:33 am

SHMEM was originally an API developed by Cray back in the 1990s. It has more recently become standardized so that there are many implementations that follow a common API. In my opinion, it's a cleaner, more intuitive API to MPI 1 and 2 for communication primitives and one-sided communication. MPI version 3 added one-sided communication. MPI has a lot of other routines that handle parallel file I/O (things that don't really apply for Epiphany). OpenSHMEM focuses moving data in a one-sided, asynchronous manner. There are a bunch of test codes so you can see how you might use specific routines.

You can also read the OpenSHMEM 1.3 specification

Nerded wrote:Also, a beginner SHMEM question, but does it effectively pool the memory of the 16 e cores? As in, I am working with 512kb instead of 32kb × 16?


No, OpenSHMEM efficiency is gained with partitioned memory. But we are working on something else for the lazy :-)
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Next

Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 5 guests