Page 1 of 1

MPI "lite" proof-of-concept

PostPosted: Thu Jan 03, 2013 10:29 am
by dar
Here is a simple proof-of-concept of MPI "lite" for Parallella. (Not just an idea, but actual code I just compiled and ran on a board, so technically this might be the first MPI code run on Parallella.)

The MPI program below, a simple example for calculating Pi pulled from the internet, was left unchanged with two exceptions. First, double was replaced with float. Second, the mpi.h header was replaced with stdcl_mpi_compat.h. No other changes were made. Code was compiled and executed on a Parallella prototype using 17 cores (1 ARM + 16 Epiphany). Work flow was identical to that of standard MPI (with the "cl" prefix added to the commands hinting at whats under the hood),

clmpicc pical.c -o pical.x
clmpirun -n 17 pical.x

Here MPI "lite" is built on top of STDCL and the tools provided in the COPRTHR SDK. Just a proof of concept for now, but it would just take some time to build out a more complete implementation of MPI "lite", however that may be defined.

Result matched exactly that produced using real MPI run on a multi-core x86.

One suggested model for Parallella would be to use a manager/worker model with an ARM core serving as the Manager with MPI procid 0. That is what was tested here.

The MPI source code compiled and executed on Parallella is shown below.


Code: Select all

/*  example from MPICH  */
#include "stdcl_mpi_compat.h"
#include <stdio.h>
#include <math.h>

float f(float);

float f(float a)
   return (4.0 / (1.0 + a*a));

int main(int argc,char **argv)
   int done = 0, n, myid, numprocs, i;
   float PI25DT = 3.141592653589793238462643;
   float mypi, pi, h, sum, x;
   float startwtime = 0.0, endwtime;
   int  namelen;
   char processor_name[MPI_MAX_PROCESSOR_NAME];


   fprintf(stderr,"Process %d of %d on %s\n",
      myid, numprocs, processor_name);

   n = 0;
   while (!done) {

      if (myid == 0) {
         if (n==0) n=10000; else n=0;
         startwtime = MPI_Wtime();

      MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

      if (n == 0)
         done = 1;

      else {

         h   = 1.0 / (float) n;
         sum = 0.0;

         /* A slightly better approach starts from large i and works back */
         for (i = myid + 1; i <= n; i += numprocs) {
            x = h * ((float)i - 0.5);
            sum += f(x);

         mypi = h * sum;

         MPI_Reduce(&mypi, &pi, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);

         if (myid == 0) {

            printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));

            endwtime = MPI_Wtime();

            printf("wall clock time = %f\n", endwtime-startwtime);          

            fflush( stdout );


   return 0;


Re: MPI "lite" proof-of-concept

PostPosted: Thu Jan 03, 2013 1:05 pm
by aolofsson
WOW, that is incredible! Most people make simple tasks complicated. You make complex tasks look easy.:-)
I talked to a bunch of people about this in the past and received all kinds of reasons why MPI wasn't a good fit for Epiphany. You actually went ahead and implemented a basic MPI programming model in less than a week. Mind boggling! I thank my lucky star that I typed in "GPL OpenCL" in a Google search a year ago to find Brown Deer Technology.:-)


Re: MPI "lite" proof-of-concept

PostPosted: Sat Jan 05, 2013 5:59 pm
by ed2k
can you give more detail instructions, how to build the brown deer corpthr, where to download the source etc.

Re: MPI "lite" proof-of-concept

PostPosted: Sat Jan 05, 2013 6:07 pm
by ubii

You can download the COPRTHR SDK from the link below and the installation instructions can be found in the COPRTHR Primer.

Re: MPI "lite" proof-of-concept

PostPosted: Sun Jan 06, 2013 1:31 am
by ed2k
I visited the website, but unable to download the epiphany package.
I downloaded the git, switch to current branch. the ./configure is really hard to pass on my ubuntu. lots of manual path to satisfy. libelf, libevent, libconfig. wondering have you guys tried it on ubuntu x64 system. Or I need to use a 32bit system.

Re: MPI "lite" proof-of-concept

PostPosted: Sun Jan 06, 2013 4:27 am
by ubii

I was able to get the COPRTHR SDK to install on 64-bit Linux Mint Debian Edition, but it required installing a different version of libelf, as seen below.

In the README.txt, it states the following:

"Please take note that libelf 1.x branch found on most Linux distributions is not a valid substitute for libelf-0.8.13 since they lack the required features and exhibit undocumented broken behavior."

The README.txt file has a link to this version of libelf, which you can download and install following the steps listed in the libelf INSTALL.txt file. This will install a different version of libelf, without stepping on your existing version.

After doing this, you should be able to configure, make, and install the COPRTHR SDK, doing the following:

./configure --with-libelf=/usr/local
sudo make install

Re: MPI "lite" proof-of-concept

PostPosted: Sun Jan 06, 2013 5:38 am
by ed2k
got the compilation/installation worked, with lots of manual symbolic link, since the configure always assume the lib is at $THE_PATH/lib and include is at $THE_PATH/include.
now will look into the internal, I am interested to know how corpthr achieve the code deployment without keeping 16 versions of them.
seems libelf is the key.

Re: MPI "lite" proof-of-concept

PostPosted: Sun Jan 13, 2013 3:21 am
by dar
Sorry for slow reply. I need to figure out how to set email notify correctly I guess.

Right now Parallella support is available in a pre-compiled package available for download from (github disabled downloads so we moved the binary downloads here). These builds are being tested on a prototype board and should be considered beta in advance of the next official package release. The source code can be found on github under the current branch. Note that the default on the github project is still 1.4-stable - for Parallella you do not want that branch. Now, unless you have one of the early boards its not clear how easily one can test these builds. We are looking at a way to allow build and test for Parallella without a board to expand accessibility for developers who want to experiment with Parallella programming in advance of a board.

If interested in this MPI "lite" code this is still experimental and has not been merged into current. Sorry.

If you have trouble with building COPRTHR SDK please send questions to

The build is tested on a range of platforms and should not require many symbolic links, etc. However, its difficult to catch every platform configuration. We always appreciate feedback to improve the build configuration.


Re: MPI "lite" proof-of-concept

PostPosted: Fri Mar 15, 2013 1:33 pm
by eleitl
Do you think it would be feasible to implement a subset of MPI in the spare FPGA space of the new Zynq?
It has enough spare I/O FPGA pins to drive a 6-link 3d grid/torus, and probably can almost
reach 1 GByte/s throughput per link, especially if you can make a cut-through router
so that you don't have to touch internal memory bandwidth for through traffic.