The MPI program below, a simple example for calculating Pi pulled from the internet, was left unchanged with two exceptions. First, double was replaced with float. Second, the mpi.h header was replaced with stdcl_mpi_compat.h. No other changes were made. Code was compiled and executed on a Parallella prototype using 17 cores (1 ARM + 16 Epiphany). Work flow was identical to that of standard MPI (with the "cl" prefix added to the commands hinting at whats under the hood),

clmpicc pical.c -o pical.x

clmpirun -n 17 pical.x

Here MPI "lite" is built on top of STDCL and the tools provided in the COPRTHR SDK. Just a proof of concept for now, but it would just take some time to build out a more complete implementation of MPI "lite", however that may be defined.

Result matched exactly that produced using real MPI run on a multi-core x86.

One suggested model for Parallella would be to use a manager/worker model with an ARM core serving as the Manager with MPI procid 0. That is what was tested here.

The MPI source code compiled and executed on Parallella is shown below.

-DAR

- Code: Select all

/* example from MPICH */

#include "stdcl_mpi_compat.h"

#include <stdio.h>

#include <math.h>

float f(float);

float f(float a)

{

return (4.0 / (1.0 + a*a));

}

int main(int argc,char **argv)

{

int done = 0, n, myid, numprocs, i;

float PI25DT = 3.141592653589793238462643;

float mypi, pi, h, sum, x;

float startwtime = 0.0, endwtime;

int namelen;

char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

MPI_Get_processor_name(processor_name,&namelen);

fprintf(stderr,"Process %d of %d on %s\n",

myid, numprocs, processor_name);

n = 0;

while (!done) {

if (myid == 0) {

if (n==0) n=10000; else n=0;

startwtime = MPI_Wtime();

}

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

if (n == 0)

done = 1;

else {

h = 1.0 / (float) n;

sum = 0.0;

/* A slightly better approach starts from large i and works back */

for (i = myid + 1; i <= n; i += numprocs) {

x = h * ((float)i - 0.5);

sum += f(x);

}

mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);

if (myid == 0) {

printf("pi is approximately %.16f, Error is %.16f\n",

pi, fabs(pi - PI25DT));

endwtime = MPI_Wtime();

printf("wall clock time = %f\n", endwtime-startwtime);

fflush( stdout );

}

}

}

MPI_Finalize();

return 0;

}