Parallella Community

Posted: **Sat May 16, 2015 5:33 pm**

Hi everyone,

I'm finishing up on my MEng project which involved importing an aritificial developmental system (ADS) whose orginal software implementation represents an organism as a 2D array of cells which perform development in a sequential manner through nested loops running through the x and y axis. The model is based on an aritficial gene regulatory network (GRN) and is basically a more complex version of cellular automata. The idea was that by porting this ADS onto hardware, where a single core represents a single cell, the x and y loops could be discarded and a performance boost should have been witnessed.

After successfully porting the model I discovered that the base time, so time taken for development to occur internally within the cell, is just ever so slighltly slower than the original software implementation. If I then permit the core to core data transfers to occur (so as to exchange chemicals and perform gene regulation) then more time is added on.

My question here is, the eCores should perform at 600 to 700MHz around correct? If the parallella version is approximately takes the same time, ever so slightly longer, as the software implementation which ran on a 3.2 GHz CPU, then that means the eCores demonstrated a performance of only 200MHz (3.2GHz/16 cores). Would anyone know why this is? The developmental model is comlex, as in a lot goes on and it revolves around modular operations, divisions, additions, subtractions and moving about of data, but nothing out of the ordinary where the 200MHz performance is demonstrated instead. Have I missed something or done something wrong like inadequate use of command line options while compiling? I used the matmul-16 template and simply copied and pasted the compiler command lines from there. Or is my comparison inacurrate?

Any hints would be useful to finalise my results section in my report as to why this is.

Posted: **Sun May 17, 2015 10:33 am**

epiphany chip has no command for dividing numbers.
division-algorithm takes about 10 times more effort than a simple multiplication.
so if your software requires dividing numbers, make sure you optimize it out.
for example dividing by an integer might be slow, but multiplying with constant float is only a matter of converting the numbers and setting up the float-operations, and the actual multiplication. that's about 5 assembler-commands instead of 50...

Posted: **Mon May 18, 2015 11:49 am**

Thanks piotr5, that is definitely the reason. I have modulo operations everywhere!!!! lol. Thanks again!!

Parallella Community

MEng Project - cores only demonstrate 200MHz performance

MEng Project - cores only demonstrate 200MHz performance

Re: MEng Project - cores only demonstrate 200MHz performance

Re: MEng Project - cores only demonstrate 200MHz performance