Parallella for Genetic Algorithms

Parallella for Genetic Algorithms

Postby Weps » Thu Oct 10, 2013 4:41 pm

I am working on a project that uses a genetic algorithm to generate design permutations that are then analyzed by a separate program and the results returned. This is a perfect candidate for parallel computing, as each design permutation can be analyzed completely independently. I'm hoping the Parallella can adapt to the problem, as my current design space is 1,000*N^2!

I would really like to order the cluster, but I have some feasability questions:

- Everything is written in C or Fortran, so compiling within Ubuntu should be straightforward. Can the Parallella be used as it's own design environment, or will it just be a headless cluster?

- The analysis program is about 2 Mb in size. Ideally, once the design stack is generated, every available core would take a design, analyze it, output results, and fetch the next waiting design. Is this even possible with Parallella? Is the analysis program too big, I.e. only executable by the ARM?

- And most importantly: To leverage the Epiphany cores, would I need to rewrite all my code? If I don't, will it just execute sequentially?

I'm really excited about the possibilities Parallella offers, as the more cores the better for these types of problems. Unfortunately, the only examples I've seen so far involve simple math/matrix operations, and I can't determine just what is possible for larger problems. I just hope my problem fits!
Weps
 
Posts: 2
Joined: Thu Oct 10, 2013 4:06 pm

Re: Parallella for Genetic Algorithms

Postby LamsonNguyen » Thu Oct 10, 2013 6:05 pm

Weps wrote:- Everything is written in C or Fortran, so compiling within Ubuntu should be straightforward. Can the Parallella be used as it's own design environment, or will it just be a headless cluster?

The Parallella can be used as it's own design environment.
Weps wrote:- The analysis program is about 2 Mb in size. Ideally, once the design stack is generated, every available core would take a design, analyze it, output results, and fetch the next waiting design. Is this even possible with Parallella? Is the analysis program too big, I.e. only executable by the ARM?

Each core on the Epiphany chip only has 32 KB memory. See the architecture reference for more details.
Weps wrote:- And most importantly: To leverage the Epiphany cores, would I need to rewrite all my code? If I don't, will it just execute sequentially?

You will need to rewrite your code to take advantage of the Epiphany, otherwise it would just run on the ARM.
LamsonNguyen
 
Posts: 138
Joined: Sun Dec 16, 2012 7:09 pm

Re: Parallella for Genetic Algorithms

Postby mhonman » Thu Oct 10, 2013 8:13 pm

LamsonNguyen wrote:You will need to rewrite your code to take advantage of the Epiphany, otherwise it would just run on the ARM.


You could also think of it as "refactor" or "restructure". Making a parallel version of a program usually involves identifying the parts that are both computationally intensive and at least partially decoupled, then writing some glue logic that farms the work out to processes on the parallel processing engine, and gathers and merges the results. You will most likely find that the compute-intensive part of the code will fit in the internal RAM of an Epiphany core - it will need to have its own glue layer to receive work and pass back results, but *if* the program is well-structured much of the original logic can be preserved*.

The devil will be in the details, particularly the scan for data dependencies. And it may be that the minimum-size work unit cannot coexist with the parallelisable core code in Epiphany internal RAM.

* the good news with Parallella is that each core can have up to 1MB of code - running code from external RAM is incredibly slow but at least it reduces worries about partitioning the code to make it fit in internal RAM - as long as that external-RAM code is hardly ever invoked.
mhonman
 
Posts: 112
Joined: Thu Apr 25, 2013 2:22 pm

Re: Parallella for Genetic Algorithms

Postby CIB » Fri Oct 11, 2013 9:20 am

mhonman wrote:
LamsonNguyen wrote:You will most likely find that the compute-intensive part of the code will fit in the internal RAM of an Epiphany core -


The code, yes(if you avoid certain dependencies like STL or boost), the memory you'll be working on not so much, and if your code ends up being mostly random access on a 2MB chunk of memory, you will see notable to massive loss in speed.

Actually, even the "it'll fit on the core easily" part I'll have to clarify. Just try compiling a random C program for the epiphany and check the ASM output. Just fitting the standard C lib onto the core could become a challenge. Optimizing for the 32KB limit, while using programs and technology optimized for having more L1 cache, not to talk about L2, L3 and RAM, than that can be a problem.
CIB
 
Posts: 108
Joined: Sat Jul 13, 2013 1:57 pm

Re: Parallella for Genetic Algorithms

Postby mhonman » Tue Oct 15, 2013 1:39 pm

Actually, even the "it'll fit on the core easily" part I'll have to clarify. Just try compiling a random C program for the epiphany and check the ASM output. Just fitting the standard C lib onto the core could become a challenge.


Thankfully it is not necessary to have the whole of the C runtime library in internal RAM (or we would really be in trouble!).

Baseline internal RAM use (i.e. tiny program that does nothing) is about 700 bytes with fast.ldf and 2200 bytes with internal.ldf. This is a good example of how locating infrequently code or data in external RAM can save internal memory space.

The external RAM is really useful during the porting process because it is not necessary to decide on how to partition the program at the start of the port. The flip-side is that it is really slow to execute code in external RAM (even more so when several cores are attempting to do so simultaneously) so there is a need to tweak the LDF file to move the often-used library routines into internal RAM. But on the other other hand, profiling the program on a host system will give a good starting point in the form of a list of library routines in which most CPU time is spent.

P.S. There is a handy way of seeing how much of the internal RAM will be used: use e-objdump -s to dump out the contents of the .srec file that will be loaded on the core. This shows you what will be written to each memory area.

I suspect that anything to do with strings will be bad news for the Epiphany core (in speed, space, and extra library routines) & I've been wondering whether there is some non-ugly way of assigning each core a host thread to do "string and file nonsense" on its behalf. No brilliant ideas yet, I'm afraid...
mhonman
 
Posts: 112
Joined: Thu Apr 25, 2013 2:22 pm

Re: Parallella for Genetic Algorithms

Postby Weps » Wed Oct 16, 2013 9:20 pm

mhonman wrote:I suspect that anything to do with strings will be bad news for the Epiphany core (in speed, space, and extra library routines) & I've been wondering whether there is some non-ugly way of assigning each core a host thread to do "string and file nonsense" on its behalf. No brilliant ideas yet, I'm afraid...


I'm definitely with you here! Stepping back, there is a lot of plotting/graphing capability that could be stripped out of my code. I have not tried the Parrallella IDE yet, but a simple "Hello, World!" in Code:Blocks went from 260 kb using iostream to 5.4 kb using stdio. I'm now willing to bet the math would fit no problem with some "program unrolled" thinking. That does leave the problem of io though. I suppose the ideal tool chain might be something like:

1. Open coordinate file
2. Load coordinates to array(s) <= In external RAM?
3. Read coordinates. Do math. <= Inside Epiphany using Job Scheduler?!
4. Write results to output array(s) <= In external RAM?
5. Save output to file

This is obviously a whole new ball of code at this point, but it might just work. At least each core would read from external RAM once and write to external RAM once per cycle. Times 16 cores at once though?... :|

P.S. Thanks to everybody replying in this thread! I'm just a lowly engineer who thought being able to program in C was pretty cool, so thanks for the help on all this crazy parallel programming/hardware level code thinking!
Weps
 
Posts: 2
Joined: Thu Oct 10, 2013 4:06 pm

Re: Parallella for Genetic Algorithms

Postby mhonman » Thu Oct 17, 2013 8:22 pm

Weps, you're on the right track there. As you've spotted, it will be a killer to operate on data while it is in external RAM, but you can manage it by treating the internal RAM as an explicitly managed cache.

In other words, the core program fetches a chunk of the "problem" from external RAM, processes it, and shuffles it back to external RAM (the Adapteva matrix multiplication demo is an example of this technique, even if the C code is a bit mind-bending to read).

Since the Epiphany cores have two DMA channels, it is feasible (if the chunks are sufficiently small) to have fresh input data being read and results being written while calculation is in progress. There will probably be practical pitfalls - the approach takes one further & further away from the original code.

The ideal is for each core to have enough internal RAM that back-and-forth shuffling is not needed, but I suspect that is not a realistic expectation. 20+ years ago, 256KB was a tight fit for a SPMD parallelised 3D Navier-Stokes CFD code (half code + buffers, half data). On Parallella, much of that code could go into the external RAM, but that leaves 128KB of data to find a home for. Increasing the amount of internal RAM and/or increasing the speed and parallelism of external RAM would make Epiphany systems more generally useful.
mhonman
 
Posts: 112
Joined: Thu Apr 25, 2013 2:22 pm

Re: Parallella for Genetic Algorithms

Postby timpart » Fri Oct 18, 2013 6:57 am

You really want to avoid executing code from external RAM. It is very slow. I understand Embecosm (the compiler people) are looking into an overlay system for code.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: Parallella for Genetic Algorithms

Postby davidl » Sat Feb 28, 2015 7:11 pm

Greetings to everyone,
I'm working now on my master thesis, and I'm trying to implement a genetic algorithm that runs on the Epiphany processor. I want to know if someone continued with this project, and what type of parallel algorithm was implemented (master-slave, coarse-grained). Any type of information will be really appreciated.

Respectfully,

David
davidl
 
Posts: 5
Joined: Tue Feb 17, 2015 7:16 am


Return to Scientific Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron