Is there anything like e_load() which loads from memory?

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Is there anything like e_load() which loads from memory?

Postby rowan194 » Wed Jan 28, 2015 9:06 pm

I'm planning on using a variation of linear genetic programming which will directly modify Epiphany machine code. The ARM host will prepare the next generation while the cores are evaluating the fitness of the current.

The trouble is that e_load() wants a filename, but it's going to be very inefficient to have to create a new file each time, especially since it's not just a flat bin file. Imagine having to write out 500 SREC files each generation?

Can I roll my own to load (from the host memory) and execute a program (on the Epiphany), using something like e_write then e_start? Or does e_load do a lot more?

(One option may be to e_load a 'blank' program, then in a loop use e_read to save the core, including code/data/IVT, modify machine code in host memory, then e_write back?)
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby sebraa » Wed Jan 28, 2015 11:19 pm

You could have a small stub program in the first, write-protected half-bank of each core. Then, instead of pushing new code to the cores, you could have that stub program fetch it from host memory, where the host has put it. That code would have to be linked at a different address, probably requiring a different linker script. Running it would then just involve creating a function pointer to the entry point of the loaded binary.

That way, you can get around the SREC files. Since you'll probably call some kind of linker, you might still have intermediate files - it might be worth keeping them on a ram disk on the host. :-)
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Is there anything like e_load() which loads from memory?

Postby rowan194 » Thu Jan 29, 2015 7:02 am

Don't think there will be a need for a linker or intermediate files - the host will be manipulating machine code directly, including calculating branch addresses. If the cores had more memory, I could keep the entire set of candidate programs in the core memory, and just modify that from the host (each generation generally changes a single opcode - one single write!), but unfortunately I need to spend a bit of time copying the Epiphany code back and forth. The other option is to execute from the host's memory, but if I understand correctly that is unbelievably slow, and would be especially painful with all 16 cores fighting for access. In that instance, it would probably be faster to execute on the ARM. :(
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby piotr5 » Thu Jan 29, 2015 10:07 am

and you'd still need to read from host either way.
why don't you just sacrifice one eCore and let the program run solely on epiphany?
let only 15 cores fight it out, and a write-protected 16th core serves the changes?
then the only data on host will be just intermediate results and maybe some rng-seeds...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: Is there anything like e_load() which loads from memory?

Postby rowan194 » Thu Jan 29, 2015 10:37 am

There isn't enough memory space in the Ephiphany. There could be 500 or 1000 candidate programs in a generation, and each could have up to 1000 instructions. As well as the program code itself, there is also training and validation data, plus fitness calculation code (these remain static so only need to be transferred in once). It may be possible to squeeze 2 or 3 candidates into a single core each load/run round, in order to improve performance.

I should make it clear that each candidate program is executed multiple times (one for each item of training/validation data), and the multiple runs can all be done on-core, independent of the host. So it's more like transfer from host, run, run, run, run, run, run (etc) rather than transfer, run, transfer, run. The latter would be very inefficient as the Parallella would spend half the time just shifting data between the two chips.

I think I may have a good read of the Ephiphany datasheet, and a peek at the SDK internals, to see if I can roll my own from first principles. Even a 'blank' program int main() {}; ends up as a 6428 byte SREC file. Because the cores are running a simple program that reads input data, does some calculations and conditional branches, then exits, there's no need to use any C library functions.

Anyway, sorry for rambling, it's really just thinking aloud. I'd be curious to know if anyone has bypassed the SREC/e_load() system.
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby sebraa » Thu Jan 29, 2015 2:07 pm

Then sacrifice one core and have it serve the changes from shared memory, where the host prepared the changes earlier. That core could do some advanced caching, and also take care of the of the results. That way, you limit the fighting for shared memory, while having effectively infinite amounts of memory.

It should be possible to roll your own startup code without the newlib. This reduces code size a lot. Oh, and your test main() function should contain an endless loop: On embedded systems, main() should never end.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Is there anything like e_load() which loads from memory?

Postby rowan194 » Thu Jan 29, 2015 10:29 pm

Here's what I've come up with so far... but it's not working. I've had two hours sleep so I'm probably missing something obvious, or have made a silly mistake.

The idea is to stuff code directly into the Epiphany by using e_write, rather than e_load. Epiphany code simply increments r1 in an endless loop; host code reports back contents of r1 for each core, once per second.

Is this the correct way to point to the start of code? 0x0 is a vector?

edit: I think I've made some fundamental mistake with reading r1 from the core, because using e_write to set r1, then reading it back, doesn't return that value I've set.

<code deleted - see next msg>
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby rowan194 » Fri Jan 30, 2015 12:49 pm

Got the code working. Two major mistakes I made:

1) The Epiphany begins code execution at address 0x0 (the first IVT), so this needs to contain a branch opcode, NOT the address of the start of code (0x0 is not a vector).

2) Endianness. The way I've stuffed machine code here with 16 bit byte swapping is pretty clumsy - this will fall apart when I have to mix 16 and 32 bit opcodes - but it's just a quick n' dirty demo.

Behaviour of the sample Epiphany code has changed slightly; this version increments r1, then saves it to 0x80, which the host reads.

No e-gcc, no SREC conversion, no e_load. :)

Code: Select all
// gcc codestuff.c -I /opt/adapteva/esdk/tools/host/include -L /opt/adapteva/esdk/tools/host/lib -le-hal -le-loader

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <byteswap.h>

#include <e-hal.h>

//#define ELOAD 1   // define this to load Epiphany executable code the 'conventional' way, via e_load()

int main() {

  e_platform_t platform;
  e_epiphany_t dev;

  unsigned int ivt[16];
  unsigned short codebuf[256];
  unsigned int reg;

  int      cp = 0;
  int      i, j;

  e_init(NULL);
  e_reset_system();
  e_get_platform_info(&platform);
  e_open(&dev, 0, 0, platform.rows, platform.cols);


#ifndef ELOAD

  bzero(&ivt, sizeof(ivt));
  ivt[0] = __bswap_16(0xe020);  // IVT 0 (sync):   b 0x40

  codebuf[cp++] = __bswap_16(0x0310);  //  mov r0,0x80
  codebuf[cp++] = __bswap_16(0x0320);  //  mov r1,0
// loop:
  codebuf[cp++] = __bswap_16(0x9324);  //  add r1,r1,1
  codebuf[cp++] = __bswap_16(0x5420);  //  str r1,[r0]
  codebuf[cp++] = __bswap_16(0xe0fe);  //  b loop

#endif

  for (i = 0; i < platform.rows; i++) {
    for (j = 0; j < platform.cols; j++) {
#ifdef ELOAD
      if ( E_OK != e_load("loop.srec", &dev, i, j, E_TRUE) ) {
        fprintf(stderr, "Failed to load loop.srec\n");
        return EXIT_FAILURE;
      }
#else
      e_write(&dev, i, j, 0x0, &ivt, sizeof(ivt));  // write IVT
      e_write(&dev, i, j, 0x40, &codebuf, sizeof(codebuf[0]) * cp);  // write code
#endif
      if ( E_OK != e_start(&dev, i, j) ) {
        fprintf(stderr, "Failed to start execution on core %d,%d\n", i, j);
        return EXIT_FAILURE;
      }
    }
  }

  while(1) {
    for (i = 0; i < platform.rows; i++) {
      for (j = 0; j < platform.cols; j++) {
        e_read(&dev, i, j, 0x80, &reg, sizeof(reg));  // read 0x80 from Epiphany core
        printf("%d,%d loopcount=%d\n", i, j, reg);
      }
    }
    printf("\n");
    sleep(1);
  }
  e_close(&dev);   // ....never executed
  e_finalize();    // ....never executed
}
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby rowan194 » Sun Feb 22, 2015 12:08 pm

I've done some benchmarking and have run into a further issue - the time it takes for the host to load and execute a program on the Epiphany.

I wrote a simple program that measures the time taken to e_write() the program from host memory to the core, then begin execution.

Results:

8k program: 0.539ms
16k program: 1.099ms
32k program: 2.146ms

The scaling is fairly linear so it seems most of the time is spent transferring data between the host memory and core.

Because my application needs to execute (literally) millions of different programs in rapid succession, the time taken to load each one becomes a painfully significant part of the total... 1 to 2 milliseconds to load the program, but only around 0.1 milliseconds to execute it.

Unless e_write() is horribly inefficient, and can be greatly improved with a hand rolled version, I don't really see any way to work around this. :(
rowan194
 
Posts: 17
Joined: Wed Jan 14, 2015 1:02 pm

Re: Is there anything like e_load() which loads from memory?

Postby aolofsson » Sun Feb 22, 2015 6:57 pm

rowan194,
Writing directly to the cores is always going to be very slow. The current memcpy() implementation inside e_write() is just using a store in a loop from the registers. It would be faster to program the Epiphany DMA to copy in a block of data from shared memory (or to use the Zynq DMA to move the data. My apologies that we haven't gotten this done yet...
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Next

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 3 guests