Iterating host code: Parallella restarts

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Iterating host code: Parallella restarts

Postby gordon » Fri Jun 16, 2017 7:13 am

Hi, I have written a code for eigenvalue decomposition of a matrix using eSDK. I am successfully getting the results for 1st iteration. I need to iterate the code in a loop for 20 times. For which I have placed the host code under a for loop. This for loop includes the starting of epiphany cores and closing them down. Ive used the following commands for;

Starting the epiphany:
e_init(NULL);
e_reset_system(); //reset Epiphany
e_get_platform_info(&platform);
e_open(&dev, 0, 0, platform.rows, platform.cols); //open all cores


Closing the epiphany:
e_close(&dev);
e_finalize();


And all of these comes under the loop along with some e_write e_read and e_load functions. It prints the output for the 1st iteration and for the 2nd iteration it doesn't give any output and parallella restarts.

Also in the device program after raising the flag I've put the cores into idle state using:

__asm__ __volatile__("idle");


Can you please tell me where am I going wrong and as to how to bring any changes so the code iterates for 20 times under a for loop? Thanks in advance :)
gordon
 
Posts: 24
Joined: Mon May 29, 2017 10:37 am

Re: Iterating host code: Parallella restarts

Postby sebraa » Fri Jun 16, 2017 10:51 am

Open the Epiphany once before the first iteration, and close it after the last one. Opening/Closing the Epiphany repeatedly may not work (or may stop working after a while). It is better to structure your code so that it can do multiple runs without a full system reset.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Iterating host code: Parallella restarts

Postby gordon » Fri Jun 16, 2017 11:40 am

I've done like you said. Now I'm facing another issue. I have two device programs running.When I'm into the second iteration, The first one runs on diagonal cores and it works fine. In the second one only 3 out of the 12 cores return flag and hence the program doesn't stop because the remaining 9 cores do not return flag on completion. I've used this to check the flag status:

Code: Select all
while(1){   
   
   all_done_1=0;
      
   e_read(&dev, 0, 1, 0x7000, &done[0*platform.cols+1], sizeof(float));
   all_done_1+=done[0*platform.cols+1];

   e_read(&dev, 0, 2, 0x7000, &done[0*platform.cols+2], sizeof(float));
   all_done_1+=done[0*platform.cols+2];
       
        e_read(&dev, 0, 3, 0x7000, &done[0*platform.cols+3], sizeof(float));
   all_done_1+=done[0*platform.cols+3];
              
   e_read(&dev, 1, 0, 0x7000, &done[1*platform.cols+0], sizeof(float));   
   all_done_1+=done[1*platform.cols+0];
                   
   e_read(&dev, 1, 2, 0x7000, &done[1*platform.cols+2], sizeof(float));
   all_done_1+=done[1*platform.cols+2];
   
   e_read(&dev, 1, 3, 0x7000, &done[1*platform.cols+3], sizeof(float));
   all_done_1+=done[1*platform.cols+3];
         
   e_read(&dev, 2, 0, 0x7000, &done[2*platform.cols+0], sizeof(float));
   all_done_1+=done[2*platform.cols+0];
       
   e_read(&dev, 2, 1, 0x7000, &done[2*platform.cols+1], sizeof(float));   
   all_done_1+=done[2*platform.cols+1];

   e_read(&dev, 2, 3, 0x7000, &done[2*platform.cols+3], sizeof(float));
   all_done_1+=done[2*platform.cols+3];
   
   e_read(&dev, 3, 0, 0x7000, &done[3*platform.cols+0], sizeof(float));
   all_done_1+=done[3*platform.cols+0];

   e_read(&dev, 3, 1, 0x7000, &done[3*platform.cols+1], sizeof(float));
   all_done_1+=done[3*platform.cols+1];

   e_read(&dev, 3, 2, 0x7000, &done[3*platform.cols+2], sizeof(float));   
   all_done_1+=done[3*platform.cols+2];
        printf("AALLL DONEEEE!!!:%d Q:  %d\n",all_done_1,q);
   
 
    if(all_done_1==12){
   
     break;
    }




And it gives the following output when printed out:

AALLL DONEEEE!!!:3 Q: 1



Here Q is my iteration number. All four diagonal cores successfully return flags but in the non diagonal ones only 3 do. I printed out after every step and found out that only (1,0) ; (2,0) and (3,0) i.e the cores of first column return flags and others don't. Why is it so I can't understand? Can you please help me out?



My full code is available at: https://gist.github.com/dhanu-mamidi/1e ... d1112aee51
gordon
 
Posts: 24
Joined: Mon May 29, 2017 10:37 am

Re: Iterating host code: Parallella restarts

Postby sebraa » Mon Jun 19, 2017 9:58 am

gordon wrote:I've done like you said. Now I'm facing another issue. I have two device programs running.When I'm into the second iteration, The first one runs on diagonal cores and it works fine. In the second one only 3 out of the 12 cores return flag and hence the program doesn't stop because the remaining 9 cores do not return flag on completion.
I didn't see any obvious problem with your code, although it is written with a lot of copy & paste (loops are useful). I usually wrap the "idle" instruction in an endless loop, to definitely prevent main() from ever exiting.

I can imagine that some cores crash (throw an exception) or for some other reason (e.g. being in idle) never start computing. Your host code would never notice.

Again, I would recommend putting the loop into the cores themselves, and not reloading/restarting the cores after every iteration. In my code, the cores would wait for a flag to be set, run their computation, reset the flag, and start over. The host code would only set the flags and wait for them to be reset.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Iterating host code: Parallella restarts

Postby gordon » Tue Jun 20, 2017 5:02 am

I can't iterate the cores itself because in my case, there are two different device programs to be executed on two different set of cores i.e the diagonal and non diagonal cores. Unless the diagonal cores program finishes its execution I can't start the other program because it takes in the values returned by the first program. So for the second iteration to start on the non diagonal cores, I somehow need to run the second iteration of the diagonal cores first and then send the values to the non diagonal cores via the host and then begin the execution. Can you help me in this direction?
gordon
 
Posts: 24
Joined: Mon May 29, 2017 10:37 am

Re: Iterating host code: Parallella restarts

Postby sebraa » Tue Jun 20, 2017 7:54 am

You have a two-step algorithm:
- step 1: diagonal cores
- step 2: non-diagonal cores

Why do you use the host program to synchronize between the steps? The cores can do that themselves (self-organizing system), the eSDK contains barrier functions for that.

So, the diagonal cores use a structure like this:
Code: Select all
int main() {
  e_barrier_init();
  e_barrier();

  while(1) {
    /* computation */
    e_barrier();
  }
}


The non-diagonal cores use a structure like this:
Code: Select all
int main() {
  e_barrier_init();
  e_barrier();

  while(1) {
    e_barrier();
    /* computation */
  }
}


When all cores are ready, the diagonal cores will execute step 1 (non-diagonal cores wait for another barrier). When they are done, the diagonal cores execute the second iteration step 1, while the non-diagonal cores execute the first iteration step 2, so they overlap. If you can't overlap, you can put in another barrier in both programs, to prevent them from running at the same time.

Also, since your kernels are very small, you can use *one* program on both kernels, which - as part of their initialization - retrieve their own coreid first, and then decide what to do. I don't know if the eSDK barriers are guaranteed to work with different kernels.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Iterating host code: Parallella restarts

Postby gordon » Tue Jun 20, 2017 10:50 am

Actually, after one iteration the host code appends all results into a single 8X8 matrix and then there is swapping of rows and columns.The new matrix is again split and then written to the 16 cores. So actually if we take the example of a diagonal core, the result of first iteration can't be again used for 2nd iteration because it few elements of it need to be used by some other core and few by other.
gordon
 
Posts: 24
Joined: Mon May 29, 2017 10:37 am

Re: Iterating host code: Parallella restarts

Postby sebraa » Thu Jun 22, 2017 3:12 pm

Still, the barriers are a useful synchronization primitive.
- Step 1: diagonal cores do math
- Step 2: swapping phase
- Step 3: non-diagonal cores do math
- Step 4: return result to host, let host send new data
- Repeat.

In any case, you do not want to restart the Epiphany system for every invocation of your program, but instead have the cores handle multiple iterations. For good performance, you want to avoid any host-communication anyway.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Iterating host code: Parallella restarts

Postby gordon » Fri Jun 23, 2017 6:35 am

sebraa wrote:Still, the barriers are a useful synchronization primitive.
- Step 1: diagonal cores do math
- Step 2: swapping phase
- Step 3: non-diagonal cores do math
- Step 4: return result to host, let host send new data
- Repeat.

In any case, you do not want to restart the Epiphany system for every invocation of your program, but instead have the cores handle multiple iterations. For good performance, you want to avoid any host-communication anyway.



Thank you very much for the help sebraa, My code is working finally :) But to measure the efficiency I've recorded the time taken to do a single e_read for a range of bytes and calculated the bitrate, which comes around 15MBps but when I looked into the parallella doc the throughput between host and epiphany is in the range of GBps. Why is this?
gordon
 
Posts: 24
Joined: Mon May 29, 2017 10:37 am

Re: Iterating host code: Parallella restarts

Postby sebraa » Fri Jun 23, 2017 10:34 am

Well, as I've told you, access to the host is very slow, especially if your access pattern is not optimal.

Use consecutive addresses (burst transfers): Otherwise you lose 75% throughput.
Use 64-bit accesses: Otherwise you lose 50% throughput.
Use writes: Reads incur high latency; read requests travel at 12.5% speed only; reads do not allow bursts.

The theoretical maximum throughput on the Parallella is 600 MB/s (with the Epiphany running at 600 MHz), but I know if anyone has ever achieved this. 150 MB/s are a more realistic estimate. As far as my experience goes, the internal communication speed was never an issue.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Next

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 2 guests

cron