How to properly restart a single core without side effects

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

How to properly restart a single core without side effects

Postby ralisi » Sun Dec 14, 2014 3:41 pm

Hello everybody,

I would like to know how to properly reset a single epiphany core. If I use the ARM-cores function e_load_core, this also changes the shared memory section.

I then tried to let one epiphany core reset another one. Therefore, I do the following:

  1. write '1' to the E_REG_CORE_RESET register
  2. copy all memory (0x0000 to 0x8000) from another core
  3. set core_row and core_col in e_group_config to the values of the core to be resetted
  4. write '0' to the E_REG_CORE_RESET register
  5. trigger the E_SYNC interrupt

Unfortunately, this procedure does not always work properly. As I am injecting errors before the reset, it is safe to assume that unaligned memory accesses and invalid operations have accurred prior to the reset. When I launch the debugger to see where the core stopped, it stopped one instruction after accessing the COREID-register or accessing any shared memory. When I check the registers content, everything is set to zero (obvously, because the core was reset) but still, it does not resume normal operations. I also tried to forcibly set the program counter during my reset procedure but this had no effect.

Any hints on that? Did I forget some reset steps?
ralisi
 
Posts: 15
Joined: Fri Apr 11, 2014 12:00 pm

Re: How to properly restart a single core without side effec

Postby aolofsson » Sun Dec 14, 2014 3:57 pm

Are talking about the register that is called "RESETCORE" in the architecture manual? What exact addresses are you writing to?

You have to be very careful with using the reset register this way. If there is any ongoing transaction related to that core ongoing at the time that the soft reset happens "bad" things could happen. A safer method would be to first "debughalt" the neighboring core, make sure nbobody is communicating with the core, making sure all the DMAs in that core are stopped, and then assert the reset sequence.


What code are you running. There are more variables in the initial structures needed than just the core row/col id if you use the e_read/e_write calls iirc.

Can you tried running the simplest possible program that does not elib to make sure that your concept works? (it should...)

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: How to properly restart a single core without side effec

Postby ralisi » Sun Dec 14, 2014 4:26 pm

Thank you for your quick answer.

Yes, I am using that register.

I am using the following code to reset a core:
Code: Select all
extern e_group_config_t const e_group_config;
void reload_core(unsigned row, unsigned col, unsigned fromRow, unsigned fromCol) {
    char *backupcore = e_get_global_address(fromRow, fromCol, (void *) 0);
    char *resetcore  = e_get_global_address(row, col, (void *) 0);

    unsigned *resetAddr = (unsigned *) e_get_global_address(row, col, (void *) E_REG_CORE_RESET);
    unsigned *dbgAddr = (unsigned *) e_get_global_address(row, col, (void *) E_REG_DEBUGCMD);


    *dbgAddr = 1;
    *resetAddr = 1;

    // copy all the data
    memcpy(resetcore, backupcore, (size_t) 0x8000);

    // adjust the coreID config
    e_group_config_t* resetGroupConfig = e_get_global_address(row, col, (void *) &e_group_config);
    resetGroupConfig->core_row = row;
    resetGroupConfig->core_col = col;

    *resetAddr = 0;
    e_irq_set(row,col,E_SYNC);
}

In the beginning, I loaded the same .srec file to the two cores involved here, the one beeing resetted, the fromCore, where I copy the data from. The third core, which copies the data, is different code.

I am not using any DMA at all right now (and hope that the injected errors did not set it up by accident)

When I fiddle with my error injection, I can see that the reset works properly most of the time. But unfortnately, this is not an option.

Providing a minimal example of the Problem will be rather tricky as the freeRTOS port is involved as well. I am still hoping to solve it without that.

//raphael
ralisi
 
Posts: 15
Joined: Fri Apr 11, 2014 12:00 pm

Re: How to properly restart a single core without side effec

Postby aolofsson » Sun Dec 14, 2014 7:28 pm

raphael,

Try the following:

Code: Select all
*resetAddr = 1;
*resetAddr = 0;
//followed by the code memcpy..


(you can't talk to the core (including memory) while the core is in reset)
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: How to properly restart a single core without side effec

Postby ralisi » Sun Dec 14, 2014 11:50 pm

Hello Andreas,

I stopped all cores (by writing 1 to E_REG_DEBUGCMD), then executed the reset, as you suggested, followed by reloading the memory, then resumed everything. Unfortunately, the cores keep getting stuck at the same location, after some time.

//raphael
ralisi
 
Posts: 15
Joined: Fri Apr 11, 2014 12:00 pm

Re: How to properly restart a single core without side effec

Postby sebraa » Mon Dec 15, 2014 11:41 am

ralisi wrote:If I use the ARM-cores function e_load_core, this also changes the shared memory section.
The e_load_core function only touches addresses mentioned in the SREC or ELF file. If you remove the offending sections, shared memory will not be changed. You can do this by running "e-objcopy -R .shared_dram --output-target srec --srec-forceS3" to create a SREC file.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: How to properly restart a single core without side effec

Postby ralisi » Mon Dec 15, 2014 1:08 pm

Hi seebra,

thank you very much for your hint.

I removed all reload-functionality from the epiphany coe and ran the follwing on the arm core:


Code: Select all
        e_halt(&dev, 0, 0);
        e_halt(&dev, 0, 1);
        e_halt(&dev, 0, 2);
        e_halt(&dev, 1, 0);
        e_halt(&dev, 1, 1);
        e_halt(&dev, 1, 2);
        e_halt(&dev, 2, 0);
        e_halt(&dev, 2, 1);
        e_halt(&dev, 2, 2);
        e_halt(&dev, 3, 0);
        e_halt(&dev, 3, 1);
        e_halt(&dev, 3, 2);

        ee_reset_core(&dev, 1, 0);
        ee_reset_core(&dev, 1, 1);
        ee_reset_core(&dev, 2, 0);
        ee_reset_core(&dev, 2, 1);
        ee_reset_core(&dev, 3, 0);
        e_load("worker.srec", &dev, 1, 0, E_FALSE);
        e_load("worker.srec", &dev, 1, 1, E_FALSE);
        e_load("worker.srec", &dev, 2, 0, E_FALSE);
        e_load("worker.srec", &dev, 2, 1, E_FALSE);
        e_load("worker.srec", &dev, 3, 0, E_FALSE);
        e_start(&dev, 1, 0);
        e_start(&dev, 1, 1);
        e_start(&dev, 2, 0);
        e_start(&dev, 2, 1);
        e_start(&dev, 3, 0);

        e_resume(&dev, 0, 0);
        e_resume(&dev, 0, 1);
        e_resume(&dev, 0, 2);
        e_resume(&dev, 1, 0);
        e_resume(&dev, 1, 1);
        e_resume(&dev, 1, 2);
        e_resume(&dev, 2, 0);
        e_resume(&dev, 2, 1);
        e_resume(&dev, 2, 2);
        e_resume(&dev, 3, 1);
        e_resume(&dev, 3, 2);


While this seems to work for a while, the process aquires more and more virtual memory and when it reached 2070368 kB (around 2GB), I get the following messages:

Code: Select all
armcode: e_alloc(): mmap failure.
armcode:
ERROR: Can't allocate external memory buffer!


armcode: e_alloc(): mmap failure.
armcode:
ERROR: Can't allocate external memory buffer!


armcode: e_alloc(): mmap failure.
armcode:
ERROR: Can't allocate external memory buffer!


armcode: e_alloc(): mmap failure.
armcode:


e_load seems to be the culprit for this. I would assume that there is a memory leak in there.

//raphael
ralisi
 
Posts: 15
Joined: Fri Apr 11, 2014 12:00 pm


Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 3 guests

cron