Cores stall (or crash) on e_dma_wait

Moderator: dar

Cores stall (or crash) on e_dma_wait

Postby nickoppen » Mon May 15, 2017 12:04 pm

I thought I had my DMA example working and was running some tests on images of different sizes when things become clear that I have no idea what's going on.

I'm writing an epiphany based program to do histogram equalisation on gray scal images. My two test images (one of 76Kb and one of 4Mb) work fine. I take the larger one and scale it down and the scaled versions result in most of the cores not returning from the first dma transfer. Mostly cores 0 and 8 do not stall but the data that they end up with is not from the data stream.

I transfer the data into two buffers on the core, each a little less that the memory remaining (I leave 512Kb for stack space) and the size of the buffers is divisible by 8 as is the source buffer. The data type is uint_8 and the dma is transfering DWORDS.

My first thought was that the buffers are over running and writing over the stack. Reducing the buffer size on 4Kb (out of a possible 22Kb) does not help.

I've tried looking at coprcc-db and the output is as follows:

Code: Select all
(copr-db) state
0(0,0) run_state=0x4000000b debug_state=0x0 info=0x820d (33293)
1(0,1) run_state=0xb debug_state=0x0 info=0x44 (68)
2(0,2) run_state=0xb debug_state=0x0 info=0x44 (68)
3(0,3) run_state=0xb debug_state=0x0 info=0x44 (68)
4(1,0) run_state=0xb debug_state=0x0 info=0x44 (68)
5(1,1) run_state=0xb debug_state=0x0 info=0x44 (68)
6(1,2) run_state=0xb debug_state=0x0 info=0x44 (68)
7(1,3) run_state=0xb debug_state=0x0 info=0x44 (68)
8(2,0) run_state=0xb debug_state=0x0 info=0x44 (68)
9(2,1) run_state=0xb debug_state=0x0 info=0x44 (68)
10(2,2) run_state=0xb debug_state=0x0 info=0x45 (69)
11(2,3) run_state=0xb debug_state=0x0 info=0x45 (69)
12(3,0) run_state=0xb debug_state=0x0 info=0x45 (69)
13(3,1) run_state=0xb debug_state=0x0 info=0x45 (69)
14(3,2) run_state=0xb debug_state=0x0 info=0x45 (69)
15(3,3) run_state=0xb debug_state=0x0 info=0x45 (69)
(copr-db) sym
__p_elf 0x1e560
'(null)' number of symbols: 144
                    _impure_data        0x8e002348      1096
                     ___mem_free            0x18c0      4
                   _core_timer_0             0x3d8      4
                     _sys_thread             0x3a8      8
                  _e_emem_config              0x50      8
                 ___coprthr_proc             0x370      40
                    ___dma1_desc            0x18f0      24
                 _e_group_config              0x28      40
        ___coprthr_barrier_state             0x358      16
                          _bebug            0x18ec      4
                       _sys_proc             0x3b0      40
          __dma_copy_descriptor_            0x1920      24
                    __impure_ptr        0x8e002340      4
               ___coprthr_thread             0x368      8
                    ___dma0_desc            0x1908      24
              _sys_barrier_state             0x398      16
                  _dma_data_size            0x18c8      32
             __global_impure_ptr        0x8e00203c      4
(copr-db) sym _bebug
__p_elf 0x1e560
'_bebug' number of symbols: 144
0(0,0): _bebug(    0x18ec)=0x48e8(18664)
1(0,1): _bebug(    0x18ec)=0x0(0)
2(0,2): _bebug(    0x18ec)=0x0(0)
3(0,3): _bebug(    0x18ec)=0x0(0)
4(1,0): _bebug(    0x18ec)=0x0(0)
5(1,1): _bebug(    0x18ec)=0x0(0)
6(1,2): _bebug(    0x18ec)=0x0(0)
7(1,3): _bebug(    0x18ec)=0x0(0)
8(2,0): _bebug(    0x18ec)=0x0(0)
9(2,1): _bebug(    0x18ec)=0x0(0)
10(2,2): _bebug(    0x18ec)=0x0(0)
11(2,3): _bebug(    0x18ec)=0x0(0)
12(3,0): _bebug(    0x18ec)=0x0(0)
13(3,1): _bebug(    0x18ec)=0x0(0)
14(3,2): _bebug(    0x18ec)=0x0(0)
15(3,3): _bebug(    0x18ec)=0x0(0)
(copr-db) mem 0x48e8, 0x48ff
core local memory: 0 (0,0):
48e8: 8e82898a 8a898e8e 8e959594 92928f8d 8e909293 00919492
core local memory: 1 (0,1):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 2 (0,2):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 3 (0,3):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 4 (1,0):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 5 (1,1):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 6 (1,2):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 7 (1,3):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 8 (2,0):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 9 (2,1):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 10 (2,2):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 11 (2,3):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 12 (3,0):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 13 (3,1):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 14 (3,2):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
core local memory: 15 (3,3):
48e8: 00000000 00000000 00000000 00000000 00000000 00000000
(copr-db)


On this run it appears that core 0 has survived but the data stream in the buffer (pointed at by the global symbol _bebug) does not appear in the input data stream.

Can anyone help me with the meaning of the processor states and the info value if that is relevant, please? I think I've done my arithmetic correctly given that some images work. I've just got no idea what is going on.

My code is here: https://github.com/nickoppen/egDMA/blob/master/e_egdmaScan.c if you would like to have a look.

Thanks,

nick
Sharing is what makes the internet Great!
User avatar
nickoppen
 
Posts: 263
Joined: Mon Dec 17, 2012 3:21 am
Location: Sydney NSW, Australia

Re: Cores stall (or crash) on e_dma_wait

Postby jar » Mon May 15, 2017 5:08 pm

I have not run the code, but it looks like your "tailEnds" (cores 0 and 15, presumably) do not wait for the DMA engine to complete before using the DMA again (in e_dma_copy).

Another concept that developers don't realize is that just because the DMA engine completed (after e_dma_wait) doesn't mean your data is where you think it should be. The DMA completing just means the last bit of command to move data across the network or e-link interface has been issued. You must perform a non-trivial check that the data completed moving. If multiple cores are reading or writing to that location, then you must also introduce synchronization. In the OpenSHMEM API, the coherence checking is handled implicitly by shmem_quiet after the call to a non-blocking shmem_put*_nbi or shmem_get*_nbi operation. The e-lib library provides no mechanism for this and is left to the developer.

Also, I have experienced some DMA weirdness that I never was able to pin down. I know that isn't helpful. In general, I avoid DMA with OpenSHMEM calls since synchronous copies typically beat it with Epiphany-III. Asynchronous/non-blocking code is also more complicated code. The painstakingly optimized shmemx_memcpy routine is the fastest way to write contiguous blocks of aligned memory with Epiphany-III (but it also handles misalignment).
User avatar
jar
 
Posts: 288
Joined: Mon Dec 17, 2012 3:27 am

Re: Cores stall (or crash) on e_dma_wait

Postby nickoppen » Tue May 16, 2017 10:52 am

Hi Jar,

The variable name "tailEnds" is perhaps not a good one. It is the amount of data for the core modulus the buffer size. I'll make sure that it waits before sending the results back.

Do the processor states in the debug session shed any light on what is happening? I looked through the architecture reference and there is a lot of discussion about processor states but no table that relates the mnemonic with the value.

I'm getting into dma in the belief that it can be used to shift one lot of data around while the core is processing another. If the algorithm is non-trivial or needs to be run many times (e.g. neural network training) then there is a net gain, even if the data transfer is not as quick as it could be. Am I on the right track here?

nick
Sharing is what makes the internet Great!
User avatar
nickoppen
 
Posts: 263
Joined: Mon Dec 17, 2012 3:21 am
Location: Sydney NSW, Australia

Re: Cores stall (or crash) on e_dma_wait

Postby jar » Tue May 16, 2017 1:31 pm

I ran your code last night and I wasn't sure what you were seeing that was crashing/stalling. How can I reproduce your issue?

Those error codes never made sense to me and it would be nice if there was some way to decode it. I'll ask dar sometime (or you could).
User avatar
jar
 
Posts: 288
Joined: Mon Dec 17, 2012 3:27 am

Re: Cores stall (or crash) on e_dma_wait

Postby nickoppen » Wed May 17, 2017 12:35 am

That code on github was set to use memcpy and that test file was one that worked anyway.

I've updated the repository with a DMA version and an input file (bridge5.csv) that always fails for me. The file bridge0.csv works as does gray.csv.

I've left in some host_printf calls that shed some light on what is going on. There is also a global symbol (_bebug) that points to the location of the most recently transferred data.

Thanks again for helping me with this.

nick
Sharing is what makes the internet Great!
User avatar
nickoppen
 
Posts: 263
Joined: Mon Dec 17, 2012 3:21 am
Location: Sydney NSW, Australia

Re: Cores stall (or crash) on e_dma_wait

Postby nickoppen » Wed May 24, 2017 12:46 am

I've been digging through the Architecture reference and I think I've found candidates for the columns displayed by the status command in debugger.

My guess is that the "run_state" refers to the STATUS register and "debug_state" refers to the DEBUGSTATUS register. I've not idea what "info" could refer to.

However, this does not help me much. The value 0x4000000b in the STATUS register says:

- The core is active
- All interrupts are enabled
- The WAND bit is set (which has something to do with barriers but is marked as LABS which should be regarded as "experimental")
- Bit 31 is also set for core 0 but that is reserved so I've no idea what this means.

The value of the debug_state seems to indicate that everything is fine.

So interesting but not useful.
Sharing is what makes the internet Great!
User avatar
nickoppen
 
Posts: 263
Joined: Mon Dec 17, 2012 3:21 am
Location: Sydney NSW, Australia


Return to OpenCL

Who is online

Users browsing this forum: No registered users and 2 guests