Face detection on Parallella

Re: Face detection on Parallella

Postby leonfg » Mon May 09, 2016 12:56 pm

olajep wrote:Hi,

I created a pull request here:
https://github.com/leonfg/face_detect/pull/1

Tested 100x iterations w/ 16 cores. No issues AFAICT. Without looking at the rest of the code my best guess is that some other part of the program overwrites 0x4000 at runtime.


e_mutex_init() was changed to be a no-op in ESDK 2016.3.
https://github.com/adapteva/epiphany-li ... init.c#L30
... but programs that use hard-coded pointer addresses will still work without modification since the loader now clears core SRAM *before* starting the program.
The change was done because runtime initialization of a mutex while multiple threads are running creates a race condition. The only reason this did work before was because of luck + that the mutex was usually placed at core(0,0) which is the first core the loader starts so the call to e_mutex_init() (usually in beginning of main()) would have completed before the rest of the cores were started.

For SIMD applications (one ELF file) it is highly recommended to compile-time initialize mutexes like below and remove any call to e_mutex_init() since it's not needed.
Code: Select all
/* Global mutex */
e_mutex_t global_mutex = MUTEX_NULL;


For MIMD applications (more than one ELF file) no change should be done, static initialization cannot be used here. Right now the e_mutex programming model is "semi-broken" for MIMD code, but will still work without modification since calls to e_load()/e_load_group() clears core SRAM.

Thank you very much! I merged your modification and it now runs perfactly both on esdk 15.1 and 16.3. The reason of the problem is just the wrong mutex using.
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby leonfg » Wed May 11, 2016 7:42 am

olajep wrote:Hi,

I created a pull request here:
https://github.com/leonfg/face_detect/pull/1

Tested 100x iterations w/ 16 cores. No issues AFAICT. Without looking at the rest of the code my best guess is that some other part of the program overwrites 0x4000 at runtime.


e_mutex_init() was changed to be a no-op in ESDK 2016.3.
https://github.com/adapteva/epiphany-li ... init.c#L30
... but programs that use hard-coded pointer addresses will still work without modification since the loader now clears core SRAM *before* starting the program.
The change was done because runtime initialization of a mutex while multiple threads are running creates a race condition. The only reason this did work before was because of luck + that the mutex was usually placed at core(0,0) which is the first core the loader starts so the call to e_mutex_init() (usually in beginning of main()) would have completed before the rest of the cores were started.

For SIMD applications (one ELF file) it is highly recommended to compile-time initialize mutexes like below and remove any call to e_mutex_init() since it's not needed.
Code: Select all
/* Global mutex */
e_mutex_t global_mutex = MUTEX_NULL;


For MIMD applications (more than one ELF file) no change should be done, static initialization cannot be used here. Right now the e_mutex programming model is "semi-broken" for MIMD code, but will still work without modification since calls to e_load()/e_load_group() clears core SRAM.

I did some performance test based on current code. I found that the speedup ratio is not equal to the core number I used. For a 1920*1080 size image, the classification calculation time cost in Epiphany by using 1 core, 2 cores, 4 cores, 8 cores and 16 cores are 12.6s, 5.5s, 2.6s, 1.3s and 0.6s. There will be 20X+ speedup for 16 cores than 1 core!
I can't figure out why the speedup is more than the multiples of the number of cores. I thought the speedup will be little lower than the multiples of the number of cores before I do the test.
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Previous

Return to Image and Video Processing

Who is online

Users browsing this forum: No registered users and 2 guests

cron