Face detection on Parallella

Face detection on Parallella

Postby leonfg » Thu May 05, 2016 6:07 am

The face detection program now works well. You can get the latest version from https://github.com/leonfg/face_detect.
==========================================================================================================================
I modified the old face detection project(http://www.adapteva.com/white-papers/fa ... -processor) to make it run on parallella and current eSDK. The program can run correctly but still have some problems, please have a look at the code and help me to fix these.
Code addr: https://github.com/leonfg/face_detect

Parallella board model: Embedded Platform
Image: ubuntu-14.04-headless-z7020-20150130
esdk: 15.1.

Remaining problems:
Working principle: resizing the input image into several different sizes, then dividing the resized images into many subimages with a given size window, the cascade classifier will do feature matching calculation to distinguish if a subimage is a human face. We use Epiphany cores to do the feature matching work parallelly to achieve speedup. There are some problems in current code:
1. Core number: When use most of the cores like 16 or 15, the device program always hang. I have to use less cores to prevent this.
2. Number of loop iterations: There are two main "for" loop sequences in the device program, one is dmacopy the subimages from shared memory, the other is classifier calculation. When there are too many dmacopy loop iterations in core the program will hang too. So I have to use more cores to reduce the loop iterations in each core.
The hang situation will happen randomly, but the workflow and logic are definitely correct, I can not figure out the reason of the problem, so I have to use nether too less nor too more cores to balance the core number and the loop iterations, 4~12 cores will get successful execution most time.
t1.jpg
t1.jpg (283.83 KiB) Viewed 25387 times

t2.jpg
t2.jpg (698.61 KiB) Viewed 25387 times
Last edited by leonfg on Mon May 09, 2016 12:58 pm, edited 2 times in total.
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby aolofsson » Thu May 05, 2016 1:55 pm

Before looking at code...
-Which board model?
-What software image/sdk version?
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Face detection on Parallella

Postby leonfg » Thu May 05, 2016 7:28 pm

aolofsson wrote:Before looking at code...
-Which board model?
-What software image/sdk version?

Thanks for your concern!
Parallella board model: Embedded Platform
Image: ubuntu-14.04-headless-z7020-20150130
esdk: 15.1
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby aolofsson » Thu May 05, 2016 9:42 pm

There were some reports of instability for heave multi threaded code with the old elink with heavy use of DMAs. (but we had a very hard time reproducing the bug). The new redesigned the elink "should" solve these kinds of issues. If the problem persists, we will take a look at the code.

Can you try running with the latest image for the 7020? (headless 15.04)

https://www.parallella.org/create-sdcard/
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Face detection on Parallella

Postby leonfg » Fri May 06, 2016 2:45 am

aolofsson wrote:There were some reports of instability for heave multi threaded code with the old elink with heavy use of DMAs. (but we had a very hard time reproducing the bug). The new redesigned the elink "should" solve these kinds of issues. If the problem persists, we will take a look at the code.

Can you try running with the latest image for the 7020? (headless 15.04)

https://www.parallella.org/create-sdcard/

I built the new image sd card. The situation persists...
BTW: In eSDK 16.3, e-gcc will throw a warning when compile: call to ‘e_mutex_init’ declared with attribute warning: e_mutex_init() is on probation and is currently a no-op. For correctness, ensure that mutex is statically zero-initialized.
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby aolofsson » Fri May 06, 2016 3:24 pm

Is your mutex initialized to zero per message?
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Face detection on Parallella

Postby leonfg » Fri May 06, 2016 3:59 pm

aolofsson wrote:Is your mutex initialized to zero per message?

I defined a mutex ptr and pointed to 0x4000. Same as the Epiphany mutex example.
see https://github.com/leonfg/face_detect/b ... detector.c
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby aolofsson » Fri May 06, 2016 6:18 pm

I would really recommend updating the code to something more modern like COPRTHR 2.0 (or MPI, or openMP, or anything else).
Writing your own mutexes is just a bad idea (we did it b/c we didn't have anything better back then)

viewtopic.php?f=13&t=3661

Instinct tells me it's going to be faster to refactor the code and lift the math routines instead of chasing synchronization ghosts. The code was done as a demo 4 years ago for a different platform and has not been brought up to date for parallella. A LOT has changed since then.

Some suspect things in old code:
-I believe the "common go" apprach used here is insufficient (may need to zero out mutex /wait variable explicitly from host?)
-initializtion of variables like mutex (Ola might have some update on the init of variables/mutexes)
-the wait loop around the dma transfer (should not be needed with new code)

Sorry that I can't help more...
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Face detection on Parallella

Postby leonfg » Fri May 06, 2016 6:50 pm

aolofsson wrote:I would really recommend updating the code to something more modern like COPRTHR 2.0 (or MPI, or openMP, or anything else).
Writing your own mutexes is just a bad idea (we did it b/c we didn't have anything better back then)

viewtopic.php?f=13&t=3661

Instinct tells me it's going to be faster to refactor the code and lift the math routines instead of chasing synchronization ghosts. The code was done as a demo 4 years ago for a different platform and has not been brought up to date for parallella. A LOT has changed since then.

Some suspect things in old code:
-I believe the "common go" apprach used here is insufficient (may need to zero out mutex /wait variable explicitly from host?)
-initializtion of variables like mutex (Ola might have some update on the init of variables/mutexes)
-the wait loop around the dma transfer (should not be needed with new code)

Sorry that I can't help more...


Thanks anyway!
I'm just curious about the reason why running with less cores will be ok but more cores will fail. And I thought program in esdk will get best performance, I did other work with coprthr 1.6.x before and the performance was not good, you mean 2.0 will be better? Coding with coprthr will be more comfortable than esdk. I will do some study on it.
leonfg
 
Posts: 18
Joined: Mon Nov 24, 2014 8:31 am

Re: Face detection on Parallella

Postby olajep » Mon May 09, 2016 10:27 am

Hi,

I created a pull request here:
https://github.com/leonfg/face_detect/pull/1

Tested 100x iterations w/ 16 cores. No issues AFAICT. Without looking at the rest of the code my best guess is that some other part of the program overwrites 0x4000 at runtime.


e_mutex_init() was changed to be a no-op in ESDK 2016.3.
https://github.com/adapteva/epiphany-li ... init.c#L30
... but programs that use hard-coded pointer addresses will still work without modification since the loader now clears core SRAM *before* starting the program.
The change was done because runtime initialization of a mutex while multiple threads are running creates a race condition. The only reason this did work before was because of luck + that the mutex was usually placed at core(0,0) which is the first core the loader starts so the call to e_mutex_init() (usually in beginning of main()) would have completed before the rest of the cores were started.

For SIMD applications (one ELF file) it is highly recommended to compile-time initialize mutexes like below and remove any call to e_mutex_init() since it's not needed.
Code: Select all
/* Global mutex */
e_mutex_t global_mutex = MUTEX_NULL;


For MIMD applications (more than one ELF file) no change should be done, static initialization cannot be used here. Right now the e_mutex programming model is "semi-broken" for MIMD code, but will still work without modification since calls to e_load()/e_load_group() clears core SRAM.
_start = 266470723;
olajep
 
Posts: 139
Joined: Mon Dec 17, 2012 3:24 am
Location: Sweden

Next

Return to Image and Video Processing

Who is online

Users browsing this forum: No registered users and 1 guest