Could not get enough CPU speed

Could not get enough CPU speed

Postby naito » Mon Jun 30, 2014 2:24 am

Hi,

I tried to calculate the PI by Monte Carlo method using the OpenMP.
However, I could not get enough CPU speed. (The Intel i3 CPU is faster than the Epiphany.)
Especially, the for loop section is slow at the moment.

I set the environmental variables below.

setenv OMP_THREADS_NUM 16

And the C language program about the for loop section is below.

srandom( time( NULL ) );
#pragma ompl parallel for
for ( l = 0; l < times; l++ ) {
double x = ( double )random() / RANDOM_MAX;
double y = ( double )random() / RANDOM_MAX;

if (x * x + y * y < 1.0) {
counter++;
}
}

Does anybody knows about it.

Sincerely,
naito
 
Posts: 2
Joined: Thu Jun 26, 2014 9:14 am

Re: Could not get enough CPU speed

Postby xilman » Mon Jun 30, 2014 8:44 am

Try replacing "double" with "float" and re-run the comparison ...
xilman
 
Posts: 80
Joined: Sat May 10, 2014 8:10 pm
Location: UK

Re: Could not get enough CPU speed

Postby Bikeman » Mon Jun 30, 2014 9:50 am

Does that code even work correctly? I'm not an expert on OpenMP but I thought you would have to tell OpenMP how to handle the counter increment, either as an atomic operation or as a sum reduction. Or is there a guarantee that the counter++ is indeed an atomic operation even across several cores?

EDIT: Which OpenMP implementation are you using? As far as I know, the preinstalled OpenMP support is JUST for the dual core ARM, NOT using the Epiphany at all. There is some experimental stuff for the Epiphany in the works tho, see the other thread in this sub-forum.

If you are just using the dual core ARM CPU, the performance will be pathetic compared to the Intel CPU of course.


Cheers
HB
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: Could not get enough CPU speed

Postby over9000 » Mon Jun 30, 2014 2:09 pm

Definitely looks like the OP is omitting the parts that do the MPI initialisation and sending. Also that the code is almost definitely running on ARM, not Epiphany. As for 'counter', with MPI this will be per-thread. You can do an MPI reduce operation to sum up all the values later, which is probably another bit of code that's omitted here.

I know it's only a simple test, but if you want accuracy and speed from this, consider:

* there's no need for floating points. You can scale everything according to your max int value. you could scale the circle to sqrt(max_int), but you only get half the number of significant bits, or scale to max_int and do a "long" multiply on the high/low max_int/2 bit values to avoid r squared and other products overflowing. Even if you do long multiplication, you'll probably get around an order of magnitude faster results on the Epiphany.
* the standard random number generator probably isn't good for statistical uses, so you'd probably need a higher quality one. Use the same idea as above and don't go doing unnecessary floating point operations (save for the final one that calculates the reduced ratio). If RAND_MAX is less than max_int, consider scaling to RAND_MAX instead.

I haven't had time to play with my Parallella much yet, and this is one of the very simple test projects I had in mind. It should be interesting to pit the epiphany against other platforms like the Raspberry Pi's GPU, ARM NEON, x86 SSE and so on. I know doing this via MPI isn't immediately useful, but it becomes more so if you can do arbitrary-precision stuff. Also, without cluster cables, MPI is probably the best choice for having multiple Parallella boards communicate with each other...
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am

Re: Could not get enough CPU speed

Postby Bikeman » Mon Jun 30, 2014 3:22 pm

over9000 wrote:Definitely looks like the OP is omitting the parts that do the MPI


MPI?? This is about OpenMP, MPI is a different sub-forum :-)

Cheers
HB
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: Could not get enough CPU speed

Postby over9000 » Mon Jun 30, 2014 4:47 pm

Bikeman wrote:
over9000 wrote:Definitely looks like the OP is omitting the parts that do the MPI


MPI?? This is about OpenMP, MPI is a different sub-forum :-)

Cheers
HB


Ah, reading comprehension fail on my part. I just look at recent threads rather than trawl all the forums, so I didn't know where this was posted. Still, points about using ints instead of floats and and needing a good quality rng should still be valid, I think. The counter should also be local to the thread.
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am

Re: Could not get enough CPU speed

Postby naito » Wed Jul 02, 2014 4:41 am

Hi,

Thank you for the advice xilman. I tried to replace "float" and re-run the program.
The process spent 1791 second, and it's faster than before.
However, it's still slow than the Intel i3 CPU. :cry:

Sincerely,
naito
 
Posts: 2
Joined: Thu Jun 26, 2014 9:14 am

Re: Could not get enough CPU speed

Postby shodruk » Wed Jul 02, 2014 9:28 am

naito wrote: However, it's still slow than the Intel i3 CPU. :cry:


That's normal. ARM Cortex A9 667MHz is slower than Core i3.
Shodruky
shodruk
 
Posts: 464
Joined: Mon Apr 08, 2013 7:03 pm


Return to OpenMP

Who is online

Users browsing this forum: No registered users and 1 guest