BOINC projects with open source code?

Re: BOINC projects with open source code?

Postby keithsloan52 » Sat May 24, 2014 8:16 am

Bikeman wrote:The bet is that with a Raspi you can outperform the Parallella on specific FFTs : N=3*2^22, real-to-complex, single precision. In-place or out-of-place, whatever is fastest on each platform.


Okay and you are doing the Pi development. Who is doing the work on the Parallella? Andreas?
Or are you doing both with a vested interest that the Pi will win?
keithsloan52
 
Posts: 17
Joined: Fri Mar 07, 2014 9:22 am

Re: BOINC projects with open source code?

Postby shodruk » Sat May 24, 2014 10:27 am

HBE,
Why do you think Pi beats Parallella? (in technically)
Shodruky
shodruk
 
Posts: 464
Joined: Mon Apr 08, 2013 7:03 pm

Re: BOINC projects with open source code?

Postby Bikeman » Sat May 24, 2014 12:10 pm

Okay and you are doing the Pi development. Who is doing the work on the Parallella? Andreas?
Or are you doing both with a vested interest that the Pi will win?


For the PI, the foundation is already there because there's now an FFT implementation on the Raspi's GPU, there's "just" some more work to be done to combine smaller into longer FFTs because the GPU implementation can't handle the target size of FFTs of this challenge. I plan to do this, yes.

Andreas will see work is done on the Parallella side so he gets beer in time for his birthday :-), but this is free for all and if someone else comes up with a faster FFT, I'm quite sure Andreas would be also be happy about this. FFT is a nice showcase algorithms and the higher the performance, the better.

with a vested interest that the Pi will win?

Not at all, I paid 99$ for the Parallella like all other backers and I want to see the maximum performance squeezed from it.

shodruk wrote:HBE,
Why do you think Pi beats Parallella? (in technically)


I am not at all sure the Pi beats the Epiphany on shorter transforms, but for N=3*2^22 =~ 12 million = 48 MB for the input and output, I think it's an interesting race. The combined per-core memory of the Epiphany16 can only hold a tiny fraction of this , so I would guess that a lot of shuffling of data between the Epiphany16 and the host RAM has to happen, which will degrade performance compared to the theoretical peak performance of the E16.

The Raspi's GPU (Videocore IV) is a quite respectable, but more specialized, piece of hardware! It's architecture doesn't scale nearly as well as Epiphany (e.g. you can't cluster them together) and it has a lower clock rate (250 MHz only), but when comparing the raw computing power it should not be far from the Epiphany16 (it has vector units to compensate for the lower clock rate). I've seen 24GFlops being mentioned as the theoretical peak performance for the Videocore IV (almost the same as for the E16 @666MHz), but you will not see this in real life on either platform, I guess. The Videocore IV is definitely capable of handling video at Full HD, so it can't be that bad :-). We will see.

HB
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: BOINC projects with open source code?

Postby theover » Sat May 24, 2014 1:56 pm

The race sounds like an interesting one, and I sure might like to be able to click in a FFTw function into the Linux Ladspa/Jack "Jamin" mastering effect, that makes use of the parallella!

Maybe it should be considered that for a so long a 2^22 FFT, intermediate result accuracy may well become a problem in single precision. Also a thought might be that the NEON in the ARM hopefully is able to accelerate FFTs quite well.

T.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: BOINC projects with open source code?

Postby Bikeman » Sat May 24, 2014 2:26 pm

theover wrote:Maybe it should be considered that for a so long a 2^22 FFT, intermediate result accuracy may well become a problem in single
precision.
Yes, but on both platforms, RPi and Parallella.

The choice of the transform length comes from its use in the Einstein@Home project, where it is used to search for pulsars (spinning neutron stars that emit radio beams that appear as pulses to a radio telescope). The input data from the radio telescope actually has just a few bits of dynamic range IIRC and the E@H project found that single precision is quite ok in this context.

theover wrote: Also a thought might be that the NEON in the ARM hopefully is able to accelerate FFTs quite well.

T.


Yup, a bit. The app that you will get when using the Parallella on Einstein@Home is using FFTW 3.3.2 at the time of writing, which already is NEON capable. I'm currently testing FFTW3.3.4 to see whether it's faster. The Raspi will, of course, get a different version since it doesn't support NEON. So the comparison between Raspi and Parallella runtimes of E@H using just the ARM CPUs which I posted earlier (roughly 1.3 per single core) is already taking this into account and shows the advantage the Parallella has over the Raspi on the ARM CPUs.

HBE
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: BOINC projects with open source code?

Postby theover » Sat May 24, 2014 5:28 pm

Hi, well I played a bit with @home things a while ago when it was new, and didn't read through the whole thread, sorry to say. If I'm not mistaken it is observed that the PI core to Zinq core improvement is only 1:1.3 ?! That I hope is due to not great compiled and /or optimized software, and without NEON, because NEON should make quite a difference I would think ?

Well, there are (for those who know a bit about the Xilinx or other brand programmable logic) IP "blocks" for the FPGA too, which compute FFTs, usually not too many bits, but the FPGA structure and the DSP mul/add power should make a difference too.

It may just boil down to which savvy programmer+ architecture buff sees the time and effect as worth while to squeeze some real juice out of the modern processors architectures!

T.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: BOINC projects with open source code?

Postby Bikeman » Sat May 24, 2014 6:31 pm

theover wrote:Hi, well I played a bit with @home things a while ago when it was new, and didn't read through the whole thread, sorry to say.


@home: It's fun and on ARM, it won't be as noticeable on the electrical power bill as it is on desktop GPUs :-)


theover wrote: If I'm not mistaken it is observed that the PI core to Zinq core improvement is only 1:1.3 ?!


Yes, but in single core performance, with the Raspi overclocked (which can be done via configuration w/o need for a fan or heatsink and everybody does it, so it's only fair ..). Since the Raspi is single core and the Zinq dual core, the actual speedup is more like x2.6.

theover wrote: That I hope is due to not great compiled and /or optimized software, and without NEON, because NEON should make quite a difference I would think ?


No, as I wrote, the Parallella measurement was using NEON, the Raspi (of course) didn't. I think this is pretty much in line with benchmarks done earlier by Adapteva themselves, e.g. see http://www.adapteva.com/white-papers/be ... arallella/ . Single threaded applications should run a quite bit less than 2 x faster and multithreaded up to 3 times faster. Don't forget that the ARM Cortex A9 is not the latest and fastest ARM core around.

HB
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: BOINC projects with open source code?

Postby theover » Sat May 24, 2014 7:19 pm

Well, it surprises me the NEON speedup is so low for fftw, considering it has 16 units, I'd expect good bandwidth to the data registers, I d hope for more speedup, but maybe that isn't realistic. I couldn't quickly find a FLOPS rating for the NEON and non-NEON ARM cores, just that it appears the Zinq ARM does have double precision NEON.

T.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: BOINC projects with open source code?

Postby Bikeman » Sat May 24, 2014 8:09 pm

theover wrote:Well, it surprises me the NEON speedup is so low for fftw, considering it has 16 units, I'd expect good bandwidth to the data registers, I d hope for more speedup, but maybe that isn't realistic. I couldn't quickly find a FLOPS rating for the NEON and non-NEON ARM cores, just that it appears the Zinq ARM does have double precision NEON.

T.


E@H only uses single precision NEON.

The NEON instruction set can indeed encode 16 operations in one instruction, BUT those 16 will be instructions on 8 bit data! NEON in the Cortex A9 is 128 bit wide, that means only 4 single precision (32 bit) floating point ops per instruction (like SSE in Intel CPUs).

And that you can encode 4 ops with one instruction doesn't mean that they execute with 1 op/clock throughput. Internally the CPU can break the 128 bit vector instruction into two 64 bit vector instructions that are executed in sequence. This would lead to a speed-up of just 2 and I think that's what's happening on the A9.

Cheers
HB
Bikeman
 
Posts: 52
Joined: Wed Sep 11, 2013 8:55 pm

Re: BOINC projects with open source code?

Postby theover » Sat May 24, 2014 8:28 pm

It appears that our good OS friends at FFMPEG achieve more realistic speedup on NEON, I didn't dig through the whole deeply, but maybe this is interesting:

http://gsoc2010-fftw-neon.blogspot.nl/

T.
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

PreviousNext

Return to Berkeley Open Infrastructure for Network Computing (BOINC)

Who is online

Users browsing this forum: No registered users and 2 guests

cron