SU2

SU2

Postby wgvanveen » Mon Nov 24, 2014 4:45 pm

Hey everybody,

I just discovered the parallella and I am thinking of buying one. I am a student in the field of Aerodynamics. I most of the time work with SU2 and I was wondering if it is possible to compile SU2 for the parallella. I read on the SU2 page that it runs on the Intel Xeon Phi coprocessor and I thought that maybe the step to the parallella is not so big. The main concern I have is the memory usage. Normal simulations running on 4 -8 cores take about 3 gb of ram per core (8 million cells), which is not available in the parallella. Is there a work around, or is this board simply not ment for such computations?

Thank you for your help!

Wouter
wgvanveen
 
Posts: 4
Joined: Mon Nov 24, 2014 4:42 pm

Re: SU2

Postby aolofsson » Mon Nov 24, 2014 6:09 pm

Thanks for the request. It would be interesting, but gut feeling it would be a tough porting process...
-SU2 code written in C++ (likely too big for Epiphany cores)
-SU2 needs double precision floating point
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: SU2

Postby wgvanveen » Mon Nov 24, 2014 6:23 pm

Dear Andreas,

Thank you for your fast reply! It is a pity to hear that SU2 is probably not going to run on the parallella. I was wondering if it is possible to run CFD software on the parallella? (either current packages, or home made software) I am not very familiar with parallel computing other then the standard MPI stuf. I can imagine that it would be valuable to have a high powered card running CFD software.

Thank you for your reply!

Wouter
wgvanveen
 
Posts: 4
Joined: Mon Nov 24, 2014 4:42 pm

Re: SU2

Postby sebraa » Tue Nov 25, 2014 4:53 pm

I did my master's thesis about CFD on Epiphany (Lattice Boltzmann algorithm), but it's not published yet. Should be out in a few weeks at most. I did not even try double precision or lattices larger than local memory (i.e. 104x104 maximum lattice size on E16). The scalability was very good, computing speed was okay, but the combination of "too small local memory" and "too little shared memory bandwidth" killed it.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: SU2

Postby aolofsson » Tue Nov 25, 2014 5:52 pm

Nice! Look forward to seeing the results! Can you elaborate on the bottleneck wrt to memory size and memory bandwidth?
-what if you had 128KB of memory per core?
-what if you had thousands of cores on a card?
-what kind of interprocessor communication were you running?
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: SU2

Postby wgvanveen » Tue Nov 25, 2014 7:16 pm

I am looking forward to reading your Thesis! Is your code available after the publication or is it private? Do you know if "traditional" finite volume methods are also tried on the Parallella?

For the memory part we use the rule of thump of around 4gb memory per core. I don't know how to scale this properly to the Parallella.
wgvanveen
 
Posts: 4
Joined: Mon Nov 24, 2014 4:42 pm

Re: SU2

Postby sebraa » Wed Nov 26, 2014 11:13 am

aolofsson wrote:Nice! Look forward to seeing the results! Can you elaborate on the bottleneck wrt to memory size and memory bandwidth?
It's a master's thesis at HH, so it will appear on Diva when it's ready. :-)

I achieved 85 MB/s writing to shared memory, but I did not investigate or optimize further. Even at the 600 MB/s theoretical maximum, it will be used for copying the results to the host (every few iterations), limiting iteration speed already. Using shared memory as "swap space" to simulate larger lattices is not feasible, since I need to touch the whole lattice once per iteration. This becomes even more of a problem when increasing the number of cores.

aolofsson wrote:-what if you had 128KB of memory per core?
I could increase the kernel size from currently 8 KB to 16-32 KB, enabling more low-level optimizations (loop unrolling etc). The 8 KB I chose are already too small to fit the 3D kernel comfortably, but even then I only have storage for a 7x6x7 block per core. Also, larger block sizes are handled more efficient (but slower, though).

aolofsson wrote:-what kind of interprocessor communication were you running?
Inter-core communication is next-neighbor only, but both writing and reading (to conserve data memory). Shared memory access uses memcpy() in parallel.

aolofsson wrote:-what if you had thousands of cores on a card?
Computation scales excellent, but getting data in and out of the system is the problem.

wgvanveen wrote:Is your code available after the publication or is it private?
I can share it, when everything's ready.

wgvanveen wrote:Do you know if "traditional" finite volume methods are also tried on the Parallella?
I don't know.

wgvanveen wrote:For the memory part we use the rule of thump of around 4gb memory per core. I don't know how to scale this properly to the Parallella.
I only used 24 KB (slightly less) per core. The models required are on the order of hundreds of gigabytes however, which cannot be stored on Epiphany or streamed through the chip efficiently.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: SU2

Postby wgvanveen » Thu Nov 27, 2014 11:40 am

Thank you for you detailed awnser! I think for now it might not be the best option to start running CFD (out of the box). However I think I will try one to see if some FVM code can be developed for it!
wgvanveen
 
Posts: 4
Joined: Mon Nov 24, 2014 4:42 pm


Return to Scientific Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron