Page 1 of 2
On the Use of a Many-core Processor for Computational Fluid
Posted:
Mon Jul 27, 2015 2:04 pm
by sebraa
Title: On the Use of a Many-core Processor for Computational Fluid Dynamics Simulations
Link:
http://www.sciencedirect.com/science/ar ... 0915011564Author: Sebastian Raase, Tomas Nordström
Publication: International Conference On Computational Science, ICCS 2015
Source: Yes, see attachment. (Note: This is my code from the master thesis; there have been no substantial changes for the paper.)
Keywords: Many-core; Epiphany; Computational Fluid Dynamics; Lattice Boltzmann
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 3:34 pm
by aolofsson
Very nice work!!
Have you looked into scaling the work to more cores. How many cores and memory/core do you need to make the off chip bandwidth issue "go away"?
Andreas
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 7:07 pm
by sebraa
The external memory bandwidth limits the number of lattices I can copy out to the host. As I have been told at the presentation conference, often one doesn't need the whole lattice, but only selected variables (e.g. density, velocity) for each node, basically reducing the required bandwidth from 19 to 2~3 floats per node (in 3D).
However, this assumes that the Epiphany contains the whole lattice, so the maximum simulation size is limited by the number of cores and the amount of memory per core. In real simulations, you'd need lattices about 4-5 orders of magnitude larger (1..10 GB, maybe more), so keeping the lattice fully in local memory is plainly infeasible. Then, the external memory bandwidth is dictated by the processing speed.
With my current code, each core processes about 2.8 millions of nodes (2D) or 0.34 millions of nodes (3D) per second, which translates to about 100 MB/s (2D) or 26 MB/s (3D). However, especially the 3D case could probably be optimized further (I couldn't test with -O3, which helped the 2D case immensely).
On a side-note: It scaled perfectly on the 64-core Epiphany as well.
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 7:17 pm
by aolofsson
Based on experience with Epiphany in many other domains, it's much more interesting to remove the DRAM from the equation (getting rid of the training wheels:-)).
How would your code perform on a system with 64 x 64 x 64 3D torus of Epiphany-III cores?
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 7:34 pm
by sebraa
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 8:13 pm
by jar
The Lattice Boltzmann method is an O(n) method per node per iteration so no amount of core or data scaling will help reduce the off-chip bandwidth overhead for this implementation. The key point in this method is that it is iterative and converges to a steady state solution. So there can be a lot of work on the device without ever having to copy a result to shared memory (DRAM) if the cores share boundary data using inter-core communication. I enjoyed reading the paper, but I think the lack of inter-core communication was one of the weaker points. After each lattice update, each core should write edge node data to the appropriate neighboring core memory rather than to shared memory.
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 8:31 pm
by sebraa
Instead of writing the boundary data to the neighboring core (which would require additional on-core memory), the code reads the boundary data directly from the neighboring core, using shared memory. It is not necessary to write anything to shared memory (obviously, the host then never gets any results or status updates).
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 9:05 pm
by aolofsson
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 9:09 pm
by aolofsson
Re: On the Use of a Many-core Processor for Computational Fl
Posted:
Mon Jul 27, 2015 9:15 pm
by jar