by sebraa » Mon Jul 27, 2015 7:07 pm
The external memory bandwidth limits the number of lattices I can copy out to the host. As I have been told at the presentation conference, often one doesn't need the whole lattice, but only selected variables (e.g. density, velocity) for each node, basically reducing the required bandwidth from 19 to 2~3 floats per node (in 3D).
However, this assumes that the Epiphany contains the whole lattice, so the maximum simulation size is limited by the number of cores and the amount of memory per core. In real simulations, you'd need lattices about 4-5 orders of magnitude larger (1..10 GB, maybe more), so keeping the lattice fully in local memory is plainly infeasible. Then, the external memory bandwidth is dictated by the processing speed.
With my current code, each core processes about 2.8 millions of nodes (2D) or 0.34 millions of nodes (3D) per second, which translates to about 100 MB/s (2D) or 26 MB/s (3D). However, especially the 3D case could probably be optimized further (I couldn't test with -O3, which helped the 2D case immensely).
On a side-note: It scaled perfectly on the 64-core Epiphany as well.