brainstorming - distinguish fine/coarse grain?

Forum for PAL Users and Developers

brainstorming - distinguish fine/coarse grain?

Postby dobkeratops » Sun Jul 05, 2015 10:07 am


[1] on many systems there is a distinction between a coarse & fine grains of parallelism, specifically cores X SIMD (but also & ILP visible to the programer in unrolling, and hyper threading & nested caches)

Would it make sense to further divide the parallel functions with a naming convention into _coarse , _fine versions that hint you definitely mean one or the other; unhinted functions would just do whatever is most sensible.

If architecting for a conventional CPU, you would parallelise large,outer tasks into worker threads, within which fine grain tasks are parallelised by SIMD;
conversely on a manycore chip like parallella you might prefer to spawn yet more subtasks;

This could also relate to data-locality - a '_fine()' hint could also mean that the data-set in question is actually within local-memory and doesn't require DMA streaming

On some platforms the _fine() version *might* spawn more tasks; however you know it *never* does on a SIMD machine; Similarly, '_coarse()' would be a hint that you definitely mean 'use more cores/threads..). You would save code-size & branches over making a dynamic decision every time.

The distinction might help cross platform implementations, e.g. a 'p_sort_u32(..)' called from the main task could fan out and spawn smaller tasks that use p_sort_u32_fine() on subsets, then the outer task merges the results; fine grain sort is also available as a useful component in its' own right, for implementors of other PAL functions.

I would invisage that a set of postfixes can be tweaked by a user to supply more information but in a way that does not change the actual behaviour or result.

I realise that manycore may suit a more general approach , however the intent of this library appears to be to make a good compromise for moving a single codebase between manycore, GPGPU, SMP x SIMD, and even FPGA. You could start out with a good enough assumption across the board, e.g. 'coarse means 4 threads', 'fine means 4way SIMD', which might still be better than no distinction when given 8,16way SIMD.


versionuse DMA (e.g. on CELL,single/multicore DSP)use SIMD(x86-sse,ARM-neon,ppc-altivec)use Cores on CELL, SMPuse DMA+cores on manycore (eg epiphany)
_coarseyesno (subtasks will)yesyes

versionWhere is the data?other notes
_coarseexternal memory (maybe cached)might use temporal writes (cache eviction hints)?
defaultanywhereon epiphany, check addresses to see if DMA should be done ?
_fineexplicitely in local memory (or possibly L1 cache)debug will warn on epiphany if its external?
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Return to PAL

Who is online

Users browsing this forum: No registered users and 1 guest