by dar » Mon Nov 11, 2013 5:06 am
At the present time the mutex extensions target the older eSDK and we are waiting to receive the same Parallella board that will be shipping in order to update the implementation to eSDK-5. We presently have a prototype board, and there have been significant changes in the eSDK which cause things like mutex locks to no longer work as they once did depending on what recent software/hardware combination you may have.
In general, I believe the posts above are correct on two observations. The mutex must be stored in core-local memory not global DRAM, and if not done carefully one can end up with each core targeting its own private redundant mutex defeating the operation of the mutex.
As far as the usefulness of a mutex, or an atomic operation for that matter, these can be killers on a GPU where thousands of threads are "in flight", and the approach of doing a late reduction is a good one. However, Epiphany is not a GPU, it is not hardware multithreaded, and the core count is lower than the multithreading of GPUs, so locking and synchronization with a mutex actually becomes a useful tool for the programmer. For this reason we aim to get this updated as soon as we can.
The entire machinery used to implement OpenCL has been substantially refactored to provide a much cleaner direct API which we plan to announce very soon for programmers who might use it directly, or for implementing other high-level APIs. I will defer a discussion, just mentioning this at the moment - the rationale and benefits are better discussed when the API can be presented in more detail. As for STDCL, this is not the API being referred to here. STDCL is a very high level API that has remained essentially unchanged since 2009 which happens to use OpenCL to provide portable hardware support. We really should separate it out as a distinct API to avoid confusion. There is documentation for it online for those who are interested. Everyone likes their favorite API. STDCL is there for programmers who want what it provides since it is simpler than OpenCL and includes features OpenCL lacks.