Very interesting, but strange that no one mentioned this
https://github.com/adapteva/epiphany-sd ... re-cachingIt seems there is an attempt to provide flat addressing of program code - a small runtime doing the job of an i-cache.
I wonder if this would be sufficient for some of my use cases.
It sounds like this works on a function-by-function basis.
I think an ideal scheme would want to cluster functions into pages, and distinguish intra/inter page calls. Then you'd only need the cache-management overhead for calls between pages;
maybe replicate some commonly-used leaf-functions in multiple pages to increase locality and decrease managemtn-overhead. A Page or 'precalculated cache' could be described as references into the main program ).
This still wouldn't solve the problem of code split between epiphany & ARM though.
it would be sufficient if your machine only had a small ARM for 'compatibility' and the bulk of the silicon was devoted to epiphany-cores, and if they could address all of DDR memory.
The main case that interests me is 'start this core, given this global function pointer';
It might be possible to trace the call-graph at a task entry point- if it fits within 16k, assemble a precalculated page; otherwise use the software-managed-cache.
Most performance critical stuff should be sufficiently small to fit in 16k of code, but obviously having the program dynamic would make development so much easier.. and the key thing is eliminating the manual allocation, and enabling template generated code to stradle multiple cores. (e.g. template generates management code on the 'current' core, and the worker-'thread' stub code on subordinate worker-cores - and that could be nested, for multiple levels of fork-join parallelism)
I think this would be worth putting time into