Parallella Community

Posted: **Tue Aug 18, 2015 2:20 pm**

templates generating epiphany-code is the topic. so let's look at an actual example: cout object and its friends.

if the part you want to have compiled for epiphany contains cout, what should happen? ideally then linux should create a new terminal for each epiphany-core this code is running on, and the cout object in context of parallellized loops or whatever should send instructions to arm which message with which parameters should be sent to the newly created console.

of course this means you'd need to abandon the stdlib implementation of cout, you can't use a global object for that. but on the other hand, object-creation on the epiphany would need to cause activity in the linux-kernel (creation of a new device).

another lesson that can be learned from this example is, apart from local and shared memory, parallell architecture also has hidden memory to cope with. you said triangles are stored on GPU in games, this is a good example of such hidden memory. gpu has its own address space, accessing it is done either by a mirrorred virtual address-space or by sending messages to the device. either way, this memory should be considered to be readonly and writeonly, no read nor write access to that memory is performed, instead its memory-address is just passed to the device for moving around or transforming or whatever. it's a bit like playing chess through a key-hole.

so judging from your description this kind of memory has a name, what is it called? how do you call pointers to memory-locations inside a gpu? how do you call pointers stored inside the gpu which are pointing to the cpu's address space? imho the first object we need is exactly this kind of pointer implemented in c++...

Posted: **Wed Aug 19, 2015 4:54 pm**

Posted: **Thu Aug 20, 2015 12:55 pm**

Posted: **Thu Aug 20, 2015 1:30 pm**

Posted: **Thu Aug 20, 2015 9:08 pm**

Very interesting, but strange that no one mentioned this

https://github.com/adapteva/epiphany-sd ... re-caching

It seems there is an attempt to provide flat addressing of program code - a small runtime doing the job of an i-cache.
I wonder if this would be sufficient for some of my use cases.

It sounds like this works on a function-by-function basis.

I think an ideal scheme would want to cluster functions into pages, and distinguish intra/inter page calls. Then you'd only need the cache-management overhead for calls between pages;
maybe replicate some commonly-used leaf-functions in multiple pages to increase locality and decrease managemtn-overhead. A Page or 'precalculated cache' could be described as references into the main program ).

This still wouldn't solve the problem of code split between epiphany & ARM though.
it would be sufficient if your machine only had a small ARM for 'compatibility' and the bulk of the silicon was devoted to epiphany-cores, and if they could address all of DDR memory.

The main case that interests me is 'start this core, given this global function pointer';

It might be possible to trace the call-graph at a task entry point- if it fits within 16k, assemble a precalculated page; otherwise use the software-managed-cache.

Most performance critical stuff should be sufficiently small to fit in 16k of code, but obviously having the program dynamic would make development so much easier.. and the key thing is eliminating the manual allocation, and enabling template generated code to stradle multiple cores. (e.g. template generates management code on the 'current' core, and the worker-'thread' stub code on subordinate worker-cores - and that could be nested, for multiple levels of fork-join parallelism)

I think this would be worth putting time into

Posted: **Fri Aug 21, 2015 8:13 pm**

Posted: **Fri Aug 21, 2015 9:31 pm**

Posted: **Sat Aug 22, 2015 12:15 am**

one more important point.

a filename is a terrible place to identify something, because it has no type information

BAD:
"spawn this file to run on this data" - possible error of incompatibility. How do you know what files are valid?

GOOD
"run this function across this data" - the compiler can tell you if its right, or in the case of templates, & overloading - it can even generate or pick the right function for you.

Type information is good,it makes things easier to express, and provides compile-time error checks, and communicates more from one programmer to another.

You should not have to lose this great benefit to make something parallel.

Posted: **Sat Aug 22, 2015 11:03 am**

Parallella Community

generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?

Re: generating epiphany code from templates, possible?