generating epiphany code from templates, possible?

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Re: generating epiphany code from templates, possible?

Postby piotr5 » Sun Aug 23, 2015 9:32 am

designing a programming language is a very difficult task nowadays. first you ask yourself, how to improve the currently existing languages. well, your improvement could be syntactic sugar. but in which way is it an improvement? suppose putting a function-body after a function-call to express this function should be added as last parameter. what is it good for? what's the difference between putting a closing bracket before or after it? next problem you face with syntactic sugar is consistency. if bracket goes with function-call and not the function-body parameter, shouldn't then functions also get their closing bracket resp. end-function statement moved in front of a sub-function that gets tail-called? i.e. you said {stage1; stage2; stage3;} and logically in rust this should read seq {stage1}{stage2}{stage3} to express that these are 3 functions with the same parameters executed in sequence or parallell if possible.

another design element is program flow. if you want, you could create a language where commands are executed from end to beginning. again consistency might play a role. for example in C the program flow usually is from beginning to end unless you call a function, as then parameters will be evaluted before the actual calling, effectively going from end to beginning. other languages solve that inconsistency by putting parameters before function-name. because of brackets it really doesn't matter which direction your program flows, you'll always be inconsistent. so maybe get rid of brackets and put their contents into variables to be calculated in front too? quite a dead end if you ask me. c++ goes into a completely different direction than beginning to end or end to beginning. here execution starts at a special function, the constructor. more exactly the constructors of all sub-objects that got pre-initialized (based on your constructor's parameters) are called pretty much in parallel, and then sequentially after all these executions ended your constructor body is executed too. so program flows first into a parallell split and then into another function executed when the split is in sync again. and then same thing happens in reverse for the destructor. for consistency's sake, also in function-bodies there is no guarrantee for execution in sequence, only as far as backwards-compatibility with C goes. also an interesting issue with program-flow is the notion of loops. the new idea in sather was to see loops as loop-body and loop-head running in parallell. loop-head runs till certain instructions which either continue loop-body or end the loop. loop-body does do the same. when loop-head gives over to loop-body it also passes parameters. i.e. a loop is nothing else than 2 coroutines with dataflow from first to 2nd. and as we know from oop, a lump of nested conditionals, is nothing else than a hierarchical object with several overloads. similarily in OOP 2 functions with dataflow from first to 2nd are actually an object containing another object. this way all loops and conditionals can be expressed in OOP without ever writing any corresponding c-alike statement, if language has oop with coroutines as a data-type. neither do you need to mark functions as parallell if the language has constructors and destructors. unfortunately such a language would be inconsistent since no such abstraction exists for mathematical stuff like plus and times and such. I'm not sure this language is a dead end though. it might be difficult to comprehend the program-flow, it's far from our natural language. but our natural language is as you called it, arbitrary! different planets have different languages, so to say. maybe in future we'll all speak c++ in voice?
dobkeratops wrote:
programmers are willing to put stuff into completely incomprehensible languages, far far away from the natural human languages,


I think here you're just talking about 'badly designed langauges'.

yes, badly designed languages like c++ or the informal language of mathematics and all programming-languages that are designed with it as a role-model. just because you speak it fluently doesn't make it into a natural language for all people! I'm not saying all those "badly designed languages" need to be changed, truth is we all learned them eventually and you can easily assume they are a natural language of some sort. but in designing a new language you shouldn't be limited by them. make an easy transition, mark for future generations which parts of the language are just backwards-compatibility. or better yet, don't design any new language at all, work with what you have, instead introduce new concepts that allow people to use existing languages in completely new ways!


jonathan blow has a point in saying that for game-development small functions are bad, for programming-performance reasons. but we're talking of lib-development here, not game-development. a game works on a meta-level, calling all your functions, for the sake of the game these functions called better be small, otherwise the game-developers need to implement the smaller versions on their own, which goes against code-reuse. there is a reason why in school they teach small is good, it has to do with decades of programming experience of many individuals. jonathan blow is merely an individual in comparison, not really an authority I can accept. I accept only the authority of the masses, of all other people including myself I think badly, I believe they're all stupid...

imho if you refuse to do the additional task of inventing names for your functions, then you'll miss out on the benefits of such an endevour: a deeper understanding of what your function actually does do and the ability to explain it to others. according to DDD code which has good names for their functions can even be understood well without help from the programmer, without any comments! I agree that naming single-instruction functions is a bit redundant, but imho it's the fault of the language-designers that such functions don't have a name yet in the first place...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: generating epiphany code from templates, possible?

Postby dobkeratops » Sun Aug 23, 2015 12:44 pm

jonathan blow has a point in saying that for game-development small functions are bad, for programming-performance reasons. but we're talking of lib-development here, not game-development. a


I'm talking about what it would take to release the Epiphany's potential for the type of code in game engines. Whether or not this ever actually happens, there will be overlap to other fields. GPU's have become mass produced general purpose tools, as a spinoff from (primarily) serving game-industry needs.

To fully utilise it - you wouldn't use it just for separate black box library code. You'd use it for the meat of your program. every entities' update, and how the entities interact. Eventually the 'host cpu' should wither away, just there for legacy code. I'd really like to see a system which has NO host, just e-cores - its just our software base and language tools aren't up to that yet.

games do have a lot of parallelism , because they deal with spatial tasks.

a deeper understanding of what your function actually does do and the ability to explain it to others


For small functions (or fragments of a piece of large code), the names can become meaningless noise. They're open to opinion, they're a place where people can disagree. A function name is further from its parameters. Operators connects their operands closer. You're trying to come up with a verbal tag for a graph

This is why lambda functions are such an amazing tool.
imho if you refuse to do the additional task of inventing names for your functions, then you'll miss out on the benefits of such an endevour


And even where this is true, it's orthogonal to parallelising. Parallelizing is down to concrete, measurable analysis of data flow. Names are a matter of opinion. A good name does not acelerate a program, and good names can be retrofitted (method extraction, duplicate code detector, search-replace)

jonathan blow is merely an individual in comparison, not really an authority I can accept. I accept only the authority of the masses


His videos raise a lot of good points about the pitfalls of C++. I like his points about (i) making cache optimisation easier and (ii) streamlining syntax so that changes are easier.
Its' so important, because programming is an exploratory process.

I see no authority behind C++ , it's just the most common, least bad option.
The value is in the infrastructure around it: e.g. IDEs, applications written in it, GCC & LLVM targeting many architectures, and the number of people

the language itself is very clunky, so much scope to clean it up.

Anyway I've raised a lot of tangents here :-


- alternate languages
- they would help, but with lambdas I can express what I want in C++ reasonably well now.

- alternate future hardware designs.
-(i-caches, single shared cache, special case read-only cache).
- i just saw how 'no i-cache' shifts the problem of 'referring to functions in large programs' into software
- i could wait for a similar i-cached design..
- but if the software appears to solve this first, then great!
- it should be the compilers burden, not the programmers
- given an AMP like approach, it could tell you if the function is too big, or it could warn you and fallback to CPU.

- the best use of the epiphany board (what to do with the FPGA memory.. )

The main question in this thread relates to how to write complex programs on the epiphany and portably with other architectures.
It' the same challenge we had with xbox360 vs CELL - I've seen how much of a disaster that was with these tools.
and its' the same challenge you have now with Parallela vs (other ARM boards with quad core,SIMD, GPU).

C++AMP hits the nail on the head. With the lambda/high-order-function approach, I can write portable code that can be compiled for (a) single-threaded (b) multi-core (c) and now even GPGPU .

Epiphany is the odd one out - currently requiring that you re-arrange everything - and all it would take to fix that is: the ability to compile & reference e-core code from within your main program, just like C++AMP, by moving individual lambdas or functions onto the accelerator, **without physically moving them out of your program**.

Then you can just add an epiphany version to your high-order-function definition, alongside the others, tucked away in header libraries where they wont confuse the main body of your program. You'll be able to experiment more and refine it better, swapping in different load-balancing schemes.. whatever, and you'll be able to re-use the successful scheme everywhere, by plugging different lambdas in.

Of course you design the high order-function itself to map equally well to the different models, by representing a bit of data flow. ("apply this function to all the graph edges, reduce the result onto its' vertices", .. whatever.. there would be dozens of variations).

From one machine to the next, the split between CPU, GPU, (e-core), SIMD would be different.

You say 'delusion' - but with SO much change and variety in hardware - you need to keep your options open. The abstraction is essential.
You don't know which platforms will take off - so if you can hedge your bets, you're much better off. So you build an abstraction that has sufficient information for each of those contexts.

What if 'epiphany cores' become available as a consumer PCI card for PC gamers (another attempt at 'physics accelerator' like Ageia PhysX).
Your program would need to adapt to each hardware permutation. big CPU+GPU, SLI, little CPU+ecore+gpu, APU. And we already see Knights Landing ("predicated SIMD") on the horizon as another option.

Similarly, right now, if I wanted to dable with embedded machine vision/whatever: I want to hedge my bets between Parallela, or traditional ARM boards.
The fact the 16core epiphany is 60nm seems to be a downer... you can get a quad core ARM x 4way SIMD (thats' 16 fmacs) + vastly superior GPU - because of 28nm, 22nm tech. Why? so much more code for the latter, so much more demand, so they're cheaper to mass produce.

Someone else was talking about 100,000 cores.. hehe. Show me 64-256 that I can buy, first :) When software routinely saturates that, I might start to consider 1000+

You need to pepper your code with tests & asserts too, you usually want to run on a CPU first because the debugging is usually so much easier. (e.g. drawing force lines & bounding boxes from within a physics simulation) The tests,'debug-code' might be too big to fit on the accelerator, so if you can just transparently change a flag to compile for either, so much the better.

(e.g. When I worked on CELL, I actually found it was faster to develop using my DMA emulation abstractions on the xbox360 first .. simply because the compile/debug cycle was so much faster, microsofts toolchain was superior)

! I'm not saying all those "badly designed languages" need to be change


They need to be tweaked and improved. There's no need to apologize for the world the way it is, with something as fluid as software.

C++ got 'auto' and 'lambdas' about 10 years too late - not because these were hard to implement, but because (IMO) of a preistly cult around the language.
For everyone that wanted them, there's another OOP fanatic/apologist saying, "oh, it's just syntactic sugar for making a class.... you don't need it... you're a bad programmer if you think you do... it's not OOP".

Rubbish. There's nothing 'good' or 'intelligent' about writing 10-20 lines where 1-2 lines can do the same job. It's just stockholm syndrome, and (worse) looking for ways to criticize people (destructive personality rather than constructive).

To this day there are people claiming "for (std::iterator<int> it=something.begin(); it!=something.end(); ++it) {..}" is somehow 'good' 'because you express your intent clearer'. BS. auto, range based for-each is vastly superior. And for years C++ programmers could only look with envy at this superior feature in other languages.

FPU's used to be optional coprocessors. Luckily C always had the 'float' datatype, it was just horendously slow on integer only machines (I've worked through the int->float transition). Thanks to the float type, use of FPUs (once they became integrated) was ubiquitous.

I see a progresion: GPUs (and e-cores) accelerating throughput tasks should be as seamless from your main body of code as using 'float' numbers were all along, taking minimal effort, and with a fallback where not supported. (your code still works, but then you can tweak it when you have specific imformation and/or time to do so)


for example in C the program flow usually is from beginning... C++ quite different..constructor


To me this perception is slightly too literal. In C, or C++ we express an intent. The compiler is then at liberty to trace dependancies and re-order the actual flow, inline, interleave or factor out repeated calculations however it sees fit. If you put C source (compiled for optimisation) alongside what is compiled and single step through the compiled machine code, the 'current position' jumps around allover the place :), and you sometimes can't put a breakpoint on a specific line because it no longer exists.

I just see C++ as a very good code-generator for C. patterns like 'constructors/destructors' just mimic things people did manually before (BlahBlah_create(), BlahBlah_dispose()..). overloading gives you 'compile time search', leveraging types as part of the name. templates are a much better replacement for macros. Even vtables are just one of many possible schemes people implemented manually with function pointers.

There was just a progression between assembly -> C++ -> (languages like haskell) which let you express your intent more precisely, giving the compiler more information to work with. And this progression should continue.

the C++ AMP syntax is pretty interesting. []() restrict(amp){...} - its' like "restrict" is telling you the function is actually threadsafe (similar to the long talked about 'pure' specifier). I aprove of their reuse of the word 'restrict' which already means 'non-overlapping'.

Imagine if that got standardized, and wherever the compiler sees that postfix 'restrict', it can (a) throw an error where it's trivially provable as non-threadsafe, (b) warn you unless its' trivially proveable as threadsafe, (c) then at liberty to make a best guess for accelerator use, when compiled with a specific flag.

If the epiphany compiler got that added.. that would be an amazing starting point
dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Re: generating epiphany code from templates, possible?

Postby piotr5 » Tue Aug 25, 2015 10:32 am

in c++ expressing intent is a bit over-enthusiastic. intent you could already express in c, the problem is the compilers are too stupid to understand it. if you want to express something in c++, it better be the desired assembler-code generated underneath. again, same can be done in c. difference is only that in c++ you additionally are capable of expressing the compiler's behaviour when encountering some rare or hopefully rare cases of misunderstanding between programmers. unfortunately, I am not aware of any programming language capable of expressing intent, there's always quite a lot of bureaucrazyness to overcome first. in a simple case, you may express your intent that whatever turtle walks a few steps forward and turns left and so on, always leaving a trace behind. but after that walk there will appear some beautiful star on the screen which you couldn't have generated by simply saying "draw me a star". this is a problem inherent in what we call "intent", we'd like the computer to make decisions for us, and if we're unhappy we'd like to tell it to try again and specify our intent a bit more clearly. this actually is a searching problem, solveable by loads of supercomputers attempting to decode everything surveillance has gathered about human beings. when you say draw me a star, there are many ways to do that, the computer would need to know them all, either by collecting that info or better yet by comprehending what it is that makes a shape into a star-shape and then searching for all such shapes. once that's done, a random shape could be chosen. stating intent basically has little to do with programming, intent is what google is after while programmers tend to just collect a list of possibilities on their own and choose among such a limited list.

I know it's a bit off-topic now, but let's use it as an example: the "Moveable" concept is a bit fragile. suppose you have a lib for numbers, are numbers moveable? principially a number could be moved, it does not depend on memory-position. however, if there is a central cache, containing all numbers so they are stored only once and never again, would such a cache then also store all positions where such numbers are referenced from? silly idea, but it destroys the moveable property. generally if you depend on 3rd party libs, the underlying objects could easily change in a way that breaks moveability, without changing the interface.

another example: usually c-programs either pass by reference and thereby introduce non-reentrant code (to store the created object in a static variable), or they create copies of lots of objects, cluttering memory so that garbage collection is needed. c++ recently introduced R-value to solve that. so, you have some sort of pick-casting to prepare a value for that kind of function, then you must write a pick-function to move over another objects data-ownership to yours, and call a setPicked function for the other object so that future access to its data generates runtime-errors. if you do such a change with existing objects, then you'd better remove copy-constructor and rename it into a 2-argument constructor. additionally the single equal-sign operator must perform the picking and therefore isn't allowed to pass a reference to this object on output, it returns void. then compile your sources and whenever copy-constructor or assignment is used, you must then make a decision based on context: should a picking or a deep-copy be performed? the original author did not express his/her intent on that topic, you must add more details to how the intent really looks like. and now suppose you have a co-worker who is unable to comprehend the notion of picking and deep copy. now you must change sources again, this time removing expression of intent again. well, in that case c++ offers some possibilities to just change definitions, so it compiles with intent inside even though it wont get used. but if the coworker doesn't continue that statement of intent, then your code has mixed depth of exactness, it is inconsistent, your stating of intent becomes useless. so you need tools to mark all changes done by that co-worker for reviewing and add this increased intent yourself.

what I claim here is, with parallelization you'll face all the same problems again! what we really need is more code-refactoring tools. the whole argumentation with programmers are lazy, small code is beautiful, easiness over changeability, all that is really a moot point in face of what nowadays IDE can do for you. the for-loop in old-style you presented, nobody wrote such for-loops actually, they just told the IDE to create such a loop for them! imho the old iterator loops are to be preferred over the new foreach loops, there you did get the actual iterator into hand and not the value it is pointing to. you can do a lot of stuff if only you have the iterator!

my suggestion is, dig out some good IDE, and improve it for easier parallell programming. I'd choose to use qt-creator, although my favourite is U++. but if you wish, you can as well use clion. or preferably, write code for code-refactoring, portable among all 3 IDEs, code that works on linux-kde c++ as well as on windows java...

I must emphasize: using templates or whatever pre-compiler programming, this will increase build-time and steal memory-ressources during compilation. if it isn't necessary because a good IDE can do all that much cheaper, why not use that option? why not code-generating instead of compile-time execution?
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: generating epiphany code from templates, possible?

Postby dobkeratops » Tue Aug 25, 2015 12:17 pm

piotr5 wrote:in c++ expressing intent is a bit over-enthusiastic. intent you could already express in c, the problem is the compilers are too stupid to understand it. if you want to express something in c++, it better be the desired assembler-code generated underneath. again, same can be done in c. difference is only that in c++ you additionally are capable of expressing the compiler's behaviour when encountering some rare or hopefully rare cases of misunderstanding between programmers. unfortunately, I am not aware of any programming language capable of expressing intent, there's always quite a lot of bureaucrazyness to overcome first


The evolution of programming languages has been to allow expressing your intent more precisely through abstractions, giving a compiler more information for algorithms to find the most efficient way to map it to hardware. There are many ways of getting the same result and the best varies between machines and even based on size of data.
e.g. a pointer is just an address. Then you can add more information eg. 'const' to say "its' a *read only source* address". Then yet more, std::unique_ptr<T>, "its' a pointer *owning piece of allocated memory*", or T& vs T*, "a reference to something owned somewhere else", etc. The compiler can use this information to generate code for you (in the case of all the RAII and operators describing what unique_ptr<T> does), or optimise (references might not even be pointers at all, they might be a register index if the variable happened to be a local still in the stack).

e.g. we started out programming in Assembly, mapping intent to hardware manually, but in the process, not really encoding the intent: this makes the programs error prone , hard to modify and read - basically a 'write-only' process.

Over ASM, C gives you the 'intent' of a variable - which the compiler is at liberty to move dynamically between Stack and Registers. In ASM, your intent for a local variable might go in a register, but as you change your code, you might not swap it in and out optimally. Compiler register allocation is a massively useful step forward.

And so it continues. 'for loops' are easy to write, but from 'functional programming' there are ideas like 'map'/'reduce'/'zip','filter' etc- names for processes which are the 'intent' behind many loops in C. There are other common names - 'scatter'/'gather' for the intent behind indexing. (intel now has SIMD gather instructions)

The process is not complete, evolution continues.

Haskell today is the most advanced 'expression of intent'. its' more declarative, not imperative, you describe a data dependancy, and the compiler is at liberty to re-order it for you, with lazy evaluation. Purity is also better at expressing exactly what you want the algorithm to do.
Its' ruined for my domain by Garbage Collection , but just as Simula+ML gave the C world inspiration to produce C++, Haskel gives us inspiration for where we should be headed. What I'm trying to do here is largely data flow programming in C++.



. in a simple case, you may express your intent that whatever turtle walks a few steps forward and turns left and so on, always leaving a trace behind. but after that walk there will appear some beautiful star on the screen which you couldn't have generated by simply saying "draw me a star". this is a problem inherent in what we call "intent", we'd like the computer to make decisions for us, and if we're unhappy we'd like to tell it to try again and specify our intent a bit more clearly. this actually is a searching problem, solveable by loads of supercomputers attempting to decode everything surveillance has gathered about human beings. when you say draw me a star, there are many ways to do that, the computer would need to know them all, either by collecting that info or better yet by comprehending what it is that makes a shape into a star-shape and then searching for all such shapes. once that's done, a random shape could be chosen. stating intent basically has little to do with programming, intent is what google is after while programmers tend to just collect a list of possibilities on their own and choose among such a limited list.


its' a gradual progression. so we started out with assembly, and we take steps up the abstraction ladder.

We're nowhere near "draw me a star", but there are useful rungs we've discovered above ASM and C . "draw 5 arrowheads with rotational symmetry" is still closer to the intent than "draw this line; draw this line; draw this line;..."

Regarding the Searching concept - sure - I am interested in something called "Shape Analysis" which uses a database of example code and better replacements, which could allow programmers to express an intent intuitively (in a way that is easiest to read & write), then replace it with something equivalent but more optimal. (optimised code is much harder to read or write)

I know it's a bit off-topic now, but let's use it as an example: the "Moveable" concept is a bit fragile. suppose you have a lib for numbers, are numbers moveable? principially a number could be moved, it does not depend on memory-position. however, if there is a central cache, containing all numbers so they are stored only once and never again, would such a cache then also store all positions where such numbers are referenced from? silly idea, but it destroys the moveable property. generally if you depend on 3rd party libs, the underlying objects could easily change in a way that breaks moveability, without changing the interface.


sure C++ has had all these concepts like "moveable" retrofitted, and there is awkwardness/fragility there. This is why I've looked into alternatives (Rust), and why I'm following Jonathan Blows' attempt, and why I started my own.

C++ has the benefit of existing IDE's and libraries, its' the 'most widespread, least bad' option, but I do believe its' possible to improve upon.

another example: usually c-programs either pass by reference and thereby introduce non-reentrant code (to store the created object in a static variable), or they create copies of lots of objects, cluttering memory so that garbage collection is needed.


I know you can get garbage collectors for C and C++, but C/C++ are usually used specifically where garbage collection isn't possible (i.e. embedded/high performance).
What really happens is bugs .. memory leaks. And of course this is why other languages e.g. C# are so popular.

I do think the 'moveable' R-value reference ideas are a better solution, but I know there are still hazards.

c++ recently introduced R-value to solve that. so, you have some sort of pick-casting to prepare a value for that kind of function, then you must write a pick-function to move over another objects data-ownership to yours, and call a setPicked function for the other object so that future access to its data generates runtime-errors. if you do such a change with existing objects, then you'd better remove copy-constructor and rename it into a 2-argument constructor. additionally the single equal-sign operator must perform the picking and therefore isn't allowed to pass a reference to this object on output, it returns void. then compile your sources and whenever copy-constructor or assignment is used, you must then make a decision based on context: should a picking or a deep-copy be performed? the original author did not express his/her intent on that topic, you must add more details to how the intent really looks like. and now suppose you have a co-worker who is unable to comprehend the notion of picking and deep copy. now you must change sources again, this time removing expression of intent again. well, in that case c++ offers some possibilities to just change definitions, so it compiles with intent inside even though it wont get used. but if the coworker doesn't continue that statement of intent, then your code has mixed depth of exactness, it is inconsistent, your stating of intent becomes useless. so you need tools to mark all changes done by that co-worker for reviewing and add this increased intent yourself.


Take a look at Rust.

it started out with 'move-semantics' as its' backbone (everything C++ retrofitted with R-value references) .
'move' is its' default; the compiler gives you errors when you try and use an invalid moved value, and it has a 'freezing' concept when you take references (the original cannot be modified while references exist). You must explicitely tell it to make a copy (which C++ assumes is the default). It also adds a concept of pointer 'lifetimes'; e.g if you pass references to 2 values and a 3rd to compare, and return a reference, you can mark that the return value has the same lifetime as the 1st 2.

So basically they have the most efficient option ('move the value') as the default, and the 'expensive' deep copy something you must manually ask for, and compile time checks to catch more mistakes... the compiler knows more of your intent and can leverage that information.

what I claim here is, with parallelization you'll face all the same problems again! what we really need is more code-refactoring tools. the whole argumentation with programmers are lazy, small code is beautiful, easiness over changeability, all that is really a moot point in face of what nowadays IDE can do for you. the for-loop in old-style you presented, nobody wrote such for-loops actually, they just told the IDE to create such a loop for them!


Compact code that expresses intent clearly is usually superior to larger code with the intent hidden. Less to read or change. Its' a better encoding.

IDE's cutting and pasting code is a terrible workaround IMO for language failures.

The real value of an IDE to me is code-search : "dot-completion" - "jump to definition" (type aware) , and debugger integration.

I still want language improvements. If I had a choice between C++03 with a great IDE, or C++1y (with its' polymorphic lambdas) I'd choose the latter. Obviously we can have both :)

imho the old iterator loops are to be preferred over the new foreach loops, there you did get the actual iterator into hand and not the value it is pointing to. you can do a lot of stuff if only you have the iterator!


I still like the old style for loops, it might just be habit, but so often I want the index, e.g. data is often split between 2 sources, but i know you can create abstracted iterators that also do that for you... sometimes its' more awkward to find the right abstraction, it depends; there's no magic bullet. But I definitely prefer for (auto& x: data){} over for (auto it=data.begin(); it!=data.end(); ++it) though. its' a godsend.

But basically between the older "C" approach and the newer 'Functional" approaches (replacing loops altogether with map/reduce/filter taking lambdas) I see the C++03 coding style as a messy 'worst of both worlds'. Its a mess we ended up with because C++ didn't have lambdas.

iterators have the same potential for bugs as pointers; and the functional approach expresses more intent which helps parallelize, and can often be more compact. (Haskell has some remarkable concise ways of doing things due to currying( a short-cut for writing lambdas)and the ability to write pipelines easily, often 1 line of haskell can do the work of 5 lines of C++).

I'm mostly convinced that high-order functions are the way to go - the rival would be Shape-Analysis or other new constructs. (figuring out where a traditional C for loop can be done in parallel... looking for how you indexed to create "scatter"/"gather" stages, with a 'map' in the middle).

OpenMP does look good to me. No refactoring needed, just '#pragma omp'...
And after seeing C++AMP (restrict(amp) on functions & lambdas) , I would suggest adding "for (....) restrict { .... }" to mean an official hint that the loop body can go in parallel. restrict means 'non overlapping'.

But the real benefit of high-order functions is being able to write data-flow - which is exactly what Epiphany needs - pass several lambdas for different stages into one construct, then the implementation in header libraries can manage the DMA transfers between stages. Write more header libraries to handle more cases, re-use them throughout your sourcebase or between projects.

(e.g. the common 'list-comprehension' map+filter could be implemented as 2 groups of cores, the first doing 'map', send the result to the second, the second does 'filter' and deals with allocating the results, with whatever buffering/partitioning scheme it takes to do that efficiently in parallel.)

my suggestion is, dig out some good IDE, and improve it for easier parallell programming. I'd choose to use qt-creator, although my favourite is U++. but if you wish, you can as well use clion. or preferably, write code for code-refactoring, portable among all 3 IDEs, code that works on linux-kde c++ as well as on windows java...


Well I'm open to suggestions, but I have no reason to doubt the high-order-function approach; it works very well.

And even if going down the Shape-Analysis route, that would just be a 'shortcut' (an advanced way of searching your high-order function library). (at some point we have to write the libraries of code & replacements).

The lambda approach works great on SMP, and we're just talking about explicitely parallelising the bits that you lose on a little core. An intel chip parallelizes your for loops at runtime, and you exploit that knowledge to optimise a single thread.

I know some people like graphical programming though, creating data flow by connecting functions boxes with wires.

But I'd much rather see language improvements than hacking over it with an IDE .
graphical data-flow could be done by parsing & re-writing a language. C++ is awkward for this because its' SO hard to parse (ambiguous grammar).

e.g. Rust has 'context-free' grammar.. just a few more keywords like "let" and "fn", ":" for types make the syntax easier to parse.

in C++
Code: Select all
a<b>c()
can mean several different things, depending on how a,b,c are defined. "<" and ">" can be overloaded operators, or that might be a templated variable with a constructor. even
Code: Select all
a*b;
might be calling a multiply operator (which might have side effects, so the compiler has to check), or declaring a variable 'b' with type 'pointer to a'. Here' i'm not talking about 'intent', just the outright ambiguity of symbols. There's no benefit to swapping "'a*b'; (multiply) and 'a*b' (declare b= pointer to type a) " in the same expression, whilst there's plenty of benefit to having '*' overloaded (matrix maths, whatever)

In C++ the compiler must know the whole program context through headers to figure it the basic syntax (AST).


In Rust you have to use "let" to declare something, and <T> only usually appears in type signatures, thanks to better inference (you occasionally pay the price of having to write ::<T> in the few times where you write it in expressions, but the tradeoff is well worth it). Rust is usually 2/3rds the size of equivalent C++ due the the streamlined syntax, even though some parts are more verbose in isolation ("let" for variables,and "fn" preceding a function definition) - the point is those explicit keywords simplify the syntax elsewhere.

Thats' part of why C++ is SO slow to compile. The compilation itself has so many more context-sensitive dependancies, due to a crappy syntax, due to it retrofitting complex ideas into C in an awkward way.

Context-free languages would be much easier to compile in parallel - a different processor could compile each function, or block of code.. in C++ you have to step through more in serial to even figure out where the functions are.

I must emphasize: using templates or whatever pre-compiler programming, this will increase build-time and steal memory-ressources during compilation. if it isn't necessary because a good IDE can do all that much cheaper, why not use that option? why not code-generating instead of compile-time execution?


Yes I know the long-compile time hazard, and this is why I do drop back to C like techniques sometimes.
Its' why I don't like C++ iterators, which are just wrappers for pointers most of the time.

But we do this because the compile time processing is so much more versatile. You don't need to use full optimisation level every time. C++ compile times are SO bad because of header explosion, and the language is so difficult to parse.
Whats amazingly useful is the ability to compile with extra debug information & asserts. A debug build can have bounds-checking, whilst the release version is efficient.

If you listen to Jonathan Blow - he puts the most common collection & data types into the compiler - inbuilt [] means 'pointer+size' Most C++ programmers think thats' horrible, preferring the versatility of templated collections , but he says "this gives you fast compile times". Either is still vastly superior than having the IDE cut/paste "pointer+size" for you, where your 'intent' is really one entity. Its' so common, so why not make it inbuilt.

Rust also used to have this - "[T]" (what C++ programmers call "std::vector<T>",), and "~T" for "unique_ptr<T>", but they changed it to work more like C++. At the time I objected in their community, for the reasons given above, but the C++ way was more popular. I wish they kept it, then just generalised it later ([T] could have been a shortcut to something user-defined , if you choose)

----------------------------------------------------

Anyway I've actually started trying to implement my scheme now as a tool manipulating LLVM IR.
In IR i can find the references to specific worker functions;
I can try to split the file into IR for the host & coprocessor.
The IR has had all the template parameters matched and so on, producing a sequence of simple operations, but still giving you the structure of functions & calls.

Maybe I can emulate it on a network ('client=host', 'server=coprocessor', 'network=DMA'), or just compile a separate exe to spawn as a process.

I'll basically be re-inventing a lightweight RPC protocol I guess, or something like 'map-reduce' .
I have kept looking around but every exisiting framework I've seen is tailored to the Network/Cluster use-case, which assumes transfer is very general(arbitrary serialization) and much slower than in-memory manipulation.
I'll be writing it as something "intended" to be DMA, with networking emulating that.

I'll still have the sticking point that there's no LLVM support for Epiphany yet, but this is likely to appear eventually from the rest of the community, there's many other reasons to want it.
Failing that I might take a look at GCC IR, and another option would be LLVM IR->C, which has many potential uses.

There's many ways to achieve what I want, what would be ideal is finding an approach that overlaps with what others need.

The easier this is to do... the more people will have a reason to buy a parallela board (or some future epiphany PCI card)

The point with my approach is people already parallelise this way for multicore, so it would make it easier to adapt existing C++ projects to run on this machine.

This is only awkward because of the split between different ISA's (host/client) , which only exists (I suspect) because the ARM ISA must be licensed, adapteva wanted to create a new IP. you could just as easily have implemented an ARM core on a chip with no cache, or made a new Host processor using the Epiphany ISA plus an MMU & cache to run a full OS and traditional code. I've seen systems with MIPS on scratchpads, even though MIPS usually has a cache
dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Re: generating epiphany code from templates, possible?

Postby Silexica » Sun Oct 18, 2015 7:13 pm

Very interesting posts with some great ideas. I want to present our current solution, as it includes some bits and pieces from this thread as well. It is commercial though, so hopefully not off-topic in this thread/forum.

You can specify your application in a data-flow way, with each process node just being regular C (C++ soon) code and FIFO communication between them. From there, our solution will analyse the application, do performance estimation for the available processor cores and available communication means and distribute it automatically to a given homo- or heterogeneous target system in space and time. Using such a generated mapping (or using a manually specified one) for communication and computation, we generate ready-to-run source code for the individual cores which can then be compiled by native c compilers and executed on the board. This is enabled by our own compiler using clang AST transformations to go from one AST to another one. Supported targets are x86 pthread, ARM, TI OMAP, TI Keystone, and the parallella board as well now, including performance estimation and code generation for both processor core types.

Looking forward to your thoughts or questions.

Cheers,
Max


PS: If you want to know more, have a look at the SLX Mapper: http://silexica.com/products/

PS2: there are lots of thoughts on future developments, including a free version for academic/personal use, open-sourcing the compiler, etc. so happy to hear your ideas
Silexica
 
Posts: 2
Joined: Sun Oct 18, 2015 6:56 pm

Re: generating epiphany code from templates, possible?

Postby dobkeratops » Fri Oct 30, 2015 4:32 am

PS: If you want to know more, have a look at the SLX Mapper: http://silexica.com/products/


thanks for the link. looks interesting
dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Previous

Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 6 guests

cron