Parallella Community

by **dobkeratops** » Mon Apr 17, 2017 11:19 am

would it be possible to get an approach like this working on the epiphany 'boost compute' (which seems to use tricks like (boost lambda) to create openCL kernel code; (basically it seems to use template magic to roll 'function objects', and compile them as openCL kernels)

The appeal would be writing portable code (CPU,GPU,e-cores..); you could iterators that could work both on GPU/epiphany, and they'd do something smarter with the dataflow where possible on the Epiphany.

It's not as nice as real lambda functions, but with enough 'iterators' it might be enough to handle a lot of useful cases.

(e.g: one could write a 'convolutional iterator', and you pass in a function to apply to the result- like the activation function for a neural net. Or rely on some expression-template magic to combine 'convolute' & 'map' operations.)

I think 'boost compute' still relies on the ability to invoke the openCL compiler from within an application.
I invisage the framework rolling stub code in the epiphany kernels for synchronization/DMA, and inlining the code generated by the lambda-function-objects.

(other inspiration along these lines is microsoft C++AMP. I had some more complex ideas in mind originally ..writing an LLVM preprocessor to extract functions in build process. I'm not sure there's LLVM support for epiphany ),

by **jar** » Tue Apr 18, 2017 1:59 pm

Yes, it has been done with CLETE from BDT years ago. The solution was a bit rough around the edges. We have a new approach to this and I'll see about getting the paper and code available this week. It's still a bit rough, but it begins to address OpenCL's shortcomings for Epiphany (or anything besides a GPU) as well as the high-level C++.

One of the practical problems with using OpenCL is that there is a lot of latency, particularly if you have to compile your kernel at runtime, copy memory from some host space to device space (less of an an issue with Parallella UVA), and copy your kernel binary to the device (though not a unique issue to OpenCL).

Forum member, upcfrost, is porting LLVM. I'm not a compiler guy, but if a compiler had the ability to target multiple architectures with a (kernel) function attribute, it would greatly improve programmability by enabling monolithic code and clearly-defined kernel interfaces.

by **dobkeratops** » Tue Apr 18, 2017 9:04 pm

oh nice.. LLVM is so useful with the IR (I haven't worked on the compiler but have had a bash at writing a pet-language using LLVM IR)

compiling shaders/kernels on startup does seem like a hazard for embedded scenarios. I think extracting lambdas/kernel functions in the build process should get around that.

Parallella Community

boost compute, boost lambda.

boost compute, boost lambda.

Re: boost compute, boost lambda.

Re: boost compute, boost lambda.

Who is online