Parallella Community

by **vanchiramani** » Thu Feb 09, 2017 6:49 am

Dear Jar

Thanks a lot for your reply. I can successfully run the sample code that was provided. However, I have some difficulty in understanding certain concepts.

I have been programming for Parallela board using the traditional COPRTHR-2 stack, in which the host ARM processor generates data. allocates variables using coprthr_mem / copies generated data, launches threads and finally obtains the output and verifies if the output is correct. Inside the device kernel, I use SHMEM for inter-thread communication. However, in the sample provided, every function is directly run on Epiphany cores.
Q1. I would like to know how to initialize values and verify the output? In the manual that was pointed out, it is mentioned that host functions for I/O file read can directly be called from Epiphany cores. Is there an example of how to do this?

Since each core has a 32KB SPM, instructions corresponding to each thread can fit in its local memory. Traditional COPRTHR-2 directly loads these functions to local memory before kernel launch. Hence, I was hoping that we could achieve a similar thing where each thread will have its functions loaded in local memory. When I converted an existing void func() to __dynamic_call void func(), the execution time increased from 30 ms to 1200 ms.
Q2. If the above is not possible, do all functions that are called within each thread be declared using __dynamic_call?
Q3. There are some numbers regarding the overhead of using __dynamic__call functions. Will the overhead be high when there are many functions called within a thread?
Q4. Currently coprthr_dexec(dd,num_thr,kernel,(void*)&args_mem, 0); allows us to launch num_thr kernels. In this function, threads are launched starting from Core 0. Even if the number of threads is less than 16, we cannot make used of unused cores. Is there a way by which we can launch different e32 kernels on a particular core id?

Thanks again!
V Vanchinathan

by **jar** » Thu Feb 09, 2017 3:28 pm

by **vanchiramani** » Fri Feb 10, 2017 8:38 am

by **jar** » Fri Feb 10, 2017 3:28 pm

Make sure you're compiling with '-fdynamic-calls' as in the example I showed. I think your code is executing them out of global DRAM, resulting in slow performance. It's not enough to just mark up your code. I should have been more explicit.

by **vanchiramani** » Sat Feb 11, 2017 8:43 am

Dear Jar

Thanks a lot for your help. It works now.

Best regards
V Vanchinathan

Parallella Community

Multi-task execution on Parallella

Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Re: Multi-task execution on Parallella

Who is online