Some questions about OpenCL programming and epiphany

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Some questions about OpenCL programming and epiphany

Postby elysium5290 » Tue Oct 13, 2015 1:59 pm

I have a newer in parallel programming.

I have some questions, and hope that have anybody can help me.

First:

I view this topics
http://parallella.org/forums/viewtopic.php?f=18&t=28&start=10&sid=22948c1cce44ac4ea8f25c7806830b5d

and have a picture
http://imgur.com/Q3xefb8

it seems like the epiphany 16-core co-processor have one compute unit.
And it have no local memory, constant memory.
Then, it use main memory as global memory

if I have something wrong, please correct me.

Second:

I run the Coprthr-1.6.2/examples/bdt_nobdy
I just change the CL_DEVICE_TYPE_ACCELERATORS
Code: Select all
parallella@parallella:~/coprthr-1.6.2/examples/bdt_nbody$ ./bdt_nbody.x
coprthr-1.6.2 (Freewill)
[3656] clmesg info: cmdsched.c(86): cmdqx1: run
[3656] clmesg WARNING: command_queue.c(39): __do_create_command_queue_1: cmdq exists
[3656] clmesg WARNING: command_queue.c(39): __do_create_command_queue_1: cmdq exists

Running bdt_nbody - a simple GPU-accelerated NBody Simulation.
Copyright (c) 2008-2009 Brown Deer Technology, LLC.  All Rights Reserved.
This program is free software distributed under GPLv3.

nstep=0 nburst=2 nparticle=16384 gdt=1.000000e-04
device 0: CL_DEVICE_TYPE= ACCELERATOR
CL_DEVICE_VENDOR_ID=0
CL_DEVICE_MAX_COMPUTE_UNITS=16
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=3
CL_DEVICE_MAX_WORK_ITEM_SIZES= 1 (symmetric)
CL_DEVICE_MAX_WORK_GROUP_SIZE=16
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR=4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT=2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT=1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG=1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT=1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE=1
CL_DEVICE_MAX_CLOCK_FREQUENCY=0
CL_DEVICE_ADDRESS_BITS=0x0
CL_DEVICE_IMAGE_SUPPORT=false
CL_DEVICE_MAX_PARAMETER_SIZE=256
CL_DEVICE_MEM_BASE_ADDRESS_ALIGN=32
CL_DEVICE_NAME=E16G Needham
CL_DEVICE_VENDOR=Adapteva, Inc.
CL_DEVICE_VERSION=unknown
CL_DRIVER_VERSION=unknown
CL_DEVICE_LOCAL_MEM_SIZE=0
[3656] clmesg info: cmdsched.c(86): cmdqx1: run
nbody_init 16384
[3656] checksum=2.961046e+04
[3656] clmesg info: cmdsched.c(181): cmdqx1: shutdown
parallella@parallella:~/coprthr-1.6.2/examples/bdt_nbody$


it seems like that co-processor have the 16 compute unit......
this result is different with upper architecture image.

please help me...

eric
elysium5290
 
Posts: 8
Joined: Sat Mar 28, 2015 7:12 am

Re: Some questions about OpenCL programming and epiphany

Postby jar » Tue Oct 13, 2015 6:11 pm

I made that image quite a while ago and I think you're correct to point out that it has 16 compute units each with one processing element. (the image is incorrect). It was generated before I understood the specifics of the COPRTHR OpenCL implementation.

You can use OpenCL to program the Epiphany cores. You should create 16 work groups with 1 thread per work group. Creating more than 16 work groups or more than one thread per work group will oversubscribe the hardware and result in worse performance.

OpenCL, as the standard exists, lacks a mechanism for inter-core communication and is just one of the many failures of the Apple/Khronos API design. The model was designed with GPUs from 2008 in mind because CUDA was winning. The OpenCL C language specified memory locality, but not accessibility. Accessibility was implicitly determined by the locality within the standard. Thus, the standard fails to account for architectures like the Epiphany. Either you break the OpenCL standard by introducing non-standard communication mechanisms resulting in non-portable code, or you keep to the standard and accept the poor performance achieved by global memory synchronization (a weak point with the current Parallella/Epiphany design).

If you're writing OpenCL applications for the Epiphany cores, you will probably be reading and writing to global memory rather than reading/writing to neighboring core local memory. The OpenCL private and local shared memory are the same thing within Epiphany. But since you have a work group size of 1, shared memory is a silly concept (shared with one core).

The OpenCL concept of constant memory does not exist in hardware on the Epiphany cores, but each core has access to 32KB of core local memory which can have small constant data structures replicated across each core.

Hope that helps you. Sorry for the misunderstanding with the old figure.
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Some questions about OpenCL programming and epiphany

Postby elysium5290 » Mon Oct 19, 2015 9:14 pm

Hi ,jar

thanks for your answer very much.

I summarize that you say:

If I want to program with "Standard" OpenCL in parallella is very stupid idea. Because the parallella board must to access variable in global memory right?
And you think that use the epiphany sdk can achieve the inter-core communication to get more performance.

If I am misunderstood you say, please let me know.
thanks you again

Eric
elysium5290
 
Posts: 8
Joined: Sat Mar 28, 2015 7:12 am

Re: Some questions about OpenCL programming and epiphany

Postby jar » Tue Oct 20, 2015 9:50 pm

I think you got it so far, but there's a little more to it...

If your application has an very high arithmetic intensity (>100 ops/byte) then you can certainly use OpenCL with with off-chip (global) memory access. Most applications do not fall into this category. I'll explain...

Global reads with current firmware run around 80 or 90 MB/s and writes are about 3x that. Peak performance is around 19,200 MFLOP/s. Let's say you write excellent code and it can achieve 50% of peak performance. You'll need 50%*19200 MFLOP/s/90 MB/s = 106 FLOPs/ byte for applications not to become bandwidth-bound. Since a floating point value is 4 bytes, that corresponds to 424 floating point operations per floating point value.

This is why on-chip data re-use is such a hard requirement. You're already fighting against the tyranny of the global bandwidth. And to further pile on, the Khronos OpenCL specification does not address inter-core memory access.

Because Adapteva and Browndeer Technology are pretty open about things, you have access to the ESDK routines if you would like to extend your OpenCL code with inter-core communication optimizations.
User avatar
jar
 
Posts: 295
Joined: Mon Dec 17, 2012 3:27 am

Re: Some questions about OpenCL programming and epiphany

Postby elysium5290 » Wed Oct 21, 2015 4:03 am

Hi , jar

Thanks for your explain.

I got it clearly .


Eric
elysium5290
 
Posts: 8
Joined: Sat Mar 28, 2015 7:12 am


Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 8 guests

cron