[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/feed.php on line 173: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/feed.php on line 174: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Parallella Community Supercomputing for Everyone 2014-09-26T17:19:52+00:00 https://parallella.org/forums/feed.php?f=18&t=1703 2014-09-26T17:19:52+00:00 2014-09-26T17:19:52+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10982#p10982 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> From OpenCL you might be able to use some of the calls, but its going to be non-standard anyway. I thought
about this and the issue will be allocating a shared mutex - this really goes outside of OpenCL. I have been
working with the coprthr API just this morning. If I can I will try to follow up with some guidance and sample
code to show how to do it. The level will be that of pthreads which is really not different from OpenCL.
And for portability, you will essentially have a pthreads implementation, I think.

Statistics: Posted by dar — Fri Sep 26, 2014 5:19 pm


]]>
2014-09-26T16:50:17+00:00 2014-09-26T16:50:17+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10981#p10981 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>

Thanks for all the help everyone. If there is any other way to make the OpenCL version faster I would love to hear it, the eSDK is quite a bit more tedious to work with, so I would rather stay at a higher level if possible.

Statistics: Posted by stevenc — Fri Sep 26, 2014 4:50 pm


]]>
2014-09-26T16:23:09+00:00 2014-09-26T16:23:09+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10980#p10980 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
eSDK version:


@bytefx: I am using pkgconfig to locate those two packages. First of all, OpenCV needs to be installed (I built and installed it from source, though the apt-get version *may* work too). The COPRTHR library is installed by default but the pkg-config file is not. The two files I am using for pkg-config are here:



and they need to be put into `/usr/local/lib/pkgconfig/`. You may also have to adjust the files depending on where your libraries are installed.

Statistics: Posted by stevenc — Fri Sep 26, 2014 4:23 pm


]]>
2014-09-25T15:26:38+00:00 2014-09-25T15:26:38+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10967#p10967 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
i've been following this thread with much interest.

I tried compiling your code at .../local_memory on github.

and there seems to be a problem locating the stdcl.h include and lib. i downloaded and compiled the COPRTHR sdk and installed it.

Your code complains about missing packages:
a. coprthr
b. opencv

i would like to bench mark and study the current framework, it sounds efficient for a sobel edge detector. it would be good to compare to a canny edge algorithm vs. a plain vanilla gradient detection. from experience, generating the gauss is computationally expensive in canny edge and i was wondering what other optimisations exist?

pls can you add a readme file on how to compile and run your code on github or on this thread. My apologies in the distraction.

Statistics: Posted by bytefx — Thu Sep 25, 2014 3:26 pm


]]>
2014-09-18T10:42:06+00:00 2014-09-18T10:42:06+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10813#p10813 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
As far as making the kernel persistent, I will look at this since the issue is that you might not have access to the API via OpenCL or STDCL at the moment, but only through the "low-level" coprthr API which provides pthreads extended to co-processors, basically the best way to do it is to use calls provided by that API. In the end, the distinctions here are not so dramatic, but more practical. So let me look before making further recommendation. You could introduce a hack like some of the "Ping-Pong" code you might have seen for Epiphany where mailboxes are used, etc. The idea is you want the kernel to wait until signaled by the host; then it does the transform, signals the host, and then it goes back and waits. And of course you need a way to tell it you are done and it should exit. Its basically pthread programming. This would keep a persistent kernel.

Statistics: Posted by dar — Thu Sep 18, 2014 10:42 am


]]>
2014-09-26T16:27:15+00:00 2014-09-17T18:48:25+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10798#p10798 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
I have made a new version of the kernel that uses local memory for the data. It folows dar's pseudo-code except that the bb array is global to the tiles, though local to the kernel. It did not change the speed whether bb was local or global to the tiles. These changes have dropped the time down to ~0.42 seconds per frame, which is getting closer to the series speed, it's only about 3.5 times slower now, and it's about 14 times faster than I started with.

Better yet, if I continue to assume there is about .38 seconds of overhead (see post on 9-4), then the kernel computation only takes around 0.04 seconds per frame, which is very good. Seems like now the problem is just the kernel overhead? Perhaps the persistent kernel idea would help this?

This version of the kernel:

Statistics: Posted by stevenc — Wed Sep 17, 2014 6:48 pm


]]>
2014-09-15T23:04:03+00:00 2014-09-15T23:04:03+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10765#p10765 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> Statistics: Posted by dar — Mon Sep 15, 2014 11:04 pm


]]>
2014-09-15T12:16:40+00:00 2014-09-15T12:16:40+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10750#p10750 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> Statistics: Posted by sebraa — Mon Sep 15, 2014 12:16 pm


]]>
2014-09-12T14:52:42+00:00 2014-09-12T14:52:42+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10714#p10714 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> Statistics: Posted by stevenc — Fri Sep 12, 2014 2:52 pm


]]>
2014-09-12T01:27:03+00:00 2014-09-12T01:27:03+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10706#p10706 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> Statistics: Posted by dar — Fri Sep 12, 2014 1:27 am


]]>
2014-09-11T13:45:47+00:00 2014-09-11T13:45:47+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10699#p10699 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]> extremely drags down performance. In one instance for me, enabling 15 additional cores (executing code from shared memory, with local data) meant a difference between "one iteration per second" and "no iterations after 20 minutes yet" of the algorithm I use.

Statistics: Posted by sebraa — Thu Sep 11, 2014 1:45 pm


]]>
2014-09-10T16:46:27+00:00 2014-09-10T16:46:27+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10687#p10687 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
Making the kernel persistent does seem like a good idea, but a concrete example would help. I read through the blog post, but I'm not sure which part would make the kernel persistent or really how to go about doing this.

I have changed the code to use the shared memory as much as possible. So the grayscale image and the output image are placed directly in shared memory without needing to copy. I also switched the commenting on clflush and clwait. I didn't notice any difference in behavior with/without either of these calls, but I suppose wait is a good one to use anyway.

With this the time is down to ~0.67 seconds per frame, so it did help a little.

Thank you for taking the time to look into this

Statistics: Posted by stevenc — Wed Sep 10, 2014 4:46 pm


]]>
2014-09-10T14:52:53+00:00 2014-09-10T14:52:53+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10684#p10684 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
I noticed you must make a copy of the image to an allocation obtained with clmalloc() from one that I presume is stored within some OpenCV object. Ultimately it would be better to just use a clmalloc'd allocation of shareable device memory directly. The problem people run into is that, as in this case, they do not actually control the allocation. If you can provide OpenCV with an allocator, you can do what we do with std::vector<> and boost::multi_array where we do just this - provide a clmalloc() based allocator and then like magic our memory is shareable. (And for an even faster design you could use UVA which is not officially supported, but we had a switch for this in COPRTHR that effectively enabled a unified address space - for absolute speed that would eliminate all offload copies.)

Small point, I noticed you commented out clwait() but kept clflush() - you probably want to reverse this - clflush() was used to help along older GPU SDKs that were not designed to execute kernels as soon as they were enqueued. COPRTHR design for Epiphany will take up anything in the queue as soon as it shows up. However, since your calls use the NOWAIT flag you must wait on completion before using the results.

Its useful that your code is posted on github - let me try to take a look and see if any immediate suggestions come to mind as a start.

-DAR

Statistics: Posted by dar — Wed Sep 10, 2014 2:52 pm


]]>
2014-09-04T17:36:54+00:00 2014-09-04T17:36:54+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10596#p10596 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>
As for where the memory is placed. What I tried to do is place the entire frame in a single buffer in the device shared memory, so the frame would be transfered once and each epiphany core could just read the part it needs (there is no core-to-core communication needed). This was what I intended to do, but perhaps I misunderstand the architecture.

Statistics: Posted by stevenc — Thu Sep 04, 2014 5:36 pm


]]>
2014-09-04T16:20:53+00:00 2014-09-04T16:20:53+00:00 https://parallella.org/forums/viewtopic.php?t=1703&p=10592#p10592 <![CDATA[Re: Sobel is ~35 times slower using OpenCL]]>

viewtopic.php?f=8&t=256&hilit=alexander

Also, check out the Epiphany benchmark by openwall on 'bcrypt':

http://www.openwall.com/presentations/P ... Slides.pdf

Statistics: Posted by aolofsson — Thu Sep 04, 2014 4:20 pm


]]>