Parallella Sorting

Generic algorithms that could be implemented in almost any language, e.g. matrix operations and FFT.

Parallella Sorting

Postby Megamax667 » Sun Jan 11, 2015 8:01 pm

So, this all began when I was looking through the forum and didn't see any examples of sorting simple data sets on the epiphany chip. So i made one. The reason i didn't put this under the examples forum is because it seems to be more than just an example. I also wasn't sure this was the place for it because this forum has mostly generic representations of algorithms. However, I think this is the best place for this example.

It takes in arguments to specify the data set that will be sorted. First it is sorted by the arm chip itself and timed and sorted by the epiphany chip and timed. The data set is randomly generated every time within the min max values given by the arguments. These arguments are INCLUSIVE. You can also select the number of numbers to generate, the columns of cores to use, and the rows of cores to use. Right now input is not double checked so make sure that your input is correctly formatted when run. If you would like to run the program you can simple execute the run.sh shell script and give it the arguments in the following order.
    rows = rows of cores to use
    cols = columns of cores to use
    data_size = The number or values to sort
    min = The minimum value allowed in the data set
    max = The maximum value allowed in the data set
    debug = Whether to print out debug statements or not. A 't' or 'f'


The limits on max and min are what can be stored by an integer so -2147483647 for min and +2147483647 for max. I've estimated that each core would be able to store about 7166 integers for sorting. FYI I have observed that after generating the array of random values and splitting it the most numbers in a single array is usually just over N/CORES where N is the data size and CORES is the total number of cores to be used. That being said the largest data size I've tried is 50000.
Ex.

Code: Select all
sh run.sh 2 4 20000 -112 235 f


In this case it will use 8 total cores comprised of 2 rows of 3 cols of cores. It will generate 20000 values between and including -112 and 235. It will not print out debug statements.

Here is a url to the source code posted on git hub. There is a compiled build of the project on the repository but there is also a build script of it as well.

For sorting right now the program simply uses bubble sort. I know that's slow but this is built in such a way that hopefully the addition of other types of sorting should be easy. As well bubble sort is a nice in place sort that doesn't require extra memory other than what is already taken by the list and it's easy to write and understand.

I'm low on free time right now however I plan to add several features to this code. A few of which are listed below.
    Double check input
    Better debug statements
    Double check lists are sorted correctly
    Possible re-compiling the sorted lists from the cores together to make a single list for output
    Adding other types of sorting: Merge sort, heap sort, etc.

So I hope you all enjoyed reading and glossing over everything. Again, I hope to make additions in the coming weeks and I also hope perhaps this helps some of you see the potential in parallel computing if you didn't see it already. Thanks again for reading!

P.S. I'm not the best coder I've only been doing it for a couple years and specifically in C of about a year so my code and conventions may not be perfect. That being said I am open to any suggestions, ideas, or criticism of my code. There is always room for improvement and that's what I plan on doing. :)
Megamax667
 
Posts: 7
Joined: Fri Jul 11, 2014 4:44 pm

Re: Parallella Sorting

Postby over9000 » Tue Jan 13, 2015 5:17 am

You should check out merge sort.

I'd ship the data to the cores in chunks, then use heap sort (can work in-place, but isn't a "stable" sort) on the data while it's there, and use other spare cores or the CPU to merge the sorted lists. Use the main memory available to the CPU as a form of backing storage, and ship the merged lists out to disk if that runs out.

edit: I didn't actually notice that you mentioned both heap and merge sort already. I'll just add that both add challenges and opportunities...

* effective merge sort will depend on how you lay out the programs running on each core, but will ultimately be limited by CPU<->Epiphany bandwidth
* actual merge step takes very little time, so it might be worth combining two heap sorts and a merge in each core
* heap sort has nice (fairly) predictable behaviour for adding new elements and removing the minimum (or maximum); you can use guaranteed worst-case behaviour to help scheduling
* it's also more efficient to add multiple data points in one go using heapsort rather than adding one at a time
* you could use multi-buffering to feed multiple heaps from the bottom (supplying one full level at a time) so that you're interleaving data transfer (DMA) with computation as much as possible
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am


Return to Algorithms

Who is online

Users browsing this forum: No registered users and 3 guests