Page 1 of 2

Introduction to parallel programming with 'foreach'

PostPosted: Sun Sep 01, 2013 12:23 pm
by David.S
This blog post may be interesting to some of you. It's an introduction to parallel computation with the 'foreach'-package.
Looks like going parallel in R can be pretty straightforward once we have epiphany support.

Re: Introduction to parallel programming with 'foreach'

PostPosted: Tue Sep 03, 2013 2:34 pm
by censix
You are right, 'foreach' is certainly one possibility to parallelize R functions.

Also keep in mind that, I think as of R version 2.14 or so, certainly since 2.15, the standard R install contains the 'parallel' package and functions such as 'mclapply', a parallel version of 'lapply', that enables concurrent function evaluation. The tricky bit however is in the detail. 'mclapply' can only do concurrency up to the number of processor cores, ie. 4, out of the box.

Making this work on an E16 would be great!

Re: Introduction to parallel programming with 'foreach'

PostPosted: Tue Sep 10, 2013 7:42 pm
by cozymandias
The simplicity of being able to use foreach and plug a couple of Parallellas in whenever I want more power was a big motivator in my decision to purchase some boards. The best part about foreach is that it scales very easily, since the same code runs on single or multiple cores. Which parallels well (pun intended) with the scaleability that Parallella allows.

As censix mentions, R has done a good bit to build/integrate core parallel functions for more flexibility and use in tailored applications, so there are several options.

I haven't received my boards yet, but I'm hoping they come in soon and epiphany support isn't too far behind.

Re: Introduction to parallel programming with 'foreach'

PostPosted: Wed Sep 11, 2013 6:59 am
by 9600
cozymandias wrote:I haven't received my boards yet, but I'm hoping they come in soon and epiphany support isn't too far behind.

Let me know if you (or anyone else for that matter) would like to try this out on a prototype, as I can easily provide network access to one.



Re: Introduction to parallel programming with 'foreach'

PostPosted: Fri Sep 13, 2013 11:59 am
by censix

I agree that 'foreach' is a nice wrapper for paralellization, however, the package itself is just that, a wrapper for parallel functions that have to be implemented elsewhere. Either by the native 'parallel' package, or the 'doSNOW' package, or the 'doMP' package.

so what is needed is a an R package, maybe called 'doEpiphany' or similar, that provides the paralellization, i.e. instantiates the workers ...

Re: Introduction to parallel programming with 'foreach'

PostPosted: Tue Sep 24, 2013 10:21 pm
by gtg302v
I've been using foreach for the last few days, and on the surface it's very easy, but i've run into a few cautions that i'll share for anyone interested:

all the examples i've seen compute a single quantity within the foreach loop that can be combined with a built in .combine argument (such as rbind). In some analysis i'm running now, what i need to happen in the loop is more complex and requires the storage of several variables of different data types. This is still possible either by writing your own combine function, or by returning all objects you want to preserve as a list in the last line of code within the foreach loop eg:

result<-foreach( iterator, .combine, parameters etc) %dopar% {

a<-some code
b<-some code
c<-some code


then result will be a list of lists that you can iterate over and recombine as you desire...the bigger sticking point for me was memory management. Without some tweaking the parallel backend packages basically duplicate everything required for the code within the foreach loop to execute on each worker, so in my case i have two lists with training and validation data each containing about 30 million records. My original iterator was just an index on a data frame with an id from the test set and an id from the validation set (objective is to return probability that the data in each set is from the same source)...the workspace is consuming about 6 GB of memory just with all the objects loaded to start with, so the first time i did this it blew up pretty quickly....the Iterators package helps alleviate this....

so my conclusion...foreach is definitely a nice wrapper to make parallel processing accessible, but for some applications it does require a bit of digging and probably restructuring your code to make it perform well, as with pretty much all packages :)


Re: Introduction to parallel programming with 'foreach'

PostPosted: Thu Sep 26, 2013 6:13 pm
by censix
Useful to know. Thanks for those insights.
How long does it take when you run this over 30Million test + 30Million validation data ? (and with how many processor cores?)
I suppose you have at least 16 Gb RAM ?

Re: Introduction to parallel programming with 'foreach'

PostPosted: Fri Sep 27, 2013 3:18 pm
by gtg302v
Through a series of unfortunate late night amazon sorties i have 24 GB of memory on my machine :)

The performance gain i got using foreach is about a factor of 3 with 4 processors supporting 4 workers.

In each iteration of the loop i'm pulling a sample out of the test set that's five columns and 300 rows and a set of data out of the training set which is 5 columns and anywhere from a few thousand rows to many hundreds of thousands of rows and computing several quantities. Both sets are stored in lists where each item in the list is either a data set for a specified device in the training data or a test sequence id in the test data---indexing this way, or some other way maybe through a database connection is essential since subsetting a huge data frame is slowwwww.

I compute the times between samples of the test sequence, pull the precomputed times between samples for the training device data and compare their distributions with KS test, compute correlation of three of the columns in both data sets and their differnces, compute different in means for one column in each, and compute differnece in mode sampling rate between the two sets.

then pull a precomputed glm object from a list (that was built solely in the training data), and get predictions from those computed quantities.

The actual length of the iteration is only 90024 (each iteration gets a chunk from the 30 mil + 30 mil data sets) and it takes about 2.5 hours in parallel with 4 workers on my machine (AMD A8 @3.0 Ghz), sorry if that was misleading in my first post.

The first time i tried to do it i was just subsetting the data frames with 30 millionish rows in each and a rough guess is that it would take 3 or 4 days to run subsetting that way. So i have a pre processing script that just subsets the data frames and stores them in a list...this takes an hour or so to run, but then subsetting is quick. probably better ways to do it but this is my first time really messing with large data sets and it's the way i got it to work :)


Re: Introduction to parallel programming with 'foreach'

PostPosted: Mon Oct 14, 2013 3:32 pm
by cozymandias
Sorry, I quit getting updates on this thread for some reason and just now thought to check back. Hopefully you've gotten things to work the way you need them to by now gtg302v, but the duplication you noted is something to look out for with these sort of tasks. It sounds like you handled it by pre-chunking, which makes sense. You might consider adjusting the size of your chunks a little more. I would expect a speed up of more than 3x when moving from 1 to 16 workers (if I've understood correctly), and even with 24GB of RAM, that could still be an issue with 16 workers.

@censix, agreed. My intention was just to show some support for foreach - I've been using it a bit lately and I appreciate it's simplicity. I think a "doEpiphany" package is a great way to frame things. Let's talk about what it would take to develop such an approach; I would be very glad to help. I'm still waiting on my boards, but maybe there's something that can be done before they get here.

Re: Introduction to parallel programming with 'foreach'

PostPosted: Tue Oct 15, 2013 1:55 am
by gtg302v
Just 4 workers for the time being (four cores on one CPU) so a speed up of a factor of three with four workers sounds about right based on other examples I've seen. The key for handling the memory issue for me was changing the iterator from a typical (i in 1:90024), and using i to index the lists, to using the list itself as the iterator (need the {iterators} package)