Parallela and #BigData

Parallela and #BigData

Postby loretoparisi » Thu Oct 02, 2014 11:37 pm

When you look at BigData analysis you have in mind platforms like Hadoop, Mahout, or Apache Spark and Storm and the latest but not least Yahoo! SAMOA.

All those distributed computation, machine-learning (ML) systems and platforms run on distributed pcs and cloud infrastructure like Amazon EC2 or Windows Azure or something else.

You have always the option to run them on you super expensive cluster anyway, BUT, what about a Parallela cluster?
Can a Ephipany cluster get rid of BigData and to act as a distributed system?

Parallela is a low consumption amazing parallel computation machine. 5W instead of 400W. A Parallela cluster can do much more of what we think crunching numbers for computation aims.

Most of these big distributed systems by ASF are both CPU-bound and I/O-bound. But they are Memory bound as well!
They can scale through the distributed cloud architecture as well.

So, if the take the Parallela board, and build our first Parallela cluster to crunch numbers and do #BigData analysis we should keep in mind that running Hadoop on Parallela, maybe it's not the right solution.

We should consider a new class of optimized software for these small server clusters since they have a remarkable computational power with restricted memory constraints.

So how to start with Parallela on #BigData? Which is the right software architecture?
That is the question.
loretoparisi
 
Posts: 1
Joined: Thu Oct 02, 2014 11:02 pm

Re: Parallela and #BigData

Postby sebraa » Mon Oct 06, 2014 12:50 pm

Parallel programs are not CPU-, I/O- and memory bound simultaneously, unless you have been able to tailor it perfectly to your hardware. Also, please remember that your BigData experience is limited to 512 KiB (minus 16 times the code size) on a 16-core Epiphany.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 4 guests

cron