Parallella Community

by **Mad_Scientist** » Thu Aug 01, 2013 3:34 pm

So I have been reading all I can recently about parallela and ephiphany, and in the past I have always been a fan of computer architecture.. staying as uptodate as i can with current trends and where technology is going.

one thing bothers me.. and it bothers me even more now after reading the extremetech article and the comments that follow.

We currently have a mindset that a program "has to be programmed to utilize multiple cores" ie.. the program should be multi threaded.

I dont think this is true. I think we need to stop thinking of architecture as "cores" and "streams" and start thinking in terms of total processing power. regardless of how information gets to that power. We need architecture and software that can turn ANY compute task whether single or multi threaded, into something that is scalable across ANY amount of cores.

Programming a piece of software for 2 or 4 cores is not going to help when 16 cores come around. and programming for 16 is not going to help when 64 comes around... and programming for 64 is not going to help when 4096 comes around.. and programming software X to be massively scalable is not going to help software Y at all.. i think this is something that needs to be hardware level, but if not, then programmed and exectuted on the chip via software.

Think about the human brain. it has "sections" that compute certain things... but at the root, do you think it cares that it has 10 or 10000000000 synapsises??? NOOO.. all it knows is "this guy is smart and has plenty of calcuating power here" .. and of course slower people dont have that ( hehe )

anyways.. i think someone needs to take a more organic approach to solving the multi-thread problem. I think a solution of this nature is something that can grow and not be limited to a number of cores, or a certain type of architecture. Run this "intelligent" software, and tell it how much processing power there is, what the addresses are.. and let it go to work. human brain style.

and please ignore me if i sound stupid because im not a computer engineer.. just kinda of voicing an opinion i have formed over time.

by **ysapir** » Thu Aug 01, 2013 4:26 pm

@Mad_Scientist, you sound very sensical to me.

The real advantage in getting your hands dirty with parallel programming is that you train your brain in "thinking" parallel and identifying parts of the problem that can be solved in a multithreaded fashion. It does not mean that all of the programs should be made parallel and/or scalable. Your proposed model is just as valid, and our emphasis should be on creating scalable programs that can run on 4-16-64-X cores just as easy. Thus your program should be using as much of the available resources as possible. If you limit your thought model to a multitude of concurrently running single-threaded programs, then you will not use the full potential of your system, in case there are just less tasks to perform than there are cores.

All in all, we designed the e-hal and e-lib API's to support concurrent runs of multiple programs, each using variable number of cores, according to your momentary needs. Other popular API's support this model too, with some variations on the name "workgroup".

by **Gravis** » Thu Aug 01, 2013 4:32 pm

by **mhonman** » Thu Aug 01, 2013 8:33 pm

Apologies in advance, this has become a mini-essay (or sermon?) and is loaded with opinion, possibly outdated views, and a fair amount of Parallel Processing 101 material that should be obvious to most who post here.

But, to the original question "is our mindset wrong", my CFD-tinged view is "yes and no".

Yes, because we are so accustomed to breaking down the solution to a problem into a sequence of steps, that it is hard to recover inherent parallelism at a later stage. So IMO the dreamt-of parallelizing compilers are a wrong-end-of-the-telescope approach.

No, because problem domains (in CFD at least) are continuous, and are described by continuous equations. The solution in every part of the problem domain is coupled to every other part. No matter how you slice and dice the parallelism in the problem, a naive solution demands a phenomenal amount of communication.

So in a nutshell there is no substitute for a smart person (or team) who have an excellent understanding of the problem domain and the limitations of the computing device at hand. Efficiencies differ by orders of magnitude between naive solutions and ones where simplifying assumptions have been introduced - so a clever solution may not need a parallel system in the first place.
The ideal is of course a clever solution which retains the inherent parallelism of the problem domain - but since this requires the engineers or scientists to "think parallel" that solution can be mapped onto any sufficiently flexible parallel computing platform.

Aside: There *are* other ways of solving these problems, one that used to be popular is analog CFD, often known as a wind tunnel. Other than the millions that it costs to build and operate a transonic wind-tunnel, and the time needed to produce a series of precision CNC-machined models, it works quite well.

Jokes aside, there are programming languages that make it easier to express the solution to a problem in concurrent terms - some of these have their own sub-forums here and they would be worth investigating before attempting to write parallel programs in C.

But I digress...

In order to really reap the benefits of parallel processing both the system and the software must be scalable, i.e. overheads remain constant when the problem size and parallelism increase in proportion to each other.

That in turn means that a good architecture should be a one in which you get more of everything when the number of processing elements is increased - more memory size, memory bandwidth, and communication bandwidth - and processing elements are functionally independent of each other. That allows a big problem to be broken down into a multiplicity of localised partial solutions, without the processing elements having to contend for shared resources (and therefore have performance limited by the speed of those resources).

As an architecture, Epiphany does all of this really well. In particular communication is low-latency and fast in comparison to its computational power, and non-localised communication is not prohibitively expensive.

And we have to remember this is not an HPC cluster we are discussing, it is a single inexpensive chip - so yes, it has its limitations (per-core RAM in particular).

However all of these really neat features are of no use unless we are able to "think parallel" in devising algorithms that solve our problems. For example, to think in terms of what happens at a cell in solution space and how it interacts with adjacent cells (cellular automata are a classic example), rather than thinking in terms of a series of operations that are performed on arrays or matrices - scalable parallelisation of which is usually predicated on an infinitely fast global communication fabric.

Once a parallel algorithm has been devised, it should usually be possible to adapt it for implementation on whatever computing device is to hand - though IMO a MIMD archictecture like Epiphany is relatively easy to target because the programmer is not restricted to having all the processing elements operating in lock-step, and is thus not forced to think parallel at all times.

That said, 20 years ago people were successfully implementing some pretty sophisticated CFD methods on the 65536-core SIMD Connection Machine.

by **Mad_Scientist** » Fri Aug 02, 2013 12:14 am

by **hamster** » Fri Aug 02, 2013 12:30 am

Hi Mad Scientist,

I don't know about you, but my brain is absolutely useless at performing calculations and passing messages to my fingers. Even with the sharpest pencil and a really big and really strong cup of coffee I still have to reach for my calculator and it takes seconds to scribble large numbers.

I don't think it makes a good model to adopt when processing highly numerical workloads... It is much better at thinking about the workload (e.g. designing methods or coding algorithms to solve a problem), but very, very poor about performing the work.

Actually, there must be a heck of a lot of waste in the brain when performing calculations with a paper and pencil...

eyes -> vision -> pattern matching -> object isolation -> long term memory (for + table) + short term memory (for carry) + magic -> answer -> mapping to pattern -> controlling fingers -> writing digits -> vision -> pattern matching -> feedback for writing.

by **markrose** » Fri Aug 02, 2013 3:59 am

by **hewsmike** » Fri Aug 02, 2013 6:48 am

Well I guess a word like 'wrong' needs to lain against an expectation or measure. I would pragmatically ask 'Is our mindset fit for task at hand ?'.

So with that conceptual benchmark defined - for this answer that I will give here - I believe the history of IT is replete with techniques and approaches that each have blossomed or suffered depending upon the domain of application. I really do agree with Fred Brooks ( Mythical Man-Month ) that there is 'no silver bullet'.

Thus here and now that devolves to finding problems for which the Parallella equipment will give significant/huge advantage to the solution of ( rather than stuffing any old question/algorithm onto an Epiphany chip ). So we already know of such utility for one class of problems at least - signal processing. That's because that was a design goal from the get-go and Epiphany architecture reflects that. No surprise.

Perhaps then a fruitful approach is attempting to analogise ie. ask what problems have sufficient structure or correspondence to signal processing in the manner that Epiphany deals. Now upon browsing & examining the Epiphany instruction set I would shine the spotlight on fused multiply-add ( and subtract ), because with that you can quickly accumulate a sum of products :

A <- A + B*C

this rapidly performs an inner product of real vectors, and if you are clever with using FMADD and FMSUB in the same loop you can deal with inner products ( in the correct Hermitian/conjugate sense ) of complex number vectors, as

A <- A - B*C

is equivalent to

A <- A + (-B)*C

So that observation alone snaps open a whole class of applications for which linear algebra constructs and techniques are used. There's at least half of any engineering field you may care to mention right there.

But where's the parallelism? Answer : in your problem -----> cue domain expertise.

Hence this my conclusion : find problematic systems ( they're the ones needing solutions ) that are at most loosely coupled internally, for which linear algebra phrasing can sufficiently represent.

Cheers, Mike.

by **Gravis** » Fri Aug 02, 2013 2:44 pm

by 8l » Sun Aug 04, 2013 11:08 pm

FYI, Simulating 1 second of real brain activity takes 40 minutes and 83K processors.
http://gigaom.com/2013/08/02/simulating ... rocessors/

well, whatever it is, we have to make something new.

Parallella Community

Is our Mindset Wrong?

Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Re: Is our Mindset Wrong?

Who is online