Neural Network

Re: Neural Network

Postby censix » Fri Sep 20, 2013 11:24 am

Well I would say that, whatever Nick, yourself and others have in mind for implementing 1-layer MLP training on
the Epiphany architecture, if that works, it would already be HUGE success! And it would be very useful indeed.
censix
 
Posts: 49
Joined: Sun Dec 16, 2012 7:54 pm
Location: europe

Re: Neural Network

Postby roberto » Fri Sep 20, 2013 2:44 pm

Thinking deeper, there is alternative to Back Propagation algoritm to train a Neural Network: to use Genetic Algoritm to evolve a population of weights till reach the element that sutusfy the NN. This is a interesting path to explore, cos it is demostrate that can give great speed up to train and in this case, parallela is a overkill because AG need a simple classic cpu to be runned. I played with GA in past and they can give extraordinary results (few minutes of calculation) where a classic algoritm "brute force" need centuries to produce a result. In this case, the Back Propagation algoritm is concettually a brute force system, no matter if multy-layer-few-neurons-per-layer or single-layer-with-many-neurons NN.
roberto
 
Posts: 39
Joined: Sat Mar 09, 2013 2:01 pm

Re: Neural Network

Postby roberto » Sun Sep 22, 2013 2:48 pm

censix wrote:Well I would say that, whatever Nick, yourself and others have in mind for implementing 1-layer MLP training on
the Epiphany architecture, if that works, it would already be HUGE success! And it would be very useful indeed.


Censix,
I fount the PDF where i got the info, here the transation:

There is a Math Law (Hornik K, Stinchcombe M and White ". (1989), Multilayer Feedforward networks are universal approssimator, in "neural netrworks", 2, pp,359-66) that demonstrate that a 3-layer Network (1st layer = input, 2nd layer = hidden, 3rd layer=exit) can approssimate every result with any degree of precision, if you choose the right number on neurons into the hidden layer with the right value of the weights.

I dont know more, i can only redirect you to that paper.
Hope it is helpfull.
roberto
 
Posts: 39
Joined: Sat Mar 09, 2013 2:01 pm

Re: Neural Network

Postby over9000 » Mon Sep 23, 2013 1:31 am

roberto wrote:Thinking deeper, there is alternative to Back Propagation algoritm to train a Neural Network: to use Genetic Algoritm to evolve a population of weights till reach the element that sutusfy the NN. This is a interesting path to explore, cos it is demostrate that can give great speed up to train and in this case, parallela is a overkill because AG need a simple classic cpu to be runned. I played with GA in past and they can give extraordinary results (few minutes of calculation) where a classic algoritm "brute force" need centuries to produce a result. In this case, the Back Propagation algoritm is concettually a brute force system, no matter if multy-layer-few-neurons-per-layer or single-layer-with-many-neurons NN.

I've highlighted the main thing in your post I wanted to reply to. Actually, I think that Epiphany could be quite well suited to running a Genetic Algorithm. There are a couple of points in its favour. First, there are plenty of papers out there (well, a few I've come across, anyway) describing how to use GPUs to speed up the most time-critical part of the GA: the fitness function. Second, let's not forget that GAs include a crossover step, and that each member of the population is effectively acting independently as a potential solution. The potential for using a multi-core system should be fairly obvious: provided each core can store a decent amount of potential solutions, each core can evolve separately, and periodically broadcast the fittest solutions it has found. Each other node can then check the fitness level against its own local fittest and either incorporate the new (and better) solution into its own population, or discard it.
Of course the devil is in the details. Even if you can use a GA to train the NN, you may find that porting the GA to the Epiphany isn't a good fit (what with all the weights that may have to be evolved). It's also possible to get side-tracked and start writing a GA that effectively just implements back-propagation rule in another guise (or at least does no better than it).
I'm not too sure what to make of the second bit that I've highlighted. You do know the difference between P and NP, right? It's relatively easy for a non-deterministic algorithm (in NP) to find a good solution, but it generally doesn't guarantee that you'll get the optimal solution.
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am

Re: Neural Network

Postby roberto » Wed Sep 25, 2013 10:50 am

over9000 wrote:
roberto wrote:Thinking deeper, there is alternative to Back Propagation algoritm to train a Neural Network: to use Genetic Algoritm to evolve a population of weights till reach the element that sutusfy the NN. This is a interesting path to explore, cos it is demostrate that can give great speed up to train and in this case, parallela is a overkill because AG need a simple classic cpu to be runned. I played with GA in past and they can give extraordinary results (few minutes of calculation) where a classic algoritm "brute force" need centuries to produce a result. In this case, the Back Propagation algoritm is concettually a brute force system, no matter if multy-layer-few-neurons-per-layer or single-layer-with-many-neurons NN.

I've highlighted the main thing in your post I wanted to reply to. Actually, I think that Epiphany could be quite well suited to running a Genetic Algorithm. There are a couple of points in its favour. First, there are plenty of papers out there (well, a few I've come across, anyway) describing how to use GPUs to speed up the most time-critical part of the GA: the fitness function. Second, let's not forget that GAs include a crossover step, and that each member of the population is effectively acting independently as a potential solution. The potential for using a multi-core system should be fairly obvious: provided each core can store a decent amount of potential solutions, each core can evolve separately, and periodically broadcast the fittest solutions it has found. Each other node can then check the fitness level against its own local fittest and either incorporate the new (and better) solution into its own population, or discard it.
Of course the devil is in the details. Even if you can use a GA to train the NN, you may find that porting the GA to the Epiphany isn't a good fit (what with all the weights that may have to be evolved). It's also possible to get side-tracked and start writing a GA that effectively just implements back-propagation rule in another guise (or at least does no better than it).
I'm not too sure what to make of the second bit that I've highlighted. You do know the difference between P and NP, right? It's relatively easy for a non-deterministic algorithm (in NP) to find a good solution, but it generally doesn't guarantee that you'll get the optimal solution.


Yes, i know difference between P and NP.

I bolded the point that to me is difficult to understand.,
Do you mean that (lets say) in a population of 64 member you can assign (in parallela64) one individual to one core? If it is, I think is not the right way because at every generation part of population die and is replaced by new memeber caming from crossover and/or mutation, so maybe is better to dedicate e64 to process the fitness in case it is heavy to be elaborated.

Second point, about "good" and "optimal".
It is only matter of words we use: if we say that if 100 is the target and 5% is the tollerance admitted, then a result of 98 can be considerated optimal. All in all, is for this that we develop NN and GA: to make accettable-solution-that-can-be-considerated-optimal.

I agree that fitness function is the critic point of GA, actually i never tried to implement a complex fitness function: anyway my demo software with GA was able to find THE BEST solution, ok, it was a very easy problem to solve, and fitness function was naive, but i got th best solution possible (100%).
The problem is the follow: a population of 64 elements, each element is an array of 512 byte. Initially elements are generated randomically, then GA make the population evolving (random crossover, one mutation on one byte of a new-born of each generation), the fitness is to find a particular element (array of 512 byte).
I ruin the sof diring tiping of this post, and finish to run now: fitness=0 (the best) time: 211 sec, the program is a monothread on a x86 at 1833Mhz.
Even if e64 can do the same solve in only 50% (and i don't think so), till the waiting time is so low, it is not necessary to move to parallel hardware as e64 or gpu.

Next time i will move to a serious topic: try to train weights of my 2-4-4-1 Neural Network using GA, because Back Propagation runned for days without find a solution.
roberto
 
Posts: 39
Joined: Sat Mar 09, 2013 2:01 pm

Re: Neural Network

Postby over9000 » Wed Sep 25, 2013 2:51 pm

roberto wrote:Yes, i know difference between P and NP.

Second point, about "good" and "optimal".
It is only matter of words we use: if we say that if 100 is the target and 5% is the tollerance admitted, then a result of 98 can be considerated optimal. All in all, is for this that we develop NN and GA: to make accettable-solution-that-can-be-considerated-optimal.


Hi again, Roberto. First off, I'm sorry if I'm coming across as being over-critical of you. It seems that all of my posts in this thread have been contradicting yours. I don't really mean to be so negative. I think you've got some great ideas, and I think it's great that you're working on them. My apologies if I'm coming across as as being too critical.

What I was getting at in the above was that while GAs (and NNs) may have excellent applications in some problem areas, and can find good solutions very quickly where other deterministic algorithms would take a long time to complete, they still have their limitations. As regards "good" versus "optimal", I'm coming at it from the point of view of an optimisation problem, where optimal really is the best solution. If you're writing a GA to train an NN, then I guess that degrees of optimality are allowed, so coming within a percentage of the best solution is definitely "good enough".

roberto wrote:
provided each core can store a decent amount of potential solutions, each core can evolve separately

I bolded the point that to me is difficult to understand.,
Do you mean that (lets say) in a population of 64 member you can assign (in parallela64) one individual to one core? If it is, I think is not the right way because at every generation part of population die and is replaced by new memeber caming from crossover and/or mutation, so maybe is better to dedicate e64 to process the fitness in case it is heavy to be elaborated.

No, I mean having one core store several solutions. This goes back to the discussion at the start of this thread about whether a single core should simulate one neuron or several. One neuron per node (or one solution in a GA per node) is conceptually simpler, but it's pretty limited. With a GA in particular, it would also be pretty wasteful since you'd be spending a lot/most of the the time on communications. What I was suggesting is to have each core responsible for keeping several solutions. Then, after a certain number of generations (or even after each generation), it could send its best solution(s) to the other nodes. Or you could have a more complex cross-over scheme between nodes, using weighted probabilities (higher fitness = higher chance of "breeding") just like you would for solutions that are in the same local population. That sort of scheme would require more traffic and would be trickier to implement, though, so the simpler method of just sending the best neighbouring solution (replacing the worst one locally) would probably end up being better (or at least good enough :))
Actually, it just occurs to me that there are some versions of NN training that don't do the back-prop stage immediately after presenting each element of the training set, but instead only does it periodically (eg, after the full training set is complete, or only after a certain number of elements have been presented). This scheme (or a variation of it) could also be used to good effect on the Epiphany cores. If you present a different subsets of the full training set to each of the cores, then you can have a map-reduce step that averages out the combined back-prop deltas and applies them to each of the local nets at once. You won't get a 16-fold improvement in speed (thanks to Amdahl's law), but it could certainly drastically reduce the time needed for the back-prop-based training...
Anyway, good luck with your experimentation!
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am

Re: Neural Network

Postby censix » Wed Sep 25, 2013 6:06 pm

There is a Math Law (Hornik K, Stinchcombe M and White ". (1989), Multilayer Feedforward networks are universal approssimator, in "neural netrworks", 2, pp,359-66)


@roberto
thank you for this reference. once i find it, I hope that in the proof of this result, the authors are able to somehow give a an upper bound to the num. of neurons necessary for the hidden layer. Otherwise their certainly nice result would (maybe) be impracticable to implement. I shall see.

@over9000 and roberto
With regard to training MLPs through 'alternative' methods , there is an interesting article that I read a cpl of weeks ago, which does not use Backprop, it does not use GA, but instead, something the author calls Cat Swarm Optimization (CSO). The first time I read the title, I had to laugh. :) After I read the arcticle, I was curious about it. Unfortunately, I don't have the time to implement it and the authors dont make any sourcecode available. The reason I am mentioning this is that one may be able to use one single E16/64 core to simulate a single or better a group of cats. Since the 'cat' behavior is reasonably complex, I think it would suit the MIMD capacities of the E16/64 very well.

So for those curious enough, have a look at the article:

I.J. Intelligent Systems and Applications, 2013, 01, 69-80
Published Online December 2012 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijisa.2013.01.07
Optimizing Artificial Neural Networks using Cat Swarm Optimization Algorithm
http://www.mecs-press.org/ijisa/ijisa-v5-n1/IJISA-V5-N1-7.pdf
censix
 
Posts: 49
Joined: Sun Dec 16, 2012 7:54 pm
Location: europe

Re: Neural Network

Postby CIB » Thu Sep 26, 2013 5:42 pm

censix wrote:
There is a Math Law (Hornik K, Stinchcombe M and White ". (1989), Multilayer Feedforward networks are universal approssimator, in "neural netrworks", 2, pp,359-66)


@roberto
thank you for this reference. once i find it, I hope that in the proof of this result, the authors are able to somehow give a an upper bound to the num. of neurons necessary for the hidden layer. Otherwise their certainly nice result would (maybe) be impracticable to implement. I shall see.


Mathematicians and researchers of theoretical CS *love* doing proofs of equivalence of representations, no matter how impractical the converted structure may be. Just look at the "NP in PSPACE" proof. So yeah, I wouldn't put too much hope into the idea of a practical upper bound. =)
CIB
 
Posts: 108
Joined: Sat Jul 13, 2013 1:57 pm

Re: Neural Network

Postby over9000 » Fri Sep 27, 2013 12:56 am

censix wrote:something the author calls Cat Swarm Optimization (CSO). The first time I read the title, I had to laugh. :)

I had a quick scan over it. It seems that it's like a combination of cats + toxoplasmosis (at least if it caused brain damage in the cat itself). Not the most endearing image. Nor, come to think of it, is the idea of herding cats :)
More seriously (perhaps), it did give me an idea for an alternative (and probably more or less equivalent) scheme. I started with the idea of Optimal Brain Damage, that a network is over-specified for a particular task, and can be pruned without loss of capability. Then, since I've had these things on my mind recently, I considered replacing a fixed activation weight with a "fuzzy" value that would fire pseudo-randomly within a given range (with the range becoming smaller as training progressed). The second thing I considered was to convert the uniformly-distributed fuzzy weight to have more structure. Specifically, it would be organised like a Huffman/range/arithmetic encoding tree. That was my idea of how to efficiently (and randomly, since we can generate many trees with the same mean probability distribution) come up with an over-specific way of representing first, the fuzzy range and second, the original axon weight. Then it occurred to me that evolving the OBD phase is akin to doing a basic knapsack algorithm. In the original paper, they look to cull the neurons that have the least influence, but if you use a knapsack-based GA, your phenotypes would code the presence or absence of each subtree (deep structure of the axon's weights) within the optimised OBD output stage.
In the paper, it's individual neurons that atrophy and are removed to achieve "optimal brain damage", but in this scheme, each neuron has deep structure at the start, but that would gradually simplify to the point of becoming either simple fuzzy variables (first level of simplification) and then, with narrowing range simplifying to simple, non-fuzzy values.
OK, so even if this worked, it would be pretty unconventional. At least it might have some advantage over traditional multi-layer approaches in that the internal tree structure can have arbitrary levels of complexity. For this to work, though, it probably needs to have multiple thresholds that are mapped to the (quantised) output activations.
All in all, maybe an interesting concept, but it's pretty half-baked. I'm not even sure if it would work...
over9000
 
Posts: 98
Joined: Tue Aug 06, 2013 1:49 am

Re: Neural Network

Postby nickoppen » Tue Oct 01, 2013 12:11 am

Hi Guys,

Here is the second part of the design - Training. It is a little more complicated that the feed forward pass so please give it a read and let me know if I've missed anything.

http://nicksparallellaideas.blogspot.com/2013/10/training-in-parallel.html

Thanks,

nick
Sharing is what makes the internet Great!
User avatar
nickoppen
 
Posts: 263
Joined: Mon Dec 17, 2012 3:21 am
Location: Sydney NSW, Australia

PreviousNext

Return to Artificial Intelligence

Who is online

Users browsing this forum: No registered users and 1 guest