Re: Multi-core Neural Net Engine
Posted:
Sat Sep 30, 2017 9:41 am
by dobkeratops
r.e. pipelining 'network depth', I gather that can be problematic for training (need to backprop using an evaluated state of the whole network..)
and I was just thinking, for robot control, would you want to minimize latency? so you'd always want to parallelize across each layer as much as possible first?..
(I was going to say extending the post above there'd be the option of splitting network layers across different cores.. but that would prioritise throughput over latency?)
Re: Multi-core Neural Net Engine
Posted:
Thu Oct 05, 2017 4:31 pm
by claudio4parallella
THE MODEL
Hi! I'm building up a configurable NN per Core and I'd like to get suggestions from you experts about the possible "Model".
I'm imaging and testing different se up that I'll try to list here into words.
MODEL0: pre convolution of big matrix (images for example) could be made in parallel by the multicore frame to be faster (only the starting convolution phase);
MODEL1: One NN per Core: each Core-NN could be trained and do its duty indipendently. Multicore=Multiduty.
MODEL2: Each Core-NN is trained for the same duty and they work in parallel each one dedicated to a part of the total of inputs (the total of an image is divided into 16 rectangles, each one treated by each core-NN for example to look for a face)
MODEL3: Each Core-NN is trained to do one duty regarding the same input (an image for example) and in parallel the first is detecting Face, the second is detecting cats, the third is detecting dogs..... the last one is detecting characters....
MODEL4: only tha Matrix Algebra for big matrix is parallelized among the cores in order to provide the training faster or working for detection faster
MODEL5: each core could be used in parallel to test different combinations of layers of the NN to realize the more efficient training, with same input and output to be verified
Which is the model that could be of any value up to you for such a MultiCore Board, considering the fact that the elaboration is sequential layer by layer from Input to Output in a continuous loop to reduce the error?
Thanks for your considerations