https://drive.google.com/file/d/0Bx4haf ... xtcEk/view
.. so similarities: 28mb on chip 'software-managed' (scratchpad?) memory,
differences: they seem to focus exclusively on 8bit multiples for inference only it seems, it seems far more focussed on one algorithm? I would much rather have the versatility of epiphany style risc cores.
So The TPU seems to be some sort of huge 8bit matrix multiplier with one giant scratchpad? , would the epiphany's memory architecture still offer advantage for Convolutional Neural Networks? (keeping filters in distinct scratchpads, closer to individual ALUs) - and multichip scalability. Maybe they can still keep some coefficients in the matrix array (I haven't quite read the details there..)
Also seems like it's driven by the host, with big CISC instructions (doing entire matrix multiply ops?)
Epiphany would be equally useful for training, I would guess, which the TPU isn't.
Any details yet on what they meant by deep learning instructions in the E5? .. I would have guessed that means some low precision support.. I would personally be happy even if it just had popcount (for cryptography?) because there are 1bit neural-net techniques out there. (Does 'communications' also refer to low-precision support .. encoding of signals ?)
I suppose this also confirms a relatively simple software library can be useful; they claim the back end is <1500 lines of code.
This hardware seems limited to a few functions.
Would the TPU have been a simpler chip to design and implement (e.g. 'more dependant on a host to drive it'); whilst I can imagine it would outperform the epiphany in 8-bit matrix performance, I still hope you could take an epiphany style design and skew it for different workloads - varying the number of functional units and custom instructions , whilst still having programability (e.g. like there are so many variations of ARM out there, some with FP, some without, some with NEON,some without..). I hope the right kind of SIMD can close the gap ('each instruction doing a load of multiply-accumulates').