e-cores cores, complexity tradeoff
Posted: Tue Mar 07, 2017 3:14 am
how 'big' could you make individual e-cores before the design ethos is wasted,
or to put another way,
where is the 'point of diminishing returns' in adding transistors to a single core, r.e. Throughput.
(e.g. how much superscalarity, deep pipelining etc can you add before adding more cores would have been better for overall throughput)
(I seem to remember it's already a dual-issue design)
I note the epiphany 5 has a 64bit registers hence has gone to SIMD for 32bit throughput, (and I dont know, but would guess 'deep-learning extensions' might involve 16,8bit packed datatypes too..).
Was this a sweetspot - it doesn't place extra program complexity on double-precision scientific code, whist the SIMD idea might be unavoidable for maximum efficiency with low precision datatypes (for video/AI); was there ever any consideration of moving to a CELL like setup with 128bit registers (which might have scared people off r.e. the dual complexity of many-core *and* simd).. I know packed SIMD is largely seen as inflexible.
there have been machines with 128bit registers and component-oriented ISAs (e.g. dot & cross-product instructions, broadcast/swizzles on multiplies. CELL itself was a bit crazy in *only* having 128bit load/stores and in some pathological cases people were advised to pad smaller types up
or to put another way,
where is the 'point of diminishing returns' in adding transistors to a single core, r.e. Throughput.
(e.g. how much superscalarity, deep pipelining etc can you add before adding more cores would have been better for overall throughput)
(I seem to remember it's already a dual-issue design)
I note the epiphany 5 has a 64bit registers hence has gone to SIMD for 32bit throughput, (and I dont know, but would guess 'deep-learning extensions' might involve 16,8bit packed datatypes too..).
Was this a sweetspot - it doesn't place extra program complexity on double-precision scientific code, whist the SIMD idea might be unavoidable for maximum efficiency with low precision datatypes (for video/AI); was there ever any consideration of moving to a CELL like setup with 128bit registers (which might have scared people off r.e. the dual complexity of many-core *and* simd).. I know packed SIMD is largely seen as inflexible.
there have been machines with 128bit registers and component-oriented ISAs (e.g. dot & cross-product instructions, broadcast/swizzles on multiplies. CELL itself was a bit crazy in *only* having 128bit load/stores and in some pathological cases people were advised to pad smaller types up