So as discussed in viewtopic.php?f=9&t=515, memory can be a real bottleneck for the epiphany, possibly much more so than raw processing power(CPU cores). The on-chip memory is really more like a cache for the most important program code and data, but any real application can't store even its current working data there(e.g. 500KB block of compressed bzip2 file).
With the parallella and a 16 core epiphany, one can sometimes work around it by streamlining reads to SDRAM as much as possible. But what if we have 64 cores, or multiple epiphanies? Then the bandwidth to the SDRAM gets even smaller, and it becomes ever harder to give all those cores something to do while whiting for that memory fetch.
Would it be possible to get a batch of slower RAM attached directly to the epiphany with a bit more bandwidth than 8GB/s? If each epiphany chip had like 10-100MB of extra RAM, with maybe 2byte per cycle per core bandwidth, that should make it much, much easier to make full use of all those cores, even for applications that need to do a lot of random memory accesses. It'd also improve scaling of having many epiphany cores on the same board.
So, from my amateurish perspective, it seems that such a strategy would only lead to improvements. Why wasn't it done? Are there technical issues? Would designing something like this cost a lot?