1) Why does it get so hot? I can't seem to use passive cooling on the Parallella... Is it the SRAM? And will that hurt the chip's ability to scale up?
2) Why does an MIMD processor use less power than an SIMD one?
3) From most of the benchmarks I've seen here, reading and writing to shared memory is about 150 MB/s. But the off-chip bandwidth is supposed to be 8 GB/s. Is the slow speed because of the FPGA? Could the board be redesigned to have some dedicated shared memory closer to the chip? (something like an L2 cache). My SUMMA implementation is scalable, so that memory transfer to/from shared DRAM is really what's killing the speed.
FYI, these are basically the results I'm getting for the paper... I'd imagine this could be improved quite a bit but it's good enough for a term paper for a single class
