After adding a 12x12 dma chained with the image window load: assembly version is under 1s, and the c version is just over 2s.
Yep, 1 epiphany cpu is now beating 1 arm core, even if it took hand-written assembly to do it ...
I also tried a bunch of other stuff like double-buffering and shaved a bit more off it, but it's getting a bit academic and none of the numbers are really verified.
FWIW I tried aborting the read-ahead dma, despite your warning on the other thread. I write 0 to the dma config register immediately followed by the new descriptor. They're writing to the same buffer. It appears to work and shaves another 0.06s off the running time (but as it's very tricky to validate i haven't verified the total calculation; it may just be luck it appears to work with my test case).Statistics: Posted by notzed — Wed Aug 28, 2013 12:15 pm
]]>