Any news on this ?
I also got some runs of just 'degraded' performance rather than 0 (but in fact, it's not runnin more slowly, it just crashed after some iterations IIRC), but I think it's timing dependent and maybe the new driver do something slightly differently causing this.
But in any case, it's not normal ... and I really need this to work. To deal with the limited 32k, I juggle data in/out manually but that's by small blocks and has to be controlled by the core rather than the ARM. not using the dma and using memcpy on the core is just _way_ too slow, I might as well run everything on the host ...