A small update.
I coded what's necessary to convert the raw CSI packets into proper RGB video stream that I can feed to the AXI Video DMA core.
This works nicely and it streams 30 fps video right into the upper 32M of the RAM.
Unfortunately, having the ARM read it from there is a bit of an issue.
1) Using /dev/mem to map those 32M yields horrible performance because caching and things like that are all disabled for this zone.
2) Even copying from normal memory to normal memory yields pretty bad performance. It takes 32 ms to copy 8 Mbytes on my rev1. On another zynq board it takes 25 ms. So first something is wrong in the DDR config of the parallella, it should be faster. And second, even when that first issue is fixed, the copying is still pretty slow, 25 ms just to copy a full HD frame in RAW once mean you really can't do much at that rate.
All in all, I don't think it's possible at all to get faster frame rate from purely userspace ... will need a kernel driver that properly controls the DMA.
Here's a small video where it's captured at about 2 fps :
http://people.osmocom.org/~tnt/parallel ... 150621.mp4The tearing effect is because I can't even re-read one frame buffer fast enough before it starts writing it again N frames later ...
I should get a porcupine next week, once I ported my stuff to it and tested it works, I'll publish what I have ATM.