FFT 32K

I want to program the FFT 32K in Parallella.
Input: 16bit 200Msps
Processing speed: latency <300nsec
Please tell me the structure and algorithms.
Input: 16bit 200Msps
Processing speed: latency <300nsec
Please tell me the structure and algorithms.