There has been a little discussion about this subject here:
http://forums.parallella.org/viewtopic.php?f=8&t=314Essentially you have a number of options, depending on what type of FFT you want to fit on the limited chips estate. I was looking for a replacement of the PC-power that's available in the Open Source FFTw3 library which compiles on modern Intels, and is also available for Cuda (on NVidia's GPUs).
The Arm cores can do computations on FFTs, possibly accelerated by one or two (I don't know if both cores have it) NEON parallel processing, you could use FPGA-fabric, or Xilinx' FPGA IP blocks which implement various efficient FFT blocks, also available with the Free Webpack design tools, and of course, the 16 or 64 Parallella cores should be usable to lightly-parallelize FFT computations of various dimensionality, speed and size.
T.V.