Please note that JavaScript and style sheet are used in this website,
Due to unadaptability of the style sheet with the browser used in your computer, pages may not look as original.
Even in such a case, however, the contents can be used safely.
User's Guide: FFT & PARFFTIntroductionFFT is a library which computes Fast Fourier Transforms. The FFT is an algorithm for calculating Discrete Fourier Transforms (DFT). The DFT of the complex numbers is the complex numbers , given by
In this document, . The reverse transform is given by
In the above equations, there is no scaling on the forward transform, and the reverse transform is scaled by . An alternative convention is to scale the forward transform by and have no scaling on the reverse transform, or to scale both transforms by . User InterfaceSoftware packages for FFTs differ in functionality, and there is no standard user interface. The SX MathKeisan FFTs contain the same functionality as Cray® LibSci™ 3.1 and HP MLIB VECLIB [3]. The interfaces are the same except that the size of work arrays in MathKeisan LibSci™ differ from those in Cray's LibSci™. The FFT Man Pages have information on work array sizes. MathKeisan FFT supports these types of transforms:
They are all available for these data type mappings:
User interface information is available from several sources:
PARFFT
PARFFT is an OpenMP parallel version of FFT. It has the same
user interface as FFT, so any code that is linked to FFT can alternatively
be linked to PARFFT. If the environment variable
A Brief Sketch of the FFT AlgorithmIf we let
then the DFT is given by the matrixvector multiplication:
The floatingpoint operation count for the DFT is if the calculation of the powers of is ignored. In 1965 Cooley and Tukey [1] developed the FFT algorithm to calculate the DFT. When is a power of 2, the FFT algorithm has the greatly reduced operation count of . For general , the operational count is . Unknown to Cooley and Tukey at the time, the FFT algorithm was proposed earlier by Gauss in 1805, and the work of Gauss was published posthumously in 1866 [2]. A derivation of the CooleyTukey algorithm is shown in Van Loan [4]. To get an idea of the algorithm, consider the case where is a power of 2. Start by substituting into the DFT equation to get:
Now set . For , split into even components and odd components . Substituting into the equation above gives:
Use the symmetry property to get:
Next, split into the first entries and the second entries as follows:
For the equation use the symmetry properties and to get:
We have taken the original DFT of size and an operation count and replaced it with 2 DFTs of size and operation count . This can be continued for steps to replace the original DFT of length by DFTs of length 1. This is called a radix2 FFT. The algorithm can be extended beyond powers of 2, to general . The splitting process above corresponds to the Decimation In Time (DIT) CooleyTukey algorithm. There are a total of 8 splittings. The splitting used in the MathKeisan FFT library is the Decimation In Frequency (DIF) Stockham algorithm. A derivation of this algorithm is shown in Van Loan [4]. The DIF Stockham algorithm has an operation count equal to that of CooleyTukey, but the order in which the operations are performed is different. Multiple 1d FFTs and Vector LengthThe FFT calculation proceeds as a series of steps of radix , where the are generally small prime numbers from the factorization . The vector length during the calculation of the radix step will be ; at the radix step it will be the product ; at the radix step it will be:
If is small, then the vector lengths in the FFT calculation will be small. Some applications allow for the calculation of a number of 1d FFTs of the same length simultaneously. If the length of the FFTs is , and the number of FFTs is , then the vector length at the radix step of the calculation will be:
This larger vector length gives performance advantages for multiple 1d over 1d FFTs. The multiple 1d FFT can be written:
Here it is understood that and are 2dimensional arrays of complex numbers. 2d and 3d FFTsThe 2d FFT of the array of complex numbers , for and is the array of complex numbers given by
Now write:
Then is given by
Here the 2d FFT is replaced by the two multiple 1d FFTs of the equations for and above. By similar methods the 3d FFT can be replaced by three multiple 1d FFTs. Internally in MathKeisan subroutines 2d and 3d FFTs are calculated by calls to multiple 1d computational kernels. This is known as the rowcolumn algorithm. RealtoComplex and ComplextoReal FFTsThe realtocomplex FFT has symmetry properties which give about a two times improvement in operation count and memory usage over the complextocomplex FFT. To show the symmetry properties, write the DFT in terms of the trigonometric functions and to get:
If is real, then is complex conjugate symmetric about . This allows for storing only half of the values of . The value is refered to as the DC frequency. If is even, then the value is refered to as the Nyquist frequency. By the identities and it can be seen that and are real. For the realtocomplex FFT with real input vector of size , to obtain the speed and memory advantages, the output vector should be complex with size . For the complex to real FFTs if the input vector is complex with size , the output vector should be real with size . The reason for the size rather than is because the redundant zero parts of the DC and Nyquist frequencies are stored. For interface information on the size requirements for the real and complex input and output vectors, see the man pages. Effect of Leading DimensionsThe multiple 1d, 2d and 3d FFT subroutines have data stored in two or three dimensional arrays. If the leading dimension of an array is a high powers of 2, bank conflicts will result in poor performance. It is best to set the leading dimension of the arrays to an odd numbers to avoid bank conflicts. FFT Routine List
Further Reading
