FFT is a library which performs Fast Fourier Transforms. The FFT is an algorithm for calculating Discrete Fourier
Transforms (DFT). The DFT of the
complex
numbers
is the
complex numbers
, given by
In this document,
.
The reverse transform is given by
In the above equations, there is no scaling on the forward transform,
and the reverse transform is scaled by
.
An alternative and equally valid convention is to scale the forward transform by
and have no scaling on the reverse transform, or to scale both transforms by
.
The FFT in MathKeisan for Itanium® Processor Family (IPF) are from the
Intel® Math Kernel Library (MKL). They have the MKL FFT and DFT
interfaces. For more information, see the
MKL Manual. The default location is
/opt/MathKeisan/MKL/doc/mklman.pdf,
or the relative path
../../../MKL/doc/mklman.pdf.
The remainder of this chapter only pertains to the FFTs in MathKeisan for SX.
Software packages for FFTs differ greatly in functionality, and there is no standard user interface. The SX MathKeisan FFTs contain the same functionality as Cray® LibSci™ 3.1 and HP MLIB VECLIB [3]. The interfaces are the same except that the size of work arrays in MathKeisan LibSci™ differ from those in Cray's LibSci™. The FFT Man Pages tell how to compute work array sizes.
MathKeisan FFT supports these types of transforms:
They are all available for these data type mappings:
User interface information is available from several sources:
ZFFTMLT from the
command-line, type man zfftmlt. If you have difficulty viewing man
pages, please see the Man Pages Chapter.
PARFFT is an OpenMP parallel version of FFT. It has the same
user interface as FFT, so any code that is linked to FFT can alternatively
be linked to PARFFT. If the environment variable
OMP_NUM_THREADS is set to np, then
PARFFT will run on np threads. If
OMP_NUM_THREADS is unset, PARFFT will run on
mp threads, where mp is
the maximum number of processes in the resource group or node.
If we let
then the DFT is given by the matrix-vector multiplication:
If the calculation of the powers of
is ignored, then the floating-point operation count for the DFT is
.
In 1965 Cooley and Tukey
[1] developed the FFT
algorithm to calculate the DFT. When
is a power of 2, the FFT algorithm has the greatly reduced operation count of
. For general
, the operational count is
.
Unknown to Cooley and Tukey at the time, the FFT algorithm was proposed
earlier by Gauss in 1805, and
published posthumously in 1866
[2].
A derivation of the Cooley-Tukey algorithm is shown in Van
Loan [4]. To get an idea of the algorithm,
consider the case where
is a power of 2.
Start by substituting
into the DFT
equation to get:
Now set
. For
, split
into even components
and odd components
.
Substituting into the equation above gives:
Use the symmetry property
to get:
Next, split
into the first
entries and the second
entries as follows:
For the
equation use the symmetry properties
and
to get:
We have taken the original DFT of size
and an operation count
and replaced
it with 2 DFTs of size
and operation
count
. This can be continued for
steps to replace the original DFT of
length
by
DFTs of length 1. This is called a radix-2 FFT. The algorithm can be
extended beyond powers of 2, to general
.
The splitting process above corresponds to the Decimation In Time (DIT)
Cooley-Tukey algorithm. There are a total of 8 splittings. The splitting used
in the MathKeisan FFT library is the Decimation In Frequency (DIF) Stockham
algorithm. A derivation of this algorithm is shown in Van
Loan [4]. The DIF Stockham algorithm has an
operation count equal to that of Cooley-Tukey, but the order in which the
operations are performed is different.
The FFT calculation proceeds as a series of steps of radix
. The
are generally small prime numbers from the factorization
.
The vector length during the calculation of the radix
step will be
;
at the radix
step it will be the product
; at
the radix
step it will be:
If
is small, then the vector lengths in the FFT calculation will be small.
Some applications require a number of 1-d FFTs of the same length.
If the length of the FFTs is
,
and the number of FFTs is
,
then the multiple 1-d FFT can be written:
Here it is understood that
and
are 2-dimensional arrays of complex numbers. In a multiple 1-d FFT, the vector
length at the radix
step of the calculation will be:
This larger vector length gives performance advantages for multiple 1-d over 1-d FFTs.
The 2-d FFT of the array of complex numbers
, for
and
is
the array of complex numbers
given by
Now write:
Then
is given by
Here the 2-d FFT is replaced by the two multiple 1-d FFTs of the
equations for
and
above.
By similar methods the 3-d FFT can be replaced by three multiple 1-d FFTs. Internally in MathKeisan subroutines 2-d and 3-d FFTs are calculated by calls to multiple 1-d computational kernels. This is known as the row-column algorithm.
The real-to-complex FFT has symmetry properties which give
about a two times improvement in operation count and memory
usage over the complex-to-complex FFT. To show the symmetry properties, write the DFT in
terms of the trigonometric functions
and
to get:
If
is real, then
is complex conjugate symmetric about
. This allows for storing only
half of the values of
. The value
is refered to as the DC
frequency. If
is even, then the
value
is refered to as the
Nyquist frequency. By the identities
and
it can be seen that
and
are real.
For the real-to-complex FFT with real input vector of size
,
to obtain the speed and memory advantages, the output vector should be complex
with size
. For the complex to real
FFTs if the input vector is complex with size
, the output vector should be real
with size
. The reason for the size
rather than
is because the redundant zero parts
of the DC and Nyquist frequencies are stored. Some of the LibSci™ and VECLIB
interfaces take advantage of space saving, and some do not. For interface
information, see the man pages.
The multiple 1-d, 2-d and 3-d FFT subroutines have data stored in two or three dimensional arrays. If the leading dimension of an array is a high powers of 2, bank conflicts will result in poor performance. It is best to set the leading dimension of the arrays to an odd numbers to avoid bank conflicts.
| Name† | Prefixes | Description | |
|---|---|---|---|
| LibSci | ?FFT
| CC ZZ
| 1-d Complex-Complex FFT. |
?FFT
| C Z
| 1-d Complex-Complex FFT. | |
?FFT2
| C Z
| 1-d Complex-Complex FFT. | |
?FFT
| SC DZ
| 1-d Real-Complex FFT. | |
?FFT
| CS ZD
| 1-d Complex-Real FFT. | |
?FFT2
| RC DZ
| 1-d Real-Complex FFT. | |
?FFT2
| CR ZD
| 1-d Complex-Real FFT. | |
?FFT2D
| CC ZZ
| 2-d Complex-Complex FFT. | |
?FFT2D
| C Z
| 2-d Complex-Complex FFT. | |
?FFT2D
| SC DZ
| 2-d Real-Complex FFT. | |
?FFT2D
| CS ZD
| 2-d Complex-Real FFT. | |
?FFT3D
| CC ZZ
| 3-d Complex-Complex FFT. | |
?FFT3D
| C Z
| 3-d Complex-Complex FFT. | |
?FFT3D
| SC DZ
| 3-d Real-Complex FFT. | |
?FFT3D
| CS ZD
| 3-d Complex-Real FFT. | |
?FFTM
| CC ZZ
| Multiple 1-d Complex-Complex FFT. | |
M?FFT
| C Z
| Multiple 1-d Complex-Complex FFT. | |
?FFTMLT
| C Z
| Multiple 1-d Complex-Complex FFT. | |
?FFTM
| SC DZ
| Multiple 1-d Real-Complex FFT. | |
?FFTM
| CS ZD
| Multiple 1-d Complex-Real FFT. | |
?FFTMLT
| R D
| Multiple 1-d Real-Complex, Complex-Real FFT. | |
?FTFAX
| F D
| Generate FFT factors and Calculate Trig tables, Real. | |
?FTFAX
| C Z
| Generate FFT factors and Calculate Trig tables, Complex. | |
XERFFT
| FFT error handler. | ||
| VECLIB | ?1DFFT
| C Z
| 1-d Complex-Complex FFT, input/output Complex. |
?1DFFT
| S D
| 1-d Complex-Complex FFT, input/output Real. | |
?RC1FT
| C Z
| 1-d Real-Complex, Complex-Real FFT, input/output Complex. | |
?RC1FT
| S D
| 1-d Real-Complex, Complex-Real FFT, input/output Real. | |
?2DFFT
| C Z
| 2-d Complex-Complex FFT, input/output Complex. | |
?2DFFT
| S D
| 2-d Complex-Complex FFT, input/output Real. | |
?RC2FT
| C Z
| 2-d Real-Complex, Complex-Real FFT, input/output Complex. | |
?RC2FT
| S D
| 2-d Real-Complex, Complex-Real FFT, input/output Real. | |
?3DFFT
| C Z
| 3-d Complex-Complex FFT, input/output Complex. | |
?3DFFT
| S D
| 3-d Complex-Complex FFT, input/output Real. | |
?RC3FT
| C Z
| 3-d Real-Complex, Complex-Real FFT, input/output Complex. | |
?RC3FT
| S D
| 3-d Real-Complex, Complex-Real FFT, input/output Real. | |
?FFTS
| C Z
| Multiple 1-d Complex-Complex FFT, input/output Complex. | |
?FFTS
| S D
| Multiple 1-d Complex-Complex FFT, input/output Real. | |
?RCFTS
| C Z
| Multiple 1-d Real-Complex, Complex-Real FFT, input/output Complex. | |
?RCFTS
| S D
| Multiple 1-d Real-Complex, Complex-Real FFT, input/output Real. |