Given an 1D vector v∈Rm, the 1D DFT is F⋅v where F∈Rm×m is defined as
Fj,k=ωj×k,0≤j,k≤m−1
where ω is a complex number whose m's power ωm=1
ω=e2πmi=cos(m2π)+isin(m2π)
2D DFT of an m×m matrix V is FVF, which is equivalent to do 1D DFT on all columns independently, and then all the rows.
Applications
FFT is often used in frequency filtering (convolution theorem), compressions (same idea), and interestingly, solving Poisson's equations.
The Poisson's problems can be written as solving L1X+XL1=B and note that 2D FFT works similar to Eigen-decomposition, actually we have that L1=FΛF−1 being a eigen decomposition where
Fj,k=m+12sin(πm+1jk),Λj=2(1−cos(πm+1j))
we can substitute L1=FΛFT and transform the RHS B by B=FB′FT, and then we can solve the whole thing as
Each Veven and Vodd has degree 2m−1, and we note that (ωj+2m)2=ω2jωm=(ωj)2, hence we are only evaluating m/2 different points.
Therefore, we have the algorithm
FFT_DnC
"""Assume that m is power of 2"""defFFT(v,omega,m):ifm==1:return[v[0]]v_even=FFT(v[0::2],omega**2,m/2)v_odd=FFT(v[1::2],omega**2,m/2)omega_precomp=[omega**iforiinrange(m/2)]returnconcat(v_even+omega_precomp*v_odd,v_even-omega_precomp*v_odd,)
The time complexity, by Master's theorem, is T(m)=2T(m/2)+O(m)∈O(mlogm).
Parallel FFT
Consider the stack tree of the recursive calls, it has m leaves and logm depth. Therefore, instead of doing top-down recursion, we can do bottom-up computations, which gives a natural way for parallelizing the algorithm.
Then, consider the recursive call, note that the divide is done by odd/even, or the last digit of the binary. Now, consider m=24, the stack tree looks like
Note that from left to right, the leaves are ordered by reversed bit order. Therefore, what we can do is to compute FFT at each level, rewrite the elements vi=(F⋅v)bitreverse(i).
Then, for p processors, the computation takes 2mlogm/p, for communications, we need log(p) stages and mlog(p)/p words.