Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.naic.edu/alfa/galfa/docs/meeting2_fft.ps.gz
Дата изменения: Mon Jun 30 23:39:11 2003
Дата индексирования: Sat Dec 22 06:09:16 2007
Кодировка:

Поисковые слова: m 97
TWO IMPORTANT ISSUES FOR FFT SPECTROMETERS
June 28, 2003
Carl Heiles
ABSTRACT
We discuss two important issues for power spectra derived from FFTing a time
series. These are weighting, which eliminates spectral leakage, and \overlapping
the chunks", which eliminates the nonessential increased variance that results from
dividing the time series into chunks.
1. INTRODUCTION
Power spectra can be derived from a time series using the Fourier transform. For a time series
consisting of N complex samples taken at intervals t smpl , the total time interval is T = N t smpl .
The power spectrum contains N frequency channels covering total bandwidth BW = t 1
smpl , with
the frequency separation between channels f = BW
N = T 1 . With N input points and N output
points, the sampling is exactly Nyquist, i.e. the output transform contains exactly the number of
channels required to completely specify the power spectrum of the input data.
2. RESPONSE TO A FREQUENCY SPIKE
If we specify the frequency resolution in the traditional way that radio astronomers do,
namely as f res = FWHM of the response to a monochromatic signal (hereafter, a spike), then
the resolution f res is less than the channel separation|speci cally, we have

fres
f

= 0:89.
Consider now the response to a spike. If the spike is exactly centered on a channel, then all the
power appears in a single channel and the spike is maximally recognizable; this is the optimum
case. On the other hand, in the worst case, when the spike lies halfway between two channels, the
response to the spike lies in more than one channel: most of the response lies in the two channels
surrounding the spike, but a nontrivial fraction of the power lies in more distant channels. When
signal/noise is low, this makes a spike hard to recognize in the presence of noise.
This spreading of the power across the spectrum is known as \leakage". The leakage power
/ K

sin x
x
 2
, where x = (2фfT ); here фf is the o set frequency from the spike and K is a
constant which depends on the exact location of the spike with respect to the computed channels
in the power spectrum. For example, if the spike lies exactly on a computed frequency channel
then there is no leakage whatsoever and K = 0; if the spike is halfway between then the leakage is

{ 2 {
maximal and symmetric with respect to the spike frequency, with K = 1; at other locations it is
nonsymmetric with respect to the spike.
The leakage is undesirable, particularly when there is interference. It can be reduced by
applying a window function W n to the N points of the time series. Here we will compare two
window functions: the uniform weighting, which weights all points equally, and the Welch window
function, for which the weighting is a parabola that falls to zero at the edges and is maximum in
the middle, i.e.
W n = 1

n N=2
N=2
 2
(1)
Applying the weighting has three e ects:
1. Windowing can very much reduce the leakage. Figure 1 shows the leakage for a spike halfway
between two channels, both for uniform weighting (solid line) and Welch weighting (dashed
line). The leakage is negligible for Welch weighting.
2. Windowing broadens the frequency resolution. For the Welch weighting,

fres
f

= 1:17,
which is about 1.3 times broader than for uniform weighting. This broadened resolution,
combined with the near absence of leakage, raises the power level of the two channels
surrounding the line. This makes it much easier to recognize a weak narrow line in the
presence of noise.
3. In e ect, windowing reduces the number of points that contribute to the power spectrum
because those time series points that are multiplied by small W n contribute little to the
power spectrum. This increases the variance. This can be xed as discussed in the next
section.
4. A related e ect is that the windowing makes all computed powers too small; one must
compensate by multiplying the power spectrum by the correction factor 1
W 2
n
.
3. VARIANCE IN THE POWER SPECTRUM
As explained above, windowing increases the variance of the power spectrum because, in
essence, fewer points contribute to the Fourier transform. Similarly, the frequency resolution gets
broader because the e ective length of the time series is shorter. With fewer points, the variance
increases; with broader resolution, it decreases. The two e ects work on the variance in opposite
directions and their e ects compensate.
In real life, we do not deal with a single N-point time series, but rather we average many
of them together. The above discussion carries over to this case directly if one does the naively

{ 3 {
Fig. 1.| Response of the FFT power spectrum to a spike located halfway between two frequency
channels. The relative power scaling for the two cases of uniform and Welch weighting is exact.
The two plots are identical except for the vertical scale.
obvious, as follows. We have M consecutive chunks of N-point time series, so that we have a
continuous time series of length NM with no gaps. We divide the NM points into M chunks.
The rst chunk consists of n running from 0 to (N 1), the second from N to (2N 1), etc.
Each chunk is completely independent of the others. When we average the M chunks together we
decrease the variance by this same factor M .

{ 4 {
However, because of the weighting, some of the points in the power spectrum don't contribute
much. Their contribution can be recovered by the following procedure. Instead of cutting the time
series of NM points into M chunks, cut it into 2M chunks. The rst chunk consists of n running
from 0 to (N 1), the second from N
2 to ( 3N
2 1), etc. Each chunk is not completely independent
of the others, so when we average the 2M chunks together we don't decrease the variance by the
full factor 2M . However, we do decrease it by more than the factor M|we gain with respect to
the naively obvious scheme. We can do even better by extending this procedure and cutting the
NM points into 4M chunks. We de ne the term overlap to mean the number of overlaps: with
2M chunks the overlap factor O is 2, with 4M chunks it is 4, etc.
It is clear that overlapping helps when we apply a Welch window, or any other nonuniform
weighting. What is not so clear is that it also helps with uniform weighting. This occurs because,
with independent nonoverlapping chunks, not all sample pairs taken within an arbitrarily placed
time interval T are able to contribute their systematic pairwise contribution to the Fourier sum.
Fig. 2.| The variance versus O, the number of overlaps. All points are normalized to the M-chunk
case with uniform weighting.
Figure 2 plots the variance versus the overlap factor O, both for uniform and Welch weighting.
We made this gure with a numerical simulation, so the plotted points have uncertainties, but
these are negligibly small. At O = 1, the curves show identical variance for the two weighting

{ 5 {
schemes, showing that the e ects of \fewer points" and broader resolution cancel exactly. At
O = 8, the curves are quite at, so the case O = 8 is suфciently close to perfection.
Compared to the O = 1 variances, the O = 8 variances are smaller by factors of 0.68 and
0.61 for the uniform and Welch weightings, respectively. The di erence between these two factors
re ects the di erence between the spectral resolutions: the broader resolution of the Welch window
gives smaller variance. If one were to use O = 1 instead of the extra computations to achieve
O = 8, then that would correspond to wasting integration time by these factors, because the
variance is inversely proportional to integration time.
4. THE RECOMMENDED TECHNIQUE
It is absolutely essential to use weighting, because this reduces the spectral leakage to an
acceptably small degree. The Welch window function is a good choice and is widely used. I
personally haven't done any serious work on comparing di erent ones. The Berkeley FPGA e ort
will use a Welch window.
Unless one is willing to accept a noise level that corresponds to less integration time than
was actually spent, it is essential to overlap the chunks. An overlap factor O = 8 minimizes
the variance to near-perfect values. Smaller overlap factors, e.g. 4 and 2, are progressively less
acceptable. In my opinion, 4 is the minimum that should be tolerated.
This work was supported in part by NSF grant AST-0097417.