Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass96/reprints/buskoi.pdf
Дата изменения: Wed Jan 14 23:12:33 1998
Дата индексирования: Tue Oct 2 11:41:23 2012
Кодировка:

Поисковые слова: р п р п р п р п р п р п р п р п р п р п п р п п р п
Astronomical Data Analysis Software and Systems VI ASP Conference Series, Vol. 125, 1997 Gareth Hunt and H. E. Payne, eds.

Error and Bias in the STSDAS fitting Package
I. C. Busko Space Telescope Science Institute, Baltimore, MD 21218, E-mail: busko@stsci.edu Abstract. The fitting package in STSDAS (Space Telescop e Science Data Analysis System) relies on two basic techniques (one linear and one non-linear) for fitting functions to data. In this work, the statistical prop erties of b oth the fitted function coefficients and their errors are examined, using a Monte Carlo approach. Results show that b oth methods may generate biased coefficients (at the few p ercent level), and over or under-estimate error bars (at the 10 p ercent level), in particular from low signal-to-noise data.

1.

Introduction

The two basic techniques used in the STSDAS fitting package are: (i) linear functions (Legendre and Chebyshev p olynomials, cubic splines) are fitted by minimizing 2 , solving the normal equations by the Cholesky method. Function coefficient errors are computed directly from the covariance matrix. This technique is provided by the IRAF (Image Reduction and Analysis Facility) library curfit. (ii) any function, linear or non-linear in its coefficients, can b e fitted by minimizing 2 using the downhil l simplex method, also known as amoeba. This method is unable by design to compute coefficient errors. The package relies on an indep endent technique (bootstrap resampling) to estimate these errors. This work aims at assessing the reliability of coefficient estimates and error bars generated by b oth methods. 2. Method

Artificial data sets were generated from two functions: a 3rd degree p ower-series p olynomial and a sum of two Planck functions. Each function was replicated 30 times and in each replication an indep endent noise realization was added at a fixed signal-to-noise level. Noise typ es included pure Gaussian, pure Poisson and a mixture of Gaussian plus Poisson. A separate, 30-element data set was created for each signal-to-noise level studied. Signal-to-noise spanned the range from 1000 down to 1 in logarithmic steps. Each individual data set was fitted by the appropriate fitting task and results were output to tables. The p ower series p olynomial was fitted by b oth linear and non-linear methods. From the 30 measurements of each function coefficient ci , and its estimated error ei , i = 1, 2 . . . 30, three statistics were computed: · The average of coefficient error estimates e = 1/30 234
i ei

© Copyright 1997 Astronomical Society of the Pacific. All rights reserved.


Error and Bias in the STSDAS fitting Package · The standard deviation of coefficient measurements c = · The average "residual" coefficient (measured minus true) cbias = 1/30 i (ci - ctrue )
i

235 (ci - c)2 /29

If the coefficient computation is unbiased, cbias should distribute itself around zero. If any bias is present, the average of the cbias distribution will depart from zero. On the same grounds, if the coefficient error estimates are unbiased, that is, if they reflect the p opulation true standard deviation, the difference e - c should also distribute itself around zero. 3. Results

Results are summarized in the Figures. When applied to a p ower-series p olynomial with Gaussian noise, b oth linear and non-linear techniques generated unbiased coefficients at the level of < 1% for any tested S/N ratio. Poisson noise introduced underestimation bias at a level of a few p ercent for lower (2­5) S/N data. The largest bias was seen on p olynomial's zero-order term, induced p erhaps by the non-symmetric nature of the Poisson distribution. A different b ehavior was seen when fitting the strongly non-linear sum of two Planck functions. The dominant (in intensity) black-b ody had its temp erature determined with almost no bias down to S/N 2. The weaker comp onent, however, showed significant overestimation bias at low S/N. The dominant black-b ody's amplitude showed large (8­10%) underestimation bias, and the weaker amplitude did not give significant results. This can b e interpreted as the result of b oth amplitudes b eing confused into a single one by the fitting algorithm, thus resulting in a biased estimate for one of them. The noise model seems to play no role in these results. When fitted by the non-linear algorithm, the p ower-series p olynomial errors showed a systematic underestimation of 10­20 p ercent, seemingly indep endent of noise typ e. The linear algorithm, on the other hand, delivers errors which are off from the "true" ones by amounts that dep end on noise typ e, and might also dep end on the coefficient values themselves. The double black-b ody function fit showed errors that lie close to the true ones in the Gaussian noise case, and with systematic overestimation of 20 p ercent in the Poisson noise case. As a general rule one might say that bias at a few p ercent level should b e exp ected when fitting either non-linear functions, or linear functions with Poisson noise, in particular with low S/N data. Also, error bars generated by b oth linear and non-linear methods are prone to under or overestimate the "true" errors by as much as 10­20 p ercent, even with high S/N data. The details, though, seem to b e dep endent on the functional form and noise typ e.


236

Busko

Figure 1. Power-series p olynomial bias and errors. Ordinate is "estimated - true" coefficient residual (upp er panel) and error bar (lower panel), in a relative (p ercent) scale, for the three first lower-order p olynomial coefficients. Abscissa is signal-to-noise. Each p oint depicts the average of 30 measures; error bar depicts the standard deviation of the same 30 measures. Solid symb ols: non-linear algorithm. Op en symb ols: linear algorithm. Squares: Poisson noise. Triangles: Gaussian noise. Points b elow the zero line mean that the coefficient (or its error) is systematically underestimated; ab ove the line it is overestimated.


Error and Bias in the STSDAS fitting Package

237

Figure 2. Double Planck function bias and errors. The two temp eratures and the largest black-b ody amplitude are depicted. See caption for Figure 1.