Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass96/reprints/theilerj.pdf
Дата изменения: Thu Jan 15 01:07:27 1998
Дата индексирования: Tue Oct 2 14:59:19 2012
Кодировка:
Поисковые слова: ultraviolet

Astronomical Data Analysis Software and Systems VI ASP Conference Series, Vol. 125, 1997 Gareth Hunt and H. E. Payne, eds.

Heuristic Estimates of Weighted Binomial Statistics for Use in Detecting Rare Point Source Transients
James Theiler and Jeff Bloch Astrophysics and Radiation Measurements Group, MS-D436 Los Alamos National Laboraotry, Los Alamos, NM 87545 e-mail: jt@lanl.gov, jbloch@lanl.gov Abstract. The ALEXIS1 (Array of Low Energy X-ray Imaging Sensors) (Priedhorsky et al. 1989) satellite scans nearly half the sky every fifty seconds, and downlinks time-tagged photon data twice a day. The standard science quicklook processing produces over a dozen sky maps at each downlink, and these maps are automatically searched for p otential transient p oint sources. We are interested only in highly significant point source detections, and, based on earlier Monte-Carlo studies (RousselDuprґ et al. 1996), only consider p < 10-7 , which is ab out 5.2 "sigmas." e Our algorithms are therefore required to op erate on the far tail of the distribution, where many traditional approximations break down. Although an exact solution is available for the case of unweighted counts (Lampton 1994), the problem is more difficult in the case of weighted counts. We have found that a heuristic modification of a formula derived by Li & Ma (1983) provides reasonably accurate estimates of p-values for p oint source detections even for very low p-value detections.

1.

Introduction

We test the null hyp othesis of no p oint source (assuming a spatially uniform background) at a given location by enclosing that location with a source kernel (whose area Asrc is generally matched to the p oint-spread-function of the telescop e) and then enclosing the source kernel with a relatively large background annulus (area Abak ). Given Nsrc photons in the source kernel, and Nbak photons in the background annulus, the problem is to determine whether the numb er of source photons is significantly larger than exp ected under the null. More sensitive p oint source detection is obtained by weighting the photons to match the p oint-spread function of the telescop e more precisely. Further enhancements are obtained for ALEXIS data by weighting also according to instantaneous scalar background rate, pulse height, and p osition on the detector. In this case, we ask whether the weighted sum of photons in the source region is significantly larger than exp ected under the null.

1

http://nis-www.lanl.gov/nis-pro jects/alexis/

151

© Copyright 1997 Astronomical Society of the Pacific. All rights reserved.

152 2.

Theiler and Bloch Unweighted Counts

If counts are unweighted (i.e., all weights are equal), then it is p ossible to write down an exact, explicit expression for the probability of seeing Nsrc or more photons in the source kernel, assuming Ntotal = Nsrc + Nbak is fixed. This is a binomial distribution, and Lampton (1994) showed that the p-value associated with this observation can b e expressed in terms of the incomplete b eta function: p = If (Nsrc ,Nbak + 1), where f = Asrc /(Asrc + Abak ). See also Alexandreas et al. (1994), for an alternative derivation of an equivalent expression (the assumption that Ntotal is fixed is replaced by a Bayesian argument). If the count rate is high (or the exp osure long), so that Nsrc and Nbak are large, then an appropriate Gaussian approximation can b e used. In general, this involves finding a "signal" and dividing it by the square root of its variance. Case 1u. The most straightforward approach uses the signal Nsrc - Nbak , where = Asrc /Abak . Under the null hyp othesis, this signal has an exp ected value of zero, and a variance--if Nsrc and Nbak are treated as indep endent Poisson sources--of Nsrc + 2 Nbak . To get a p-value, use p=S N - Nbak src Nsrc + 2 Nbak , (1)

where S (s) = 1 (1 - erfc(s/ 2)) converts "sigmas" of significance into a one2 tailed p-value. Case 2u. An alternative approach, suggested by Li & Ma (1983), treats the sum Ntotal = Nsrc + Nbak , as fixed, so that Nsrc and Nbak are binomially distributed. In particular, choose the signal Nsrc - fNtotal , and note that the variance of Nsrc is given by f (1 - f )Ntotal , while the variance of Ntotal is by definition zero. In that case p=S Nsrc - fN
total total

f (1 - f )N

=S

N - Nbak src Nsrc + Nbak

.

(2)

Case 3u. By looking at a ratio of Poisson likelihoods, Li & Ma (1983) also derived a more complicated equation p= S 2N
src

ln(N

src

^ /N

src

)+ N

bak

ln(N

bak

^ /N

bak

)

,

(3)

^ ^ where Nsrc = fNtotal and Nbak = (1 - f )Ntotal . This is considerably more accurate than Eqs. (1,2) when Nsrc and Nbak are not large, but is still just an approximation to Lampton's exact formula. Abramowitz & Stegun (1972) provide several approximations to the incomplete b eta function, one of which (25.5.19) is an asymptotic series whose first term looks very much like the Li & Ma formula. The left panel of Figure 1 compares these cases, along with the Lampton (1994) formula, using a Monte-Carlo simulation. 3. Weighted Counts

2 Define Wsrc = isrc wi and Qsrc = isrc wi , where wi is the weight of the i-th photon. Notice that when all the weights are equal to one, we have Qsrc =

Heuristic Weighted Binomial Statistics
100 100

153

Unweighted
10 OverOccurrence
Case 2u

Weighted
10
Case 2w

Case 3u

Case 3w

1
Lampton

1

0.1
Case 1u

0.1
Case 1w

0.01 0 1 2 3 4 5 Significance 6 7

0.01 0 1 2 3 4 -log10p 5 6 7

Figure 1. Results of Monte-Carlo exp eriments with N = 100 photons, with f = 0.1, and with T = 107 trials. For the weighted exp eriment, N weights were uniformly chosen from zero to one, and assigned to the N photons. The photons were randomly assigned to the source kernel or background annulus with probabilities f and 1 - f resp ectively. Values of Wsrc , Wbak , Qsrc , and Qbak were computed, and a p-value was computed using the formulas for the three cases. As the p-values were computed, a cumulative histogram H (p) was built indicating the numb er of times a p-value less than p was observed. Since we exp ect H (p) = pT , we plotted H (p)/pT as the frequency of "overoccurrence" of that p-value. The plot is this overoccurrence as a function of "significance," defined by - log10 p. Wsrc = Nsrc and Qbak = Wbak = Nbak . Note also that Wsrc /Nsrc = wi isrc , 2 and that Qsrc /Wsrc = wi / wi . We do not make any assumptons ab out weights averaging or summing to unity. (We define Wbak and Qbak similarly.) Generalizing Case 1u, we define the signal as Wsrc - Wbak and then treating source and background as indep endent, we can write the variance as Qsrc + 2 Qbak . We can similarly generalize Case 2u and obtain: Case 1w: Case 2w: p=S p=S W
src

- W + 2

bak bak

Q

src

Q

. .

(4) (5)

W - Wbak src Qsrc + Qbak

Case 3w: It is not as straightforward to generalize Eq. (3), but we have tried the following heuristic: p=S 2Wtotal Qtotal W
src

ln(W

src

^ /W

src

)+ W

bak

ln(W

bak

^ /W

bak

)

,

(6)

^ ^ where Wsrc = fWtotal and Wbak = (1 - f )Wtotal . The Monte-Carlo results shown in Figure 1 indicate that this heuristic provides reasonably accurate pvalues even for very small values of p.

154 4.

Theiler and Bloch Limit of Precisely Known Background

An interesting limit occurs as the background annulus b ecomes large. Here, ^ ^ Abak , and the exp ected backgrounds Nsrc , Wsrc , etc. are all precisely known. For the unweighted counts, the exact p-value can b e expressed in terms ^ of the incomplete gamma function: p = 1 - (Nsrc , Nsrc )/(Nsrc ). The Gaussian 2 estimate of significance is straightforward b oth for the unweighted case, p = S
Nsrc -Nsrc ^ ^ Nsrc

, and for the weighted case: p = S

Wsrc -Wsrc ^ ^ Qsrc

. In this limit,

Eq. (6) b ecomes p=S ^ 2W ^ /Q W ln(W ^ /W ) - (W ^ -W ) . (7)

src

src

src

src

src

src

src

Marshall (1994) has suggested an empirical formula p = S

Wsrc -Wsrc + ^ ^ Qsrc +

,

^ ^ where = 0.7Qsrc /Wsrc , which produced reasonable results in his simulations, but does not app ear well suited for p-values at the far tail of the distribution. Acknowledgments. This work was supp orted by the United States Department of Energy. References Abramowitz, M., & Stegun, I. A. 1972, Handb ook of Mathematical Functions (Dover, New York), 945 Alexandreas, D. E., et al. 1993, Nucl. Instr. Meth. Phys. Res. A328, 570 Babu, G. J., & Feigelson, E. D. 1996, Astrostatistics (Chapman & Hall, London), 113 Lampton, M. 1994, ApJ, 436, 784 Li, T.-P., & Ma, Y.-Q. 1983, ApJ, 272, 317 Marshall, H. L. 1994, in ASP Conf. Ser., Vol. 61, Astronomical Data Analysis Software and Systems I I I, ed. D. R. Crabtree, R. J. Hanisch & J. Barnes (San Francisco: ASP), 403 Priedhorsky, W. C., Bloch, J. J., Cordova, F., Smith, B. W., Ulibarri, M., Chavez, J., Evans, E., Seigmund, O., H. W., Marshall, H., & Vallerga, J. 1989, in Berkeley Colloquium on Extreme Ultraviolet Astronomy, Berkeley, CA, vol 2873, 464 e, Roussel-Duprґ D., Bloch, J. J., Theiler, J., Pfafman, T., & Beauchesne, B. 1996, in ASP Conf. Ser., Vol. 101, Astronomical Data Analysis Software and Systems V, ed. G. H. Jacoby & J. Barnes (San Francisco: ASP), 112

2

Babu & Feigelson (1996) incorrectly suggest p = S

^ (Nsrc - Nsrc )/

^ Nsrc + Nsrc

.