Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mrao.cam.ac.uk/~bn204/lecture/2008/bn-l3.pdf
Дата изменения: Fri Aug 10 20:17:37 2012
Дата индексирования: Tue Oct 2 10:03:52 2012
Кодировка:

Поисковые слова: dra
Statistics for Astronomy III: Non-parametric statistics, Error Analysis and Confidence Inter vals
B. Nikolic b.nikolic@mrao.cam.ac.uk
Astrophysics Group, Cavendish Laboratory, University of Cambridge http://www.mrao.cam.ac.uk/~bn204/lecture/astrostats.html

October 2008


Goals for this Lecture
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Kolmogorov-Smirnov test


Compare observed cumulative distribution with either theoretical distribution or another observed distribution




Test observation against any distribution (i.e., other than the usual Normal, Poisson, etc). No assumption of normally distributed errors Can test observation against a control observation Can not be used if you need to fit parameters from the observations Compute the two cumulative functions C1 (x ) and C2 (x ) Find maximum difference between them D = max(C1 (x ) - C2 (x )) Value of D together with n1 and n2 give the significance level of the test -- see tables scipy.stats.ks 2samp does it all for you



How to apply:








ґ Cramer-von-Mises variation on the theme that calculates the integrated square difference between the CDFs


SDSS+FIRST example

me a n
60 0.6 50 DEC/px (1.8 arcsec/px) 40 30 20 10 0 0 10 20 30 40 50 60 RA/px (1.8 arcsec/px) 0 0 1 DEC/px (1.8 arcsec/px) 50 40 30 20 60

median
0.1 0.075

Jy/Beam

Jy/Beam

0.4

0.05

0.025

0.2

0 10 0 0 10 20 30 40 50 60 RA/px (1.8 arcsec/px) -0.025

0

1


SDSS+FIRST example

1


0.75 0.5 0.25

Left: comparison between observed distributions for the central pixel and off-centre pixel Obvious they are different; K-S: (1 - p ) 10-22

C


0 -2.5 0 2.5 5 F (mJy) 7.5 10 12.5


SDSS+FIRST example: side-lobes?

Can we detect the side-lobes that are seen in the median image? me a n median
0.1 60 0.6 50 DEC/px (1.8 arcsec/px) 40 30 20 10 0 0 10 20 30 40 50 60 RA/px (1.8 arcsec/px) 0 0 1 DEC/px (1.8 arcsec/px) 50 40 30 20 0 10 0 0 10 20 30 40 50 60 RA/px (1.8 arcsec/px) -0.025 0.075 60

Jy/Beam

Jy/Beam

0.4

0.05

0.025

0.2

0

1


SDSS+FIRST example: side-lobes?

Can we detect the side-lobes that are seen in the median image? Left: comparison between 1 observed distributions for 0.75 the a pixel off centre but on the ver tical of the 0.5 image and a completely 0.25 off-centre pixel
C 0 -0.5


0 0.5 F (mJy) 1 1.5

K-S: (1 - p ) 0.1 = reject with 90% confidence


SDSS+FIRST example: side-lobes? Check

OK, maybe all pixels are different? Compare two pixels both on the ver tical: Left: comparison between 1 observed distributions for 0.75 the a pixel off centre but 0.5 on the ver tical
C 0.25



0 -0.5

0

0.5 F (mJy)

1

1.5

2

K-S: (1 - p ) 0.7 Could only reject with 30% confidence, which is not at all significant


U-test (Mann-Whitney­Wilcoxon)
Non-parametric test that allows testing of direction






E.g: Does observation set ai have values than observation set bi Principle of operation: combine ai observation and compute its CDF. samples originally from ai come in

significantly higher and bi into a single set of Figure out where the this CDF compared to bi .



How to apply:




ai has m observations and bi has n Construct rank of ai and bi together, keeping track where samples from ai ended up in the total ranking: xj . Calculate: m(m + 1) - xj (1) U = mn + 2
j


U-test illustration
Red dots from sample off-centre pixel, blue dots from sample on-centre pixel (many fewer galaxies used so points don't overlap)
1

0.8

0.6 C 0.4 0.2 0 -0.5

-0.25

0 F (mJy)

0.25

0.5

0.75


U-test use



For n,m less than 20 use tables Otherwise U approximately follows: P (U ) N (µ, ) mn µ 2 mn(m + n + 1) 2 12 So can use the standard normal distribution CDF to compute confidence levels (2) (3) (4)


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Intro
Likelihood Probability of data given the model (parameter), e.g., P ({xi }| ) Example: if the experiment is described by a Poisson distribution with T = 10, what is the probability of measuring 11 photons? Posterior PDF Probability of model given the data, and any prior information, P ( |I {xi }) Example: if we measure 11 photons, and know that governing distribution is Poisson, what is the probability distribution of . Bayes theorem The connection between these two concepts: P ( |I {xi })P ({xi }) = P ({xi }|I )P ( ) (5)


Elementar y error analysis
Tr ying to determine model parameter given some observed set {xi }


Bayes theorem and priors (I ) give P ( |I {xi }) ­ this is what we want
But for now lets assume P ( |{xi }) P ({xi }| ) Given {xi } we can compute any statistics f1 ({xi }), f2 ({xi })..., hence P (f1 | ), etc.

Probability theory and the model give: P ({xi }| )





In this lecture examine P (fk | ) for some common probability distributions and statistics and think about what we know about P ( |fk )


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Normal Distribution and error on the mean



X normally distributed random variable with µ and {xi } are n samples from X 1 n
n

Easy to show (e.g., characteristic function) that m1 = xi
i =1

distributed as N µ, n
n

(6)

Error of m1 as an estimate of µ is ±


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


CCD signal to noise calculation I
Model description: s We are observing a source of strength s b There are also `background' photons arriving at rate b For an optical system this background is not stray light, but sky and or/zodiacal scattered light coming from the same direction as source r There is noise associated with readout of the CCD, which is independent of exposure time t exposure time


CCD signal to noise calculation II


For a photon counting device at shor t wavelengths photon noise follows Poisson statistics


At longer wavelengths photons are not independent and so Poisson statistics do not apply



Typically when imaging at >X-Ray wavelengths sufficiently many photons that normal distribution approximation applies: µ2 = (s + b )t leading to quoted error as


(7)

(s + b )t

Note that spectroscopy disperses the incoming light onto a large number of pixels ­ can fairly easily get to a few photons per pixel regime



Assume readout noise is approximately Normally Distributed too with µ2 = r 2


CCD signal to noise calculation III

The simple calculation for per-pixel signal to noise leads to: s = s st (s + b )t + r
2

(8)


CCD regimes I
background limited i.e., the limiting factor of performance is the intrinsic background: bt >> r 2 b is propor tional to solid angle the pixel subtends on the sky r independent of pixel size Optics usually arranged so that obser vations are background limited s st (9) s s+b bright object dominated by uncer tainty in source flux s st (10) s faint object dominated by uncer tainty due to background st s (11) s b


CCD regimes II

Some notes:




When dominated by readout S/N improves as t but in the usual regime as t For faint sources S/N goes as s but for only as s for bright sources For faint point sources the most profitable course is to improve the realised (i.e., after accounting for seeing) angular resolution to reduce the background




Error propagation: linearised regime


You've made estimates of X and Y and now want to compute derived quantity Z = f (X , Y ); what is it's error? Estimate error : Z f X
2 2 X + 2



f Y



2 Y

(12)



Assumptions


X and Y are normally distributed X and Y are independent of each other X and Y are small f is continuous in neighbourhood of µX , µY ...



Normally you are better off estimating Z directly


Example: Radio emission of late vs early type galaxies I


Want to find by how much a typical late-type galaxy is brighter in radio than an early-type Mean, standard deviation not good estimators Use medians: Flate = 0.11 mJy , Fearly = 0.065 mJy Conver t inter-quar tile range (0.2 mJy) to sigma on assumption of normal distribution (obviously wrong in this case!):








Easily show interquantile range form normal distribution is 1.35 в hence 0.15 Have about 700 images in each stack. Assume n behaviour so m = 0.006 mJy Same error on both measurements


Example: Radio emission of late vs early type galaxies II


Apply the linearisation formula to get: R=F R R
late

/F

early

= 1.70
2

(13)
2

m Flate

+

m Fearly

20%

(14)



Managed to `rescue' a normal-distribution like estimate from clearly a non-normally distributed problem




Note the huge number of assumptions, some clearly wrong Answering a specific question about quiescent early and late type galaxies; in general the above is not correct Also note impor tance of visualisation to get anywhere in this case


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Confidence inter vals I


Normally derived by integrating the posterior PDF to get to required probability:

1

p=

0

P ( |I {xi })d

(15) (16)

= (0 , 1 ) is the interval of with p confidence


Here assuming likelihood propor tional to posterior, so, for example:

1

p=

0

P ({xi }| )d

(17) (18)

= (0 , 1 ) is the interval of with p confidence


Confidence inter vals II


In astronomy typical confidence intervals are at p = 0.997 level, i.e., 3 assuming Normal distributions. It is this high to compensate (at least in par t) for :


Inevitable bias of the scientist toward `getting a result' Too optimistic noise estimates





If you've approximated ever y distribution as normal then confidence inter vals are entirely equivalent to quoting error as x ± Some non-normal analytical distributions for which confidence inter vals are significantly different:


2 , F -distribution, t -distribution Non-parametric statistics distributions



Beyond this usually in the realm of Monte-Carlo simulations


Outline
Non-parameteric statistics Kolmogorov-Smirnov test U-test (MannWhitney Wilcoxon) Likelihood versus posterior probability Basic error analysis Astronomical Example: CCD signal to noise Confidence inter vals Maximum Likelihood


Maximum likelihood



Above examples consisted of estimating means only, and then using those to estimate a derived function (Z = f (X , Y ))


This is neither efficient nor possible in general





Maximum Likelihood: estimate as the value which maximises the P ({xi }| )

Given some general model for both the noise and the effect we are measuring can compute the likelihood function P ({xi }| ) with some model parameters


ML Example: Fitting 2d Gaussian I


Have a two-dimensional image with n pixels values ai and respective positions of pixel centres (xi , yi ) S /N = 5 per pixel

S /N =


ML Example: Fitting 2d Gaussian II


Assume all pixels are independent Have model for the Gaussian G(xi , yi ; A, x0 , y0 , , x , y ) G(· · · ) = A exp - (x - x0 )2 (x - x0 ) (y - y0 ) (y - y0 )2 - - 2 2 x y 2 x 2 y (19)



model for noise i N (0, ), i.e., is the per-pixel error Model for observation: bi = G(xi , yi ; A, x0 , y0 , , x , y ) + ai - bi N (0, ) P (ai | · · · ) = 1 2
2

i

(20) (21)

exp -

(ai - bi ) 2 2

2

(22)


ML Example: Fitting 2d Gaussian III



Since observations are independent: L=
i

P (ai |A, x0 , y0 , , x , y , )

(23)



Re-griding, rotation, registration almost always introduce correlations between pixels = best to analyse the rawest data that can be practically processed


ML Example: Fitting 2d Gaussian IV


Maximising L is the same as maximising log L: log L =
i

log P (ai |A, x0 , y0 , , x , y , ) - (ai - bi )2 2 2

(24) (25)

=c+
i





This transformation is almost universally applied in practice: turns the product into sum and simplifies normally distributed likelihoods Since looking for the maximum of the function c is not impor tant and usually dropped


Relationship with 2 I




Imagine, for some reason, you expect the parameters {A, x0 , y0 , , x , y } to take par ticular values You can then compute value: X=
i

(ai - bi )2 2 ai - G(xi , yi ; A, x0 , y0 , , x , y ) 2
2

(26)

=
i

(27)

If:


Then:
We correctly guessed the parameters and we correctly guessed the noise


X ought to be distributed as a X too large? = reject model and/or parameters X too small? = reject noise mo d e l

2






Relationship with 2 II



The above does not apply if you fit for the parameters {A, x0 , y0 , , x , y }



2 is primarily useful (like much of classical statistics) for rejecting hypothesis, not for model comparison In this case X , i.e. 2 , has a similar form to log L, but this is not usually the case




In this case minimising 2 maximising likelihood Leads to occasional interchangeable use of the two phrases in the literature