Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://www.mrao.cam.ac.uk/~bn204/lecture/2008/bn-l1.pdf
Äàòà èçìåíåíèÿ: Fri Aug 10 20:17:36 2012
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 10:03:25 2012
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï ï ð ï
Statistics for Astronomy I: Introduction and Probability Theor y
B. Nikolic
Astrophysics Group, Cavendish Laboratory, University of Cambridge

20 October 2008


`Astronomers cannot avoid statistics, and there are several reasons for this unfor tunate situation.' [Wall(1979)]


`Astronomers cannot avoid statistics, and there are several reasons for this unfor tunate situation.' [Wall(1979)]


You should aim to make best use of available data: thorough understanding of statistics is the key to that Statistical inference is relevant both to interpreting observation and simulation Simply applying formulae often not sufficient:






Need to understand the theory, limitations, etc Computer based processing essential


Course goals



Review of essential statistics


Some topics that everybody should know I expect there is a range of backgrounds here so for some this may be all very familiar Some classic applications A survey rather than a thorough tutorial



Basic applications of statistics in astronomy




Introduction to advanced statistics



Goals for this Lecture
Course introduction Probability Theory Moments Characteristic functions The central limit theorem Well-known probability distributions Normal Distribution Binomial distribution Poisson distribution 2 distribution


Reference materials
J. V. Wall and C. R. Jenkins, Practical Statistics for Astronomers (CUP) D. J. C. MacKay Information Theory, Inference, and Learning Algorithms (CUP) Mike Irwin lectures http://www.ast.cam.ac.uk/~mike/lect.html Penn State University Center for Astrostatistics http://astrostatistics.psu.edu/ My lectures and suppor ting materials http://www.mrao.cam.ac.uk/~bn204/lecture/ J. V. Wall's papers: 1979QJRAS..20..138W and 1996QJRAS..37..519W http://adsabs.harvard.edu/abstract_service.html


Probability

Dual use of `probability':


Quantify with what frequency a `random-variable' is expected to take its possible values A measure of the degree of belief that a hypothesis is true




Random variables




Random variable: outcome of an experiment that we can not determine in advance The cause of apparent randomness is often simply that we do not know the initial conditions of the experiment






e.g., flipping a coin or a roll of the roulette wheel are both easily predictable given fairly rudimentary measurements of initial conditions when coin/ball are launched. [Yes, this has been exploited in practice] output of a computer random number generator is exactly predictable if you know the internal state of the generator (it usually is from 8 to 128 bits long) and almost completely unpredictable if you do not `Randomness' of an experiment is therefore subjective and a function of prior knowledge about the experiment



Random variables can be continuous or discrete


PDF
X is a continuous random variable

Probability Density Function (PDF)
P (x )dx is the probability of X being in range x to x + dx


P (x ) is non-negative: P (x ) 0 x (1)



Area under P (x ) is unity: P (x )dx = 1 (2)


CDF
Cumulative Density Function (CDF)
C (x ) is the probability that X is less than x :
x

C (x ) =
-

P (x )dx



(3)



Cumulative functions are easier to estimate from observations:




They can be visualised more faithfully (i.e., with fewer assumptions) They form a basis of a number of impor tant statistical tests



Mathematically CDFs are a slightly more general way of describing probabilities than PDFs


Discrete random variables



Most results retain equivalent forms, with integrals turned to sums etc Probability density function usually renamed `Probability Mass Function' (PMF) The possible values of the random variables need not have a defined ordering






e.g., X = H or X = T for heads or tails outcomes no ordering = no cumulative distribution


Moments of probability distributions
µn (r ), n-th moment around value r : µn (r ) = (x - r )n P (x )dx (4)

Mean, µ, is the first moment around r = 0: µ= xP (x )dx (5)

n-th central moment is the n-th moment around the mean: µn (r = µ) = (x - µ)n P (x )dx (6)

Note that moments do not necessarily exist even for some common theoretical distributions.


Moments II

µ2 second central moment = variance =

2

µ3 / 3 Skew: a measure of asymmetry of the distribution µ4 / 4 Kur tosis: a measure of peakiness / fatness Conversion between central moments (µn ) and moments about origin (µ ) : n n n µn = (-1)n-j µ µn-j (7) j j
j =0

E.g.:
µ2 = µ2 - µ 2

(8)


Moments III: why all the fuss?





As we will see shor tly, moments are a way of expanding a probability distribution into coefficients quite similar to say Taylor expansion of a function Can quantitatively compare to normal distribution, e.g.:


Expect Skew = 0 for normal distribution Expect Kur tosis =3 for normal distribution



Central-limit theorem


Characteristic function
Characteristic function of a probability distribution: (t ) = exp(itx )P (x )dx (9)



Note the sign in the exponent means that the (t ) is the inverse Fourier transform of the probability density function (0) 1 Moments of the PDF are closely related to the Taylor expansion of the Characteristic function:
µn = i -n



d n (t ) dt n

(10)
t =0

One reason for fussing about the moments!


Characteristic function II

Expand the characteristic function explicitly in terms of moments: (t ) = 1 + it µ - t3 t4 t2 µ2 - i µ + µ + · · · 2 3! 3 4! 4 (11)


The central limit


X (t ) is the characteristic of PX (x ) if Y = X1 + X2 , what is Y , the characteristic function of Y ?


Y (t ) = X1 (t ) â X2 (t ) Think of this in terms of the convolution theorem in Fourier analysis
k



If Y =


ak Xk ?
k

Y (t ) =

Xk (ak t )

Sums of independent random variables
The analysis above shows that for sums of lots of independent random variables, the characteristic function must: Y (t ) 0 when |t | >> 1 (12)

almost regardless of distributions of component variables


The central limit II
Look for function which: (0) = 1 (t ) 0 quickly when |t | >> 1 What about: (t ) = exp - t 2 2
2

(13) (14)

(15)


The central limit II
Look for function which: (0) = 1 (t ) 0 quickly when |t | >> 1 What about: (t ) = exp - t 2 2
2

(13) (14)

(15)

The PDF that corresponds to this characteristic function: N (x ; ) = 1 2
2

exp -

x2 2 2

(16)


The Normal Distribution



Results naturally where a large number of independent variables are additively combined (x - µ) 1 exp - N (x ; µ, ) = 2 2 2 2 N (t ; µ, ) = exp it µ - t 2 2
2 2

(17) (18)


The Normal Distribution Plot
0.9
N ( x ; µ = 0 , = 1 ) 0 .3 9 9
x

N (x ; µ = 0, = 1)d x



0.7

0.5

0.3

0.1 -3 -2 -1 0 x 1 2 3


Ubiquity of the Normal Distribution


Voltage fluctuations across a resistor at finite temperature: 2 4kB TR Limiting form of a number of other distributions Analytically tractable Experiments involving human intervention almost never normally distributed: `possibility of outliers' Non-linear processing algorithms:




But it is clearly not always applicable:




Object detection / de-blending De-convolution



Electronics: 1/f noise and drift Sometimes `pre-process ' observation to get the errors closer to normally distributed (inevitably this leads to loss of information)


Binomial distribution

If p is the probability of `success' in one trial, binomial distribution gives the probability of j successes in n independent trials: P (j ) = nj p (1 - p )n j
-j

(19)

(Easily derived through combinatorial arguments)


Poisson distribution
Poisson distribution
Probability that n events will occur in time interval of length T given that the underlying rate is per unit time: P (n; T , ) = (T )n exp(-T ) n! (20)



Derived by generalising the binomial and then multinomial distributions A discrete distribution: n is a discrete random variable Moments: µ = T µ 2 = T mean variance (21) (22)




Poisson Distribution Plots
0.9
P oisson =2 (x) 0 .2 7 0 6 7 1
x

Poisson

=2

(x )d x



0.9
P oisson =5 (x) 0 .1 7 5 4 6 7

x

Poisson

=5

(x )d x



0.7

0.7

0.5

0.5

0.3

0.3

0.1 -2.5 0 2.5 x
x

0.1 5 7.5 10 -2.5 0 2.5 x 5 7.5 10

0.9
P oisson =10 (x) 0 .1 2 5 1 1

Poisson

=10

(x )d x



0.9
P oisson =100 (x) 0 .0 3 9 8 6 1

x

Poisson

=100

(x )d x



0.7

0.7

0.5

0.5

0.3

0.3

0.1 -5 0 5 x 10 15 20
-50

0.1 0 50 x 100 150 200


Poisson Normal distribution



As seen above, Poisson distribution quickly approaches the normal distribution as T >> 1. The parameters of the limiting normal distribution: µ = T = T (23) (24)




The 2 distribution
n

Y=
i =1

X

2 i

X N (µ = 0, = 1)

(25) (26)

= Y P2 (y ; n) = y

2 n (n/2)-1

2

n/2

exp (-y /2) (n/2)

(27)



Key use of the 2 distribution is in model testing If fi is model for i -th random variable (=observable):
n

Y=
i =1

Xi - fi i

2

(28)


2 plots
0.9
2 1 (x ) 3 .9 6 9 5 3 x 2 1 (x )d x

0.9
2 2 (x ) 0 .4 9 7 5 0 6

x

2 2 (x )d x



0.7

0.7

0.5

0.5

0.3

0.3

0.1 -6 -4 -2 0 x 0.9
2 3 (x ) 0 .2 4 1 9 7 1 x

0.1 2 4 6 -6 -4 -2 0 x 0.9
2 5 (x ) 0 .1 5 4 1 8 x

2

4

6

2 3 (x )d x



2 5 (x )d x



0.7

0.7

0.5

0.5

0.3

0.3

0.1 -6 -4 -2 0 x 2 4 6 -6 -4 -2

0.1 0 x 2 4 6


2 Normal distribution
0.9
2 1 (x ) 225626 x 2 1 (x )d x

0.9
2 5 (x ) 0 .1 5 4 1 8

x

2 5 (x )d x



0.7

0.7

0.5

0.5

0.3

0.3

0.1 -20 -10 0 x 0.9
x 2 10 (x )d x

0.1 10 20 -20 -10 0 x 0.9
x 2 30 (x )d x

10

20

2 10 (x) 0 .0 9 7 6 8 3 4

2 30 (x) 0 .0 5 2 9 9 4 6

0.7

0.7

0.5

0.5

0.3

0.3

0.1 -20 -10 0 x 10 20 -60 -40 -20

0.1 0 x 20 40 60


2 Normal distribution II

For 2 : n


µ1 = n µ2 = 2 n Kur tosis 3 + 12/k : converges to Normal


F -distribution

X1

2 j

=


X1 /j F X2 /k

X2
j ,k

2 k

(29) (30)

Key use of the F -distribution is in testing of variances


Bibliography

Wall J. V., 1979, QJRAS, 20, 138