Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mrao.cam.ac.uk/~bn204/lecture/2008/bn-l6.pdf
Дата изменения: Fri Aug 10 20:17:37 2012
Дата индексирования: Tue Oct 2 10:04:32 2012
Кодировка:

Statistics for Astronomy VI: Bayesian Examples
B. Nikolic b.nikolic@mrao.cam.ac.uk
Astrophysics Group, Cavendish Laboratory, University of Cambridge http://www.mrao.cam.ac.uk/~bn204/lecture/astrostats.html

10 November 2008

Goals for this Lecture

Bayesian estimate of mean from a normal distribution

Markov Chain Monte Carlo

Measuring pressure-broadened lines

What to do with posterior distribution

You've got P ( |{xi }), what do with it?

Plot it (only works well up to 2 parameters ...) Marginalise all combinations of parameters and then plot Find most likely , mean , etc... Estimate confidence intervals, errors Publish electronically

Outline

Bayesian estimate of mean from a normal distribution

Markov Chain Monte Carlo

Measuring pressure-broadened lines

Normal distribution

Again consider X N (µ, ): given some {xi } what can we say about µ (or, µ and if neither is known) ?

Stages of computation:
2. Think about what priors are appropriate P (µ, ) 3. Calculate evidence 4. Use Bayes theorem to calculate P (µ, |{xi }) d µ d P ({xi }|µ , )P (µ , ) 1. Calculate the likelihood, P ({xi }|µ, )

Normal distribution likelihood
We've considered this already. If {xi } are n observations independently drawn from N (µ, ) P ({xi }|µ, ) = = (x - µ)2 1 exp - i 2 2 2 1
n

(1)

i

(2 )n

exp -

i

(xi - µ)2 2 2

(2)

That is the answer but convenient to factor µ from the sum in the exponent: 2 n x2 - x2 1 exp - n (µ - x ) P ({xi }|µ, ) = exp - 2 2 2 2 n (2 )n (3)

Priors
In principle P (µ, ) but as these are independent, write P (µ, ) = P (µ)P ( ) 1. µ : we know nothing a-priori, but expect translational invariance: P (µ) constant
1

(4)

2. : must be positive

Demand scale invariance: = P ( ) (5)

Proper vs Improper priors
Both of the priors do not converge when integrated to infinity:

dµ
- 0

d

(6)

In analytical calculation this can be a problem! In non-analytical situations not a problem since usually bounded priors

Evidence

P ({xi }) = =

d µ d P (µ )P ( )P ({xi }|µ , ) d µ d

(7) (8)

C
n+1 2

This leads to a bit of algebra...

в exp -

n x2 - x 2
2

(2 )n

в

2 exp - n (µ - x ) 2 2

(9)

Result

P (µ|{xi }) =

Which is just the Student's t distribution.

1 n в 1 + n

(n/2) (n - 1)( n

1
-1 2

)

x -x µ-x

2

2

в 2
n/2

(10)

(11)

(x 2 - x 2 )/n

Conclusion
If you know observations are governed by Normal distribution, but do not know µ and a-priori , then your knowledge you µ after you have observed the data is described by the Student's t distribution.

Outline

Bayesian estimate of mean from a normal distribution

Markov Chain Monte Carlo

Measuring pressure-broadened lines

2d Source Problem recap
Recall we have:

Some measurements {ai } at positions {xi , yi } a model G(xi , yi ; ) where parameters: = {A, x0 , y0 , x , y , } (12)

What we want to calculate the posterior distribution of the parameters given the data P ( |{ai })

Noise model i N (0, )

Bayesian approach
Ingredients:

Likelihood, have calculated that already: log L( ) =
i

log [P (ai | , )] - [ai - G(xi , yi ; )]2 2 2

(13)

=c+
i

(14)

Priors: we don't have useful prior information but lots of data. Make all of the priors flat, i.e.,:

P (A) = c1 P (x0 ) = c2 ··· = P ( ) = c

What next?
P ( |{ai }) = P ({ai }| )P ( ) P ({ai }) (15)

Have: Likelihood: P ({ai }| ) Priors: P ( ) Left to do: Compute the evidence (i.e., the normalising constant): P ({ai }) =

d n P ({ai }| )P ( )

(16)

Locate region of maximum posterior, i.e., where is P ({ai }| )P ( ) large? Compute marginalised posteriors: P (j |{ai }) = = d
n -1

P ( |{ai }) d
n -1

(17) P ({ai }| )P ( )

1 d P ({ai }| )P ( )
n

Direct space search/integration

Divide the space spanned into regular grid of elements n Evaluate P ({ai }| )P ( ) in each element Approximate integrals using equivalent of Gaussian Quadrature

Problem: This approach is unfeasibly slow even for the simple problem with six parameters. For example: make 10% of range of , then need 106 evaluations of likelihood and will know to 10% accuracy at best.

Practical methods
Note we are tr ying to evaluate integrals of form: IF = = Key concept:

d n F ( )P ({ai }| )P ( ) d n F ( ) ( ; {ai })

(19) (20)

Do not generate a regular grid Instead, generate points with density propor tional to X ( ; {ai })d n d n F ( ) ( ; {ai }) 1 r F (xj )
j

(21)

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search (log):
10 -2100 -2200 5 -2300 log-likelihood -2400 -2500 -2600 -5 -2700 -2800 1 1.5 2 Amplitude 2.5 3 0 1

Posterior sampling:

x-position

0

-10

100 pixels in each direction so 104 evaluations of likelihood

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling:

0.8

0.6 x-position 0 likelihood 0.4 0.2 0 1 1.5 2 Amplitude 2.5 3 0 1

-5

-10

100 pixels in each direction so 104 evaluations of likelihood

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 10 samples
40

0.8

0.6 x-position 0 likelihood

0.4

35 x-position
0 1

-5

0.2

30

-10

0 1 1.5 2 Amplitude 2.5 3

100 pixels in each direction so 104 evaluations of likelihood

25

1

1.5

2 Amplitude

2.5

3

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 30 samples
40

0.8

0.6 x-position 0 likelihood

0.4

35 x-position
0 1

-5

0.2

30

-10

0 1 1.5 2 Amplitude 2.5 3

100 pixels in each direction so 104 evaluations of likelihood

25

1

1.5

2 Amplitude

2.5

3

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 100 samples
40

0.8

0.6 x-position 0 likelihood

0.4

35 x-position
0 1

-5

0.2

30

-10

0 1 1.5 2 Amplitude 2.5 3

100 pixels in each direction so 104 evaluations of likelihood

25

1

1.5

2 Amplitude

2.5

3

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 1000 samples
40

0.8

0.6 x-position 0 likelihood

0.4

35 x-position
0 1

-5

0.2

30

-10

0 1 1.5 2 Amplitude 2.5 3

100 pixels in each direction so 104 evaluations of likelihood

25

1

1.5

2 Amplitude

2.5

3

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 1000 samples
40 100 80 35 x-position 60 30

0.8

0.6 x-position 0 likelihood 0.4

-5

40

0.2

25
-10 0 1 1.5 2 Amplitude 2.5 3 0 1

20

0 1 1.5 2 Amplitude 2.5 3 0 1

100 pixels in each direction so 104 evaluations of likelihood

Illustration
I've reduced the 2-d Gaussian to two parameters for this illustration

Direct search :
10 5

1

Posterior sampling: 10000 samples
40 800

0.8

0.6 x-position likelihood

35 x-position

600

0

0.4

30

400

-5

0.2

200 25
-10 0 1 1.5 2 Amplitude 2.5 3 0 1

0 1 1.5 2 Amplitude 2.5 3 0 1

100 pixels in each direction so 104 evaluations of likelihood

Markov Chain Monte Carlo
Need to have a way of generating {xi } which tend to (are `ergodic' to ).

Markov Chain Monte Carlo

Based on current point xi , generate the next point in the chain xj After some number steps, the chain will lose information about it's star ting point (this is the `burn-in' stage) Subsequent points (without regard to their order) tend to the target distribution Various algorithms exist: Metropolis-Hastings, Gibbs, etc.

Allows computation of: IF = d n F ( )P ({ai }| )P ( ) d n P ({ai }| ){ai }) (22)

Doesn't directly allow computation of evidence.

Marginalised posteriors
0.08

P (A|{ai })

0.06 0.05

P (x0 |{ai })

0.06
0.04

0.04

0.03 0.02

f

0.02
0.01

0 0.8 0.9 1 Amplitude 1.1 1.2

f

0 30.5

31

31.5

32 Position

32.5

33

33.5

0.08

P (x |{ai })

0.06 0.05

P ( |{ai })

0.06
0.04

0.04

0.03 0.02

f

0.02
0.01

0 40 50 60 Width 70 80 90

f

0 0.3 0.4 0.5 Rotation 0.6 0.7

Contrast with Monte-Carlo simulations
ML Monte Carlo
800 600

Bayesian posterior
0.08 0.06

200

f

400

N

0.04

0.02

0 0.8 0.9 1 Amplitude 1.1 1.2 1.3

0 0.8 0.9 1 Amplitude 1.1 1.2

Analyse multiple simulated data sets Generate single parameter estimate for each Investigate the population of those parameter estimates

Can analyse the single observed data set Consider the population of parameters that fit this observed data set

Correlations
33

0.08
32.5

100

33 100

80

150 125

0.65 0.6 0.55 squint

100

0.6

1
80

0.06
x-position 32

80

32 y-position

75

0.4
60

60 31.5

0.04

31

50

75 60 50 25 50

0.5 40 0.45 0.4 0.35 20

f

rotation 0.2 0

40

0.02
31 20

30

25

0 1.5 1.75 2 Rotation 2.25 2.5
1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3

width

70

100

0 0 1

29 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3

0 0 1

0 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 0 1

0 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 0 1

1.7

1.8

1.9

2 Amplitude

2.1

2.2

2.3

33 100 32.5 80 x-position 32 60

0.06 0.05

33

100
80

100 0.65 0.6 0.55 squint 80

80 80

0.6

32

60

0.4 rotation 0.2 0
0 31 31.5 32 x-position 32.5 33 0 1 60

width

0.04 0.03 0.02 f

y-position

70

60

31

40

0.5 40 0.45

31.5

40

40 60
30 20

31

20

0.01
50

20

0.4 0.35

20

0 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 0 1

0 30.5

29

0 31 31.5 32 x-position 32.5 33 0 1

0 31 31.5 32 x-position 32.5 33 0 1

31

31.5

32 Rotation

32.5

33

33.5

31

31.5

32 x-position

32.5

33

100

33 100

33

80

0.06 0.05

80

100

0.65 80 0.6

0.6

1

32 y-position

75 y-position

32

60

75 width

31

50

31

40

0.03 0.02
60

50

0.5 40 0.45

f

rotation 0.2 0 29

0.04

70
squint

0.55

60

0.4

30

25

30

20

25

0.01
50
29 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 0 0 1 29 31 31.5 32 x-position 32.5 33 0 0 1

0.4 0.35

20

0 28 29 30 31 Rotation 32 33 34
29 30 31 y-position 32 33

0 0 1

0 29 30 31 y-position 32 33 0 1

30

31 y-position

32

33

80

150 125 100 width 75

100 80 80 75 80 100

120

1 0.6

0.08

0.65 100 0.6

1

0.06
0.55 squint

80

width

width

60

50 40 60 20 50 50 0 0 1 31 31.5 32 x-position 32.5 33 0 1 29 30 31 y-position 32 33 0 0 1 60 25

0.04

0.5 0.45

60

f

rotation 0.2 0

70

70

70

0.4

60

50 25

40

0.02
0.4 20

50 0 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3

0 40 50 60 Rotation 70 80 90

0.35 50 60 70 width 80

0 0 1

50

60

70 width

80

100 100 0.65 0.6 0.55 squint squint 60 0.5 40 0.45 0.4 0.35 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 20 0.45 0.4 0.35 0 1 31 31.5 32 x-position 32.5 33 20 100 0.65 0.6 0.55 0.5 40 0.45 0.4 0.35 0 1 29 30 31 y-position 32 33 20 0.4 0.35 0 1 50 60 70 width 80 20 80 0.65 80 0.6 0.55 squint 0.5 40 0.45 40 60 squint 0.6 80 0.55 0.5 60 0.65 100 120

0.06 0.05

0.6

1

80

0.4 rotation 0.2 0

60

0.04 0.03 0.02 0.01 0
0 1

0

0

0

0

f

0.3

0.4

0.5 Rotation

0.6

0.7

0.4

0.5 squint

0.6

120 0.6 100 0.6 80 80 0.4 rotation 75 rotation 0.4 60 rotation 0.4 60 0.2 rotation 0.4 80 rotation 0.4 60 0.6 100 0.6 100 80 0.6 100

0.08

0.06

60 0.2 40

0.2

50

0.2

40

0.04

40

0.2

40

f
20 0 0

25 0 0 1.7 1.8 1.9 2 Amplitude 2.1 2.2 2.3 0 1 31 31.5 32 x-position 32.5 33 0

20 0 0 0 1 29 30 31 y-position 32 33

0.02
20 0 0 0 1 50 60 70 width 80 20

0 0 1 0.4 0.5 squint 0.6

0

1

0 -0.2

0

0.2 Rotation

0.4

0.6

Solving the bias of ML solutions
Monte-Carlo simulation of the Bayesian analysis. Write a as one observation set {ai }. Now generate N such sets: a j . Compute the posterior distribution for each P ( |a j ) Plot the combined posterior distribution from all experiments:
4000

3000

2000

N

1000

0 1.4 1.6 1.8 2 Amplitude 2.2 2.4 2.6

Outline

Bayesian estimate of mean from a normal distribution

Markov Chain Monte Carlo

Measuring pressure-broadened lines

Water Vapour cm/mm/sub-mm lines
1 mm water vapour, P 600 mB, T 270 K
300 250 200 Tb (K) 150 100 50 0 200 400 (GHz) 600 800 1000

The 183 GHz Water Vapour Line
Blue rectangles are the WVR prototype filters
250

200

150 Tb (K) 100 50 0 175 177.5 180 182.5 (GHz) 185 187.5 190

Statistics from four data

Have four data: a measurement from each of the filters Have three fundamental unknowns:

Amount of water Pressure Temperature Thickness of water layer Temperature change across water layer Temperature, pressure at ground Historical distributions from radiosonde (priors!)

Fur ther possible parameters:

Fur ther possible data

Pressure broadening effect
Brightness temperature of a pressure broadened line with pressure from 400mB to 750mB in steps of 50mB.
250

200

150 Tb (K) 100 50 0 175 177.5 180 182.5 (GHz) 185 187.5 190

Temperature effect
Temperature from 230 K to 300 K in 10 K steps.
250

200

150 Tb (K) 100 50 0 175 177.5 180 182.5 (GHz) 185 187.5 190

PWV column effect
Column from 0.6 to 1.3 mm in 0.1 mm steps:
250

200

150 Tb (K) 100 50 0 175 177.5 180 182.5 (GHz) 185 187.5 190

What we want from the data

What do we know about: 1. Total amount of water 2. Index-of-refraction change due to the water: n-1 P Pw + T T
w 2

(23)