Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mrao.cam.ac.uk/~bn204/lecture/2008/bn-l4.pdf
Дата изменения: Fri Aug 10 20:17:37 2012
Дата индексирования: Tue Oct 2 10:04:06 2012
Кодировка:

Поисковые слова: обвмадеойс нефептощи рпфплпч
Statistics for Astronomy IV: Maximum Likelihood
B. Nikolic b.nikolic@mrao.cam.ac.uk
Astrophysics Group, Cavendish Laboratory, University of Cambridge http://www.mrao.cam.ac.uk/~bn204/lecture/astrostats.html

3 November 2008


Goals for this Lecture
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Maximum Likelihood Estimation of 2d Gaussian
S /N = 5 per pixel


Noisy image of a 2d Gaussian source Noise is normally distributed and uncorrelated between pixels Infer the best guesses for parameters of the model assume best guess "Maximum Likelihood" for now







(x - x0 )2 (x - x0 ) (y - y0 ) (y - y0 )2 - - G(· · · ) = A exp - 2 2 x y 2 x 2 y (1)


What are we tr ying to find out?



We are tr ying to estimate the parameters of the distribution ( model): = {A, x0 , y0 , x , y , } (2) from the observed pixel data {ai } and a model for noise Do this by maximising the likelihood: P ({ai }| ) Also estimate errors on the parameter estimates Compute the posterior distribution: P ( |{ai }) Test if some par ticular values {A, x0 , y0 , x , y , } are a good fit for the data



Not (yet) tr ying to:



Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Alternatives to ML I

There are alternatives to Maximum-Likelihood analysis


Recall ai is observed signal in pixel xi , yi Could estimate position x0 , y0 of centre of Gaussian:


Find maxi [ai ] and estimate x0 , y0 xi , yi Flux-weighted mean of x and y positions of pixels: x0 y0 1
i

ai ai

xi ai
i

(3) (4)

1
i i

yi ai


Alternatives to ML II
x -projection
30 40 30 20 10 i bi i b
i

y -projection

20

10 0

0

-10

-10 -20 0 50 100 150 xi 200 250 300 0 50 100 150 yi 200 250 300

-20



x0 , y0 from the peaks x , y from widths Can not constrain the rotation parameter


Alternatives to ML III

Advantages:


Disadvantages:


Non-iterative Faster Less storage Less programming ( fewer mistakes)

Never optimal unless identical to a ML statistic Poor or no error estimates Information loss Ad-hoc ­ no fixed rule to come up with such statistics




Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


How to find the Maximum Likelihood
Goal is to find that maximises L( ) = P (ai | ) or equivalently the log-likelihood log L( ).

Local


Global


[Steepest Descent] (f ) Conjugate-gradient (f ) Broyden-FletcherGoldfarb-Shanno ( 2 f ) Lavenburg-Marquant ( 2 f )

Direct search/plain monte-carlo Downhill-simplex Simulated annealing Markov-Chain Monte-Carlo (MCMC)





First derivatives


You can supply the function which computes the derivatives of L at arbitrar y point (faster, more accurate) Or you can ask the algorithm to estimate the differentials itself using finite differences (easier to write)




The Levenberg-Marquardt Method
The Levenberg-Marquardt is essentially the first-choice algorithm for problems that involve:


Minimisation of squares of residuals And, do not have local minima

It is a combination of the steepest descent and quasi-Newton methods, transitioning to the latter when close to the extremal point.

Public implementations
Among others:


MINPACK (F77 but ver y tried and tested) C++ wraps of the above (e.g., bnmin1 on my web-pages) http://www.ics.forth.gr/~lourakis/levmar/


Advice on maximisation


Visualisation. If you are working on real observed data, it is essential to visualise the residuals and the path the maximisation routine takes in parameter space. You are looking for :


Obvious problems in your model Local maxima/other problems for the minimisation routine





If there is possibility of local maxima, star t the minimisation from a number of different positions and check they converge to same value Do not waste time copying from Numerical Recipes ­ use existing implementations:




Gnu Scientific Library: http://www.gnu.org/software/gsl/ scipy library in Python Search the web · · ·


Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Error estimates for ML I



Back to our example of fitting a Gaussain Have , what is the error on the parameters of the Gaussian? Relabel {A, x0 , y0 , x , y , } as {1 , · · · , 6 } log L = 0 at max. likelihood j =
i

(5)
2

[ai - G(xi , yi ; )] - j 2 2

(6) (7)

=
i

ai - G(xi , yi ; ) G(xi , yi ; ) j 2


Error estimates for ML II



Recall: a
i

observed data

G(xi , yi ; ) model If is ver y close to true value, is not ver y large, G is fully differentiable, then can Tailor expand around ai : ai - G(xi , yi ; ) i -
j

G(xi , yi ; ) d j

j

(8)


Error estimates for ML III
This gives: i - G(xi , yi ; ) G(xi , yi ; ) d j =0 j k G(xi , yi ; ) G(xi , yi ; ) d j k
j

i

j

(9)


i

i

G(xi , yi ; ) = k

i

j

(10) Define: X ij = G(xi , yi ; ) j (11)


Error estimates for ML IV

Then: X T = X T X d d = X T X
-1

(12) (13)

XT

(Or the equivalent numerical procedure without directly computing the -1 ) The above is the answer if we knew , but of course all we know is that they are random variables N (0, )!


Error estimates for ML V
T E d d T T = XTX
-1 T -1

(14) (15) X T EX X T X (16)

Diagonal elements of T are variances of each parameter of the maximum likelihood model. Off-diagonal elements are the covariances of the parameters. In general these are not zero reflecting the fact that errors on parameters are not independent. If: E = 2 I (that is observations all independent with same error), then :
-1

T =

2

XTX

(17)


Error estimates for ML VI
This is a great simplification because recall: G(xi , yi ; ) G(xi , yi ; ) j k

XTX =
i

(18) (19)

D

Hence only have to inver t an m в m matrix where m is number of parameters rather than handle X which is n в m matrix where n is number of observations. T 2D D
-1 -1

(20) (21)

"Error matrix"


Error estimates for ML VII



Define matrix: D jk =
i

G(xi , yi ; ) G(xi , yi ; ) j k

(22)

It defines how small changes in observations (ai ) and parameters {i } need to be related so we remain at maximum likelihood


The inverse 2 D -1 defines the errors of the fitted parameters (diagonal terms, Dii ) and covariances of errors on parameters (off-diagonal terms)


Error estimates for ML VIII


Back to the Gaussain example, what is D ? [Condon(1997)] How to evaluate elements Dij ? In principle they are dominated by the sum but for reasonably small pixels (h << x ; h << y ) can turn the sum into integrals, where h is pixel size. Integrals evaluate using the usual closed form solutions. Find:
2x y A



0
Ax y

A D= 2h 2

0 0 y x 0

0 0
Ay x

y 0 0
3Ay 2x A 2

x 0 0
A 2 3Ax 2y

0 0 0 0

0 0 0

0

0

Ax y 2

0 0 0 0 0

(23)


Error estimates for ML IX



The inverse of D : D
-1

A x y

0
x Ay

0 2 0 2h = - A 21 y -1 2x 0

0 0
y Ax

-1 2y

-1 2x

0 0 0 0

0 0
x Ay

0 0 0

0 0 0
y Ax

0 0

0 0 0 0 0
2 Ax y

(24)

0


Error estimates for ML X



What does this tell us?


Peak flux: Var(A) = 2 D
-1 11

=

2

2h2 x y

(25)



Position Var(x0 ) = 2 D
-1 22

=

2

2h2 x A2 y

(26)

= error (in units of pixels) on position is approximately equal to the noise-to-signal ratio /A


Error estimates for ML XI



Off-axis elements, e.g., D D D
-1 14 -1 14

= =

h 2 -1 A y -1 2

(27) (28)

-1 11

D

-1 44

= Strong negative correlation between errors on inferred amplitude and width


Fisher Information Matrix
Fisher information formalism is a generalisation of the above discussion: Jij = = log L(x | ) log L(x | ) i j log L(x | ) log L(x | ) dx i j (29) (30)

The impor tant result is that for any form of likelihood (Cramer-Rao bound): Var i Ji- i
1

(31)


Some conclusions on ML

Advantages:


Disadvantages:


A well-defined approach that can be applied ever y time you have a model Estimate has the minimum variance

Results often biased = Have to Monte-Carlo anyway Computationally expensive Requires good model for errors






Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Monte Carlo techniques



The brute-force approach to characterising a statistic and/or calculating likelihood


Generate a large number of simulated data using a model Calculate the proposed statistic from each data set This results in an empirical distribution of the statistic P (f (x )|) which you compare to the model parameters In many realistic problems there are no good theoretical models for the sources of errors If can't estimate errors at all, consider the bootstrap technique



Still need to have a good model for the errors






Can take into account the complete processing of the data


Monte Carlo example
Estimate the centre of the model Gaussian by simply picking the highest-value pixel.
200 x y 150

100

N

50

0 0 50 100 position 150 200 250


Outline
Maximum Likelihood Example Recap Alternatives to ML How to find the Maximum Likelihood Errors estimation in Maximum Likelihood Monte Carlo simulation Problems


Some problems to tr y I

1. In the situation when background is time-variable, one often used technique is on-off observation, where the telescope observes empty sky for a half of the observing time and this measurement is subtracted from the on-source measurement. What is the resulting loss in sensitivity? What is the full S/N expression?


Bibliography

Condon J. J., 1997, PASP, 109, 166