Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://hea-www.harvard.edu/AstroStat/HEAD2008/talk_dvandyk.pdf
Äàòà èçìåíåíèÿ: Tue Apr 1 07:57:23 2008
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 03:31:57 2012
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: ï ï ï ï ï ï ï ï ï ï ï ï ï ï
Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Hypothesis Testing
David A. van Dyk1
1 Depar tment of Statistics University of California, Irvine

2008 HEAD Meetings

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Outline
1

Hypothesis Testing Basic Framework Test Statistics Mathematical Computations Asymptotics Assumptions Numerical Computations Monte Carlo Bootstrap and Posterior Predictive P-values

2

3

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Basic Framework Test Statistics

Hypothesis Testing The Null Hypothesis
H0 : Supposed interesting feature doesn't exist in the data.

The Alternative Hypothesis
HA : Supposed interesting feature does exist in the data.

H0 : No emission line.

H1 : Emission line.

The null is a special case of the alternative:
Line intensity equals zero.
David A. van Dyk Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Basic Framework Test Statistics

Test Statistics Test Statistics are used to measure the evidence for null and alternative hypotheses. Assuming the null hypothesis is true, how likely are we to see a value of the test statistics as extreme or more extreme than the observed value?
1

2

3

The distribution of the the test statistic must be known under the null hypotheis. The test statistic must behavior differently under the alternative hypothesis. For example, large value of the test statistic may give evidence for the alternative and agaist the null hypothesis.

How large must the test statistic be?
David A. van Dyk Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Basic Framework Test Statistics

P-values Assuming the null hypothesis is true, how likely are we to see a value of the test statistics as extreme or more extreme than the observed value?

Pr(T tobs |H0 ) = p-value Unfor tunately, these probability calculation are intractable in all but the simplest situations.

Solution: "Large sample" approximations.

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Asymptotics Assumptions

Likelihood Ratio Test Statistics R= sup0 L( |Y ) , sup L( |Y )

where 1 is the parameter space under the alternative (dim = d ). 2 0 is the parameter space under the null (dim = d0 ). 3 L is the Likelihood Fit model with and without the line and compare the best fits.

Under cer tain assumptions, the distribution of -2 log(R ) under H0 approaches 2d -d0 ) as the ( sample size (or counts) increases.
David A. van Dyk Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Asymptotics Assumptions

BUT... Assumptions include:
1

The null hypothesis must be a special case of the alternative hypothesis: 0 . The null hypothesis must be in the interior of the alternative hypothesis, more precisely 0 must be in the interior of .

2

The second assumption fails when testing for a spectral line:
1

When there is no line, the line intensity is zero, it may not be negative. Fur ther, the the location and width of the line do not exist when there is no line. They have no values. The F-test is similarly inappropriate for testing for a line.

2

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Asymptotics Assumptions

0.0

0.0

0

2

4

6

0

2

4

6

8

0.0

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· ·· ·· ·· ··· ··· ···· ···· ····· ······ ········· ··········· ·························· ···························· ·······

1.0

p.d.f.

p.d.f.

0.8

0.6

2.6%

1.5%
· · · · · · · · · · · · · · · · · · · · · · ·· · ·· ·· ·· ·· ·· ·· ··· ··· ··· ··· ···· ···· ····· ····· ······· ········ ·············· ··············· ········· ······

p.d.f.

0.10

0.15

(a)

(b)

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· ·· ·· ·· ··· ··· ···· ···· ······· ········ ·························· ·························

1.5

1.2

0.20

(c)

1.0

31.5%

0.5

0.4

0.2

0.05
0

5

10

15

LRT statistic

LRT statistic

LRT statistic

The actual distribution of the LRT statistic (histogram) is compared with its nominal distribution (line). Three cases: fitting a narrow line (fixed location), fitting a wide line (fit location), testing for an absorption line. The nominal cut off for 5% false positives is shown along with the simulated false positive rates.
David A. van Dyk Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Monte Carlo Bootstrap and Posterior Predictive P-values

Monte Carlo Calibration
1

We do not know the true (sampling) distribution of the test statistic. We can evaluate the distribution numerically using Monte Carlo simulation. Simulate L data sets under H0 and compute the test statistic for each of the L data sets. A histogram of the simulated test statistics approximates the sampling distribution of the test statistic.

2

3

4

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Monte Carlo Bootstrap and Posterior Predictive P-values

p.d.f.

0.4

0.6

1.6%
0.0 0 0.2

2

4

6 LRT statistic

8

10

12

Computing the p-value: Pr(T tobs |H0 ) = the propor tion of simulated test statistics larger than tobs .
David A. van Dyk Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

Monte Carlo Bootstrap and Posterior Predictive P-values

Bootstrap and Bayesian Posterior Predictive Sampling A complication: If there are unknown parameters in null the model, we can not directly simulate data. Solutions:
1

Fit the real data under the null model. Compute fitted parameters and error bars. Parametric Bootstrap suggests resampling data sets with unknown parameters set accounting for these error bars. Bayesian Posterior Predictive modeling simulates unknown parameters from their posterior distribution, which are in turn used to simulate data sets.

2

3

David A. van Dyk

Statistics: Handle with Care


Hypothesis Testing Mathematical Computations Numerical Computations Fur ther Reading

For Fur ther Reading I
van Dyk, D. A., Connors, A., Kashyap, V., and Siemiginowska, A. (2001). Analysis of energy spectra with low photon counts via Bayesian posterior simulation. The Astrophysical Journal 548, 224­243. Protassov, R., van Dyk, D. A., Connors, A., Kashyap, V., and Siemiginowska, A. (2002). Statistics: Handle with care ­ detecting multiple model components with the likelihood ratio test. The Astrophysical Journal 571, 545­559. van Dyk, D. A. and Kang, H. (2004). Highly structured models for spectral analysis in high energy astrophysics. Statistical Science 19, 275­293. van Dyk, D. A., Connors, A., Esch, D. N., Freeman, P., Kang, H., Karovska, M., and Kashyap, V. (2006). Deconvolution in High Energy Astrophysics: Science, Instrumentation, and Methods (with discussion). Bayesian Analysis, 1, 189­236.
David A. van Dyk Statistics: Handle with Care