Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea-www.harvard.edu/AstroStat/astro193/VLK_slides_mar09.pdf
Дата изменения: Tue Mar 10 01:40:33 2015
Дата индексирования: Sun Apr 10 11:58:38 2016
Кодировка:

Поисковые слова: crab nebula
Astro 193 : 2015 Mar 9
·

Follow-up
· · ·

MCMC: proposal distribution MCMC: parameter estimates and variances Bayesian Gaussian Aperture Photometry -- updated notes

·

Model Selection


Model Selection
· · · · ·

p-values and Hypothesis Tests Type I, Type II, and other errors Analysis Design Upper Limits Hypothesis Tests, Likelihood Ratio Tests, and posterior predictive pvalues Odds ratios, Bayes Factors AIC/BIC/DIC/WAIC/HQIC/MDL

· ·


p-values
· ·

p=Pr(T>T(D)|H) What p-values are: the probability that a statistical fluctuation in the baseline results in a deviation at least as large as that observed What p-values are not:
·

·

measure of truth of baseline hypothesis p(D|H0) p(H0|D) proof of complex hypothesis

·


Hypothesis Tests

·

Comparing a simpler null hypothesis H0 to a more complex alternative hypothesis H1 classical Neyman-Pearson hypothesis test uses false-positive threshold, balanced against false negative, such that when H0 is rejected, it implies H should be accepted.
·

·

1

Set sample size based on expected effect size or noise; set a threshold based on false positives and accept or reject null based on a p-value test. The rules for testing (choice of statistic, the threshold probability of false positives, and controlling for false negatives) are laid down before the data are collected; the analysis presents stark decisions with known rates of error.

·

· ·

Most powerful Hypothesis Tests are those that use ratio of likelihoods. when multiple hypothesis tests are carried out, watch out for those false positives


Characterizing errors of hypothesis tests

· ·

Type I: Probability of false positives, seeing things where nothing is Type II: Probability of false negatives, not seeing things that really are there Measure of Statistical Power FDR (False Discovery Rate) : Ratio of false positives to true positives. Type S: Probability of finding the wrong sign Type M: Probability of finding the wrong magnitude exaggeration factor: ratio of measured signal to expected effect (aka Eddington Bias)

·

· · ·


p-values: use with care
·

Started out as a way to avoid fooling ourselves, but nowadays has become a prime tool for trapping the unwary. When the null hypothesis H0 is true, p-values are distributed uniformly (unless discrete, or H0 is composite, or uncertainty in parameter is included [e.g., ppp]) Never use distributions of p-values to do any statistical test. Never compare p-values from different analyses.

·

·

Always check against magnitude of effect. If you find a highly significant p-value when testing for a weak signal, something has gone wrong. If you find pobs=0.01 when the threshold is 0.05, report it as pobs<0.05, otherwise you are mischaracterizing the error rate of the experiment p-values don't get error bars, but they do have uncertainty. There is nothing magical about the p=0.05 threshold. Look at the distribution of the statistic used to get the p-value to understand how robust it is. "The difference between significant and not significant is not significant." -- Andrew Gelman "researchers are disproving always-false null hypotheses and taking this disproof as near proof that their theory is correct"
http://andrewgelman.com/2014/03/27/beyond-valley-trolls/#comment-156007

·

·

·


p-values: use with care
xkcd 1478: P-Values xkcd 882: Significant


Analysis Design
See Gelman & Carlin, Beyond power calculations to a broader design analysis, prospective or retrospective, using external information
"The usual thinking is that if you happen to obtain statistical significance with low power, then you have achieved a particularly impressive feat, obtaining scientific success under difficult conditions. But that is incorrect, if the goal is scientific understanding rather than (say) publication in a top journal. In fact, statistically significant results in a noisy setting are highly likely to be in the wrong direction and can grossly overestimate the absolute values of any actual effect sizes."

http://www.stat.columbia.edu/~gelman/research/unpublished/retropower.pdf

1. First estimate effect size, D, from external sources (this is like figuring out an informative prior) 2. Then determine effect size d± from data, and compute power, Type S error, and exaggeration factor power for Gaussian, P(|d|>n)=P(d>n)+P(d<-n) Type S error for Gaussian, P(d<-n)/((P(d>n)+P(d<-n)) exaggeration factor for Gaussian, E[d ||d|>n]/D 3. Use above calculations to evaluate whether the p-value is a reliable test to reject H
0 rep


Analysis Design: Example
You have a continuum spectrum and seek to measure an absorption line. The continuum is at 100 counts, and you measure 80 counts. This is significant at the p0.05 (~2) level.

power =-1 =-5 =-10 =+1 =+5 =+10 4% 6% 13% 4% 7% 16%

sign error 45% 10% 1% 33% 7% 0.8%

exaggeration 24x 5x 2.5x 24x 5x 2.6x