Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea-www.harvard.edu/AstroStat/SolStat2012/statTutorial_awblocker.pdf
Дата изменения: Fri Feb 17 02:02:14 2012
Дата индексирования: Tue Oct 2 06:16:36 2012
Кодировка:

Поисковые слова: m 8
Statistical Tools

Statistical Tools and Techniques for Solar Astronomers
Alexander W Blocker Nathan Stein

SolarStat 2012


Statistical Tools Outline

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Introduction & Objectives

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Introduction & Objectives

Introductions

Alex Blocker, Harvard Statistics Model-based stacking for low-count observations Event detection for massive light curve databases

Nathan Stein, Harvard Statistics Analysis of CMDs in stellar clusters Robust clustering methods for astronomical datas


Statistical Tools Introduction & Objectives

Objectives for the day

Disclaimer: Will not make statisticians in two hours Goals: Awareness of statistical issues and concepts Understanding of probability modeling approach Familiarity with computational tools Basically, want informed statistical consumers


Statistical Tools Introduction & Objectives

Background

Assuming little statistical background However, should have basic understanding of probability Not assuming knowledge of Bayesian modeling, MCMC, etc.


Statistical Tools Statistical issues with astronomical data

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Statistical issues with astronomical data

Sources of error

Raw data typically consist of photon counts Measurement noise and intrinsic variation Contamination from background Inhomogeneous instrumental sensitivities


Statistical Tools Statistical issues with astronomical data

Forms of data

Starts with images, but many ways to extract numerical data (increasing order of structure and complexity): Point and area measurements (predefined) Light curves Spatiotemporal on predefined regions (local) Global spatiotemporal patterns


Statistical Tools Statistical issues with astronomical data

Focus

Focusing today on simplest structure -- measurements on predefined regions Core modeling is shared across settings Computational strategies are similar, but greater sophistication is needed for more complex settings Analyses of light curves, whole images, etc. add layers of structure


Statistical Tools Example: filter / hardness ratios

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Example: filter / hardness ratios

Problem definition

Interested in relative flux of source or region between two energies Filter ratios in solar Hardness ratios in high-energy DEMs preferred to filter ratios for solar (e.g. Weber et al. 2005) However, ratios provide a straightforward setting to work with; DEM analysis is an extension


Statistical Tools Example: filter / hardness ratios

Definitions

Denoting fluxes in hard and soft passbands as H and S Simple ratio R= Color C = log
10

S H S H


Statistical Tools Example: filter / hardness ratios

Data

Observations Photon counts from region of interest in hard (H ) and soft (S ) passbands -- extracted from images Similar counts from area with background only in each passband (BH and BS ) Calibration Sensitivity of instrument to each band eH and eS (effective area) Relative effective area for background region r


Statistical Tools Standard approach

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Standard approach

Simple case

Without background or other corrections, standard approach just substitutes counts for fluxes: S R= H S C = log10 H


Statistical Tools Standard approach

Corrections

Adjusting for background, standard approach would use: S - BS /r R= H - BH /r S - BS /r C = log10 H - BH /r


Statistical Tools Standard approach

Error estimates

Standard errors of these are usually propagated Gaussian approximation (linear approximation): 2 2 2 2 S + BS /r 2 H + BH /r 2 S - BS /r R = + H - BH /r (S - BS /r )2 (H - BH /r )2
2 2 2 2 H + BH /r 2 S + BS /r 2 1 + C = ln(10) (S - BS /r )2 (H - BH /r )2 where S , H , BS , and BH are typically approximated with the Gehrels prescription (Gehrels 1986) X X + 0.75 + 1


Statistical Tools Standard approach

From sigma to intervals

Typically not interested in for its own sake Want to summarize uncertainty about R or C Standard statistical approach is to use intervals ^ Often constructed as ± k · Confidence interpretation: want interval to include true value at least as often as stated Ї Ї e.g., for Gaussian data, X ± is a 68% interval; X ± 1.96 is 95%


Statistical Tools Standard approach

Flaws

Sigma does not summarize errors on R or C; actual uncertainty can be highly asymmetric Gaussian assumption is flawed for low-count observations; intervals are not valid Background subtraction leads to bias and inefficiency (van Dyk et al. 2001) Not accounting for differences in detector sensitivity effectively


Statistical Tools Building a statistical model

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Building a statistical model

Models vs. procedures

Classical approach is set of procedures; not derived from deeper framework Model-based approach starts from description of data-generating process Model is realistic (though not necessarily physical) description of underlying mechanisms Why model? Efficient use of information, consistency, and incorporation of complex error structure


Statistical Tools Building a statistical model

Parameters vs. observations

Parameters regulate underlying processes (source and observation) Ideally invariant to detail of observation structure (e.g. flux, not expected counts) Target of inference For hardness ratios, parameters are source fluxes (H and S ) and background fluxes (H and S ) Observations are noisy outputs of parameters S , H , BS , and BH in ratio problem Input for, not target of, inference


Statistical Tools Building a statistical model

Distributions as connections

Connect parameters to observations through distributions Background counts depend only on background flux and exposure BS Poisson(r · eS · S ) BH Poisson(r · eH · H ) where e is the effective area for the source region Source counts depend on both source and background fluxes S Poisson(eS · (S + S )) H Poisson(eH · (H + H ))


Statistical Tools Building a statistical model

Augmentation
Sometimes useful to expand model by expanding observations Looks like adding complication, but can simplify computation and help with interpretation Usually ask "what observations would make this problem easy?" For ratio case, would be easy if we knew which parts for S and H came from source vs. background So, augment with background counts S Poisson(eS · S ) and H Poisson(eH · H ), S Poisson(eS · S ) and H Poisson(eH · H ), S = S +
S

and H = H + H


Statistical Tools Using the model

Outline

1

Introduction & Objectives Statistical issues with astronomical data Example: filter / hardness ratios Standard approach Building a statistical model Using the model

2

3

4

5

6


Statistical Tools Using the model

Likelihood
Likelihood is at the core of model-based inference Definition Likelihood is the probability of observing your (fixed) sample, as a function of the parameters. If your sample is Y and parameters are , likelihood is L() Pr (Y |). Likelihood is not the probability of your parameters taking on a particular value Higher values of likelihood indicate more support from data for given parameter value Likelihood function contains all information for inference with given model


Statistical Tools Using the model

Likelihood, in particular

For independent observations, likelihood is just the product of their probabilities. So, for the ratio problem, the likelihood is: L(S , H , S , H ) P (BS |S ) · P (S |S , S ) · P (BH |H ) · P (H |H , H ) Here, all of these probabilities take the form of the Poisson PMF


Statistical Tools Using the model

MLE

One way to use the likelihood is to find the parameter values that maximize it; known as maximum likelihood estimation Resembles 2 fitting, but error measures need not be squared Has some desirable properties in large samples (efficiency, known approximate errors, etc.) Requires numerical maximization for most realistic models Can be badly misleading for small samples and settings with highly asymmetric uncertainty


Statistical Tools Using the model

Bayesian inference
Uses Bayes Theorem to quantify uncertainty and perform estimation P (Y |)P () P (|Y ) = P (Y ) P (|Y ) is posterior distribution of parameters
Estimates from mean, median, etc. of this distribution Intervals, standard errors, etc. from its quantiles and spread

Can derive or simulate posterior of any function of using this posterior Drawback: need prior P () Typically aim to choose prior that has little effect on results; check through sensitivity analysis


Statistical Tools Using the model

Implementation

On to Nathan for computation!