Presentations |
Alex Blocker (Harvard U) 6 Sep 2011 |
- A taste of astrostatistics: problems, opportunities, & connections
- Abstract:
Astrostatistics is a vibrant, tight-knit field with more open
problems than statisticians to tackle them. These range from very
applied, such as understanding the workings of space telescopes,
to fundamental questions of statistical inference, and sophisticated
computation is the order of the day. The latter is particularly
true as new instruments generate huge volumes of data.
I will provide a sampling of two projects from astrostatistics:
inferring the brightness of faint galaxies using the Chandra space
telescope, and finding unusual events within millions of astronomical
time series. These presented major inferential challenges in
radically different ways. Addressing them took a combination of
statistical modeling, scientific knowledge, and computational
finesse.
Finally, I will share some surprising connections between astrostatistics
and my work in biology. Biology appeared to lead astronomy in data
analysis for many years, but the fields are now coming full circle.
The newest forms of biological data share many features with modern
astronomical data; there is a great potential for "methodological
arbitrage" here for graduate students willing to dive into
astrostatistics.
- Slides [.pdf]
-
|
Astro Projects for Statistics 20 Sep 2011 |
- Projects, problems, and demos
- Doubt: How Do I Know if that is a Real Feature in My Image? (Alanna C)
- Timing analysis of grating data (Vinay K)
- Real time feature detection and classification (Pavlos P)
- Issues in modeling the X-ray data (Aneta S)
- Quasar clustering project (Brandon K)
- Simplicity: Bayesian Energy Quantiles, or Quick Non-parametric way(s) to incorporate Higher Dimensional Data (Alanna C)
- Source detection in 4D (Vinay K)
- Physics demos: poisson, atomic lines, dispersion spectra (Alanna C)
-
|
Group 11-13 Oct 2011 |
- Projects
- Tuesday 11 Oct
- 9:30a - 10:15a: pyBLoCXS (at SciCen 706)
- 10:15a - 11:15a: proposal
- 11:30a - 12:15p: new projects
- 12:15p - 1:00p: Bayes Factors
- 2:00p - 3:00p: Full Bayes Calibration Uncertainties
- 3:00p - 3:30p: 2D Cal Uncertainties and SCA
- Wednesday 12 Oct
- 10:00a - 10:30a: SolarStat (at CfA Fishbowl)
- 10:30a - 11:15a: Sunspot Classification
- 11:15a - 11:45a: Sunspot Cycles
- 1:00p - 2:00p: Timing analysis with grating data (at CfA M-240)
- 2:00p - 2:45p: Solar DEM features
- 2:45p - 4:00p: computing
- Thursday 13 Oct
- 10:00a - 11:00a: pySALC (at CfA M-240)
- 11:00a - 2:30p: proposal
-
|
Brandon Kelly (CfA/UCSB) 25 Oct 2011 |
- Investigating Star Formation through Hierarchical Bayesian Modeling of Emission from Astronomical Dust
- Abstract: Astronomical dust plays an important role
in the formation of stars and planets. Recently launched observatories,
such as Herschel and Planck, are providing observations which provide
important constraints on the properties of astronomical dust.
However, the traditional least-squares analysis used by astronomers
is highly inefficient for this problem, and leads to biases and
incorrect conclusions. In this talk I will discuss a hierarchical
Bayesian approach to deriving the physical parameters of astronomical
dust, as well as the distribution of these parameters. I will also
discuss an ancillarity-sufficiency interweaving strategy for boosting
the efficiency of the MCMC sampler. Finally, I will present results
from our model as applied to a nearby star-forming region. The
results obtained from our Bayesian approach lead to opposite
scientific conclusions compared to those obtained from the least-squares
analysis. The results obtained from our Bayesian analysis are
consistent with astrophysical theories of dust formation, while the
least-squares results are inconsistent with astrophysical theory.
- Slides: [.pdf] | [.ppt]
-
|
Raffaele D'Abrusco (CfA) 1 Nov 2011 |
- Knowledge Discovery workflows for exploration of complex multi-wavelengths
astronomical datasets. Application to CSC+, a sample of AGNs built on the Chandra Source Catalog
- Abstract:
A complete understanding of all astronomical sources requires a global
multi-wavelength approach and that, at the same time, the availability of
large surveys of the sky in different spectral regions has propelled the
aggregation of massive and complex datasets. The traditional approach to
data analysis that involves well informed testing of different models
cannot make justice of the richness of the these new datasets and, in some
sense, of the intrinsically peculiar type of knowledge therein contained.
Knowledge Discovery (KD) techniques, while relatively new to astronomy,
have been successfully used in several other disciplines, from finance to
genomics, for the determination of complex or simple but yet unseen
patterns in large datasets.
In this talk I shall describe CLaSPS, a method for the characterization of
the multi-dimensional astronomical sources, based on KD unsupervised
clustering algorithms that are used to determine the spontaneous
aggregations of sources in the high-dimensional space generated by their
observables. Then, a data-driven criterion is applied to pick the most
interesting clusterings in terms of astronomical properties of the sample.
I will discuss the application of this method to a sample of optically
selected AGNs with X-ray observations in the Chandra Source Catalog and
other multi-wavelength data, which is representative of the VO-powered
inhomogeneous astronomical dataset that will be more and more common in the
future. The goals of this project are to test known correlations, possibly
determine new patters and establish diagnostics for an improved
classification of X-ray selected AGNs with multi-wavelength observations.
As an example of unknown low-dimensional patters. I will also briefly
discuss a recent result on Blazars which is by-product of the application
of CLaSPS to a sample of AGNs with multi-wavelength data.
- Slides [.pdf]
-
|
Ed Turner (Princeton) 15 Nov 2011 |
- A Bayesian Analysis of the Astrobiological Implications of the Rapid
Emergence of Life on the Early Earth
- Abstract:
Life arose on Earth sometime in the first few hundred million years
after the young planet had cooled to the point that it could support
water-based organisms on its surface. The early emergence of life on
Earth has been taken as evidence that the probability of abiogenesis
is high, if starting from young-Earth-like conditions. This argument is
revisited quantitatively in a Bayesian statistical framework. Using
a simple model of the probability of abiogenesis, a Bayesian estimate of
its posterior probability is derived based on the datum that life emerged
fairly early in Earth's history and that, billions of years later,
sentient creatures noted this fact and considered its implications.
Given only this very limited empirical information, the choice of
Bayesian prior for the abiogenesis probability parameter has a very
strong influence on the computed posterior probability. In particular,
although life began on the Earth quite soon after it became habitable, that
fact is statistically consistent with an arbitrarily low intrinsic
probability of abiogenesis for plausible uninformative priors and,
therefore, with life being arbitrarily rare in the Universe. The
presentation will emphasize generic statistical properties of problems of
this general character, which occur in cosmology and many other areas
of science, as well as in the context of abiogenesis.
- Slides [.pdf]
-
|
Group 29 Nov 2011 |
- 20 Questions
- Wherein stats grad students ask questions of astronomers, who,
if they can't answer the question, will get to ask back a question
on statistics. Also, demos.
-
|
Xu Jin (UC Irvine) 7 Feb 2012 |
- New Results of Fully Bayesian
- slides [.pdf]
-
|
Tom Loredo (Cornell) 15 Feb 2012 3:15pm - 4:30pm Pratt Conference Room at CfA |
- Adaptive scheduling of exoplanet observations via Bayesian adaptive exploration
- Abstract:
I will describe ongoing work by a collaboration of astronomers and
statisticians developing a suite of Bayesian tools for analysis and adaptive
scheduling of exoplanet host star reflex motion observations. In this
presentation I will focus on the most unique aspect of our work: adaptive
scheduling of observations using the principles of Bayesian experimental
design in a sequential data analysis setting. The idea is to iterate an
observation-inference-design cycle so as to gain information about an
exoplanet system more quickly than is possible with random or ad hoc
scheduling. I will introduce the core ideas---decision theory and
information measures---and highlight some of the computational challenges
that arise when implementing Bayesian design with nonlinear models.
Specializing to parameter estimation cases (e.g., measuring the orbit of
planet known to be present), there is an important simplification that
enables relatively straightforward calculation of greedy designs via maximum
entropy sampling. We implement MaxEnt sampling using population-based MCMC
to provide posterior samples used in a nested Monte Carlo integration
algorithm. I will demonstrate the approach with a toy problem, and with a
re-analysis of existing exoplanet data supplemented by simulated optimal
data points.
- Presentation slides [.pdf]
-
|
Group 16-17 Feb 2012 |
- Solar-Statistics mini Workshop
- Thursday, Feb 16 (@ Pratt)
- 2:00pm - 3:45pm: Stats Tutorial
- 4:15pm - 6:00pm: Solar Tutorial
- Friday, Feb 17 (@ Phillips)
- 9:00am - 10:30am: Feature Recognition
- 11:00am - 12:30pm: Thermal Structure
- 2:00pm - 3:30pm: Multi-D Joint Analysis
- 4:00pm - 5:30pm: Massive Data Streams
-
|
Alex Blocker (Harvard) 21 Feb 2012 |
- Discussion of Maximal Information Coefficient
- Abstract: The publication of Reshef et
al's work on the maximal information coefficient (MIC) in late
2011 created a great deal of buzz across many disciplines. Their
goal of identifying novel relationships in massive datasets and
low-assumption approach resonated with many researchers, and the
method's publication in Science amplified its impact substantially.
However, this work has been less warmly received by the statistical
community, where many consider it lacking compared to existing
approaches. I will summarize the theory and application of MIC as
presented by Reshef et al for scientists and statisticians, then
provide a statistical review of their approach. The broader issues
and lessons raised by this episode will also be discussed.
- Presentation slides: [.pdf]
- References and code for the talk from AB: [ab_20120221/]
- Supplement to the Reshef et al paper (especially for the statisticians),
linked from thoughts-on-mic-reshef-et-al-2011
-
|
Paul Baines (UC Davis) 6 Mar 2012 [via Skype] |
- LogN-LogS: Model Selection and Model Checking
- The study of astrophysical source populations is often conducted
using the cumulative distribution of the number of sources detected
at a given sensitivity. The resulting log(N>S)-logS relationship
can be used to compare and evaluate theoretical models for source
populations and their evolution. In practice, however, inferring
properties of source populations from observational data is complicated
by detector-induced uncertainties, background contamination and
missing data.
By investigating the connection between probabilistic and
theoretical assumptions in commonly used logN-logS methods, we
propose a new class of models with a more realistic physical
interpretation. Our Bayesian approach leads to efficient inference
for physical model parameters and the corrected log(N>S)-log(S)
distribution for source populations. Our method extends existing
work in allowing for both non-ignorable missing data and an unknown
number of unobserved sources. In this talk we will focus on model
selection issues and multivariate strategies for Bayesian model
checking.
This is joint work with Andreas Zezas, Vinay Kashyap and Irina Udaltsova.
- Presentation slides [.pdf]
-
|
Andreas Zezas (Crete) 20 Mar 2012 9am PDT / Noon EDT / 4pm GMT / 6pm EET [via Skype] |
- Adaptive Smoothing powwow
- Presentation slides [.pdf]
- The goal is to derive the ideal tool for quick astronomical analysis: a statistically principled, adaptively smoothing, flux-conserving, semi-parametric tool that works in 2-D, on Poisson data, and runs reasonably quickly. Some useful papers to read up on:
-- ASMOOTH: A simple and efficient algorithm for adaptive kernel smoothing of two-dimensional imaging data, Ebeling, H., White, D.A., & Rangarajan, F.V.N., 2006, MNRAS, 368, 65 [arXiv:0601306]
-- csmooth, CIAO ahelp page, cxc/ciao/ahelp/csmooth
-- Multiple Testing of Local Maxima for Detection of Unimodal Peaks in 1D, Schwartzman, A., Gavrilov, Y., & Adler, R.J., 2011 [.pdf]
-- Multiple Testing of Local Maxima for Detection of Peaks in ChIP-Seq Data, Schwartzman, A., Jaffe, A., Gavrilov, Y., & Meyer, C.A., 2011, HU Biostatistics Working Paper Series, 133 [.pdf]
-- A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data, Freeman, P.E., Kashyap, V., Rosner, R., & Lamb, D.Q., 2002, ApJS, 138, 185 [.pdf]
-- Low Assumptions, High Dimensions, Wasserman, L., 2011, RMM v2, 201, in Statistical Science and Philosophy of Science [.pdf]
-- Multiscale Poisson Intensity and Density Estimation, Willett, R.M., and Nowak, R.D., 2007, IEEE Trans. on Inform. Theory, 53, 9 [.pdf]
-- Multiscale Photon-limited Spectral Image Reconstruction, Krishnamurthy, K., Raginsky, M., and Willett, B., 2009, SIIMS [.pdf]
-- Poisson Noise Reduction with Non-Local PCA, Salmon, J., Deledalle, C.A., Willett, R., and Harmany, Z., 2012, ICASSP [.pdf]
-
|
Min Shandong & Xu Jin (UCI) 03 Apr 2012 |
- Bayes Factors (Shandong)
- Presentation slides [.pdf]
- Calibration (Jin)
- Presentation slides [.pdf]
-
|
Omiros Papaspiliopoulos (U Pompeu Fabra) 10 Apr 2012 |
- SMC2: an efficient algorithm for sequential analysis of
state-space models
- Nicolas Chopin, Pierre E. Jacob, Omiros Papaspiliopoulos
-
Abstract:We consider the generic problem of performing sequential Bayesian inference
in a state-space model with observation process y, state process x and
fixed parameter theta. An idealized approach would be to apply the iterated
batch importance sampling (IBIS) algorithm of Chopin (2002). This is a
sequential Monte Carlo algorithm in the theta-dimension, that samples
values of theta, reweights iteratively these values using the likelihood
increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles
through a resampling step and a MCMC update step. In state-space models
these likelihood increments are intractable in most cases, but they may be
unbiasedly estimated by a particle filter in the x-dimension, for any fixed
theta. This motivates the SMC^2 algorithm proposed in this article: a
sequential Monte Carlo algorithm, defined in the theta-dimension, which
propagates and resamples many particle filters in the x-dimension. The
filters in the x-dimension are an example of the random weight particle
filter as in Fearnhead et al. (2010). On the other hand, the particle
Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al.
(2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the
theta-particles target the correct posterior distribution at each iteration
t, despite the intractability of the likelihood increments. We explore the
applicability of our algorithm in both sequential and non-sequential
applications and consider various degrees of freedom, as for example
increasing dynamically the number of x-particles. We contrast our approach
to various competing methods, both conceptually and empirically through a
detailed simulation study, included here and in a supplement, and based on
particularly challenging examples.
- paper available from arxiv.org/abs/1101.1528
- Presentation slides [.pdf]
-
|
Lazhi Wang (Harvard) 15 May 2012 |
- Luminosity Functions
- Abstract: The goal of source detection is often to obtain the
luminosity function, which specifies the relative number of sources
at each luminosity for a population. In this talk, I will first
explain a hierarchical Bayesian approach to infer the distribution
of intensities (luminosities) of all the sources in a population,
given the background contaminated photon counts at the locations
of the sources. The distribution of intensities is modeled as a
zero-inflated gamma distribution. The zero-inflated component, which
is a completely new idea in astronomical problems, models the
proportion of dark sources (sources which do not emit any photons).
Then, I will display some simulation results, including the joint
posterior distributions of the parameters, the best fit of the
zero-inflated gamma and the associated uncertainty. Finally, I will
discuss different choices of priors for the hyper-parameters and
the coverage percentages of the Bayesian model under different
simulation studies and with different priors.
- Presentation slides [.pdf]
|
Tanmoy Laskar (CfA) 29 May 2012 |
- Quantifying the Non-Existent - Radio, X-ray and Optical Model Fitting with Non-Detects
- Abstract: Non-detects are equally important in hypothesis
testing and model-fitting as "detections". While several statistical
tools have been developed in the bio-medical and environmental
sciences on the incorporation of non-detects into robust analyses,
percolation of these methods into Astronomy has been slow. To bridge
this gap, I will discuss a project that involves simultaneous
modeling of multi-wavelength light curves (in the context of Gamma-Ray
Burst afterglows) - from the radio through the X-rays and seek to
understand the best statistical method for quantifying and incorporating
non-detects into the analysis.
- Presentation slides: [.pdf] ; [.odp]
|
|
-
|