The AstroStat Slog

Archive for the ‘Stat’ Category.

The Flip Test

May 1st, 2008| 02:00 pm | Posted by vlk

Why is it that detection of emission lines is more reliable than that of absorption lines?

That was one of the questions that came up during the recent AstroStat Special Session at HEAD2008. When you look at the iconic Figure 1 from Protassov et al (2002), which shows how the null distribution of the Likelihood Ratio Test (LRT) and how it holds up for testing the existence of emission and absorption lines. The thin vertical lines are the nominal F-test cutoffs for a 5% false positive rate. The nominal F-test is too conservative in the former case (figures a and b; i.e., actual existing lines will not be recognized as such), and is too anti-conservative in the latter case (figure c; i.e., non-existent lines will be flagged as real). Continue reading ‘The Flip Test’ »

Tags: absorption line, emission line, F-test, LRT, Protassov, question for statisticians
Category: Spectral, Stat, Uncertainty | Comment

tests of fit for the Poisson distribution

Apr 29th, 2008| 02:24 am | Posted by hlee

Scheming arXiv:astro-ph abstracts almost an year never offered me an occasion that the fit of the Poisson distribution is tested in different ways, instead it is taken for granted by plugging data and (source) model into a (modified) χ² function. If any doubts on the Poisson distribution occur, the following paper might be useful: Continue reading ‘tests of fit for the Poisson distribution’ »

Tags: Cramer-von Mises test, Goodness of fit, most powerful test, Poisson, Power
Category: Methods, Misc | 1 Comment

[ArXiv] 4th week, Apr. 2008

Apr 27th, 2008| 11:29 am | Posted by hlee

The last paper in the list discusses MCMC for time series analysis, applied to sunspot data. There are six additional papers about statistics and data analysis from the week. Continue reading ‘[ArXiv] 4th week, Apr. 2008’ »

Tags: clusters, CMB, GALEX, gravitaional waves, lensing, LF, LMC, machine learning, maximum likelihood, priors, probability, SDSS, stellar populations, sunspot, time series
Category: arXiv, MCMC | Comment

The LRT is worthless for …

Apr 25th, 2008| 01:48 am | Posted by hlee

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

Tags: LRT, mixture
Category: arXiv, Bad AstroStat, CHASC, Frequentist, Stat | Comment

[ArXiv] Ripley’s K-function

Apr 21st, 2008| 11:56 pm | Posted by hlee

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog

Continue reading ‘[ArXiv] Ripley’s K-function’ »

Tags: BATSE, catalog, correlation function, cosmology, K-function, Point Process, spatial statistics
Category: arXiv, Methods, Stat | Comment

[ArXiv] 3rd week, Apr. 2008

Apr 20th, 2008| 09:05 pm | Posted by hlee

The dichotomy of outliers; detecting outliers to be discarded or to be investigated; statistics that is robust enough not to be influenced by outliers or sensitive enough to alert the anomaly in the data distribution. Although not related, one paper about outliers made me to dwell on what outliers are. This week topics are diverse. Continue reading ‘[ArXiv] 3rd week, Apr. 2008’ »

Tags: background, bootstrap, calibration errors, Cash statistics, clusters, CMB, corona, edge detection, FFT, gravitational lens, maximum likelihood, multiscale, neural network, outlier, SDSS, sunspot, systematic errors, topology, WMAP, XMM-Newton
Category: arXiv, High-Energy, MCMC | Comment

PCA

Apr 18th, 2008| 01:38 pm | Posted by hlee

Prof. Speed writes columns for IMS Bulletin and the April 2008 issue has Terence’s Stuff: PCA (p.9). Here are quotes with minor paraphrasing:

Although a quintessentially statistical notion, my impression is that PCA has always been more popular with non-statisticians. Of course we love to prove its optimality properties in our courses, and at one time the distribution theory of sample covariance matrices was heavily studied.

…but who could not feel suspicious when observing the explosive growth in the use of PCA in the biological and physical sciences and engineering, not to mention economics?…it became the analysis tool of choice of the hordes of former physicists, chemists and mathematicians who unwittingly found themselves having to be statisticians in the computer age.

My initial theory for its popularity was simply that they were in love with the prefix eigen-, and felt that anything involving it acquired the cachet of quantum mechanics, where, you will recall, everything important has that prefix.

He gave the following eigen-’s: eigengenes, eigenarrays, eigenexpression, eigenproteins, eigenprofiles, eigenpathways, eigenSNPs, eigenimages, eigenfaces, eigenpatterns, eigenresult, and even eigenGoogle.

How many miracles must one witness before becoming a convert?…Well, I’ve seen my three miracles of exploratory data analysis, examples where I found I had a problem, and could do something about it using PCA, so now I’m a believer.

No need to mention that astronomers explore data with PCA and utilize eigen- values and vectors to transform raw data into more interpretable ones.

Tags: IMS bulletin, PCA, Terry Speed
Category: Algorithms, Quotes, Stat | Comment

[ArXiv] 2nd week, Apr. 2008

Apr 11th, 2008| 02:21 am | Posted by hlee

Markov chain Monte Carlo became the most frequent and stable statistical application in astronomy. It will be useful collecting tutorials from both professions. Continue reading ‘[ArXiv] 2nd week, Apr. 2008’ »

Tags: Classification, GRB, Hubble constant, K-S test, kurtosis, mask, maximum likelihood, SDSS, skewness, Solar Oscillation, Vicent Martinez
Category: arXiv, Bayesian, MCMC, Methods, Stat | 3 Comments

[ArXiv] use of the median

Apr 8th, 2008| 07:49 pm | Posted by hlee

The breakdown point of the mean is asymptotically zero whereas the breakdown point of the median is 1/2. The breakdown point is a measure of the robustness of the estimator and its value reaches up to 1/2. In the presence of outliers, the mean cannot be a good measure of the central location of the data distribution whereas the median is likely to locate the center. Common plug-in estimators like mean and root mean square error may not provide best fits and uncertainties because of this zero breakdown point of the mean. The efficiency of the mean estimator does not guarantee its unbiasedness; therefore, a bit of care is needed prior to plugging in the data into these estimators to get the best fit and uncertainty. There was a preprint from [arXiv] about the use of median last week. Continue reading ‘[ArXiv] use of the median’ »

Tags: breakdown point, mean, median, quantile
Category: arXiv, Frequentist, Stat, Uncertainty | Comment

[ArXiv] 1st week, Apr. 2008

Apr 6th, 2008| 11:10 am | Posted by hlee

I’m very curious how astronomers began to use Monte Carlo Markov Chain instead of Markov chain Monte Carlo. The more it becomes popular, the more frequently Monte Carlo Markov Chain appears. Anyway, this week, I added non astrostatistical papers in the list: a tutorial, big bang, and biblical theology. Continue reading ‘[ArXiv] 1st week, Apr. 2008’ »

Tags: Bible, big bang, FFT, IMF, microlensing, misnomer, model, NGC 602, power law, Stellar association, wavelet
Category: arXiv, Jargon, MCMC, Misc | Comment

[ArXiv] Pareto Distribution

Apr 3rd, 2008| 04:55 pm | Posted by hlee

Astronomy is ruled by Gaussian distribution with a Poisson distribution duchy. From time to time, ranks are awarded to other distributions without their own territories to be governed independently. Among these distributions, Pareto deserves a high rank. There is a preprint of this week on the Pareto distribution: Continue reading ‘[ArXiv] Pareto Distribution’ »

Tags: asteroid, citation, IMF, nebula, Pareto distribution, survival function, truncated
Category: arXiv, Cross-Cultural, Fitting, Stars, Stat | 4 Comments

Quote of the Date

Apr 1st, 2008| 12:46 pm | Posted by vlk

Really, there is no point in extracting a sentence here and there, go read the whole thing:

“Why I don’t like Bayesian Statistics”

- Andrew Gelman

Oh, alright, here’s one:

I can’t keep track of what all those Bayesians are doing nowadays–unfortunately, all sorts of people are being seduced by the promises of automatic inference through the “magic of MCMC”–but I wish they would all just stop already and get back to doing statistics the way it should be done, back in the old days when a p-value stood for something, when a confidence interval meant what it said, and statistical bias was something to eliminate, not something to embrace.

Continue reading ‘Quote of the Date’ »

Tags: Andrew Gelman, April, Bayesian, quote
Category: Bayesian, Quotes | Comment

Statistics is the study of uncertainty

Mar 30th, 2008| 11:16 pm | Posted by hlee

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from Continue reading ‘Statistics is the study of uncertainty’ »

Tags: client, Lindley, modeling, probability
Category: arXiv, Bayesian, Frequentist, Quotes, Uncertainty | 1 Comment

[ArXiv] 3rd week, Mar. 2007

Mar 21st, 2008| 06:20 pm | Posted by hlee

Markov chain Monte Carlo (MCMC) never misses a week from recently astro-ph. A book titled MCMC in astronomy will be a best seller. There are, in addition, very interesting non MCMC preprints. Continue reading ‘[ArXiv] 3rd week, Mar. 2007’ »

Tags: chi-sq, Fourier Analysis, GREAT08, lensing, likelihood, misnomer, Poisson noisy image, sparse
Category: arXiv, Cross-Cultural, Jargon, MCMC | Comment

[ArXiv] 2nd week, Mar. 2008

Mar 14th, 2008| 03:44 pm | Posted by hlee

Warning! The list is long this week but diverse. Some are of CHASC’s obvious interest. Continue reading ‘[ArXiv] 2nd week, Mar. 2008’ »

Tags: ANN, autocorrelation, Classification, cross-correlation, Estimation, Fisher information, lensing, LF, Model Selection, Pareto, signal processing, tessellation
Category: arXiv, MCMC | Comment