Survival Analysis: A Primer
Astronomers confront with various censored and truncated data. Often these types of data are called after famous scientists who generalized them, like Eddington bias. When these censored or truncated data become the subject of study in statistics, instead of naming them, statisticians try to model them so that the uncertainty can be quantified. This area is called survival analysis. If your library has The American Statistician subscription and you are an astronomer handles censored or truncated data sets, this primer would be useful for briefly conceptualizing statistics jargon in survival analysis and for characterizing uncertainties residing in your data.
Survival Analysis: A Primer by David A. Freedman
The American Statistician, May 2008, Vol. 62, No.2, pp. 110-119
This article explains the basics of survival analysis and adds criticisms on previously conducted studies. Since the given examples are from medical studies, astronomers may not be interested in reading the whole article. Nonetheless, Freedman offers the definitions in survival analysis such as survival function, hazard rate, the Kaplan-Meier estimator, the proportional hazard model with clarity and conciseness. For example, if τ (a positive random variable indicating the waiting time for failure) is Weibull, the hazard rate takes an exact form of the celebrated power law in astronomy (I think modification of pdfs reflecting censoring and truncation may lead more robust results compared to fitting power laws unless parameters in power laws have astrophysical implications and survival analysis approaches cannot perform the same parametrization).
Commonality between power laws and Pareto distributions and frequent appearance of power laws in astronomical journals drives some anticipation of frequent applications of survival analysis to astronomical data; on the contrary, there are not many.
Though there are more, here are a few references relevant to survival analysis, that utilized examples from astronomy or appeared astronomical journals:
- Nonparametric Methods for Doubly Truncated Data by B Efron and V Petrosian. (subscription required)
Journal of the American Statistical Association, Vol. 94, pp. 824-834 (1999) - Survival Analysis of the Gamma-Ray Burst Data by B Efron and V Petrosian. (subscription required)
Journal of the American Statistical Association, Vol. 89, pp. 452-464 (1994) - A simple test of independence for truncated data with applications to redshift surveys by B Efron and V Petrosian
ApJ, Vol. 399, pp.345-352 (1992) - Statistical methods for astronomical data with upper limits. I – Univariate distributions by Feigelson and Nelson
ApJ, Vol. 293, pp.192-206 (1985) - Nonparametric Estimation of the Slope of a Truncated Regression by Bhattacharya, Chernoff, and Yang (subscription required)
The Annals of Statistics, Vol. 11(2), pp. 505-514 (1983)
Note that these papers only dealt particular statistical interests with an general introduction about survival analysis and definitions of estimators based on relatively small sample size data sets. Facing massive survey data with truncation and heterogeneity in measurement errors in astronomy could open a new era of survival analysis.
Lastly, there are studies regarding Pareto distribution some of which are presented in the slog. (Use “search” with Pareto. More statistical papers on survival analysis in astronomy are welcome to be added; please, inform me.)
brianISU:
My Reliability professor considered this an extremely strong primer of life data analysis. I have been wanting to read it since my thesis work is on lifetime data.
07-08-2008, 8:43 pmhlee:
Thanks for your comment. I’m glad to know the primer is highly considered. It was an accidental discovery from scheming the American Statistician. There were quite tutorial like articles.
Apart from a thank you note, I must say Schechter function which is used for describing incompleteness and basically it takes the form of Gamma distribution. It’s not always true that astronomers only name errors (both uncertainty and bias) instead of modeling them.
07-08-2008, 11:36 pmvlk:
A clarification — the Eddington bias does not deal with censored data, if by that you mean a dataset that contains measurements and censoring. It does, however, describe how the distribution of a measured quantity is altered in the presence of detection thresholds, which in turn is the cause of censoring in a dataset.
07-09-2008, 10:46 amplanet facts:
The student I study with now deals with survival analysis and he is writing a book… dont got much details but im going to send him this post… very nice btw.
11-10-2008, 3:45 am