What is so special about chi square in astronomy?
Since I start reading arxiv/astro-ph abstracts and a few relevant papers about a month ago, so often I see chi-square something as an optimization or statistical inference tool. Chi-square function, chi-square statistics, chi-square goodness-of-fit test are the words that serve different data analysis purposes but under the same prefix. As a newbie to statistics, although I learned chi-square distribution and chi-square test, doing statistics with chi-square are somewhat considered to be obsolete in terms of robust applications to modern data. These are introduced as one of many distributions and statistical tests. Nothing special. However, in astronomy, chi-square becomes the almost only method for statistical data analysis. I wonder how such strong bond between chi-square tactics and astronomer’s keen mind to data analysis has happened?
Beyond this historic question, one thing more bothers me is mixing chi-square function with chi-square distribution. The former is not necessarily chi-square distributed but it is practiced that once chi-square function is written, the variable within the function will have a confidence interval automatically according to chi-square distribution with degrees-of-freedom. No checking procedure for regularity conditions.
Statistically and astronomically, answers to my question lead to correcting my knowledge and erasing my prejudice. Vinay wrote about chi-square fitting. This certainly gives a better account for my question. Or Numerical Recipes to follow how chi-square methods are used. I welcome all kind lessons, advice, and references to have extended knowledge and a better perspective about the meaning of chi-square to astronomers.
vlk:
In response, perhaps I should ask, what is wrong with good old chi-square? As you point out, it is just a statistic, and has a meaning that is easily comprehensible even to statistically illiterate astronomers (to wit, the fractional deviations of the model from the data, relative to the inherent error, square added) — a very nice quantity to carry around in and of itself, without reference to its standard form distribution. That the distribution is also valid in a majority of the cases is a plus. With the advent of high speed computers, even when the chi-square function is not distributed as a chi-square distribution, it is fairly easy to calibrate it on a case by case basis. The ease of comprehension, ease of computation, and the wide applicability are significant advantages and outweigh the few cases where it is inapplicable. Admittedly, with the advent of telescopes like Chandra, those “few cases” are increasing. I agree that astronomers should pay attention to why they are using this particular function, but I should also say that what they are doing is usually not such a bad thing.
07-14-2007, 9:49 amhlee:
I never implied chi-square is wrong. Among so many statistics, I wonder how chi-square “almost” only caught astronomers’ eyes. I wanted to get rid of my prejudice by knowing what makes chi-square special to astronomer.
07-16-2007, 11:50 amhlee:
All doubts are originated from my interests in multi-modality of globular clusters. Astronomers used luminosity functions (LF), which is in general represented by a histogram because of binning, and they visually identified multi-modality to explain multiple epochs of star formation history of a galaxy. Then, later Ashman, Bird and Zepf (1994) introduced the likelihood approach to resolve/prove the bimodality problem statistically. However, as Protossov, et. al. (2002) pointed out, there are regularity conditions such as finite expectation and identifiability, and I’ve seen papers on globular clusters using likelihood ratio tests (LRT) for the hypothesis testing of say, 2 generations of clusters vs 3 generations of clusters with gaussian mixture models, where these regularity conditions are violated and LRT cannot be applied for such a hypothesis testing.
Not all statistics are robust until someone proved its robustness case by case. The part that annoys me about the chi-square is out of sudden, researchers said, “due to chi-square…” What if the conditions on those chi-square methods do not satisfy the nature of the data set as some astronomers used LRT where it cannot be applied for the hypothesis testing on their data sets but already applied to other data sets of the same object type satisfying the mathematical conditions? Because of its fame and convenience, for me at least (most?), people tend to use chi-square blindly.
07-16-2007, 2:06 pm