chi-square distribution [Eqn]
The Χ2 distribution plays an incredibly important role in astronomical data analysis, but it is pretty much a black box to most astronomers. How many people know, for instance, that its form is exactly the same as the γ distribution? A Χ2 distribution with ν degrees of freedom is
p(z|ν) = (1/Γ(ν/2)) (1/2)ν/2 zν/2-1 e-z/2 ≡ γ(z;ν/2,1/2) , where z=Χ2.
Its more familiar usage is in the cumulative form, which is just the incomplete gamma function. This is where you count off how much area is enclosed in [0,Χ2) to tell at what point the 68%, 95%, etc., thresholds are met. For example, for ν=1,
∫0Z dx p(Χ2|ν=1) = 0.68 when Z=1.
This is the origin of the ΔΧ2=1 method to determine error bars on best-fit parameters.
aneta:
I guess in the typical analysis we call chi2 a “random variable” that follows the chi2 distribution:
chi2= Sum (D(i)-M(i))/var(i)
where D(i) is the observed data, M(i) is the model predicted data and we “silently” assume that D(i) is normally distributed and each i measurement is independent. We minimize this random variable when searching for the best model parameters that fit the data, but we rarely think about probabilities. However, the assumptions are not valid for many X-ray observation, as the number of the observed counts follows the Poisson distribution. Different weighting (choice of var) in this expression is used to overcome the problem when the collected data has a low number of counts. Properties of the chi2 distribution are well understood and this is why we are still using it in our analysis even in case of low counts number.
07-20-2008, 6:16 pmaneta:
of course chi2 equations needs the power of 2, so
Sum(D(i)-M(i))^2
07-20-2008, 6:52 pmvlk:
Thanks, Aneta, very useful comment. I would only add that we can of course minimize that statistic Sum{(D-M)^2/var)} to get best-fits (modulo biases) regardless of whether D are normally distributed, but to then follow on and relate the change in the statistic to error bars on the fitted parameters does require that the statistic be chisq distributed, with all the attendant baggage.
07-20-2008, 11:28 pmAlex:
It should also be noted that chi-square error bars and other second-degree statistics (ie, anything relating to squared quantities) are considerably less robust to non-normality than first-order statistics (ie, error bars on means). For example, while normal approximations for the distribution of the mean are usually quite good for n>30 (and certainly for n>100), chi-square and F statistics are often not distributed anywhere close to their nominal distributions for such sample sizes if the data is non-normal.
07-26-2008, 9:52 pm