[ArXiv] Matching Sources, July 11, 2007
From arxiv/astro-ph: 0707.1611 Probabilistic Cross-Identification of Astronomical Sources by Budavari and Szalay
As multi-wave length studies become more popular, various source matching methodologies have been discussed. One of such methods particularly focused on Bayesian idea was introduced by Budavari and Szalay with a demand for symmetric algorithms in a unified framework.
First, astrometric precision, varying due to instrumental effects and the nature of objects in multi-bands, is explained as well as Bayes factor for testing a hypothesis that multiple observations from various catalogs are from the same source. Then, the formula of calculating the Bayes factor, acquired from the spherical normal distribution is presented. However, the matching process is not a straightforward process of calculating the Bayes factor analytically derived from the spherical normal distribution particularly when the goal is finding a new objects with unknown spectral energy distributions, where physics squeezes in. Before summarizing, practical issues such as fast and efficient computation on multiple catalogs (sequentially adding a catalog to current n catalogs) and recursion formulas for evaluating the weight of evidence are given.
I want to quote a few sentences of my liking from the paper:
- Often Bayesian analysis is referred to as the calculus of belief; however, it should rather be thought of as the calculus of observational evidence.
- The Bayesian analysis is inherently recursive. As soon as we obtain new measurements, and compute the posterior probability, that becomes the prior for subsequent studies.
Also one sentence that I have some questions:
… it penalizes complicated hypotheses (with smaller prior probability [hlee: is this a generally true statement? Complicated model occupies larger parameter space by the dimensions although they mentioned that the hypothesis of separate sources occupies a more restricted parameter space. Why complicated model occupies this restricted parameter space?]) over simpler ones.
In model selection, BIC penalizes complicated model more, proportional to the number of parameters (also proportional to log sample size) but this comes from Laplace approximation, not the idea that a complicated model has small prior probability.
In addition, I want to raise a question that under the spherical normal distribution, what is the difference between using Bayes factor and classical hypothesis testing to test multiple observations from different wavelengths are from the same source? To my knowledge, numerous studies on multiple hypothesis testing for biological data (our counterpart could be testing on millions of sources across catalogs) have been available recent years and such frequentist approaches seem to be eligible for cross-matching problem applications.
Finally, I wish to add some references on Cross-Matching/Coincidence Assessment with the VO given by Tom Loredo from one of the SAMSI Surveys and Population Studies working group meetings (Please, note that his talk on the subject matter and other journal papers linked at the SAMSI AstroStat websites are password protected. SAMSI is an acronym of
Statistical and Applied Mathematical Sciences Institute).
hlee:
The draft update was told a while ago but I’ve been lazy to review both my posting and the update. Although I cannot comment on multiple testing for source matching and what are the actual changes from the first draft to the latest one, I can say that my doubt on penalizing complicated hypotheses over simpler ones is cleared. Yet, another question remains on the curse of dimensionality, independent of the paper. Not knowing cosmology (mass distribution) and computation for two hypotheses (H: all matching vs K:at least one is not matching), I wonder how Bayes factor will be executed (I saw a statement as follows: BF(H,K) larger than 10 strongly favors H; 2-10 mildly favors H, etc.). Are there any power related studies on source matching statistics? Although I don’t comprehend the history of astrometric source matching, I see the work by Budavari and Szalay will lay a corner stone to develop feasible and practical source matching statistics.
02-18-2008, 11:51 pm