The AstroStat Slog

Spurious Sources

Sep 19th, 2007| 02:21 pm | Posted by vlk

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, “false sources” are generally in the domain of ~~Type II~~ Type I errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real?

In astronomical analysis, sources are generally defined with reference to the existing background, as point-fluctuations that exceed some significance threshold defined by the estimated background “in the vicinity”. The threshold is usually set such that we can tolerate “a few” false positives at borderline significance. But that ignores the effect of systematic deviations that can be caused by various instrumental features. Such things are common in X-ray images — window support structures, chip gaps, bad CCD columns, cosmic-ray hits, etc. Optical data are generally cleaner, but by no means immune to the problem. Barron et al. here describe how they have gone through the USNO-B catalog and have modeled and eliminated artifacts coming from diffraction spikes and telescope reflection halos of bright stars.

The bad news? More than 2.3% of the sources are flagged as spurious. Compare to the typical statistical significance at which the detection thresholds are set (usually >3sigma).

Tags: arXiv, catalog, diffraction spikes, false sources, instrumental features, Stars, USNO
Category: arXiv, Astro, Data Processing, Imaging, Optical, Stars, Uncertainty | Comment (RSS) | Trackback

2 Comments

hlee:

Type II error is claiming no signal when there is signal (failing to reject the null hypothesis when the alternative is true). Type I error is rejecting the null hypothesis when the null is true, i.e. detecting signal under no signal. The null hypothesis is a subset of combined hypotheses (union of null and alternative). I think no signal should be the null hypothesis and the existence of signal is the alternative. The other way around, the existence of signal is null and no signal is alternative, is an improper statement for hypothesis testing.

[Response: Thanks for the catch. That was a pyto. -vlk]

Setting 3� or 5� thresholds become important when you study the power of the test, defined by one minus the size of type II error, once you reject the null hypothesis, or say signal is significant. Besides many factors, power depends on the sample size. With the same rejection region of the null hypothesis based on 3� (smaller sample size) or 5� (larger sample size) thresholds, the power of larger sample is larger than the power of smaller sample; in other words, type II error is smaller with larger sample. Setting high number � helps to reduce type II error, false negative, or the chance of saying no signal when there is signal. Unfortunately, other factors also determine the power of the test so that a larger � threshold is not an optimal choice for a reliable source detecting rule, not to mention the cost of collecting large sample and systematic errors.
09-20-2007, 12:42 am
hlee:

I saw many students and some clients from consulting class were confused with how to set the null and alternative hypotheses, and defining type I and II errors accordingly. Hypothesis testing looks very arbitrary and most likely appears as a method to reject the null hypothesis by collecting data. I was not sure it was a pyto of typo, or a confusion between null and alternative (or type I and II), which led me to write about it.
09-21-2007, 1:03 pm

Spurious Sources

2 Comments

hlee:

hlee:

Leave a comment

Admin

Recent Posts

Recent Comments

Category Cloud

Blogroll

Links