. : http://hea-www.harvard.edu/~bradw/cv/papers/BayesHR.ps.gz
: Mon Nov 27 20:25:21 2006
: Sat Dec 22 03:16:14 2007
:

:
BAYESIAN ESTIMATION OF HARDNESS RATIOS: MODELING AND COMPUTATIONS
Taeyoung Park, 1
Vinay L. Kashyap, 2
Aneta Siemiginowska, 2
David A. van Dyk, 3
Andreas Zezas, 2
Craig Heinke, 4
and Bradford J. Wargelin 2
Received 2005 December 24; accepted 2006 June 14
ABSTRACT
A commonly used measure to summarize the nature of a photon spectrum is the socalled hardness ratio, which
compares the numbers of counts observed in different passbands. The hardness ratio is especially useful to distinguish
between and categorize weak sources as a proxy for detailed spectral fitting. However, in this regime classical
methods of error propagation fail, and the estimates of spectral hardness become unreliable. Here we develop a rig
orous statistical treatment of hardness ratios that properly deals with detected photons as independent Poisson random
variables and correctly deals with the nonGaussian nature of the error propagation. The method is Bayesian in nature
and thus can be generalized to carry out a multitude of sourcepopulationYbased analyses. We verify our method with
simulation studies and compare it with the classical method. We apply this method to realworld examples, such as the
identification of candidate quiescent lowmass Xray binaries in globular clusters and tracking the time evolution of
a flare on a lowmass star.
Subject headinggs: methods: statistical --- stars: flare --- Xrays: binaries
Online material: color figures
1. INTRODUCTION
The shape of the spectrum of an astronomical source is highly
informative as to the physical processes at the source. However,
often detailed spectral fitting is not feasible due to various con
straints, such as the need to analyze a large and homogeneous
sample for population studies, or there being insufficient counts
to carry out a useful fit. In such cases, a hardness ratio, which re
quires the measurement of accumulated counts in two or more
broad passbands, becomes a useful measure to quantify and char
acterize the source spectrum. 5 A hardness ratio is defined as either
the ratio of the counts in two bands called the ``soft'' and ``hard''
bands or a monotonic function of this ratio. We consider three
types of hardness ratios,
simple ratio; R #
S
H
;
color; C # log 10
S
H
# # ;
fractional diAerence; HR #
H # S
H S
; 1
where S and H are the source counts in the two bands, called the
soft and hard passbands. The simple formulae above are mod
ified for the existence of background counts and instrumental
effective areas.
Spectral colors in optical astronomy, defined by the standard
optical filters (e.g., UBVRIJK; U # B, R # I , etc.), are well known
and constitute the main observables in astronomy. They have been
widely applied to characterize populations of sources and their
evolution. The hardness ratio concept was adopted in Xray as
tronomy for early Xray detectors (mainly proportional counter
detectors), which had only a limited energy resolution. The first
application of the Xray hardness ratio was presented in the Xray
variability studies with SAS 3 observations by Bradt et al. (1976).
Zamorani et al. (1981) investigated the Xray hardness ratio for a
sample of quasars observed with the Einstein Xray observatory.
They calculated each source intensity in two energy bands (1.2Y3
and 0.5Y1.2 keV ) and discussed a possible evolution of the hard
ness ratio (the ratio of both intensities) with redshift for their
Xray sample of 27 quasars. Similar ratios have been used in
surveys of stars ( Vaiana et al. 1981), galaxies (e.g., Kim et al.
1992), and Xray binaries (Tuohy et al. 1978) studied with Einstein
and earlier observatories. In the case of Xray binaries in partic
ular, they were used to define two different classes of sources
( Ztrack and atoll sources; Hasinger & van der Klis 1989) de
pending on their time evolution on HR versus HR diagrams.
Since then the concept of the Xray hardness ratio has been de
veloped and broadly applied in a number of cases, most recently
in the analysis of large data samples from the Chandra XRay
Observatory ( Weisskopf et al. 2000) and XMMNewton (Jansen
et al. 2001).
Advanced Xray missions such as Chandra and XMMNewton
allow for the detection of many very faint sources in deep Xray
surveys (see review by Brandt & Hasinger 2005). For example,
in a typical observation of a galaxy, several tens and in some
cases even hundreds of sources are detected, most of which have
fewer than 50 counts, and many have only a few counts. Similar
types of sources are detected in the ChaMP (Chandra Multi
wavelength Project) serendipitous survey (Kim et al. 2006). They
provide the best quality data to date for studying source popu
lations and their evolution. The most interesting sources are the
1 Department of Statistics, Harvard University, One Oxford Street, Cambridge,
MA 02138; tpark@stat.harvard.edu.
2 Smithsonian Astrophysical Observatory, 60 Garden Street, Cambridge, MA
02138; kashyap@head.cfa.harvard.edu, aneta@head.cfa.harvard.edu, azezas@
head.cfa.harvard.edu, bwargelin@head.cfa.harvard.edu.
3 Department of Statistics, University of California, 364 ICS Building One,
Irvine, CA 926971250; dvd@ics.uci.edu.
4
Northwestern University, 2131 Tech Drive, Evanston, IL 60208; cheinke@
northwestern.edu.
5
In the context of Chandra data, the passbands often used are called ``soft ''
(0.3Y0.9 keV ), ``medium'' (0.9Y2.5 keV ), and ``hard'' (2.5Y8.0 keV ) bands. Some
times the soft and medium bands are merged into a single band, also called the soft
band (e.g., 0.5Y2 keV ). In all cases, note that the energies refer to the nominal
detector PHA or PI channel boundaries and not to the intrinsic photon energies. The
choice of energy or PI channel range is flexible and situational.
A
610
The Astrophysical Journal, 652:610Y628, 2006 November 20
# 2006. The American Astronomical Society. All rights reserved. Printed in U.S.A.

sources with the smallest numbers of detected counts, because
they have never been observed before.
Furthermore, the number of faint sources increases with im
proved sensitivity limits; i.e., there are more faint sources in deeper
observations. Because these faint sources have only a few counts,
hardness ratios are commonly used to study properties of their
Xray emission and absorption (e.g., Alexander et al. 2001; Brandt
et al. 2001a, 2001b; Giacconi et al. 2001; Silverman et al. 2005).
With long observations, the background counts increase, so the
background contribution becomes significant and background
subtraction fails to work. Still, the background contribution must
be taken into account to evaluate the source counts.
The sensitivity of Xray telescopes is usually energy depen
dent, and thus the classical hardness ratio method can only be
applied to observations made with the same instrument. Usually
the measurements carry both detection errors and background
contamination. In a typical hardness ratio calculation the back
ground is subtracted from the data, and only net counts are used.
This background subtraction is not a good solution, especially in
the lowcounts regime (see van Dyk et al. 2001). In general, the
Gaussian assumption present in the classical method (see xx 1.1
and 3) is not appropriate for faint sources in the presence of a
significant background. Therefore, the classical approach to cal
culating the hardness ratio and the errors is inappropriate for low
counts. Instead, adopting a Poisson distribution as we do here
(see below), hardness ratios can be reliably computed for both
low and highcounts cases. The Bayesian approach allows us to
include the information about the background, difference in col
lecting area, effective areas, and exposure times between the
source and background regions.
Our Bayesian modelbased approach follows a pattern of ever
more sophisticated statistical methods that are being developed
and applied to solve outstanding quantitative challenges in empir
ical astronomy and astrophysics. Bayesian modelbased methods
for highenergy highresolution spectral analysis can be found,
for example, in Kashyap & Drake (1998), van Dyk et al. (2001),
van Dyk & Hans (2002), Protassov et al. (2002), van Dyk &
Kang (2004), Gillessen & Harney (2005), and Park et al. (2006).
More generally, the monographs on statistical challenges in modern
astronomy (Feigelson & Babu 1992, 2003; Babu & Feigelson
1997) and the special issue of Statistical Science devoted to
astronomy ( May 2004) illustrate the wide breadth of statistical
methods applied to an equal diversity of problems in astronomy
and astrophysics.
Here, we discuss the classical method and present the fully
Bayesian method for calculating the hardness ratio for counts
data. 6
1.1. The Classical Method
A conventional hardness ratio is calculated as set out in
equations (1), where S and H are the ``soft'' and ``hard'' counts
accumulated in two nonoverlapping passbands. In general, an
observation will be contaminated by background counts, and this
must be accounted for in the estimate of the hardness ratios. The
background is usually estimated from an annular region surround
ing the source of interest, or from a suitable representative region
on the detector that is reliably devoid of sources. The differences
in the exposure time and aperture area of source and background
observations are summarized by a known constant r, for which
the expected background counts in the source exposure area are
adjusted. With the background counts in the soft band (B S ) and
the hard band (B H ) collected in an area of r times the source re
gion, the hardness ratio is generalized to
R
S # B S =r
H # BH =r
;
C log 10
S # B S =r
H # BH =r
# # ;
HR
(H # B H =r) # (S # B S =r)
(H # B H =r) (S # B S =r)
: 2
The adjusted counts in the background exposure area are di
rectly subtracted from those in the source exposure area. The
above equations can be further modified to account for variations
of the detector effective areas by including them in the constant r,
in which case the constants for the two bands will be different. 7
Errors are propagated assuming a Gaussian regime, i.e.,
# R
S # B S =r
H # BH =r
######################################################### # 2
S # 2
B S
=r 2
(S # B S =r) 2
# 2
H # 2
B H
=r 2
(H # B H =r) 2
s ;
# C
1
ln (10)
######################################################### # 2
S # 2
B S
=r 2
(S # B S =r) 2
# 2
H # 2
BH =r 2
(H # BH =r) 2
s ;
# HR 2 h (H # BH =r) 2 (# 2
S # 2
B S
=r 2 )
(S # B S =r) 2 (# 2
H # 2
BH =r 2 ) i 1=2
; # (H # B H =r) (S # B S =r) # #2 ; 3
where # S , #H , # B S
, and # B H
are typically approximated with the
Gehrels prescription (Gehrels 1986)
# X # ################## X 0:75
p 1; 4
where X is the observed counts and the deviation from #### X
p is
due to identifying the 68% Gaussian (1 #) deviation with the
16thY84th percentile range of the Poisson distribution. In addi
tion to the approximation of a Poisson with a faux Gaussian dis
tribution, classical error propagation also fails on a fundamental
level to properly describe the variance in the hardness ratios.
Because R is strictly positive, its probability distribution is
skewed, and its width cannot be described simply by # R . The
fractional difference HR is better behaved, but the range limits
of #1; 1# cause the coverage rates (see x 3) to be incorrect. The
color C is the best behaved, but the error range will be asymmet
rical when the errors are large. Moreover, it has been demon
strated (van Dyk et al. 2001) that using backgroundsubtracted
source counts (which is the case in the method outlined above)
gives a biased estimate of the source intensity, especially for very
faint sources. Furthermore, although we can account for gross
differences in the detector sensitivity between observations, this
is not done in a statistically correct manner, but instead simply by
rescaling the backgroundsubtracted counts in each band using
additional precomputed correction factors. Finally, the classical
hardness ratio method cannot be applied when a source is not
6 A Fortran and Cbased program that calculates the hardness ratios fol
lowing the methods described in this paper is available for download from http://
heawww.harvard.edu/AstroStat / BEHR /.
7
Gross differences in detector sensitivity, e.g., between different observations
or for sources at different offaxis angles, can be accounted for with additional
multiplicative factors modifying the backgroundsubtracted counts. This is, how
ever, not statistically meaningful and can lead to unreliable estimates and error
bars in the small counts regime. Instead, we apply such scaling factors directly to
the source intensities (see eq. [5]).
HARDNESS RATIOS IN POISSON REGIME 611

detected in one of the two bands (which is quite common, since
CCD detectors are more sensitive in the soft band ).
An alternative method based on the median (or some quantile)
of the distribution of the energies of the detected counts in a
source was suggested by Hong et al. (2004). This method can be
very powerful for classifying faint sources, but is not suited for
comparisons of spectral properties between observations with dif
ferent instruments. In this method, because the spectral parameter
grids are highly nonlinear and multivalued in quantilequantile
space, it is also necessary that the spectral shape of the source be
ing analyzed be known in order to interpret the numerical values.
We have therefore developed fully Bayesian approaches to
appropriately compute hardness ratios and their errors, by prop
erly accounting for the Poissonian nature of the observations.
The value of our method is heightened in several astrophysically
meaningful cases. In x 2 we describe the model structure we adopt.
In x 3 we carry out simulations to compare our method with the
classical method. In x 4 we outline various applications of our
method; we demonstrate its use in identifying candidate quies
cent lowmass Xray binary sources in a globular cluster and in
studying the evolution of a stellar Xray flare. In x 5 we discuss
some nuances of our method, such as the adopted prior distribu
tions, as well as the advantages and limitations. A summary of
the method and its advantages is presented in x 6. A detailed
mathematical derivation of the posterior probability distributions
of the hardness ratios and the computational methods we use to
calculate them are described in Appendix A, and a detailed com
parison of the effects of priors is described in Appendix B.
2. MODELING THE HARDNESS RATIOS
2.1. The Poisson Case
Hardness ratios are often used to describe faint sources with
very few counts; it is not uncommon for either or both of the soft
and hard counts to be small enough that the source is undetected
in that band. The classical method (x 1.1) generally fails to pro
duce reliable estimates (or upper and lower limits) in such cases,
and indeed, it is sometimes not even possible to apply it, such as
when the background is large and simple subtraction results in a
negative estimate for the source counts. Thus, instead of using
Gaussian assumptions on the detected counts, we directly model
them as an inhomogeneous Poisson process.
Since the observed counts are the sum of counts due to the
source and the background, and because the sum of two indepen
dent Poisson random variables is a Poisson random variable with
the sum of two Poisson intensities, we may write 8
S # Poisson(e S (k S # S ));
H # Poisson(e H (k H # H )); 5
where S and H are independent data points, k S and k H are the
expected soft and hard source counts intensities, # S and # H are
the expected soft and hard background counts intensities, and e S
and e H are correction factors that take into account variations in
effective areas, exposure times, and other instrumental effects. 9
The observed background counts are also modeled as indepen
dent Poisson random variables,
B S # Poisson re S # S
; BH # Poisson re H # H
6
where # is scaled by a known correction factor r, to account for
differences in source and background areas and sensitivities.
We implicitly assume throughout that the observed data are dis
crete photon events, or counts. The intensities k and # may have
other units, determined by how the instrument efficiency is sup
plied. Thus, if e S is given in counts s cm 2 photon #1 , then k S will
have units of photons s #1 cm #2 . Alternately, e S could describe
the exposure time in seconds, in which case k S will have units of
counts s #1 . In the case in which k S and k H themselves have units
of counts, e S and e H are dimensionless and usually taken to be
unity. This allows us to compute hardness ratios of fluxes and
count rates as well as counts. We place no restriction on the units
of the model intensities (other than that k and # have the same
units); they are entirely determined by the problem at hand. Thus,
our analysis is valid for any instrumental configuration or com
bination, and the only requirement is that the observed data be
described by a Poisson distribution.
Given the expected source counts intensities (k S and k H ), it is
legitimate for the hardness ratio to be rewritten as
simple ratio; R
k S
k H
;
color; C log 10
k S
k H
# # ;
fractional diAerence; HR kH # k S
kH k S
: 7
These quantities are characteristics of the spectrum in ques
tion, rather than of the observed data, like the quantities in equa
tion (2). As such, the rewritten hardness ratio is of more direct
scientific interest. Note also that while these quantities are de
fined entirely in terms of expected source intensities in the dif
ferent passbands, the background is implicitly taken into account
via direct modeling, as in equations (5) and (6).
2.2. Bayesian Approach
As a new method of evaluating hardness ratios, we adopt
a Bayesian approach. Bayesian analysis is explained in detail
in numerous articles in the astronomical literature (e.g., Gregory
& Loredo 1992; Kashyap & Drake 1998; van Dyk et al. 2001;
Kashyap et al. 2002; Protassov et al. 2002), and here we describe
it only in rudimentary terms. In this approach, the prior knowl
edge for a parameter [the ``prior probability distribution,'' e.g.,
p(k S ; # S )] is combined with the information in the observed data
[the ``likelihood,'' e.g., p(S; B S jk S ; # S )] to derive the probability
distribution of the model parameters [the ``posterior distribu
tion,'' e.g., p(k S ; # S jS; B S )] via Bayes's theorem, 10 i.e.,
p(k S ; # S jS; B S )
p(k S ; # S )p(S; B S jk S ; # S )
R R p(k S ; # S )p(S; B S jk S ; # S ) d# S dk S
; 8
8
Note that throughout this paper, Greek letters indicate unobserved quantities
and Roman letters denote observed data values (except for R, C, and HR, which
are used to represent the hardness ratio model parameters in the Poisson case).
9 Note that in general the effective area changes within an adopted passband,
so the chosen values of the correction factor are averages that are calculated after
making some prior assumptions about the underlying source spectrum. Their
values may change depending on the analysis results.
10 The notation p(AjB) is to be read as ``the probability that A is true given that
B is true.'' Thus, except for prior distributions, all the probability distributions
that we use are conditional distributions.
PARK ET AL.
612 Vol. 652

and similarly for p(k H ; # H jH ; B H ). Because the posterior dis
tribution in equation (8) needs only to be computed up to a nor
malizing constant, we often deal with an unnormalized posterior
density, i.e., p(k S ; # S jS; B S ) / p(k S ; # S )p(S; B S jk S ; # S ). Bayesian
statistical inferences for a parameter are based on the posterior
distribution, so we consider its various summary statistics; see
x 2.2.1 for details.
Because the underlying likelihood functions are Poisson dis
tributions, we assign socalled conjugate #prior distributions for
both source and background intensities (van Dyk et al. 2001); a
conjugate prior distribution ensures that a posterior distribution
follows the same parametric form as the prior distribution, e.g., a
#distribution is a conjugate prior for a Poisson likelihood. 11 That
is, we assign independent #prior distributions for the source and
background counts k and #,
k S # #( S 1 ; S 2
); k H # #( H 1 ; H 2
); 9
# S # #( S 3 ; S 4
); # H # #( H 3 ; H 4
); 10
where the values of are calibrated according to our prior
knowledge, or lack thereof, about the parameters; see x 5.1 for
our discussion on choosing a prior distribution.
We describe the application of Bayes's theorem and the der
ivation of the posterior probability distribution functions for the
interesting parameters in Appendix A. Here, we note that since
k S and k H are independent of each other, their joint posterior
probability distribution is the product of the individual marginal
posterior distributions, 12 written as (see eq. [A8])
p(k S ; kH jS; H ; B S ; BH ) p(k S jS; B S )
; p(k H jH ; BH ): 11
Then, the posterior distribution of each type of hardness ratio
can be computed by transforming k S and k H to the appropriate
variables and marginalizing over the resulting nuisance variable,
as follows:
p(RjS; H ; B S ; BH ) dR
dR Z kH
dkH kH p(RkH ; kH jS; H ; B S ; BH ); 12
p(CjS; H ; B S ; B H ) dC
dC Z kH
dkH kH 10 C ln (10)p(10 C
kH ; kH jS; H ; B S ; BH ); 13
p(HRjS; H ; B S ; B H ) dHR
dHR Z !
d!
!
2
p # (1 #HR)!
2
;
(1 HR)!
2 # # #S; H ; B S ; BH # ;
14
with ! k S k H . The integrals can be computed using either
Monte Carlo integration or Gaussian quadrature (throughout this
paper, we use the term ``quadrature'' to refer exclusively to the
latter). We use both methods of integration, because neither has
the clear advantage over the other in our case (see x 5.4 and
Appendix A).
2.2.1. Finding Point Estimates and Error Bars
Bayesian statistical inferences are made in terms of probabil
ity statements, and thus we consider various summary statistics
for a posterior distribution. Here we briefly describe the standard
summaries of location and methods of computing error bars; a
fuller description along with detailed examples can be found in
Park et al. (2006). Commonly used summaries of location are the
mean, median, and mode(s) of the distribution: the posterior
mean is the posterior expectation of the parameter, the posterior
median is the point that divides the posterior probability evenly
such that 50% of the probability resides below and above its
value, and the posterior mode is the most likely value of the
parameter given the data.
When a posterior simulation of size N is obtained with a
Monte Carlo sampler (see x A1), the posterior mean is the sim
ple average of the sample draws, and the posterior median is the
[N/2]th draw after sorting the draws in increasing order. To com
pute the posterior mode, we repeatedly bisect the Monte Carlo
draws and choose the half with more draws until the range of the
chosen half is sufficiently narrow. The midpoint of the resulting
narrow bin approximates the posterior mode.
With quadrature (see x A2), we can obtain equally spaced
abscissas and the corresponding posterior probabilities. Thus,
the posterior mean is computed as the sum of the product of the
abscissas with their probabilities (i.e., the dot product of the vec
tor of the abscissa with the corresponding probability vector).
The posterior median is computed by summing the probabilities
corresponding to the ordered abscissas one by one until a cumu
lative probability of 50% is reached. The posterior mode is sim
ply the point among the abscissas with the largest probability.
Unlike with the point estimates above, there is no unique or
preferred way to summarize the variation of a parameter. Any
interval that encompasses a suitable fraction # of the area under
the probability distribution qualifies as an estimate of the varia
tion. Of these, two provide useful measures of the uncertainty. The
equaltail posterior interval, which is the central interval that corre
sponds to the range of values above and below which lies a frac
tion of exactly # /2 of the posterior probability, is a good measure
of the width of the posterior distribution. The highest posterior
density (HPD) interval, which is the range of values that contain a
fraction 1 # # of the posterior probability and within which the
probability density is never lower than that outside the interval,
always contains the mode and thus serves as an error bar on it.
For a symmetric, unimodal posterior distribution, these two
posterior intervals are identical. The equaltail interval is invari
ant to onetoone transformations and is usually easier to compute.
However, the HPD interval always guarantees the interval with
the smallest length among the intervals with the same posterior
probability. Neither of these intervals is necessarily symmetric
around the point estimate; i.e., the upper and lower bounds may
be of different lengths. For consistency, we refer to such intervals
as ``posterior intervals''; others also refer to them as confidence
intervals 13 or credible intervals.
11
Formally, these are semiconjugate prior distributions in that they are
conjugate to a particular conditional distribution. For example, the #prior dis
tribution on k S is conjugate to the Poisson distribution of the source counts among
the observed soft counts, denoted # S in x A1. The #(# ; # ) distribution is a
continuous distribution on the positive real line with probability density func
tion p(X ) 1/#(# )# # X ##1 e ##X and has a mean of # /# and a variance of # /# 2
for # ; # > 0.
12 The marginal posterior distribution of interest is computed by integrating
the socalled nuisance parameters out of their joint posterior distributions. For
example, p(k S jS; B S ) R p(k S ; # S jS; B S ) d# S .
13 Technically, confidence intervals are defined in terms of how frequently
they contain the ``true'' value of the estimand, rather than in terms of a posterior
probability distribution.
HARDNESS RATIOS IN POISSON REGIME 613
No. 1, 2006

For a Monte Carlo simulation of size N, we compute either the
equaltail interval or an interval that approximates the HPD inter
val. ( Unless otherwise stated, here we always quote the equal
tail interval for the Monte Carlo method and the HPD interval for
the quadrature.) The equaltail posterior interval in this case is
computed by choosing the [(#/2)N ]th and the (1 # # /2)N #th
draws as the boundaries. An approximate HPD interval is derived
by comparing all intervals that consist of the [X ]th and the X
(1 # # )N #th draws and choosing that X which gives the shortest
length among them.
When the posterior density is computed by the quadrature,
we split parameter space into a number of bins and evaluate the
posterior probability at the midpoint of each bin. In this case, a
100(1 # # )% HPD interval can be computed by beginning with
the bin with the largest posterior probability and adding addi
tional bins down to a given value of the probability density until
the resulting region contains at least a fraction 1 # # of the pos
terior probability.
3. VERIFICATION: SIMULATION STUDY
In order to compare the classical method with our Bayesian
method, we carried out a simulation study to calculate coverage
rates of the classical and posterior intervals. Given prespecified
values of the parameters, source and background counts were
generated and then used to construct 95% classical and posterior
intervals of each hardness ratio using the methods discussed in
this article. From the simulated data we calculated the proportion
of the computed intervals that contain the true value of the corre
sponding hardness ratio. (This is the coverage rate of the classical
and probability intervals.) In the ideal case, 95% of the simulated
intervals would contain the true value of each hardness ratio. Be
sides the coverage rate, the average length of the intervals and
the mean square error of point estimates were also computed and
compared. The mean square error of the point estimate
# of # is
defined as the sum of the variance and squared bias for an estimator,
i.e., MSE(
#) E((
# # #) 2 ) Var(
#) E(
#) # ## 2
. A method
Fig. 1.--- Comparison of the shape of the posterior distribution for hardness ratios R , C, and HR with the classical Gaussian approximation for high counts (case I; left
column) and for low counts (case II; right column). The solid lines represent the posterior distributions of the hardness ratios, and the dashed lines, a Gaussian distribution
with the mean and standard deviation equal to the classical estimates. A prior distribution that is flat on the real line (# 1; see x 5.1) is adopted for the Bayesian
calculations. Note that as the number of counts increases, the two distributions approach each other. At low counts the two distributions differ radically, with the classical
distributions exhibiting generally undesirable properties ( broad and extending into unphysical regimes).
PARK ET AL.
614 Vol. 652

that constructs shorter intervals with the same coverage rate and
produces a point estimate with a lower mean square error is gen
erally preferred. The entire simulation was repeated with differ
ent magnitudes of the source intensities, k S and k H . Intrinsically,
we are interested in the following two prototypical cases:
case I, hardness ratios for high counts, i.e.,
k S kH 30; # S # H 0:1; 15a
case II, hardness ratios for low counts, i.e.,
k S kH 3; # S # H 0:1: 15b
In both cases we adopt a background area to source area ratio
of r 100 (see eq. [6]); i.e., we take the observed counts in
the background region to be 10. Note that these are written with
reference to the counts observed in the ``source region,'' i.e., the
units of k S , k H , # S , and # H are all counts (source area) #1 , and that
we have set e S e H 1 here. The actual extent of the source
area is irrelevant to this calculation.
This simulation study illustrates two typical cases, i.e., high
counts and lowcounts sources: case I represents highcounts
sources for which Poisson assumptions tend to agree with Gaussian
assumptions, and case II represents lowcounts sources, where
the Gaussian assumptions are inappropriate. In case I, the poste
rior distributions of the hardness ratios agree with the corre
sponding Gaussian approximation of the classical method, but in
case II, the Gaussian assumptions made in the classical method
fail. This is illustrated in Figure 1, where we compare the two
methods for a specific and particular simulation: we assume S
H 30 in case I and S H 3 in case II, compute the result
ing posterior distributions of hardness ratios using the Bayesian
method, and compare them to a Gaussian distribution with a
mean and standard deviation equal to the classical estimates. The
right panels in Figure 1 show that there is a clear discrepancy
between the two methods.
In order to compare the statistical properties of estimates
based on the Bayesian and classical methods, we simulate test
data under both case I and case II. Table 1 presents the results
of 200 simulated sources for each of the two cases. The data
are used to compute point estimates and 95% error bars by
using the classical method, Monte Carlo sampler, and nu
merical integration by quadrature. The Bayesian methods use
noninformative prior distributions on k and #; in particular,
k ##(1; 0) and # ##(0:5; 0) (see x 5.1). The results of case I
indicate that all three methods are comparable, although the
quadrature yields slightly shorter intervals and point estimates
with a smaller mean square error. Figure 2 illustrates that the
resulting intervals from the methods are almost identical. On
the other hand, the results of case II appear in Figure 3 and
confirm that the Bayesian methods outperform the classical
method, because the mean lengths of the classical intervals are
wider than the posterior intervals, and the classical estimates
exhibit a much larger mean square error. For example, the hard
ness ratio HR by definition is limited to the range #1; 1#, so
that the maximum length of the intervals should be two. How
ever, the mean length of simulated classical intervals is 3.7,
which is clearly larger than needed. Since quadrature tends to
yield estimates with smaller mean square errors and shorter
posterior intervals while maintaining high coverage rates, it is
generally preferred to the Monte Carlo method. However, this
comes at a price: because the summation inside the posterior
density is over the detected source counts, quadrature is com
putationally more expensive as the source counts increase; on
the other hand, the Monte Carlo method is much faster regardless
of the number of source counts. Thus, we recommend using quad
rature with relatively low counts data (e.g., S < 20 or H < 20)
and the Monte Carlo method with higher counts.
Because the Bayesian inference for each hardness ratio is
based on its posterior distribution, we consider these two point
estimates: the posterior mode and the posterior mean. Table 1
presents a comparison of the two proposed estimates in terms of
their mean square errors. In the case of HR, the posterior mean
seems to be more robust to the shape of a posterior distribution
than a posterior mode, because both left and right limits of HR
are bounded. For R and C, on the other hand, the posterior mode
TABLE 1
Statistical Properties of Bayesian and Classical Methods.
Mean Square Error a
Case Method Hardness Ratio
Coverage Rate
(%) Mean Length Of Mode Of Mean
Case I ( high counts) .......................... Classical method R 95.0 1.24 0.065
C 99.0 0.54 0.013
HR 98.5 0.61 0.016
Monte Carlo method R 96.0 1.07 0.056 # 0.069
C 96.0 0.44 0.012 # 0.012
HR 96.0 0.49 0.016 0.015 #
Quadrature R 94.5 1.03 0.057 # 0.069
C 96.0 0.43 0.012 # 0.012
HR 94.5 0.49 0.016 0.015 #
Case II ( low counts) .......................... Classical method R 97.5 192.44 93.27
C 100.0 6.02 0.27
HR 100.0 3.70 0.21
Monte Carlo method R 98.0 15.77 0.328 # 85.482
C 98.0 1.54 0.078 # 0.113
HR 98.0 1.26 0.181 0.083 #
Quadrature R 97.0 8.18 0.394 # 20.338
C 99.5 1.51 0.074 # 0.112
HR 95.0 1.23 0.187 0.083 #
a The quantity with the lower mean square error is marked with an asterisk; this shows that the mode is the robust estimator for the color and the
simple ratio, while the mean is preferred for the fractional difference.
HARDNESS RATIOS IN POISSON REGIME 615
No. 1, 2006

appears to perform better. Thus, the posterior mode is preferable
for both R and C, and the posterior mean, for HR, as shown in
Table 1.
4. APPLICATIONS
4.1. Quiescent LowMass XRay Binaries
Lowmass Xray binaries (LMXBs) are binary systems com
posed of a neutron star or black hole and a lowmass (<1Y2 M # )
donor. These binaries are generally transient sources and spend
most of their time in quiescence (qLMXBs). However, even when
not active they emit Xrays due to thermal emission from the
surface of the neutron star, and sometimes due to an additional
hard component associated with residual lowlevel accretion (e.g.,
Campana et al. 1998; Brown et al. 1998). Identification and study
of these quiescent systems is very important, since they provide a
direct census of the overall LMXB population (and therefore the
population of neutron stars) and also constrain their duty cycles.
Although only the identification of their optical counterparts can
unambiguously classify an Xray source as a qLMXB, soft Xray
spectra or hardness ratios consistent with those expected for ther
mal emission by neutron stars are very useful for identifying can
didate qLMXBs (Heinke et al. 2003).
We apply the methods described in x 2 on a long (40 ks)
Chandra ACISS (Advanced CCD Imaging Spectrometer) ob
servation of the globular cluster Terzan 5 (ObsID 3798; PI:
Fig. 2.---Simulated range of hardness ratios calculated using three different methods in the highcounts case (case I; see eq. [15a]), the classical method (left),
Monte Carlo method (middle), and quadrature (right), and for the three types of hardness ratios, R (top), C (middle), and HR (bottom). The horizontal lines are the
95% intervals computed for each simulated set of counts, and the vertical white line in each panel represents the true value of the hardness ratio. Note that all the
different methods exhibit similar performances in this case.
PARK ET AL.
616 Vol. 652

R. Wijnands). A total of 49 sources were detected within the
halfmass radius of the cluster (for details on the observation,
reduction, and analysis, see Heinke et al. 2006). We performed
the source photometry in the 0.5Y2.0 keV (soft) and 2.0Y6.0 keV
( hard ) bands. In Figure 4 we present a colorluminosity diagram
of all the detected sources (note that the color here is scaled by
a factor of 2.5 to be similar to the definition of an optical color
such as B # V ). The posterior modes of the color for all detected
sources, computed here using the Jeffrey's prior (# 1/2), along
with the 95% highest posterior density intervals, are shown.
Those sources that are not amenable to analysis via the classical
method (weak sources with backgroundsubtracted source counts
in at least one band being <5) are marked with open circles,
while the remainder are marked with filled circles.
Thermally emitting neutron stars are expected to have Xray
colors in the range 0:5; 2:5# (indicated by dashed lines in the
figure). The softer limit (dashed curve to the right; color #2)
corresponds to a purely thermal emission spectrum, and the
harder limit (dashed curve to the left; color #0.5) corresponds to
a 20% contribution from a power law of index 1.5. From this
figure it is clear that the Bayesian method, apart from providing
more accurate confidence intervals for the source colors, allows
us to estimate hardness ratios for the sources that were unde
tected in one of the two bands. We find 15 sources whose error
bars overlap this region and are thus candidate qLMXBs. Note
that eight of these candidates could not have been so classified
using the classical method, because there are too few source
counts. Conversely, many sources that lie at the edge of the
Fig. 3.---Same as Fig. 2, but for a simulated range of hardness ratios in the lowcounts case (case II; eq. [15b]). In this, the lowcounts case, the Bayesian methods
dramatically outperform the classical method.
HARDNESS RATIOS IN POISSON REGIME 617
No. 1, 2006

region of interest would have been mistakenly classified as
candidate qLMXBs with the classical method if the overly large
error bars it produces had been used (see discussion of coverage
rates in Table 1).
4.2. Evolution of a Flare
In the previous section (x 4.1) we demonstrated the usefulness
of using the full Bayesian machinery in classifying weak sources
in comparison to the classical method. Here, we go further and
show that new avenues of analysis and interpretation are opened
up when we take advantage of the full probabilistic description
available to us. Such analysis methods are difficult to implement
correctly and generally cannot be justified under the classical
method. As a specific illustrative example, we consider the time
evolution of a large Xray flare on a lowmass dwarf, Ross 154
( Wargelin et al. 2006).
Stellar coronae consist of magnetically confined hot plasma
(T #1Y10 MK ) in the outer atmospheres of latetype stars and
emit optically thin thermal emission composed of bremsstrah
lung continuum and atomic line emission components. Detailed
analysis of these linerich spectra yields valuable clues regarding
the temperature and emission structure on the star, its compo
sition, and the processes that drive coronal heating (see, e.g.,
Rosner et al. 1985). Of special importance are flares, which result
from impulsive energy input into the corona by the reconnection
of highly strained magnetic flux tubes and provide additional
information on the dynamics of the corona. Analysis of flare
spectra during decay yields constraints on the physical site and
size of the flare loops and on the heating process (see, e.g., van
den Oord &Mewe 1989; Pallavicini et al. 1990; Serio et al. 1991;
Schmitt & Favata 1999).
A useful technique developed explicitly to analyze hydrody
namically evolving loops is to track their evolution in density
temperature space ( Reale et al. 1993, 1997, 2004). This makes
possible the modeling of flares to derive physically meaningful
parameters such as the length of the loop and the heating func
tion. Unfortunately, a comprehensive analysis along these lines
requires that complex model spectra be fit to the data at each
stage of the flare evolution, and because fitting requires a large
number of counts (generally >500) to produce reliable results,
the resolution at which the flare evolution can be studied is
strongly limited. This is especially so during the rising phase of
flares, where the physically most interesting processes are of
short duration and the observed counts are also small. However,
hardness ratios can be constructed at relatively finer time binning
resolution and can be used as a proxy for the spectral fit pa
rameters. In the following, we apply the Bayesian method de
scribed above (see x 2) to Chandra data of a stellar Xray flare
and demonstrate its usefulness.
A large flare was observed on Ross 154, a nearby (2.93 pc)
active M dwarf ( M4.5 Ve), during a 57 ks observation with the
Chandra ACISS detector (ObsID 2561; PI: B. Wargelin). Dur
ing the flare, the counting rate increased by over a factor of 100.
Because of strong pileup in the core of the PSF ( pointspread
Fig. 4.---Xray ``colormagnitude'' diagram for globular cluster Terzan 5. For each of the detected sources, a form of color C log 10 (k S /k H ), scaled by a factor
2.5 for similarity with optical colors, is plotted along the abscissa, and the log 10 ( luminosity ergs s #1 #) in the 0.5Y6 keV band, along the ordinate. The modes of the
posterior probability functions (computed using quadrature) for all sources are shown, along with the 95% confidence intervals (the error bars on the luminosity are
not plotted). The sources that would result in unreliable estimates if the color were computed with the classical method ( backgroundsubtracted source counts <5 in
at least one band) are marked with open circles, and the remainder are marked with filled circles. The dashed lines represent the colorluminosity tracks expected for
neutron stars with blackbody spectra of different temperatures [the open triangles correspond to temperatures of log 10 (T K#) 6:3, 6.2, 6.1, and 6.0 from top to
bottom] and varying contributions from a powerlaw component (slope # 1:5; pure blackbody for the rightmost curve, and 10% and 20% contributions from a
powerlaw component in the 0.5Y6 keV band luminosity for the middle and left curves, respectively). For more details on these spectral models, see Heinke et al.
(2006, in preparation). [See the electronic edition of the Journal for a color version of this figure.]
PARK ET AL.
618 Vol. 652

function), only those data within an annular region surrounding
the center can be used. ( Details of the reduction and analysis are
given in Wargelin et al. 2006.) The light curve of the source
during the flare is shown in Figure 5, along with light curves for
smaller passbands (soft, 0.25Y1 keV; medium, 1Y2 keV; and
hard, 2Y8 keV ). Note that the hard band light curve peaks at an
earlier time than the softer bands, an impression that is borne out
by the evolution of the color, C ( Fig. 5, bottom).
The same data can also be represented in a colorintensity
diagram, as in Figure 6, which shows scatter plots of the color
and intensity at each time bin along with their computed errors
(connected by a thin line to indicate the time sequence). The
advantage of doing so is that color is inversely correlated with
plasma temperature, and the intensity approximately tracks the
plasma density. 14
A hysteresislike evolution is discernible in the
colorintensity track, even at a time resolution as small as 40 s.
However, the error bars on the data points are large enough to
obscure this structure at small time bin sizes, whereas at larger
bin sizes, the number of data bins is smaller and it becomes
difficult to discern the totality of the evolutionary track, even
though the pattern is persistent as bin size and starting times are
varied. Mainly, the choices of the bin size and the phase (i.e.,
the starting point) are completely arbitrary, and the choice of
no single value can be properly justified. However, Bayesian
analysis provides a simple method to work around this difficulty,
by simply considering the bin size as a nuisance parameter of the
problem and marginalizing over it.
Thus, in order to ameliorate the effects of choosing a single
time bin size and a starting time, we have carried out a series of
calculations using the Monte Carlo method: we obtain different
light curves by varying the phase of the binning (i.e., the starting
time), and for each light curve thus obtained, we obtain 2000
samples of (k S ; kH ) for each data point. We hold the time bin size
fixed during this process. After cycling through all the phases,
the samples of (k S ; kH ) are all combined to form a single image
(the shaded images in Fig. 6). This procedure, also called ``cycle
spinning'' ( Esch 2003), allows us to discern the pattern of the
evolutionary track clearly. The longer the source spends at any
part of the colorintensity diagram, the darker the shade in the
images, and the statistical error in the calculation is reflected by
the gradient of the shading. Furthermore, all the cyclespun im
ages thus constructed, for bin sizes ranging from 40 to 400 s,
were then averaged to produce a single coherent image, thus
effectively marginalizing over the bin sizes; such an image in
cludes all the errors associated with the analysis ( Fig. 7).
Note that this type of analysis is not possible using the clas
sical method. For instance, in the Bayesian case the samples are
drawn from the posterior probability distributions of S and H. No
such sampling technique is available in the classical case, 15 and
naively using the Gaussian approximations as in equation (4)
will overestimate the errors and wash out the signal (see x 3).
Fig. 5.---Top: Light curves of Ross 154. The observed count rate in the source is shown as a function of time since the beginning of the observation, for counts in
the soft (dotted histogram), medium (dashed histogram), and hard (dashdotted histogram) bands and the total (solid histogram). A bin size of 150 s was chosen for
convenience, and the sequence of bins is marked along the light curve for future reference. Bottom: Hardness ratio evolution of Ross 154. The color, C log 10 S/H ,
is plotted for a bin size of #t 150 s (solid histogram) and #t 40 s (dotted histogram), along with the associated error bars. Larger values of C indicate a softer
spectrum, and vice versa. [See the electronic edition of the Journal for a color version of this figure.]
14
Xray luminosity LX / EM;P(T ), where EM n 2
e V is the emission
measure of coronal plasma with an electron density n e occupying a volume V
and P(T ) is the power emitted by the plasma at temperature T. The emissivity
P(T ) is a weak function of T over the temperature range 6Y20 MK expected in
this source, and assuming that the volume of emission remains constant, the
observed intensity is proportional to the square of the electron density. For a more
detailed modeling of the L X T conversion, see Wargelin et al. (2006).
15 New counts realizations can be obtained by bootstrapping from the
observed counts, but such sampling will be quantized and will render the
shaded images unusable at lower count intensities.
HARDNESS RATIOS IN POISSON REGIME 619
No. 1, 2006

Furthermore, the concept of marginalizing over the nuisance
parameter of time bin sizes has no analog in the classical system
and cannot be justified, and cycle spinning leads to correlations
between the estimates at different bins since the statistical in
dependence of the data will be lost.
A sharp initial hardening of the spectrum, coincident with the
onset of a small optical flare ( Wargelin et al. 2006), is visible
at the beginning of the time sequence (points 1Y3; see Figs. 5
and 6). A point worth noting is that the Bayesian analysis prevents
overinterpretation of the results: at time point 3, the standard
analysis indicates that the spectrum is at its hardest, suggesting
either a nonthermal origin to the emission or significant tran
sient absorption, possibly due to a coronal mass ejection, but the
shaded image demonstrates that the statistical errors are suffi
ciently large that a more mundane lowdensity thermal expansion
is quite plausible. This stage is followed by a rapid softening
( points 3Y8), interpreted as the thermalization of the nonthermal
hard Xray flare. Then the star, which lies in the lower central
portion of the colorintensity diagram before the flare, moves
to the upper left as the flare is set off (the spectrum hardens
as intensity increases; points 8Y11), turns right at flare peak
(softens as the deposited energy cascades to lower temperatures;
points 11Y12), and then moves down the right flank back to its
original state (decays in temperature and intensity; points 12Y19).
The physical consequences of this analysis are discussed in detail
by Wargelin et al.
5. DISCUSSION
5.1. Informative and Noninformative Prior Distributions
A major component of Bayesian analysis is the specification
of a prior distribution, the probability distribution assigned a
priori to the parameters defining the model. This is often deemed
controversial because of the apparent subjectivity in the as
signment: if different people assign different priors and obtain
different results, how is it possible to evaluate them? The answer
Fig. 6.---Colorintensity evolutionary tracks. Spectral hardness increases to the left, and source brightness increases toward the top. The paired color and intensity
points (shown separately in Fig. 5) are plotted for bin sizes of 40 s (left) and 150 s (right). The error bars for each quantity are also plotted as horizontal and vertical
bars. The points are connected by a thin solid line to aid in visualization of the temporal sequence (see also Fig. 7). Samples of (k S ; kH ) obtained from their posterior
probability distributions for each bin and for various starting time phases are displayed as the shaded image underneath the evolutionary tracks. [See the electronic
edition of the Journal for a color version of this figure.]
Fig. 7.---Ross 154 colorintensity track, same as Fig. 6, but for an image
obtained by averaging cyclespun images at various bin sizes ranging from
40 to 400 s. The persistence of the evolutionary track of the flare on Ross 154
is quite clear in this visualization. The points corresponding to each bin of the
light curve obtained at a bin size of 150 s are superimposed on the image, with
a white solid line connecting them in temporal sequence. A clear pattern of the
flaring plasma evolution can be seen, first rising in temperature and intensity,
softening at the peak, and then decaying in temperature and intensity. [See the
electronic edition of the Journal for a color version of this figure.]
PARK ET AL.
620 Vol. 652

to this conundrum is to realize that all problems always include a
bias brought to the analysis by the researcher, either in the choice
of statistic, or even in the choice of analysis method, and the
specification of a prior distribution codifies and makes explicit
this bias. Furthermore, if indeed prior information about the
value or the range of the parameter is available, the prior allows a
simple and principled method by which this information can be
included in the analysis. If no prior information is available, a so
called noninformative prior distribution must be chosen in order
to not bias the result.
In our case, if there is a strong belief as to the value or range of
the hardness ratio, we can incorporate this via an informative
prior distribution, encoded as a #distribution (see van Dyk et al.
2001). In addition, the analysis of previously acquired data will
produce a posterior probability distribution that can be used as an
informative prior on future observations of the same source.
When no prior information is available, we normally use a
noninformative prior distribution. Since a Poisson intensity is
necessarily positive, three types of noninformative prior distribu
tions immediately present themselves when X j# # Poisson(#):
1. a noninformative prior distribution on the original scale,
p(#) / 1; 16a
2. a Jeffrey's noninformative prior distribution,
p(#) / I 1=2
# ; 16b
3. a noninformative prior distribution under a log transformation,
p( log #) / 1; or equivalently p(#) /
1
#
; 16c
where I # E #@ 2 log p(X j#)/@# 2
j#
# is the expected Fisher
information (Casella & Berger 2002). The first noninformative
prior distribution (eq. [16a]) is flat from 0 to 1, the second
(eq. [16b]) is proportional to the square root of the Fisher
information, and the third (eq. [16c]) is flat on the whole real
line under a log transformation. The functional forms of these
prior distributions can be generalized into p(#) / # ##1 , i.e.,
# # #(#; 0), where # is an index that varies from 0 to 1: the
first corresponds to # 1, the second to # 1/2, and the third
to # 0 (where this notation indicates that # > 0, but arbi
trarily close to zero, for reasons described below). We note that
these noninformative prior distributions are ``improper,'' i.e.,
are not integrable. If an improper prior distribution causes a
posterior distribution to be improper as well, then no inferences
can be made using such nonintegrable distributions. In our case,
as long as # is strictly positive, a posterior distribution remains
``proper,'' i.e., integrable. Hence, in the third case, we adopt val
ues of # that are strictly positive but close in value to 0, e.g.,
# 0:01 or # 0:1.
Note that while these prior distributions are noninformative, in
the sense that in most cases they do not affect the calculated
values of the hardness ratios (see Appendix B), they do codify
some specific assumptions regarding the range of values that is
believed possible. For instance, if a large dynamic range is ex
pected in the observations, a flat prior distribution in the log scale
is more appropriate than a flat prior distribution in the original
scale. The choice of the prior distribution is dictated by the
analysis and must be reported along with the results.
5.2. R versus C versus HR
At low counts, the posterior distribution of the counts ratio,
R, tends to be skewed because of the Poissonian nature of data;
R only takes positive values. The color, C log 10 R, is a log
transformation of R, which makes the skewed distribution
more symmetric. The fractional difference hardness ratio, HR
(1 #R)/(1 R), is a monotonically decreasing transformation
of R, such that HR ! 1 as R ! 0 (i.e., a source gets harder)
and HR ! #1 as R !1 (i.e., a source gets softer). The
monotonic transformation results in a bounded range of #1; 1#
for HR, thereby reducing the asymmetry of the skewed posterior
distribution. The values of R and HR are bounded on one side or
two sides, while C is unbounded due to the log transformation.
The posterior distribution of any hardness ratio becomes more
symmetric as both soft and hard source intensities increase.
Regardless of the size of the intensities, however, the color has
the most symmetric posterior distribution among the popular
definitions of a hardness ratio. Figure 8 illustrates the effect of
the magnitude of the source intensities on the symmetry of the
posterior distribution of each hardness ratio; the posterior dis
tribution of C is confirmed to have the most symmetric posterior
distribution. In the figure, we fix R 2, and the soft and hard
intensities are determined by beginning with k S 2 and kH 1
[in units of counts (source area) #1 ] and increasing the intensities
by a factor of 5 in each subsequent column. We assume no back
ground contamination in each simulation.
Figure 8 also illustrates the effect of the use of different indices
for the noninformative prior distribution on the resulting poste
rior distribution of each hardness ratio. In the case of R, the
posterior distribution becomes diffuse and more skewed toward
0 as the value of # decreases; as expected, this is also the case
for C. In the case of HR, the posterior distribution has more mass
near the boundaries (i.e., #1) as the value of # decreases, thereby
exhibiting a Ushaped density in the case of very faint sources.
As the intensities increase, however, the choice of a prior distri
bution has less effect on the posterior distribution of any of the
hardness ratios.
Finally, as pointed out in x 3, for R and C the mode of the
posterior probability distribution is a robust estimator, whereas
for HR the mean of the distribution is a better choice. ( Notice
that the posterior mode of HR in the low count scenario is #1
when # 0 in Fig. 8.)
5.3. Advantages
A significant improvement that our method provides over the
classical way of computing hardness ratios is that we use the
correct Poisson distribution throughout and do not make any
limiting ( high counts) Gaussian assumptions. Thus, while the
classical method works well only with highcounts data and fails
to give reliable results with low counts, our method is valid in all
regimes. Indeed, because the observed counts are nonnegative
integers, it is not appropriate to model low counts with Gaussian
distributions that are defined on all the real numbers.
Because our methods are based on Poisson assumptions in a
fully modelbased statistical approach, we need not rely on plug
in estimates of the hardness ratio. Instead, we can evaluate the
full posterior probability distribution of the hardness ratio, which
provides reliable estimates and correct confidence limits even
when either or both soft and hard counts are very low. In par
ticular, our method is not limited to ``detectable'' counts and can
properly calculate upper and lower bounds in cases in which the
source is detected in only one passband. For high counts, our
method gives more precise error bars for the hardness ratios than
HARDNESS RATIOS IN POISSON REGIME 621
No. 1, 2006

the classical method (as defined by the coverage; see Table 1 and
Appendix A), although both methods yield similar results.
Further, a priori information can be embedded into our model
and can be updated by data, thereby producing more accurate
results as we collect more data.
5.4. Limitations
Unlike the quantilewidth method ( Hong et al. 2004), the
calculation of hardness ratios is limited by necessity to pre
selected nonoverlapping passbands. In general, we must use
different band definitions in order to achieve the maximum dis
criminating power for soft and hard sources. A priori, with no
knowledge of the character of the source populations, there is
no way to ensure that the chosen passbands are the most efficient
for splitting the spectrum and obtaining maximum leverage to
distinguish spectral model parameters. However, this restriction
allows for a more uniform analysis and comparison of results
across different instruments and telescopes.
Because the Bayesian method does not allow for a simple
analytic solution similar to standard error propagation in the
Gaussian case, the computational methods used to determine
the probability distributions become important. We have im
plemented a Monte Carlo integration method (the Gibbs sam
pler; see x A1) and a method based on numerical integration
(quadrature; see x A2). The Gibbs sampler is based on Monte
Carlo simulation and hence causes our estimates to have sim
ulation errors in addition to their true variability; to reduce the
levels of simulation errors, the Markov chain must be run for
many iterations. On the other hand, the quadrature precisely com
putes the posterior distribution as long as the number of bins is
large enough; however, its computation becomes expensive as
the counts become large because of the binomial expansion in
equation (A5). In general, the Gibbs sampler is much faster than
the method based on quadrature, but care must be taken to ensure
that the number of iterations is sufficient to determine the posterior
mode and the required posterior interval with good precision.
Fig. 8.---Posterior distributions of hardness ratios calculated for R (top row), C (middle row), and HR (bottom row), with different source intensities ( low at left
and high at right) and prior distributions. The different curves in each figure represent the posterior probability distributions computed with different noninformative
prior distribution indices, i.e., # 0 (solid curve), # 1/2 (dashed curve), and # 1 (dotted curve). At higher counts ( large Poisson intensities; right column),
the posterior distributions tends to be symmetric and the effect of the prior distributions is minimal; i.e., the posterior distributions with different indices are nearly
the same. At small counts (small Poisson intensities; left column), the nonsymmetric shape of the posterior distribution for each hardness ratio becomes clear, as does the
effect of the choice of a noninformative prior distribution. [See the electronic edition of the Journal for a color version of this figure.]
PARK ET AL.
622 Vol. 652

We recommend using the Gibbs sampler for high counts and the
quadrature for low counts.
6. SUMMARY
We have developed a method to compute hardness ratios of
counts in nonoverlapping passbands that is fully Bayesian and
properly takes into account
1. the Poissonian nature of the data,
2. background counts, allowing for differences in collecting
areas, effective areas, and exposure times between the source and
background regions, and
3. both informative and noninformative prior information.
The algorithm is implemented in two forms: a Monte Carlo
based Gibbs sampler that samples from the posterior proba
bility distributions of the counts intensities and another that
numerically integrates the joint probability distribution of the
counts intensities using quadrature.
We carry out comparison tests between our method and the
classical method and show that the Bayesian method works even
in the lowcounts regime, where the classical method fails, and
matches the results of the classical method at high counts, where
a Gaussian approximation of the errors is appropriate.
We apply the method to real world test cases of
1. determining candidate quiescent lowmass Xray binaries
in the Terzan 5 cluster, where we show that it can be applied even
in cases where the classical method is entirely inapplicable, such
as those where the source is detected in only one passband, and
2. tracking the time evolution of a flare on the dMe star Ross
154, where we demonstrate the flexibility of the method in being
able to work around the conflicting limitations imposed by time
bin sizes and time resolution.
Additional applications of these methods include spectral clas
sification of the Xray sources detected in deep surveys and
other galaxies, and spectral variability of AGNs (active galactic
nuclei) and Xray binaries. The physical properties of the source
influence the source intensities k S and k H ; given a source spectral
model, the two quantities are related. Thus, variations in the
hardness ratios reflect changes in the source properties (e.g.,
sources with different temperatures have different colors). A
relation between source properties, physical model parame
ters, and the hardness ratio is a natural extension of this work
and will be presented in the future.
We thank Yue Wu for assistance with code development and
DongWoo Kim and Peter Edmonds for useful discussions. The
authors gratefully acknowledge funding for this project partially
provided by NSF grants DMS0104129, DMS0438240, and
DMS0406085 and by NASA Contracts NAS839073 and
NAS803060 (CXC), and NASA grant NAG513056. This work
is a product of joint work with the CaliforniaHarvard astro
statistics collaboration (CHASC) whose members are T. P., V. K.,
A. S., D. v. D., and A. Z. and J. Chiang, A. Connors, D. Esch,
P. Freeman, H. Kang, X.L. Meng, J. Scargle, E. Sourlas, and
Y. Yu.
APPENDIX A
COMPUTING THE HARDNESS RATIOS
A1. MONTE CARLO INTEGRATION: GIBBS SAMPLER
Monte Carlo methods can be used in Bayesian statistical inference if representative values of the model parameters can be simulated
from the posterior distribution. Given a sufficient simulation size, these simulated values can be used to summarize and describe the
posterior distribution and hence the Bayesian statistical inference. For example, the posterior mean can be computed by averaging these
simulated values, and posterior intervals are computed using quantiles of the simulated values. Markov chain Monte Carlo ( MCMC )
methods ( Tanner & Wong 1987) accomplish Monte Carlo simulation by constructing a Markov chain with a stationary distribution
equal to the target posterior distribution. The Markov chain is then run and upon convergence delivers the desired Monte Carlo
simulation of the posterior distribution. The Gibbs sampler (Geman & Geman 1984) is a special case of MCMC that constructs the
Markov chain through alternating conditional simulation of a subset of the model parameters conditioning on the others. A more detailed
description of these methods can be found in van Dyk (2003) and the Appendix of van Dyk et al. (2001).
In order to construct a Gibbs sampler, we embed the statistical model described in x 2.1 into a larger model. Since the observed counts
are the sum of the source counts (#) and the background counts (# ), we write S # S # S and H # H #H . That is, the source counts
are modeled as independent Poisson random variables,
# S # Poisson(e S k S ); # H # Poisson(e H kH ); A1
and the background counts in the source exposure area, as independent Poisson random variables,
# S # Poisson(e S # S ); # H # Poisson(e H # H ); A2
where k and # denote the expected source and background counts in the source region. This implies equation (5), where the detected
photons in the source region are a convolution of the source and background counts, i.e., S # S # S and H #H # H .
Using the method of data augmentation, we treat the background counts in the source exposure area (# S and # H ) as missing data; the
source counts (# S and # H ) are fully determined when # S and # H are known. Intuitively, it is straightforward to estimate the Poisson
intensities if detected photons are split into the source and background counts. Based on this setting, the Gibbs sampler produces Monte
Carlo draws of k S and k H , as well as stochastically imputing the missing data (# S and # H ). The conditional independence of k S and k H
implies that a Monte Carlo simulation of each hardness ratio can be obtained by simply transforming that of k S and k H . Due to the
HARDNESS RATIOS IN POISSON REGIME 623
No. 1, 2006

conditional independence, both soft and hard bands also have exactly the same sampling steps, and thus we illustrate the Gibbs sampler
only for the soft band. First, the joint posterior distribution of k S , # S , and # S is given by
p(k S ; # S ; # S jS; B S ) / p(Sjk S ; # S )p(B S j# S )p(# S j# S )p(k S )p(# S )
/
1
(S # # S )!# S !
k S## S S 1 #1 # B# S S 3 #1 e #(e S S 2
)k S #(e S re S S 4
) # S : A3
That is, conditional on the total soft counts (S ), the unobserved background counts in the source exposure area ( # S ) follow a binomial
distribution: given the current iterates of the parameters, k (t)
S and # (t)
S , step 1 is given by
1. draw # (t1)
S from # S j(k (t)
S ; # (t)
S ; S; B S ) # BinomialS; # (t)
S /k (t)
S # (t)
S ,
for t 1; : : : ; T. Next, steps 2 and 3 are to draw the source and background intensities from the #distributions. In particular, steps 2
and 3 find the next iterates of the intensities using
2. draw k (t1)
S from k S j(# (t)
S ; # (t1)
S ; S; B S ) # Gamma(S # # (t1)
S S 1
; e S S 2
) , and
3. draw # (t1)
S
from # S j(k (t1)
S ; # (t1)
S ; S; B S ) # Gamma(B S # (t1)
S S 3
; e S re S S 4
)
for t 1; : : : ; T. To accomplish one Gibbs iteration, these steps are implemented for the soft and hard bands. After iterating the
Gibbs sampler T times, we collect a posterior sample fk (t)
S ; k (t)
H ; t t 0 1; : : : ; Tg for a sufficiently long burnin period 16 t 0 . The
analytical calculation to determine burnin is far from computationally feasible in most situations. However, visual inspection of
plots of the Monte Carlo output is commonly used for determining burnin. More formal tools for determining t 0 , called con
vergence diagnostics, have been proposed; for a recent review, see Cowles & Carlin (1996). Under a monotone transformation (see
eqs. [12], [13], and [14]) of the posterior samples, T # t 0 Monte Carlo draws for each hardness ratio are obtained, which enables us
to find its point estimates and the corresponding error bar (see x 2.2.1 for details).
A2. NUMERICAL INTEGRATION: GAUSSIAN QUADRATURE
Instead of introducing the missing data, the Bayes's theorem can analytically compute the joint posterior distribution of k S and k H ,
based on which we obtain the posterior distribution of each hardness ratio through quadrature. Because the models for the hard and soft
bands are independent and symmetric, we illustrate the computation only for the soft intensity k S . First, the joint posterior distribution of
k S and # S in equation (8) is written out as
p(k S ; # S jS; B S )
p(k S )p(# S )p(Sjk S ; # S )p(B S j# S )
R 1
0 R 1
0
p(k S )p(# S )p(Sjk S ; # S )p(B S j# S )d# S dk S

(k S # S ) S k S 1 #1
S # B S S 3 #1
S e #(e S S 2 )k S e #(e S re S S 4 )# S
R 1
0 R 1
0
(k S # S ) S k S 1 #1
S # B S S 3 #1
S e #(e S S 2 )k S e #(e S re S S 4 )# S d# S dk S
: A4
Then, the binomial expansion to (k S # S ) S , i.e.,
(k S # S ) S
X S
j0
#(S 1)
#( j 1)#(S # j 1)
k S # S#j
S ; A5
enables us to analytically integrate # S out of the joint distribution in equation (A4) and obtain an analytical solution to the marginal
posterior distribution of k S . Therefore, using the binomial expansion, equation (A4) becomes
p(k S ; # S jS; B S ) " X S
j0
#(S 1)
#( j 1)#(S # j 1)
# S # j B S S 3

(e S re S S 4
) S#jB S S 3
#( j S 1 )
(e S S 2
) j S 1
# #1
; X S
j0
#(S 1)
#( j 1)#(S # j 1)
k j S 1 #1
S # S#jB S S 3 #1
S e #(e S S 2
)k S e #(e S r#e S S 4
)# S
A6
and then # S is integrated out of equation (A6), i.e.,
p(k S jS; B S ) " X S
j0
1
#( j 1)#(S # j 1)
#(S # j B S S 3
)
(e S re S S 4
) S#jB S S 3
#( j S 1
)
(e S S 2
) j S 1
# #1
; X S
j0
1
#( j 1)#(S # j 1)
#(S # j B S S 3
)
(e S re S S 4
) S#jB S S 3
k j S 1 #1
S e #(e S S 2 )k S
A7
16 The term ``burnin'' refers to the number of iterations necessary for the Markov chain to converge and begin to sample from the posterior probability distribution.
PARK ET AL.
624 Vol. 652

Here the prior independence of k S and k H implies that the joint posterior distribution of these two intensities is decomposed into a
product of their marginal posterior distributions, i.e.,
p(k S ; k H jS; H ; B S ; B H ) p(k S jS; B S )p(k H jH ; B H ): A8
This assumption of independence between k S and k H can be relaxed, and hierarchical modeling for sources to determine clustering
properties and spectral parameters can be devised (in preparation).
Using the joint posterior distribution in equation (A8), we compute the analytical solution to the posterior distribution of each
hardness ratio as follows:
1. the posterior distribution of R is obtained after integrating k H out of
p(R; kH jS; H ; B S ; BH ) dR dkH p(k S ; kH jS; H ; B S ; BH ) # # # #
@(k S ; k H )
@(R; k H )
# # # # dk S dkH
p(RkH ; kH jS; H ; B S ; BH )k H dR dkH ;
2. the posterior distribution of C is obtained after integrating k H out of
p(C; kH jS; H ; B S ; BH ) dC dkH p(k S ; kH jS; H ; B S ; BH ) # # # #
@(k S ; k H )
@(C; k H )
# # # # dk S dkH
p(10 C k H ; k H jS; H ; B S ; B H )10 C ln (10)k H dC dk H ;
TABLE 2
Coverage of the XRay Color (C) Using the Bayesian Method with Different Noninformative Prior Distribution Indices
k H [counts (source area) #1 ]
0.5 1.0 2.0 4.0 8.0 16.0 32.0 64.0
k S
[counts (source area) #1 ] #
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
CR a
(%) AL b
0.5................................. 0 + 100 23.99 100 22.67 100 21.19 100 19.02 100 14.07 100 11.32 100 11.14 100 11.11
1/2 100 5.10 100 4.88 99.9 4.51 99.9 3.95 100 3.35 99.9 3.09 99.7 3.02 100 3.03
1 100 2.87 100 2.71 99.0 2.50 98.4 2.25 98.4 2.02 97.2 1.90 97.0 1.82 98.4 1.83
1.0................................. 0 + 100 22.95 100 21.59 100 19.61 100 17.40 100 12.24 100 10.03 100 9.84 100 9.51
1/2 100 4.91 100 4.69 99.7 4.31 100 3.77 99.9 3.04 99.8 2.80 99.7 2.74 99.8 2.71
1 100 2.73 100 2.58 99.2 2.37 98.9 2.14 98.5 1.85 97.3 1.73 97.7 1.67 98.0 1.65
2.0................................. 0 + 100 21.42 100 19.57 99.8 17.52 99.7 15.04 99.6 10.54 99.3 8.45 99.9 7.95 99.8 7.82
1/2 99.9 4.52 99.9 4.30 99.7 3.89 99.6 3.27 99.5 2.56 99.2 2.33 99.0 2.21 99.4 2.21
1 99.1 2.51 99.5 2.37 99.7 2.14 98.8 1.87 97.8 1.60 98.1 1.49 97.3 1.40 97.7 1.39
4.0................................. 0 + 100 18.60 100 16.80 99.9 15.18 99.2 13.11 97.9 8.87 97.9 6.06 97.0 5.85 96.8 5.63
1/2 100 3.92 100 3.68 99.6 3.30 99.2 2.64 98.5 1.89 98.0 1.60 97.9 1.53 96.6 1.49
1 98.4 2.24 99.2 2.09 99.0 1.87 99.1 1.59 97.7 1.29 97.3 1.14 97.1 1.08 97.4 1.04
8.0................................. 0 + 100 13.66 100 12.28 99.3 10.93 98.3 8.74 99.2 5.13 98.2 2.50 97.8 2.26 97.4 2.14
1/2 99.9 3.29 100 3.04 99.7 2.64 97.8 1.86 98.8 1.25 97.6 0.96 97.2 0.86 97.4 0.83
1 97.7 1.99 98.7 1.85 98.7 1.63 97.5 1.28 98.1 1.01 97.3 0.83 96.3 0.75 97.1 0.71
16.0............................... 0 + 100 11.37 100 9.96 99.8 8.49 97.7 6.11 98.6 2.41 97.9 0.73 96.7 0.60 95.5 0.54
1/2 99.9 3.10 99.8 2.79 99.2 2.34 97.8 1.60 98.0 0.95 96.6 0.65 95.6 0.55 94.4 0.50
1 97.5 1.91 98.0 1.73 98.2 1.49 97.2 1.14 96.7 0.83 96.6 0.63 95.3 0.54 93.7 0.49
32.0............................... 0 + 100 10.82 100 9.56 99.3 8.00 97.7 5.74 97.3 2.33 95.7 0.59 97.1 0.43 93.8 0.37
1/2 99.9 2.98 99.6 2.71 99.0 2.21 97.8 1.51 96.5 0.88 94.4 0.55 97.2 0.43 94.3 0.36
1 98.2 1.80 97.4 1.65 96.0 1.40 97.1 1.06 96.0 0.75 94.5 0.54 97.2 0.42 94.5 0.36
64.0............................... 0 + 100 10.99 100 9.52 99.7 7.84 97.7 5.69 97.7 2.11 96.3 0.53 94.6 0.36 96.5 0.29
1/2 99.9 3.02 99.6 2.72 99.0 2.23 98.1 1.52 97.7 0.83 94.8 0.50 94.8 0.36 96.5 0.29
1 98.5 1.82 97.4 1.66 97.1 1.40 97.8 1.05 96.6 0.71 94.2 0.49 94.9 0.36 96.5 0.28
a Coverage rate (CR) of the corresponding interval.
b Average length (AL) of the corresponding interval.
TABLE 3
Legend Key for Table 2
Hard band source intensity, kH [counts (source area) #1 ]
Soft band source intensity k S [counts (source area) #1 ] Coverage rate of intervals with # 0 Average length of intervals with # 0
Coverage rate of intervals with # 1/2 Average length of intervals with # 1/2
Coverage rate of intervals with # 1 Average length of intervals with # 1
HARDNESS RATIOS IN POISSON REGIME 625
No. 1, 2006

Fig. 9.---Graphical representation of the data in Table 2. The symbols are located on a grid corresponding to the appropriate k S (abscissa) and k H (ordinate), for
# 0 (top), # 1/2 (middle), and # 1 (bottom). The shading of each symbol represents the coverage, with 100% being lightest and the shading progressively
getting darker as the coverage percentage decreases. The sizes of the symbols are scaled as ( length of interval) 1/2 . Note that for small values of #, which correspond
to a prior expectation of a large dynamic range in the source colors, the intervals are large when the counts are small (i.e., the choice of prior distribution has a large
effect) and decrease to values similar to those at larger # when there are more counts. [See the electronic edition of the Journal for a color version of this figure.]

3. the posterior distribution of HR is obtained after integrating ! k S kH out of
p(HR;!jS; H ; B S ; BH ) dHR d! p(k S ; kH jS; H ; B S ; BH ) # # # #
@(k S ; kH )
@(HR;!)
# # # # dk S dkH
p # (1 #HR)!
2
;
(1 HR)!
2 # # #S; H ; B S ; B H # !
2
dHR d!:
To numerically integrate out a parameter in these analytical solutions, we employ quadrature, which precisely evaluates the
integral of a realvalued function (and has nothing to do with the Gaussian counts statistic assumptions); refer to Wichura (1989)
for details of the algorithm. Due to the complex expression, the probability density for each hardness ratio is approximated for each
equally spaced abscissa over the finite range of the hardness ratio. Then we summarize the approximate posterior density in order
to calculate the inferences, as detailed in x 2.2.1.
APPENDIX B
EFFECT OF NONINFORMATIVE PRIORS
In order to compare the effect of these noninformative prior distributions, we have carried out simulations for low counts and high
counts cases to test the computed coverage against the simulation: Table 2 presents the coverage rate and mean length of the 95%
intervals for the color, C. In order to simplify the comparison, we assume that there is no background contamination and simulate 1000
realizations of source counts for each case. The tables are laid out as a grid of soft and hard source counts per source area (i.e., k S vs. k H ;
see eq. [A1]).
For each (k S ; k H ) pair, we report the percentage of the simulations that result in 95% posterior intervals that actually contain (k S ; k H )
along with the mean length of the 95% posterior intervals from the simulations. This calculation is carried out for three choices of the
prior index, # 0 , 1/2, and 1, corresponding to the top, middle, and bottom elements in each section of the table. In these simulations,
we theoretically expect that the computed 95% posterior intervals contain the actual value with a probability of 95%. Due to the Monte
Carlo simulation errors, we expect most coverage rates to be between 0.93 and 0.97, which are three standard deviations away from
0.95; a standard deviation of the coverage probability for 95% posterior intervals is given by (0:95)(0:05)/1000
# 1 = 2
0:0069 under a
binomial model for the Monte Carlo simulation.
Fig. 10.---Empirical distributions of coverage rates with different prior distribution indices. The first three columns of panels are constructed under different types
of noninformative prior distributions, # 0 ( first column), # 1/2 (second column), and # 1 (third column). The fourth column depicts results from the
classical method and is presented for reference. ( In calculating the coverage and the length of the intervals, we exclude simulations in which S 0 or H 0, since
C is undefined in such cases.) The top panels correspond to the lowcount scenarios k S (0:5; 1; 2; 4) or kH (0:5; 1; 2; 4), and the bottom panels correspond to the
remaining cases. The coverage rate is the fraction of simulated intervals that contain the parameter under which the data were simulated. In this case, we compute the
coverage rate based on the 95% posterior intervals for C with different combinations of values for k S and k H . The histograms represent the empirical distribution of
the coverage rate. The dotted vertical lines represent the expected coverage rate of 95%.
HARDNESS RATIOS IN POISSON REGIME 627

Table 2 presents the coverage rate and average length of posterior intervals for small and large magnitudes of source intensities. The
key is given in Table 3. For each (k S ; kH ) pair, the posterior intervals are simulated using different prior distribution indices (# 0 , 1/2,
and 1), and the summary statistics---the coverage rate and the mean lengths of the intervals---are displayed from top (# 0 ) to bottom
(# 1) within each section. The same information is shown in graphical form in Figure 9. The use of # 0 tends to yield very wide
posterior intervals and undercover the true Xray color when the source intensities are low. On the other hand, the other two non
informative prior distribution indices produce much shorter posterior intervals and maintain high coverage rates. With high counts,
however, the choice of a noninformative prior distribution does not have a noticeable effect on the statistical properties of the posterior
intervals. The same results are summarized in graphical form in Figure 10, where the 95% coverage rates are shown for various cases.
The empirical distributions of the coverage rate computed under either lowcount (k S 0:5, 1, 2, or 4 or kH 0:5, 1, 2, or 4) or high
count (all of the remaining) scenarios are shown, along with a comparison with the results from the classical method (last column). The
distinction between the Bayesian and classical methods is very clear in this figure. Note that the posterior intervals overcover relative to
our theoretical expectation, but to a lesser degree than the classical intervals. Overcoverage is conservative and preferred to under
coverage, but should be minimal when possible. Considering the lowcount and highcount scenarios, Figure 10 suggests using the
Jeffrey's noninformative prior distribution (i.e., # 1/2) in general.
At low counts, the shape (and consequently the coverage) of the posterior distribution is dominated by our prior assumptions, as
encoded in the parameter #. The coverage rate will also be improved when informative priors are used.
REFERENCES
Alexander, D. M., Brandt, W. N., Hornschemeier, A. E., Garmire, G. P.,
Schneider, D. P., Bauer, F. E., & Griffiths, R. E. 2001, AJ, 122, 2156
Babu, G. J., & Feigelson, E. D. 1997, Statistical Challenges in Modern As
tronomy II ( Berlin: Springer)
Bradt, H., et al. 1976, ApJ, 204, L67
Brandt, W. N., & Hasinger, G. 2005, ARA&A, 43, 827
Brandt, W. N., et al. 2001a, AJ, 122, 1
---------. 2001b, AJ, 122, 2810
Brown, E. F., Bildsten, L., & Rutledge, R. E. 1998, ApJ, 504, L95
Campana, S., Colpi, M., Mereghetti, S., Stella, L., & Tavani, M. 1998, A&A
Rev., 8, 279
Casella, G., & Berger, R. L. 2002, Statistical Inference (2nd ed.; Pacific Grove:
Duxbury)
Cowles, M. K., & Carlin, B. P. 1996, J. Am. Stat. Assoc., 91, 883
Esch, D. N. 2003, Ph.D. thesis, Harvard Univ.
Feigelson, E. D., & Babu, G. J. 1992, Statistical Challenges in Modern As
tronomy ( Berlin: Springer)
---------. 2003, Statistical Challenges in Modern Astronomy III ( Berlin: Springer)
Gehrels, N. 1986, ApJ, 303, 336
Geman, S., & Geman, D. 1984, IEEE Trans. Pattern Anal. Machine Intelli
gence, 6, 721
Giacconi, R., et al. 2001, ApJ, 551, 624
Gregory, P. C., & Loredo, T. J. 1992, ApJ, 398, 146
Hasinger, G., & van der Klis, M. 1989, A&A, 225, 79
Heinke, C. O., Grindlay, J. E., Edmonds, P. D., Lloyd, D. A., Murray, S. S.,
Cohn, H. N., & Lugger, P. M. 2003, ApJ, 598, 501
Heinke, C. O., Wijnands, R., Cohn, H. N., Lugger, P. M., Grindlay, J. E.,
Pooley, D., & Lewin, W. H. G. 2006, ApJ, submitted (astroph/0606253)
Hong, J. S., Schlegel, E. M., & Grindlay, J. E. 2004, ApJ, 614, 508
Gillessen, S., & Harney, H. L. 2005, A&A, 430, 355
Jansen, F., et al. 2001, A&A, 365, L1
Kashyap, V. L., & Drake, J. J. 1998, ApJ, 503, 450
Kashyap, V. L., Drake, J. J., Gu del, M., & Audard, M. 2002, ApJ, 580, 1118
Kim, D.W., Fabbiano, G., & Trinchieri, G. 1992, ApJ, 393, 134
Kim, D.W., et al. 2006, ApJ, 644, 829
Pallavicini, R., Tagliaferri, G., & Stella, L. 1990, A&A, 228, 403
Park, T., et al. 2006, ApJ, submitted
Protassov, R., van Dyk, D. A., Connors, A., Kashyap, V. L., & Siemiginowska, A.
2002, ApJ, 571, 545
Reale, F., Betta, R., Peres, G., Serio, S., & McTiernan, J. 1997, A&A, 325, 782
Reale, F., Gu del, M., Peres, G., & Audard, M. 2004, A&A, 416, 733
Reale, F., Serio, S., & Peres, G. 1993, A&A, 272, 486
Rosner, R., Golub, L., & Vaiana, G. S. 1985, ARA&A, 23, 413
Schmitt, J. H. M. M., & Favata, F. 1999, Nature, 401, 44
Serio, S., Reale, F., Jakimiec, J., Sylwester, B., & Sylwester, J. 1991, A&A,
241, 197
Silverman, J. D., et al. 2005, ApJ, 618, 123
Tanner, M. A., & Wong, W. H. 1987, J. Am. Stat. Assoc., 82, 528
Tuohy, I. R., Garmire, G. P., Lamb, F. K., & Mason, K. O. 1978, ApJ, 226, L17
Vaiana, G. S., et al. 1981, ApJ, 245, 163
van den Oord, G. H. J., & Mewe, R. 1989, A&A, 213, 245
van Dyk, D. A. 2003, in Statistical Challenges in Modern Astronomy III, ed.
E. D. Feigelson & G. J. Babu ( Berlin: Springer), 41
van Dyk, D. A., Connors, A., Kashyap, V. L., & Siemiginowska, A. 2001, ApJ,
548, 224
van Dyk, D. A., & Hans, C. M. 2002, in Spatial Cluster Modelling, ed.
D. Denison & A. Lawson ( London: CRC ), 175
van Dyk, D. A., & Kang, H. 2004, Stat. Sci., 19, 275
Wargelin, B. J., Kashyap, V. L., Drake, J. J., GarcaAlvarez, D., & Ratzlaff,
P. W. 2006, ApJ, submitted
Weisskopf, M. C., Tananbaum, H. D., Van Speybroeck, L. P., & O'Dell, S. L.
2000, Proc. SPIE, 4012, 2
Wichura, M. J. 1989, Dept. Stat. Tech. Rep. 257 (Chicago: Univ. Chicago)
Zamorani, G., et al. 1981, ApJ, 245, 357
PARK ET AL.
628