Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://hea-www.harvard.edu/AstroStat/etc/calerr_Xu+2014_ApJ_794_97.pdf
Äàòà èçìåíåíèÿ: Fri Oct 24 21:37:26 2014
Äàòà èíäåêñèðîâàíèÿ: Sun Apr 10 12:58:09 2016
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: http hea-www.harvard.edu jdrake
The Astrophysical Journal, 794:97 (21pp), 2014 October 20
C

doi:10.1088/0004-637X/794/2/97

2014. The American Astronomical Society. All rights reserved. Printed in the U.S.A.

A FULLY BAYESIAN METHOD FOR JOINTLY FITTING INSTRUMENTAL CALIBRATION AND X-RAY SPECTRAL MODELS
Jin Xu1 , David A. van Dyk2 , Vinay L. Kashyap3 , Aneta Siemiginowska3 , Alanna Connors5 , Jeremy Drake3 , Xiao-Li Meng4 , Pete Ratzlaff3 , and Yaming Yu1

2

1 Department of Statistics, University of California, Irvine, Irvine, CA 92697-1250, USA; jinx@uci.edu, yamingy@ics.uci.edu Statistics Section, Imperial College London, Huxley Building, South Kensington Campus, London SW7 2AZ, UK; dvandyk@imperial.ac.uk 3 Smithsonian Astrophysical Observatory, 60 Garden Street, Cambridge, MA 02138, USA; vkashyap@cfa.harvard.edu, asiemiginowska@cfa.harvard.edu, jdrake@cfa.harvard.edu, pratzlaff@cfa.harvard.edu 4 Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138, USA; meng@stat.harvard.edu Received 2014 June 9; accepted 2014 July 16; published 2014 September 26

ABSTRACT Owing to a lack of robust principled methods, systematic instrumental uncertainties have generally been ignored in astrophysical data analysis despite wide recognition of the importance of including them. Ignoring calibration uncertainty can cause bias in the estimation of source model parameters and can lead to underestimation of the variance of these estimates. We previously introduced a pragmatic Bayesian method to address this problem. The method is "pragmatic" in that it introduced an ad hoc technique that simplified computation by neglecting the potential information in the data for narrowing the uncertainty for the calibration product. Following that work, we use a principal component analysis to efficiently represent the uncertainty of the effective area of an X-ray (or -ray) telescope. Here, however, we leverage this representation to enable a principled, fully Bayesian method that coherently accounts for the calibration uncertainty in high-energy spectral analysis. In this setting, the method is compared with standard analysis techniques and the pragmatic Bayesian method. The advantage of the fully Bayesian method is that it allows the data to provide information not only for estimation of the source parameters but also for the calibration product--here the effective area, conditional on the adopted spectral model. In this way, it can yield more accurate and efficient estimates of the source parameters along with valid estimates of their uncertainty. Provided that the source spectrum can be accurately described by a parameterized model, this method allows rigorous inference about the effective area by quantifying which possible curves are most consistent with the data. Key words: methods: data analysis ­ methods: statistical ­ standards ­ techniques: miscellaneous ­ X-rays: general Online-only material: color figures 1. BACKGROUND 1.1. Calibration Uncertainty in High-energy Astrophysics Observed data are always the result of the interaction of the incident spectrum with an instrument such as a telescope and detector assembly. These are described by instrument calibration products such as effective-area curves, energy redistribution matrices, and point-spread functions. The careful specification of these calibration products is critical both for parameter fitting and for properly accounting for the statistical errors of these fits. It is only through instrument calibration that we can transform measured signals into physically meaningful quantities and have any hope of interpreting data analyses in a useful manner. Misspecification of calibration products can lead to serious bias in the fitted parameters, unreliable statistical errors, and uninterpretable results. In practice, it is well known that instrumental properties (e.g., the quantum efficiency of a CCD detector, point-spread functions) are measured with error. Unfortunately, typical analyses only account for nominal estimates of calibration products without regard for their errors and/or their possible misspecification. This can seriously degrade fitted parameters and their error bars. In spectral analysis, for example, Drake et al. (2006) demonstrated that ignoring calibration uncertainty with good-quality, high signal-to-noise ratio data can result in error bars that are underestimated by a factor of as much as five; see their Figure 5. We show that ignoring these errors not only is detrimental to er5

Deceased, Formerly of Eureka Scientific.

ror bars but more importantly can seriously bias the fitted values themselves. Efforts have been made to develop methods that account for calibration uncertainty in high-energy astrophysics, and such methods exist both in other areas of astrophysics and in related fields such as particle physics (Heinrich & Lyons 2007) and observational cosmology (Bridle et al. 2002); see Lee et al. (2011) for a review. The nature of the errors in high-energy calibration products includes their complex correlations for which existing methods generally do not provide reliable results. Modern instruments such as Chandra are calibrated using data from particularly well understood astronomical sources or from labs and comparing these data with theoretical predictions. These measurements are not typically used directly; rather, they are used to tune sophisticated physics-based computer codes that model the instrument as a whole. These codes can be used to derive both nominal estimates of calibration products and measures of their uncertainty. The calibration products are high-dimensional and exhibit complex and large-scale correlation structures among their components. Accounting for this complex uncertainty is further complicated in high-energy astrophysics by the nonGaussian nature of both the source photon counts and the instrument response. The non-Gaussian data along with the complex correlations in the calibration uncertainty mean that existing methods are not by and large applicable in this setting. The general method of combining measurement and calibration errors in quadrature (e.g., Bevington & Robinson 1992), for example, assumes uncorrelated Gaussian errors and a one-to-one relationship between calibration errors and data points. This is not 1


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

appropriate in the context of the complex correlations exhibited by the wide variety of calibration products computed and recommended by calibration scientists. However, a given product, taken to be error-free, can introduce serious biases into the final fitted values. Thus, calibration uncertainty must be folded into analyses in order to provide statistically principled and scientifically sound results. As we shall see, doing so can allow the data to inform the choice among the possible calibration products. To address these complex correlations, Drake et al. (2006) suggested a bootstrap-like method that relies on the availability of a large representative sample of possible calibration products capturing various calibration uncertainties; we refer to this sample as the calibration library. In particular, they generated a replicate data set for each calibration product in the library and fit it in the usual way. The variability among the resulting fitted model parameters then estimates the effect of calibration uncertainty on the fitted parameters. While this is a useful method to demonstrate the scale of the effect of calibration uncertainty on error bars for the model parameters, generating replicated data sets requires knowledge of typically unknown model parameters. From a practical point of view, the method's reliance on a large calibration library is also problematic, especially considering that calibration products of space-based detectors degrade over time, and hence different calibration libraries are required for different observations. Lee et al. (2011, henceforth Paper I6 ) developed this approach further by, first, replacing the large calibration library with a lowdimensional model for the calibration uncertainty and, second, embedding the model for calibration uncertainty into a Bayesian procedure that simultaneously fits the model parameters and accounts for calibration uncertainty. As suggested by Drake et al. (2006), Paper I modeled the calibration library using a principal component analysis (PCA), but did so in a way that avoided the need for observation-specific calibration libraries. This strategy effectively embeds the instrument-modeling code via the calibration library as an integral part of the statistical computing techniques. This approach, however, supposes purely for simplicity that the observed photon counts and the calibration product are conditionally independent, that is, that the data provide no information for narrowing the calibration uncertainty. An advantage of this independence assumption is that it significantly simplifies the computing, making the algorithm easy to implement. For this reason in Paper I, we call this approach a pragmatic Bayesian method. The independence assumption of the pragmatic Bayesian method also implies that the choice of calibration product is determined by calibration scientists, calibration experiments, and calibration simulations, rather than the data from a particular observation. This may be viewed as an advantage by some researchers, particularly if they do not trust the post-observation model assumptions. The primary objective of this article is to remove the questionable independence assumption of the pragmatic Bayesian approach and allow the data to narrow the calibration uncertainty. This is a more principled approach from a statistical perspective. If a subset of the calibration products are plausible before seeing the data but inconsistent with the data once it is observed, this subset should not play a role in the final analysis.
6

We call this a fully Bayesian method because it fully follows the principles of Bayesian analysis. This approach has other advantages. For example, in addition to the calibration uncertainty quantified through idealized experiments, calibration products are subject to errors stemming from differences between these idealized settings and the variety of actual settings in which the products are used. Indeed, suspected systematic errors cannot be fully understood without taking into account the actual data in any particular observation and/or cross-instrument comparisons that may be made (e.g., Nevalainen et al. 2010). Our fully Bayesian method allows the data to inform our choice of possible calibration products. In practice, we find that relatively large data sets ( 104 counts) are needed to obtain appreciable power in narrowing calibration uncertainty. Like the pragmatic Bayesian method, the fully Bayesian approach embeds a model derived from a PCA of the calibration library into a larger statistical model. Unlike the pragmatic Bayesian method, however, it then marginalizes over calibration uncertainty while conditioning on the observed data, whereas the pragmatic method did so without conditioning on the data. In this regard, the fully Bayesian method is in line with methods proposed by Bridle et al. (2002) and Heinrich & Lyons (2007)for handling systematic errors in cosmology and particle physics, respectively. These proposals, however, use a parameterized form for the systematics under which marginalization can be achieved analytically. While these specific proposals are not applicable in our setting, they share our emphasis on the general principle of building a joint model that incorporates all sources of uncertainty and then marginalizing over nuisance parameters while conditioning on the observed data. The statistical framework we present is quite general, but in this paper we focus on X-ray spectral analysis with uncertainty in the effective-area curve. We begin by outlining the necessary background on Bayesian spectral analysis, the PCA-based calibration model proposed in Paper I, and the pragmatic Bayesian method. Section 2 lays out our fully Bayesian method and uses a simple numerical example to illustrate its advantage over both the typical strategy of ignoring calibration uncertainty and the pragmatic Bayesian method. Section 3 validates our proposal using a set of simulation studies that include a comprehensive frequency evaluation. We find striking improvement in the statistical properties of estimates and error bars, when large-count spectra ( 104 counts) are generated with an effective-area curve consistent with the prespecified calibration uncertainty but the spectra are fit with a misspecified default curve. Section 4 illustrates how the fully Bayesian method works for several data from the Chandra telescope: a collection of QSOs observed near the aimpoint of ACIS-S and described with absorbed power-law models; a bright O star system at a large off-axis location on ACIS-S2 and modeled as absorbed multithermal spectra; and coadded long-duration grating observation of an isolated neutron star modeled as a blackbody spectrum. Finally, we discuss the consequences and future directions of our method in Section 5. An appendix details the Markov Chain Monte Carlo (MCMC) algorithms we designed to implement our method; they rely on the pyBLoCXS module (Siemiginowska et al. 2011) in Sherpa and existing algorithms for the pragmatic Bayesian method.7 A glossary of the symbols we use appears in Table 1.
7 A version of our software that implements the pragmatic Bayesian fitting of Paper I is currently available within CIAO/Sherpa. A newer version that includes both the pragmatic and fully Bayesian algorithms will soon be made publicly available at http://github.com/astrostat/pyblocxs/ and eventually made available within CIAO/Sherpa.

We refer to Lee et al. (2011)as Paper I simply for easy reference; both the current paper and Lee et al. rely on earlier work that characterize calibration uncertainty (Drake et al. 2006) and lay out the framework of MCMC-based Bayesian spectral analysis (van Dyk et al. 2001). In contrast to van Dyk et al., who use Gibbs-based MCMC, both Paper I and this work adopt Metropolis and Metropolis­Hastings (MH) type MCMC for the sake of flexibility in astrophysical model specification.

2


The Astrophysical Journal, 794:97 (21pp), 2014 October 20 Table 1 Glossary of Symbols Used in the Text Symbol A A0 A 0 Al ¯ A Aprop (t +1) (t +1) ApB , AfB A CI E E e eprop (t +1) (t +1) efB , epB g I i I J j K KpyB L l L M p pstd , ppB , pfB R 2 rj T t
(t +...)

Xu et al.

Description An effective-area (ARF) curve The default effective-area curve on which the calibration library is based. The observation-specific effective-area curve. Effective-area curve l in the calibration library The average of the effective-area curves in the calibration library A proposed effective-area curve in an MH sampler An effective-area curve simulated with the pragmatic and fully Bayesian sampler A set of effective areas, the calibration library A confidence interval Energy of incident photon Detector channel at which the detector registers the incident photon The low-dimensional PCA representation of A; see Equation (5) Value of e proposed in an MH sampler Value of e simulated with the pragmatic and fully Bayesian samplers Generic proposal distribution in a MH sampler Number of inner iterations in MCMC loop, typically 10 Inner iteration number or index Information obtained prior to data acquisition, for example, by calibration scientists Number of components used in PCA analysis, here 8 Principal component number or index An MCMC kernel The MCMC kernel used in pyBLoCXS Number of replicate effective-area curves in calibration library Replicate effective-area number or index The likelihood function Number of replicated draws of per draw of A in Iterated MH within PCG sampler A generic prior or posterior distribution Standard, pragmatic Bayesian, and fully Bayesian posterior distributions Energy redistribution matrix (RMF) Eigenvalue or PC coefficient of component l in the PCA representation Number of MCMC iterations Main MCMC iteration number or index The superscript indicates the running index of random draws A uniformly distributed random number between zero and one Eigen- or feature-vector for component l in the PCA representation Data, typically used here as counts spectra in detector PI bins Correlation used to validate the choice of I in the MH within PCG sampler The acceptance probability in an MH sampler Spectral model parameters Estimate of A value of proposed in an MH sampler Value of simulated with standard, pragmatic Bayesian, and fully Bayesian samplers Generic notation for unknown quantities in analysis, viz., = or = (A, ) A value of proposed in an MH sampler Error bars under the standard, pragmatic, and fully Bayesian methods

u vj Y ^ ^ prop (t +...) (t +...) std , fB , prop std , pB , fB ^ ^ ^

(t +...) pB

1.2. Bayesian Spectral Analysis To fix ideas, we focus attention on a general spectral analysis problem in which the observed photon count in channel E is modeled according to the Poisson distribution,8 Y (E ) Poisson
E

(E ; )A(E )R (E ; E )+ B (E ) ,

(1) where (E ; ) is the source spectral intensity at energy E, is the (vector) spectral parameter, A(E ) is the effective-area curve at energy E, R (E ; E ) is the energy redistribution matrix of the detector, and B (E ) is the background intensity in channel
8 The notation Y Distribution is a standard way in statistics literature to indicate that a variable (e.g., the observed counts) is modeled using the specified distribution (e.g., Poisson).

E . The photon counts in each channel, Y (E ), are independent Poisson variables. For notational simplicity, we represent the collection of observed photon counts by Y = {Y (E )}, the effective area by A = {A(E )}, and the photon redistribution matrix by R = {R (E ; E )}. We focus on methods that treat A as an unknown quantity and allow its uncertainty to affect both the fit and the error bars of . Although our methods can be employed to account for uncertainty in R or in both A and R,we do not pursue these possibilities in this article. To fit the spectral parameters, , given the observed photon counts, Y, while accounting for calibration uncertainty, we adopt a Bayesian framework. In particular, we quantify our state of knowledge before having seen the data using a so-called prior distribution and that after having seen the data using a posterior distribution. Bayes's theorem allows us to transform the prior distribution into the posterior distribution by conditioning on the observed counts. In particular, suppose that represents 3


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

the quantities that are to be estimated and I is the information we have before seeing the data, including that used to estimate the calibration products and/or their uncertainty. We compare methods that treat only as unknown with others that treat both and A as unknown, so either = or = (, A), while R (and sometimes A) is part of I . In this setting Bayes's theorem states that the posterior distribution of given Y and I is p( |Y, I ) =
L(Y |, I ) p ( |I ) , p (Y | I )

(2)

where p( |I ) is the prior distribution of , L(Y |, I ) is the likelihood of Y given , and p(Y |I ) = p(Y |, I )p( |I )d is the normalizing constant that ensures that p( |Y, I )integrates to 1. We include I here to emphasize that analyses always rely on some information external to Y. To simplify notation, however, we assume that all probability distributions are conditional on I and omit it from our notation for the remainder of the article. We also implicitly assume that the adopted spectral model is appropriate. Its misspecification can lead to biases in addition to those caused by misspecification of the calibration products. Substituting = (, A) into Equation (2) and assuming that the prior distributions for and A are independent, we can write the posterior distribution as p(, A|Y ) L(Y |, A) p( ) p(A), (3)

where we have omitted the denominator of Equation (2) because it is determined by the numerator. We typically use a diffuse prior distribution on representing prior ignorance and an informative prior distribution on A representing the information obtained from calibration studies. Whereas our primary goal is to consider methods for joint inference for and A using Equation (3), we also compare such methods with the standard approach that treats A as fixed and known. For clarity we refer to this approach as the standard method. For example, in a Bayesian analysis (e.g., van Dyk et al. 2001), the standard method involves estimating using its posterior distribution given the observation, Y, and the nominal effective-area curve associated with this observation, A , that 0 is, using pstd ( | Y, A0 ) L(Y | , A0 ) p( ). (4) Because this approach assumes that A = A , it does not account 0 for calibration uncertainty. Paper I illustrates that this can lead to misleading estimates of and can significantly underestimate the error bars associated with these estimates. Nevertheless, because this is the standard approach in practice, we treat it as a baseline in our numerical comparisons. 1.3. Quantifying Calibration Uncertainty The specification of the posterior distribution in Equation (3) requires that we formulate a prior distribution on A that encapsulates the calibration uncertainty. Although they were not working in a Bayesian setting, Drake et al. (2006) generated a library of ACIS effective-area curves. This was accomplished by explicitly including uncertainties in each of the subsystems of the telescope (UV/ion shield transmittance, CCD quantum efficiency, and the telescope mirror reflectivity) using truncated Gaussian distributions for the parameters of different instrument models and by modifying a spline correction curve that multiplies a default curve. More recently, we compiled a second calibration library to represent uncertainty in the LETGS+HRC-S grating/detector system (J. Drake et al., in preparation). 4

This includes corrections applied to the telescope mirror reflectivity, grating obscuration and efficiency, UV/ion shield transmittance, micro-channel plate quantum efficiency and uniformity, etc. Additionally, spline knots were set at all prominent spectral edges due to materials that were used in the construction of the telescope, grating, and detector. In our numerical studies, we illustrate how either of these libraries can be incorporated into a fully Bayesian analysis. Both consist of L = 1000 simulated effective-area curves, each of length 1078 in the ACIS library and each of length 16,384 in the LETGS+HRC-S library. The former is used in Sections 4.1 and 4.2, and the latter in Section 4.3. In our general notation, we represent a calibration ¯ library by A = {A1 ,A2 ,...,AL }, define A to be the arithmetic mean curve of the calibration library, and let A0 denote the de¯ fault effective-area curve associated with the library; A and A0 are similar but not necessarily equal. In practice, the calibration library must be large to fully represent the uncertainty in high-dimensional calibration products. To summarize this sample into a concise and usable form, Paper I implemented a PCA on the mean-subtracted calibra¯ ¯ tion sample, {A1 - A,...,AL - A}. PCA is a mathematical procedure that uses orthogonal transformations to convert a set of possibly correlated variables into a set of linearly uncorrelated variables called principal components. Approximately 8 (20) principal components (out of 1000) account for 97% (99%) of the variability in the ACIS calibration library. As in Paper I, we conduct a Bayesian analysis that treats A and as unknown. We use the PCA summary of the calibration library to formulate the prior distribution for A, p(A). In particular, we suppose that under p(A), ¯ A(e) = A +(A - A0 )+ 0
J

ej rj vj ,
j =1

(5)

where A is the user-generated observation-specific effective0 2 area curve, rj and vj are the principal component eigenvalues and eigenvectors, and ej are independent standard normal ¯ deviations.9 Since A A0 , we can view Equation (5) as starting with the user-specified effective area, A , and adding the random 0 ¯ term J=1 ej rj vj to account for uncertainty; A - A0 adjusts j for the necessary mean subtraction of A when conducting PCA. To simulate replicate effective-area curves under the prior distribution given in Equation (5), we only need to draw J independent standard normal deviations, (e1 ,...,eJ ), and evaluate Equation (5). We treat A(e) as the generic notation for the effective-area curve and continue to simply write A when its explicit dependence on e is not pertinent. Using Equation (5) to summarize the calibration library involves several assumptions. First, we assume that the uncertainty in A can be described by a multivariate normal distribution. The similarity of the effective-area curves in A means that most of the correlations among the components of this distribution are very strong (i.e., near 1). Equation (5) stipulates that the distributions associated with calibration uncertainty for observationspecific effective-area curves differ only in their means and that they have the same variance. This means that we can use the
An additional residual term, = L=J +1 rj vj , may also be included in j Equation (5). Adding eJ +1 can help to account for the full range of calibration uncertainty when J is small, or for components that contribute significantly over small energy ranges, yet make up a small fraction of the overall variance of A.
9


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

A [cm2]

A [cm2]

400

0.3

1

10

-40

0

0 0.3

40

1

10

A [cm2]

A [cm2]

400

0.3

1

10

-40

0

0 0.3

40

1

10

A [cm2]

A [cm2]

400

0.3

1

10

-40

0

0 0.3

40

1

10

A [cm2]

A [cm2]

400

0.3

1
energy [keV]

10

-40

0

0 0.3

40

1
energy [keV]

10

Figure 1. PCA representation of several effective-area curves from the calibration library. The left column plots four randomly selected Al A, one in each row, along with their PCA representation, Al (e), for several values of J. The original curves are plotted in dot­dashed black, and the PCA representations are plotted as dashed blue and solid red for J = 1 and 8, respectively. The right column is constructed in the same manner but plots the error in our approximation, Al = Al (e) - Al . Although Al (e) deviates from Al , even with J = 8, the right column shows that the scale of this deviation is quite small and that overall using J = 8 concisely captures the structure of each of the effective-area curves. (A color version of this figure is available in the online journal.)

variance of the calibration library of Drake et al. (2006) to represent the variance of any observation-specific effective area. In practice, this procedure avoids generating a calibration library for each observation while still allowing us to account for calibration uncertainty in a practical manner. Figure 1 illustrates the performance of PCA in summarizing the structure of individual effective-area curves in A. It shows that with J = 8, the reconstructed effective-area curve nicely captures the structure of the original A A. This means that we can capture the vector A of length 1078 with the vector e of length 8, by using A(e). We use J = 8 in all of our numerical studies. This PCA-based representation, A(e), is critical for both the pragmatic Bayesian method of Paper I and the fully Bayesian method described here. It not only provides a simple way to quantify calibration uncertainty but also allows us to evaluate both p(A, | Y ) and p(A) using Equations (3) and (5). The evaluation of both of these distributions is necessary for our MCMC sampler. 1.4. A Pragmatic Bayesian Method As noted above (see Equation (4)), standard analyses assume that the effective area is fixed. That is, the parameters are estimated conditional on A . Here, we aim to eliminate this 0 conditioning. Mathematically, this involves treating A as unknown, rather than conditioning on its value, and expressing Equation (3)as pfB (, A | Y ) = p( | A, Y ) p(A | Y ); (6) 5

here the subscript fB indicates that this is the fully Bayesian posterior distribution. Paper I made the "pragmatic assumption" that p(A | Y ) = p(A). This assumption says that the observed data and calibration are independent, that is, the data provide no information for narrowing the uncertainty in the choice of effective-area curve. Under this assumption, the posterior distribution in Equation (6) can be written as ppB (, A | Y ) = p( | A, Y ) p(A), A 0 (7) where p( | A, Y ) is given in Equation (4) with replaced by the generic A. We use the subscript pB in Equation (7) to emphasize that this is the posterior distribution under the pragmatic assumption of Paper I. Under the model in Equation (7), inference for is based on its marginal posterior distribution, ppB ( | Y ) = p( | A, Y ) p(A) dA. (8)

The pragmatic Bayesian method accounts for calibration uncertainty in a conservative manner. The assumption that p(A | Y ) = p(A) ignores information in the data that may reduce uncertainty in A and hence in . We now consider methods that allow Y to narrow the uncertainty of A. 2. A FULLY BAYESIAN SOLUTION 2.1. Motivation and Theory Under the fully Bayesian posterior distribution, the marginal distribution of given in Equation (8) is replaced with pfB ( | Y ) = pfB (, A | Y ) dA, (9)


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Standard 1.15 1.15

Pragmatic Bayes 1.15 0.09 0.11 0.13
H

Fully Bayes





1.05

1.05

0.95

0.95

0.09

0.11

N

0.13
H

0.15

N

0.15

0.95 0.09

1.05



0.11

N

0.13
H

0.15

Figure 2. Comparison of the standard (left), pragmatic Bayesian (middle), and fully Bayesian (right) methods. Each panel compares a complete Monte Carlo posterior sample of values of = (,NH ) with its true value, marked with a red square; NH is shown in units of 1022 cm-2 . When fit using an incorrect effective-area curve, the standard method can result in misleading estimates of (see left panel). The pragmatic Bayesian method, on the other hand, averages over all a priori possible effective-area curves and significantly enlarges the posterior variance for (see middle panel). Although the centers of the posterior distributions under the standard and pragmatic Bayesian methods are similar, the larger error bars computed with the latter allow them to include the true values of . Finally, the posterior distribution under the fully Bayesian method shifts toward the true value of , allowing it to cover the true value while avoiding the large error bars of the pragmatic Bayesian method (see right panel). (A color version of this figure is available in the online journal.)

where pfB (, A | Y ) = p( | A, Y ) p(A | Y ). The fully Bayesian posterior distribution of can be viewed as a weighted version of that under the pragmatic posterior: pfB ( | Y ) = 1 T p( | A, Y )
T

p (A | Y ) p(A) dA p(A) p(A(t ) | Y ) , p(A(t ) ) (10)

p( |A(t ) ,Y )
t =1

where A(t ) is drawn from p(A). In principle, we could implement the fully Bayesian method by obtaining a sample {A(t ) ,t = 1,...,T } from p(A) and evaluating Equation (10). (This technique is known as importance sampling.) Unfortunately, evaluating p(A | Y ) = p(, A | Y ) d is extremely challenging in this setting. Alternative computational techniques therefore are needed; we outline our strategy in Section 2.3 and give details in the Appendix. The posterior distribution of A under the pragmatic method, ppB (A | Y ) = = ppB (, A | Y ) d p( | A, Y ) p(A) d = p(A),

is simply equal to the prior distribution of A. The fully Bayesian posterior for A, pfB (A | Y ) = = pfB (, A | Y ) d p( | A, Y ) p(A | Y ) d = p(A | Y ),

however, can be used to learn what effective-area curves are more or less consistent with the observed data. We illustrate how this is done in Section 3. 2.2. The Advantage of a Fully Bayesian Analysis To illustrate the advantage of the fully Bayesian method over the pragmatic Bayesian method, we compared their performance in a simulation study. In Section 3 we reproduce part of 6

the simulation study of Paper I but this time including the fully Bayesian method. Here we give detailed results under the simulation setting called Simulation II in Section 3. In particular, we simulate an absorbed power-law source model with three parameters (power-law index , absorption column density NH , and normalization) using the fake_pha routine in Sherpa (Freeman et al. 2001). The data set was simulated without background contamination, using the XSPEC model wabs*powerlaw and a default photon redistribution matrix (RMF) for ACIS-S. We consider the energy range from 0.3 keV to 7 keV, which is divided evenly into approximately 1000 bins. Here we give detailed results for Simulation II, which set = 1, NH = 1021 cm-2 , generated 105 counts, and used an extremal effective-area curve from the calibration library of Drake et al. (2006). (We use the same extremal effective-area curve as in Paper I and label it Aext ; it is the curve with the highest value of the effective-area in the calibration library.) Figure 2 plots a Monte Carlo sample from the posterior distribution under the standard method, pstd ( | A0 ,Y ), the pragmatic Bayesian method, ppB ( |Y ), and the fully Bayesian method, pfB ( | Y ), where A0 is the default under the calibration library of Drake et al. (2006) and here = (,NH ). The red square in each panel gives the true value of . The MCMC samplers used in Figure 2 are described in the Appendix. Although the error bars computed with the standard method are the smallest, in this simulation the method misses the true value of . The results are precise but inaccurate. This is not unusual when the default effective-area curve, A0 , is misspecified, as it is in this case because the data were generated under a different curve--one that is nonetheless plausible given the calibration uncertainty. The pragmatic method accounts for calibration uncertainty by averaging over all a priori possible effectivearea curves, resulting in much larger error bars that capture the true value of --the method is imprecise but accurate. As pointed out in Paper I, this is a clear advantage over the standard method. Finally, the fully Bayesian method accounts for calibration uncertainty by averaging over those effective-area curves that are consistent with the observed data. The resulting error bars are only slightly larger than those produced with a fixed effective-area curve, but the fitted values for have shifted enough that the error bars still capture the true value. This example clearly illustrates the benefits of the fully Bayesian


The Astrophysical Journal, 794:97 (21pp), 2014 October 20 Table 2 The Four Simulations Used to Compare the Standard, Pragmatic Bayesian, and Fully Bayesian Methods Nominal Counts 10 Simulation Simulation Simulation Simulation I II III IV
5

Xu et al.

Spectral Model Hard X X
a

10

4

Softb

and and can compute A(e). Thus, we write our approximation as g (, e). Specifically, our sampler for the fully Bayesian model is as follows. Pragmatic Proposal Sampler For t = 0, 1,... ,T , ¯ Step 1: Draw ( prop ,eprop ) g (, e), set Aprop = A0 +(A - prop J A0 )+ j =1 ej rj vj , and compute = pfB ( p
fB prop (t (t ,Aprop | Y )g fB) ,efB) prop

X X X X

X X

Notes. a An absorbed power law with = 2, N = 1023 cm-2 . H b An absorbed power law with = 1, N = 1021 cm-2 . H

(t fB) ,A(t ) | Y g ( fB

,e

prop

)

.

(11)

Step 2: Let u be a uniformly distributed random number between 0 and 1 and set
(t (t fB+1) ,efB+1) ,A(t +1) = fB

method: we can characterize the performance of the standard method as "precise but inaccurate," of the pragmatic method as "imprecise but accurate," and of the fully Bayesian method as both "precise and accurate." 2.3. Fitting the Fully Bayesian Model We use a Metropolis­Hastings (MH) MCMC sampler to (t obtain a correlated sample, {(fB) ,A(t ) ),t = 1,...,T }, from fB pfB (, A | Y ). An introduction to MCMC, MH samplers, and the other computational methods that we employ appears in Appendix A.1. We use a type of MH sampler that is known as an independence sampler. It uses an approximation, g, to (t pfB (, A | Y ) that does not depend on (fB) ,A(t ) ) as its proposal fB (t +1) (t +1) distribution when sampling (fB ,AfB ). For such a sampler to work efficiently, any range of values that has appreciable probability under pfB (, A | Y ) must also have appreciable probability under g. This is because the only values that can be simulated with an independence sampler are those values that can be simulated with g. If there is a range of values that has negligible probability under g, values in this range are unlikely to be simulated. To be sure that no important values of (, A) are missed, it is critical that g be an overdispersed approximation to pfB (, A | Y ). That is, g must be a reasonable approximation to pfB (, A | Y ), but with more proclivity to extreme values. Fortunately, the pragmatic posterior distribution, ppB (, A | Y ), provides just such an overdispersed approximation to pfB (, A | Y ). This is illustrated numerically in Figure 2, in which the posterior distribution under the pragmatic Bayesian model (middle panel) is clearly an overdispersed approximation to that under the fully Bayesian model (right panel). More generally, the pragmatic Bayesian approach includes all values of A that are possible under p(A), whereas the fully Bayesian approach more heavily weighs those values that are more consistent with Y. Thus, the fully Bayesian posterior distribution focuses on a narrow range of A. This in turn leads to a narrower range of values of that are plausible under pfB (, A | Y ) than under ppB (, A | Y ). Unfortunately, although ppB (, A | Y ) is an ideally suited overdispersed approximation to pfB (, A | Y ), it cannot be used directly as the proposal distribution, g, because it is difficult to evaluate. (We need to evaluate g in order to compute given below in Equation (11).) In Appendix A.3 we discuss this difficulty and how we construct a simple approximation to ppB (, A | Y ) that is a suitable choice for g. Because g is based on the pragmatic posterior distribution, we call it a pragmatic proposal. Using Equation (5), A is completely represented by the low-dimensional e, so we need only sample e 7

( prop ,eprop ,Aprop ) if u < . (t (t fB) ,efB) ,A(t ) otherwise fB

In our numerical studies, we make one final adaptation. At each iteration we either use the update described in the Pragmatic Proposal Sampler with g set to the approximation to ppB (, A | Y ) described in Appendix A.3, or we conduct a random-walk update to (, e) by replacing Step 1 of the sampler with the following
( Alternative Step 1: Draw ej = ejt ) + N (0, ) for j = 1,...,J and prop g ( | eprop ), set Aprop = prop ¯ A0 +(A - A0 )+ J=1 ej rj vj , and compute j prop

=

pfB ( p
fB

prop

(t (t ,Aprop | Y )g fB) | efB) prop

(t fB) ,A(t ) | Y g ( fB

|e

prop

)

.

The choice between the two versions of Step 1 is made randomly at each iteration of the sampler. Mixing proposal distributions in this way tends to improve the convergence of MCMC samplers. We use this Mixed Pragmatic Proposal Sampler in all of our numerical studies. 3. SIMULATION STUDIES In this section we use a series of simulation studies to demonstrate the circumstances under which the fully Bayesian method is advantageous. In Section 4 we illustrate the method in real data analyses. 3.1. Replicating the Simulation Study of Paper I We begin by replicating four of the eight simulation studies of Paper I, but this time using the fully Bayesian method. The four simulations are conducted exactly as the one in Section 2.2, but data were generated with different spectral models and different nominal counts (see Table 2). Specifically, the simulation represents a 2 â 2 design, with the two factors being 1. either 105 or 104 photon counts are simulated, and 2. data are generated under either a hard spectral power-law model ( = 2,NH = 1023 cm-2 ) or a soft spectral powerlaw model ( = 1,NH = 1021 cm-2 ). Following Paper with the extremal library of Drake from the default I, all four simulated data sets were generated effective-area curve, Aext , from the calibration et al. (2006). This curve differs substantially curve used in the standard method and is


The Astrophysical Journal, 794:97 (21pp), 2014 October 20
Simulation I 105 counts Simulation II 105 counts

Xu et al.

Fully Bayes Pragmatic Bayes Standard

Fully Bayes Pragmatic Bayes Standard

1.6

1.8

2.0



2.2

2.4

2.6

0.8

0.9

1.0



1.1

1.2

1.3

Simulation III 104 counts

Simulation IV 104 counts

Fully Bayes Pragmatic Bayes Standard

Fully Bayes Pragmatic Bayes Standard

1.6

1.8

2.0



2.2

2.4

2.6

0.8

0.9

1.0



1.1

1.2

1.3

Figure 3. Results for Simulations I­IV. The panels show the posterior distributions (curves), fitted values (â), and 1 error bars (horizontal bars) for the spectral power-law index, . Results for the standard, pragmatic Bayesian, and fully Bayesian methods are plotted in dotted black, dashed blue, and solid red, respectively. (The posterior distribution under the standard method is omitted to enhance the comparison of the other two methods. To smooth the plotted pragmatic Bayesian posterior distributions, we use the MH within PCG Sampler of Paper I, see Appendix A.2.) Thetruevalueof is given by the red broken vertical line. In these simulations, we consider the situation in which the default effective-area curve is misspecified to a degree that is consistent with the variability of the calibration library. Because the standard method uses this misspecified curve, it performs poorly. Both the pragmatic and the fully Bayesian methods avoid assuming that A is known without error, allowing them to perform better. Of these two, the fully Bayesian method provides both estimates of that are closer to its true values and narrower error bars. (A color version of this figure is available in the online journal.)

extreme within the calibration library, but it is nonetheless consistent with the uncertainty described by the library. We also examined four additional simulations in Paper I where the simulated data were generated using the default effective-area curve. We exclude discussion of these simulations here because, first, it is unrealistic to assume that the true effective area is known and, second, if such an assumption is made, it only shows, as expected, the standard method performs the best; see Xu (2014) for details. In practice, the effective-area curve is specified with uncertainty, and it is not feasible to correctly specify its value. The four simulations we conduct (Simulations I­IV) are numbered Simulations 5­8 in Paper I. We fit an absorbed power-law model to each of the four simulated data sets, using the standard, pragmatic Bayesian, and fully Bayesian methods. In all cases the standard method is run using the (misspecified) default effective area. This enables us to investigate the effect of misspecification of A0 , where the degree of misspecification is consistent with the range of variability of the calibration library. Posterior distributions, intervals, and fitted values for , computed with each of three methods, run on each simulation, appear in Figure 3. (We use posterior means and central 68% posterior intervals for fitted values and error bars throughout the paper.) The joint posterior distribution of and NH under Simulation II for each of the three methods is discussed in Section 2.2 and appears in Figure 2. In all four simulations, the standard method produces significantly narrower error bars than the other methods, especially in the large-count Simulations I and II. Unfortunately, however, as expected its intervals miss the true value of by a large margin. 8

The pragmatic Bayesian method, by contrast, exhibits similar fitted values but much wider error bars that reflect the variability in the fits resulting from different choices of A in the range of A. Finally, the fully Bayesian method uses the data to exclude some A in the range of A that are inconsistent with the observed spectra. The result is optimal in that the fitted values shift toward the true value of and the widths of the error bars narrow relative to those produced with the pragmatic Bayesian method. Thus, like the pragmatic Bayesian method, the fully Bayesian method provides error bars that contain the true value at nearly the correct statistical rate (68.2% here), but these error bars are narrower than those provided by the pragmatic Bayesian method--and can be much narrower with large counts. We conduct a large-scale simulation study in Section 3.2 to better quantify these trends. The results of the simulations can be understood by considering the statistical errors (due to Poisson fluctuations in the counts) and the systematic errors (due to misspecification of the effective-area curve). The standard method ignores the latter sources of error, and thus, not surprisingly, it provides misleadingly narrow errors and can exhibit significant bias if the misspecification of A is substantial. Because it only considers statistical errors, the standard method underestimates the total error for large-count spectra. The pragmatic Bayesian method, on the other hand, incorporates both statistical and systematic errors, resulting in significantly larger error bars. Because systematic errors do not dissipate as the sample size grows, the error bars produced by the pragmatic Bayesian method are not particularly sensitive to the number of counts. In Paper I, we


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

A [cm2]

0 0.3

200

400

600

800

1.0

10.0

energy [keV]
20

A - Aext [cm2]

-80 0.3

-40

0

1.0

10.0

energy [keV]
Figure 4. Estimating the range of effective-area curves that are consistent with data. The plots summarize pfB (A | Y ) for Simulation II. The true (extremal) and nominal (default) effective-area curves are plotted in dot­dashed red and dotted black. The pointwise posterior distribution of A is plotted in blue, where the dark (light) blue area corresponds to the central 68.3% (90%) region of pfB (A | Y ) and the solid blue curve plots the posterior mean. The top panel is on the effective-area scale, and the bottom panel subtracts off the true effective-area curve, Aext , to highlight differences. The true curve is contained entirely in the blue region. The posterior mean (solid blue curve) shifts from the prior mean (dotted black) toward the true effective-area curve (dot­dashed red). (A color version of this figure is available in the online journal.)

illustrate that as the photon counts grow and the statistical errors become negligible, the error bars produced by the pragmatic methods will be entirely due to calibration uncertainty. The power of the fully Bayesian method is that it actually uses the data to measure the systematic error. Put another way, it handles systematic error in the same way one would handle statistical error. Thus, its error bars are wider than those provided by the standard method because they incorporate both sources of error, but their width decreases as the number of counts grows because the data are able to narrow the calibration uncertainty. This in turn allows us to go one step further and actually estimate A using its posterior distribution, pfB (A | Y ). We illustrate this in Figure 4, which shows that A0 is inconsistent with the pfB (A | Y ) fit under Simulation II. The shaded light blue area corresponds to a 90% pointwise posterior region10 for A and contains the true effective-area curve, Aext (plotted in red). The posterior region, however, is shifted toward A0 (plotted in black), which serves as the prior mean for A. Thus, the prior distribution on A has a clear influence on its posterior distribution. A similar pattern can be seen in Figure 3. The fitted values of are pulled from the true value toward the best value assuming A = A0 as computed with the standard method. We emphasize, however, that the prior distribution on A used by the fully Bayesian method is in fact making a much weaker assumption than what is commonly made in practice: the assumption that A is exactly equal to A0 . The standard method makes this assumption, and as we demonstrate in Section 3.2, its fitted values exhibit significant bias when A0 is misspecified.
10

3.2. A Frequency Evaluation of the Methods The 2 â 2 simulation study described in Section 3.1 generated only one data set for each of the four simulation studies. In this section we generate 50 spectra for each of the four simulation settings described in Table 2 and fit each of the resulting 50 â 4 simulated spectra with the standard, pragmatic Bayesian, and fully Bayesian methods. The fitted values for and their 68.2% error bars computed using the first ten spectra generated in each simulation appear in Figure 5. Numerical summaries of the entire simulation study appear in Table 3. The frequency analyses confirm the trends we highlighted in Section 3.1. 1. In all four simulations, the standard method on average exhibits the narrowest error bars (mean error bars), the pragmatic Bayesian method has the widest, and those of the fully Bayesian method are in between. (This also holds when A is known with certainty; see Xu 2014.) 2. When A is correctly specified, all three methods perform well, but the standard method performs best because it uses precise, but in practice unattainable, knowledge of A. (These results are not shown; see Xu 2014.) 3. In the more realistic situation in which there is uncertainty in A (i.e., Simulations I­IV), both the standard and pragmatic Bayesian methods exhibit substantial bias.11 The fully Bayesian method reduces this bias, resulting in the overall smallest mean square error.
11

In this case a 90% pointwise region is composed of intervals for each energy, all of which have 90% (central) posterior probability.

By "bias" we mean statistical bias, which is defined as the difference between the expected value of a fitted parameter upon repeated replication of the data and the true value of that parameter.

9


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Sim I

Sim I

Sim I

1.7

1.9

2.1

2.3

1.7

1.9

2.1

2.3

1.7

1.9

2.1

2.3

Sim II

Sim II

Sim II

0.90

1.00

1.10

1.20

0.90

1.00

1.10

1.20

0.90

1.00

1.10

1.20

Sim III

Sim III

Sim III

1.7

1.9

2.1

2.3

1.7

1.9

2.1

2.3

1.7

1.9

2.1

2.3

Sim IV

Sim IV

Sim IV

0.90

1.00



1.10
std

1.20

0.90

1.00

pB

1.10

1.20

0.90

1.00

fB

1.10

1.20

Figure 5. Frequency analysis for Simulations I­IV. Each panel gives the fitted values and 1 error bars for resulting from 10 replicate simulations. The red broken vertical line in each panel indicates the true value of . The four rows correspond to the four simulation settings, and the three columns correspond to the standard, pragmatic Bayesian, and fully Bayesian methods, respectively. Owing to the misspecification of A0 , the standard and pragmatic Bayesian methods both exhibit significant bias. The pragmatic Bayesian method adjusts for this bias with wider error bars, while the fully Bayesian method reduces the bias. Overall the fully Bayesian method is able to cover the true value of more often with narrow error bars than either of the other methods, especially with large-count spectra (i.e., Simulations I and II). (A color version of this figure is available in the online journal.) Table 3 Statistical Summaries of the Frequency Simulation Study Standard Coveragea Sim Sim Sim Sim If I If III IV 0.00 0.00 0.40 0.12 Mean Error Barsb 0.030 0.010 0.076 0.026 Std Error Root msee 0.097 0.056 0.125 0.063 Coverage 0.62 0.80 0.68 0.48 Pragmatic Bayesian Mean Error Bars 0.094 0.042 0.123 0.049 Std Error 0.030 0.008 0.079 0.028 Root mse 0.088 0.054 0.113 0.059 Coverage 0.76 0.78 0.66 0.56 Fully Bayesian Mean Error Bars 0.065 0.025 0.084 0.040 Std Error 0.040 0.017 0.078 0.036 root mse 0.052 0.016 0.103 0.053

Biasc 0.092 0.056 0.099 0.057

d

Bias 0.082 0.053 0.083 0.053

Bias 0.035 0.002 0.068 0.040

0.029 0.007 0.078 0.027

Notes. a The proportion of 68% (1 ) intervals that contain the true value of . b The mean length of the 1 error bars. c The mean of the fitted values of minus its true value. d The standard deviation of the fitted values of . e The square root of the mean of the squared deviations between the fitted value and the true value of . f Results for Simulations I and II are highlighted in bold because they show the largest gain in performance for the fully Bayesian method--specifically, when the effective-area curve is misspecified with high-count spectra.

4. The advantage of the fully Bayesian method is most striking when A is misspecified and the data set is large (Simulations I and II). In this case the estimates produced with the fully Bayesian method have much lower bias and root mean square (rms) error than those of the other two methods. Fully Bayesian intervals are much more likely to include the true value of (coverage) than intervals based on a fixed effective-area fit, and the fully Bayesian error 10

bars can be much narrower than the pragmatic Bayesian error bars. These effects dissipate with smaller data sets because substantial data are required to narrow the calibration uncertainty. 4. DATA ANALYSES Here we apply the three methods to real data to demonstrate the practical effects of the fully Bayesian method. We consider


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

power-law and multithermal sources observed with ACIS-S and a nominal blackbody source observed with Chandra HRC-S/ LETG. 4.1. A Sample of Radio-loud Quasars In radio-loud quasars, X-ray emission originates in close vicinity to a supermassive black hole and is believed to be caused by either an accretion disk or a relativistic jet. This emission can be modeled with a Compton scattering process and the X-ray spectrum described using an absorbed power law: (E ; ) = NE
- - (E ) NH

e

photons cm-2 s

-1

keV-1 ,

(12)

where (E ) is the absorption cross section and the three model parameters are = (N, ,NH ), with N the normalization at 1 keV, the photon index of the power law, and NH the absorption column. We consider a small sample of radioloud quasars whose spectra were observed with Chandra in 2002 (Siemiginowska et al. 2008; Lee et al. 2011). Standard data processing including source extraction and calibration was performed using the CIAO software (Chandra Interactive Analysis of Observations). The number of counts in the observed spectra varies between 8 and 5500. As in Paper I, we excluded two of the spectra (ObsID 3099, which had 8 counts, and ObsID 836, which is better described by a weak thermal spectrum) and reanalyzed the remaining 15 using the standard, pragmatic Bayesian, and fully Bayesian methods. We account for background contamination using a background spectrum extracted over large annuli surrounding each source and a highly structured background model that was originally fit to the blank-sky data provided by the Chandra X-ray Center (see Paper I for details). Only the normalization of the background model was fit in the individual spectral analyses. This approach was used for all but the two lowest-count spectra (<45, both with short 5 ks exposures), for which background was ignored. The sample of quasars was originally analyzed by Siemiginowska et al. (2008), who did not account for calibration uncertainty. A follow-up analysis accounted for calibration uncertainty using the pragmatic Bayesian method and resulted in substantially larger error bars for the high-count data sets (Paper I). As illustrated in Section 3, systematic errors due to calibration uncertainty swamp statistical errors for large data sets. For small data sets, however, the statistical errors may be much larger and the relative effect of calibration uncertainty is therefore less important. Here we reanalyze the same spectra with the fully Bayesian method and illustrate how it is able to deliver low-bias parameter estimates with smaller error bars than in the pragmatic Bayesian method. We fit each spectrum in three ways: 1. with the standard method, 2. with the pragmatic Bayesian method using the Iterated MH within partially collapsed Gibbs (PCG) Sampler (see Appendix A.2), and 3. with the fully Bayesian method using the Mixed Pragmatic Proposal Sampler. With each of the three methods, we use the 15 observationspecific default effective-area curves, A , corresponding to each 0 spectra. For the two Bayesian methods, we use J = 8 in the PCA summary of the calibration library along with the A 0 in Equation (5). When running the Iterated MH within PCG Sampler, we set I = 10 and M = 10; that is, at each iteration 11

we run pyBLoCXS M + I - 1 times, discarding the output of the first I - 1 runs and keeping the output of the final M runs. In the Mixed Pragmatic Proposal Sampler, the two proposal rules were used in equal proportion, and with the random-walk proposal we set = 0.1. Results appear in Figures 6 and 7. Error bars for computed ^ under the pragmatic Bayesian (pB ) and fully Bayesian (fB ) ^ methods are compared with those computed under the standard method (std ) in Figure 6. The left panel replicates results ^ reported in Paper I and shows that for large data sets, for which std is small, the pragmatic Bayesian method produces much ^ larger error bars than the standard method; pB accounts for ^ systematic and statistical errors, whereas std only accounts for ^ statistical errors. For the largest data sets pB is twice as big as ^ std . The right panel of Figure 6 shows that the fully Bayesian ^ method produces error bars more in line with std ; although for ^ the largest data sets fB is bigger than std , it is not as big as pB . ^ ^ ^ Figure 7 compares the 1 intervals for produced by the three methods. Consider an interval computed under the pragmatic ^ ^ Bayesian method: CIpB () = { ± pB ()}, where is the ^ estimate of under this method. To compare this interval with ^ that computed under the standard method, we subtract std from CIpB () and divide by std (). The result is an interval that ^ extends from -1 to 1 if CIpB () and CIstd () are identical, is ^ wider if pB > std , and shifts to the left or right if pB and ^ ^ ^ std differ. These adjusted intervals are plotted in the left panel of Figure 7. There is little shifting to the left or right because the pragmatic Bayesian and standard methods produce similar estimates. For smaller-count data sets the adjusted intervals are nearly {-1, 1}, but for larger-count data sets the adjusted intervals are as much as twice as wide. The right panel of Figure 7 illustrates intervals computed under the fully Bayesian method, CIfB (), that are adjusted in the same manner. For the smaller data sets CIfB () and CIstd () are similar, but for larger data sets they differ: the adjusted intervals tend to shift to the left or right but are not generally much more than two units wide. This means that the fully Bayesian method tends to adjust the fitted value and to increase error bars only moderately. In two cases (ObsID 3097 and 866) the fully Bayesian method shifts the fitted value of by more than std . This constitutes a ^ significant shift in the scientific inference for these observations when calibration uncertainty is accounted for in a principled Bayesian manner. Guidelines as to how many counts are needed for the fully Bayesian fitted values and error bars to differ from those computed with the standard method would be very useful. In practice, however, this depends on a number of key factors, including the instrument in question (i.e., the calibration library), the energy range over which fits are carried out, the fitted spectral model, and the true underlying source spectrum. Thus, there are no useful generic guidelines, and the establishment of such guidelines in specific situations is an important area of future research. This example provides a single illustration of this possibility, where we see that observations with more than about 5000 counts produced noticeably different inference under the two methods; see Figure 7. 4.2. Fitting a Multithermal Spectral Model Our analyses thus far have been carried out using a simple power-law spectral model (in Paper I, in the simulations in Section 3, and for the observed data in Section 4.1). Here we illustrate the pragmatic and fully Bayesian methods using a more


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Figure 6. Comparing the error bars for computed under the standard (std ), pragmatic Bayesian (pB ), and fully Bayesian (fB ) methods using spectra from each ^ ^ ^ ^ ^ of 15 radio-loud quasars. Smaller values of std correspond to data sets with more counts. For high-count spectra, pB tends to be substantially larger than std while ^ fB is only moderately larger than std . Thus, the fully Bayesian method is able to provide a principled accounting for calibration uncertainty with only a moderate ^ ^ increase in the final error bars. (A color version of this figure is available in the online journal.)

Figure 7. Standardized pragmatic (left) and fully Bayesian (right) intervals. Using spectra from the 15 quasars, the left panel plots 1 confidence intervals computed ^ with the pragmatic Bayesian method, but re-centered and rescaled using the estimate and error bars of computed under the standard method: (CIpB - std )/std (). ^ If the pragmatic and standard methods return the same estimates and error bars, the plotted intervals would equal the interval (-1, 1). In fact, the plotted intervals are as much as twice this wide for large-count data sets, indicating that pB can be substantially larger than std . The right panel plots 1 confidence intervals computed ^ ^ with the fully Bayesian method, re-centered and rescaled in the same manner. The plotted intervals shift right and left because the fitted values under the standard and fully Bayesian methods differ. The widths of the fully Bayesian intervals, however, are only moderately larger than the standard intervals. (A color version of this figure is available in the online journal.)

complex multicomponent thermal model. To do this, we analyze one of the strongest sources in the Chandra Source Catalog, Ori, a young (<12 Myr) binary system composed of an X-raybright O9 supergiant and a weaker B0 subgiant with about a 3 separation. The source is observed (ObsID 1878) at 15 off-axis, situated on the ACIS-S2 chip, and is detected with a count rate of 1.33 counts s-1 , with >105 counts in 75.46 ks. Because of the large off-axis location, the point-spread function is broad, and the binary cannot be spatially resolved. Furthermore, the source is spread out over more than 20,000 pixels, with maximum fluence at <0.0017 counts s-1 pixel-1 so that CCD pileup effects may be ignored. 12

Our objective is not to model the spectrum in detail, but rather to consider the effect calibration uncertainty has on spectral fitting. (The X-ray emission is thermal, generated from shocked plasma deep in the wind; see Waldron & Cassinelli 2001; Pollock 2007; Raassen et al. 2008; Herve et al. 2013, for various models designed to account for X-ray emission from massive stars.) We construct a variable-abundance absorbed two-temperature APEC spectral model and fit it to the data. This roughly mimics previous attempts to model spectra of Ori obtained with other telescopes such as ASCA (Yamauchi et al. 2000) and XMM-Newton (Raassen et al. 2008). For reference, Yamauchi et al. find two temperature components


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Standard

Pragmatic Bayes

Fully Bayes

0.49

0.49

T2

T2 0.47

0.45

0.45

0.170 0.190

0.180

0.190

0.170 0.190

0.180

0.190

0.45 0.170 0.190

T2 0.47

0.47

0.49

0.180

0.190

T1

T1

T1

0.180

T1 0.180

0.170

0.170

0

1000

2000 Iterations

3000

0

1000

2000 Iterations

3000

0.170

T1 0.180
0

T

1

10000

25000 Iterations

Figure 8. Standard, pragmatic Bayesian, and fully Bayesian fits of the T1 and T2 parameters in the spectrum of Ori. The first row plots a complete Monte Carlo simulation from each of the three posterior distributions; the solid red lines correspond to the posterior means of T1 and T2 under each fit. The fitted values (posterior means) for the two parameters are indistinguishable under the standard and pragmatic Bayesian methods but shift noticeably under the fully Bayesian method. Error bars are quantified by the spread of the simulated parameter values under each of the methods. The error bars under the pragmatic Bayesian method are noticeably larger than those computed with the other two methods. Thus, the fully Bayesian method again is able to account for calibration uncertainty by shifting the fitted values rather than by increasing their error bars. The second row presents time-series plots of the Markov chains for T1 used to generate the three Monte Carlo simulations. While all three chains are fairly well behaved, the fully Bayesian chain occasionally "sticks" at the same parameter value for a number of iterations. This is an indication that the pragmatic proposal distribution attributes relatively little probability to some regions of the parameter space with appreciable probability under pfB (, A | Y ). Nonetheless, the algorithm performs well enough for valid inference. (Here we only plot every 10th iteration of the fully Bayesian sampler.) (A color version of this figure is available in the online journal.) Table 4 Fitted Parameters for Ori Model Parameters Standard Analysis Pragmatic Bayesian Fully Bayesian Analysis Analysis 15.0 0.179 0.474 0.23 0.48 0.41 6.4 1.07 ± ± ± ± ± ± ± ± 0.92 0.0022 0.0069 0.024 0.033 0.032 0.78 0.079 15.1 0.181 0.471 0.21 0.47 0.40 5.9 1.00 ± ± ± ± ± ± ± ± 0.87 0.0019 0.0067 0.026 0.034 0.031 0.56 0.059

NH (1020 cm-2 ) 14.8 ± 0.90 T1 (keV) 0.179 ± 0.0016 T2 (keV) 0.474 ± 0.0063 [C, N, O]a 0.23 ± 0.018 [Ne]a 0.48 ± 0.031 [Ni, Mg, Si, Ca, Fe]a 0.41 ± 0.028 Norm1 (1012 cm-5 ) 6.3 ± 0.65 Norm2 (1012 cm-5 ) 1.05 ± 0.057

Note. a Abundances relative to solar (Anders & Grevesse 1989).

at T1 = 0.2, T2 = 0.6 keV, at an absorption column fixed at NH = 2.61020 cm-2 ; Raassen et al. find three temperature components at T1 = 0.55, T2 = 0.2, T3 = 0.07 keV, NH = 51020 cm-2 , with abundances of C, N, and O, being close to solar photospheric (represented by the compilation of Anders & Grevesse 1989), and those of Ne, Mg, Si, and Fe being elevated. The results of applying the standard, pragmatic Bayesian, and fully Bayesian methods to the data are given in Table 4. We find significantly higher absorption columns and lower temperatures and abundances, with our estimated abundance consistent with the low metallicities measured for ´ the nearby NGC 2023 star cluster (Lopez-Garc´ et al. 2013). ia 13

These results are stable with respect to calibration uncertainty, with all three methods producing similar best-fit values. The fits are also summarized in Figures 8 and 9. They show that relative to the standard method, the pragmatic Bayesian method delivers similar fitted values for T1 and T2 but accounts for calibration uncertainty by inflating their error bars. The fully Bayesian method shifts the fitted values by a small amount and generally requires a smaller increase in the error bars. In this case, the posterior correlation of T1 and T2 decreases under the fully Bayesian method; see Figure 9. The second row of Figure 8 shows time-series plots of the Markov chains of T1 used to simulate the three posterior distributions. (The samplers were run with J = 8,I = 10,M = 10, and = 0.1.) While all three converge reasonably well, the fully Bayesian chain occasionally "sticks" at a particular value of the parameter for a number of iterations. This indicates that the pragmatic proposal distribution may attribute relatively little probability to some regions of the parameter space with appreciable probability under pfB (, A | Y ). This can also be seen in Figure 9, where the 90% contours of (normal approximations to) the pragmatic and fully Bayesian posterior distributions are plotted in green and blue, respectively. The fact that the fully Bayesian contour extends outside the pragmatic Bayesian contour indicates that we may have trouble exploring parts of pfB (, A | Y ) using the pragmatic proposal distribution. (Recall that we require the jumping rule of the independence sampler to be an overdispersed approximation to pfB (, A | Y ).


The Astrophysical Journal, 794:97 (21pp), 2014 October 20
0.49

Xu et al.

pB
std

fB

0.170

0.175

0.180

0.185

0.190

T

1

Figure 9. Plot superimposing the 90% contours of the three posterior distributions simulated in the first row of Figure 8. The contours are computed using a normal approximation to the posterior distribution computed under the standard (red), pragmatic Bayesian (green), and fully Bayesian (blue) methods. The posterior means under the three methods are plotted, respectively, as a red cross, a green plus sign, and a filled blue square. Relative to the standard method, the pragmatic Bayesian method delivers similar fitted values and larger error bars, while the fully Bayesian method shifts the fitted value but only moderately inflates the error bars. In this example the correlation of the two parameters decreases under the fully Bayesian method. (A color version of this figure is available in the online journal.)

The Mixed Pragmatic Proposal Sampler, however, mixes the pragmatic proposal with a random-walk update. This second component allows simulation of parameter values with relatively low probability under the pragmatic proposal rule. In this case the sampler was computationally costly. Owing to its "sticking," we ran the fully Bayesian sampler for 30,000 iterations. (The pragmatic sampler was run for 3000.) This, combined with complexity of the multithermal spectral model fit with its eight parameters, resulted in a larger computational burden ( 20â) than the other analyses carried out here (see Sections 4.1 and 4.3; those runs typically took 60­90 minutes on a 64-bit 2.2 GHz CPU with 16 GB of RAM). In the fully Bayesian run, the data prove to be informative about the effective areas. The subset of A that is consistent with the data and the adopted model suggests that the nominal effective area is underestimated; see Figure 10. This is not surprising, since effective areas at large off-axis angles are not as well calibrated as those near the aimpoint. Naturally, this result is conditional on the validity of the adopted spectral model of an absorbed two-temperature thermal emission with variable abundances. Our analysis suggests that at off-axis angles >15 the Chandra effective areas should be increased by 10% over the 0.5­2 keV range and by 5% at high energies >6 keV. 4.3. Fitting a Blackbody Model to a Grating Spectrum As an additional example of the versatility of our approach, we consider data from an entirely different detector, fitted with a

0.45

0.46

0.47

T

2

0.48

A [cm2]

0 0.3

100

300

500

1.0

10.0

energy [keV]
60

A - A0 [cm2]

0 0.3

20

40

1.0

10.0

energy [keV]
Figure 10. Estimating the range of effective-area curves that are consistent with the spectrum of Ori. The plots summarize pfB (A | Y ). The pointwise posterior distribution of A is plotted in blue, where the dark (light) blue area corresponds to the central 68.3% (90%) region of pfB (A | Y ). The pointwise posterior mean of A and its default value, A0 , are plotted as solid blue and dotted black lines, respectively. The top panel shows the full effective areas, and the bottom panel subtracts off the default effective-area curve, A0 , to highlight the differences. (We cannot subtract off the true curve as in Figure 4 because it is unknown.) The data suggest that A0 underestimates the true effective area by 10% over the 0.5­2 keV range and by 5% at high energies >6 keV, and the vignetting correction must be reduced for large off-axis angles for Chandra imaging observations. (A color version of this figure is available in the online journal.)

14


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Figure 11. Top five principal components of the Chandra LETGS+HRC-S effective-area calibration library. The first (blue), second (red), third (green), fourth (brown), and fifth (pink) components, weighted by the square root of their eigenvalues, are shown as colored regions. Together, they account for 95.5% of the variance in the LETGS+HRC-S calibration library. The sum of the similarly weighted contributions from the remaining components is shown in gray. The vertical dashed lines indicate the wavelength range over which the data analysis is carried out. (A color version of this figure is available in the online journal.)

blackbody spectral model. We analyze a high-resolution grating spectrum of an isolated neutron star, RX J1856.5-3754 (RX J1856). This source has been observed with the LETGS+HRC-S grating/detector combination numerous times over the Chandra mission and has accumulated 617.735 ks of exposure. RX J1856 is an intrinsically interesting object: it was originally classified as an isolated neutron star but was later suspected to be a quark star (Drake et al. 2002). The X-ray data were fit as a blackbody spectrum with a temperature of T = 61.1 ± 0.3 eV by Drake et al., resulting in a radius estimation of 4­8 km. The optical data are inconsistent with the X-ray predictions and require fitting by a more complex magnetic hydrogen atmosphere model, with a temperature of T 37 eV and a radius R 17 km consistent with a conventional neutron star core (Ho et al. 2007). In the X-ray regime itself, these two models cannot be statistically distinguished (Ho et al.). Here, for the sake of simplicity, we adopt an absorbed blackbody spectrum model to fit the soft X-ray data. This spectral model was fit using exactly the same methods and algorithms as in Section 4.1 (J = 8,I = 10,M = 10, = 0.1.), except that the background was modeled as a fixed eighth-degree polynomial whose coefficients were determined via a standard fit to the background spectrum, and was then incorporated into the source model with a variable normalization. As was done for the Chandra/ACIS-S, we generate a calibration library based on constrained spline curve modifications of known subsystem uncertainties in the LETGS+HRC-S system (J. Drake et al., in preparation). Figure 11 shows the top five principal components (in color), which together account for >95% of the variance in the library. Also shown, in gray, are the summed contributions of the remaining components, which account for <5% of the variance. Notice that there are wavelength regions where this residual could be a significant factor. The two wavelength ranges over which RX J1856 data are informative are shown with vertical dashed lines; the residual components 15

do not affect the analysis over these ranges. Note that in all of our analyses, we use J = 8 principal components. We limit our analysis to the wavelength ranges +[25:59.5, 68:80] å, with the gap centered on the HRC-S chip gap. The chip gap is excluded because, even though the nominal effective area A0 includes the effect of dither and corrects for the drop across the gap, it is subject to additional systematic errors due to deformations in the locations of the active regions on the chip, and these are not included in the calibration library. We co-add the spectra from all positive dispersion data sets of RX J1856. We exclude the negative-order dispersion data for the sake of simplicity. It has a chip gap in a different location, and the uncertainty in its effective area likely exhibits qualitatively different characteristics compared to the positive order, and thus using it would make interpretation of the analysis results difficult. There are 129 kcount in the spectrum over the chosen wavelength range, of which 43.8 kcount are estimated to be due to the background. Although the large number of net counts (85 kcount) puts this data set in the range where calibration uncertainty will likely affect the total error bars, the large fraction of expected background counts makes this an inefficient data set to place constraints on the calibration library. As expected, the application of the pragmatic Bayesian method increases the error bars on the best-fit model parameters; see Table 5. For instance, the temperature estimate remains stable at T = 62.4 eV, but the uncertainty increases from ±0.6 eV for standard analysis to ±1.05 eV for pragmatic Bayesian and decreases slightly to ±0.93 eV for fully Bayesian analysis; see also Figure 12. Unlike the case with Ori (Section 4.2), there is no significant effect on the range of effective-area curves in the calibration library that are consistent with the spectrum of RX J1856 (see Figure 13). We attribute this lack of sensitivity partly to the high background that contaminates the data set and also to the short wavelength range over which the data set is informative. As we


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

Standard

Pragmatic Bayesian

Fully Bayesian

Temperature [keV] 0.060 0.064

Temperature [keV] 0.060 0.064

0.007 0.009 0.011 Absorption Column [cm-2]

0.007 0.009 0.011 Absorption Column [cm-2]

0

1000 2000 Iterations

3000

0

1000 2000 Iterations

3000

Temperature [keV] 0.060 0.064 0

Temperature [keV] 0.060 0.064

Temperature [keV] 0.060 0.064

Temperature [keV] 0.060 0.064 0.007 0.009 0.011 Absorption Column [cm-2]

1000 2000 Iterations

3000

Figure 12. Standard, pragmatic Bayesian, and fully Bayesian fits of the temperature and absorption column in the spectrum of RX J1856.5-3754. The plots are layered as in Figure 8. In this case, however, there is only a small shift of the fitted values (posterior means given by the solid red lines). There is a noticeably less joint uncertainty in the two parameters under the fully Bayesian fit than under the pragmatic fit. There is no noticeable sticking for the any of the MCMC samplers; compare the second row with that of Figure 8. (A color version of this figure is available in the online journal.) Table 5 Fit Parameters for RX J1856.5-3754 Model Parameters NH (1020 cm-2 ) T (eV) Norm ( L36 a erg s-1 cm 2
D10

Standard Analysis 0.91 ± 0.043 62.4 ± 0.58 0.311 ± 0.0056 66 ± 1.5

Pragmatic Bayesian Analysis 0.91 ± 0.058 62 ± 1.1 0.31 ± 0.023 66 ± 1.8

Fully Bayesian Analysis 0.93 ± 0.056 62 ± 0.9 0.32 ± 0.022 65 ± 1.7

-2

)

Background scale

Note. a L36 is the source luminosity in units of 1036 erg s-1 and D10 is the distance in units of 10 kpc.

see from the principal components displayed in Figure 11, there are long-range correlations present in the library that will be selected for when a source with a suitably long wavelength range is analyzed. In the current analysis, there is a suggestive small curvature bias of 1%­2% in the HRC-S/LETG effective area over the 25­80 å range. This bias, however, is fully contained within the nominal 1 range of the effective-area curves. 5. DISCUSSION In this article we demonstrate the advantage of a fully Bayesian accounting of calibration uncertainty. Relative to the pragmatic Bayesian method, the fully Bayesian method delivers estimates with smaller bias and smaller error bars. As with the pragmatic method, the fully Bayesian method requires largecount data sets to deliver significant gains over the standard method. In low-count data sets, uncertainty stemming from random fluctuations in the counts swamps that due to calibration. The advantage of the fully Bayesian method stems from its ability to handle systematic errors due to calibration uncertainty 16

in the same way one would handle statistical errors. In this way it accounts for calibration uncertainty by shifting the fitted values of spectral parameters. From a scientific perspective, this is preferable to increasing their reported error bars, the mechanism by which the pragmatic Bayesian method accounts for calibration uncertainty. Fitting a spectral model under the fully Bayesian method poses significant computational challenges. We illustrate how we deal with this problem by leveraging the pragmatic Bayesian fit to deliver parameter values simulated under this fully Bayesian posterior distribution. This strategy allows us to simultaneously fit the effective-area curve and the spectral parameters. Thus, we are able to use information in large-count observed spectra to narrow the uncertainty for the calibration product. We focus on the application of the fully Bayesian methods to account for uncertainty in the effective-area curve in X-ray spectral analysis. The general techniques we employ, however, have broad applicability in handling systematic errors. Obvious extensions include accounting for uncertainty in other


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

A [cm-2]

0

2

4

6

8

10

12

30 1.0

40

50

60

70

80

wavelength [A]

A - A0 [cm-2]

-1.0

-0.5

0.0

0.5

30

40

50

60

70

80

wavelength [A]
Figure 13. Estimating the range of effective-area curves that are consistent with the spectrum of RX J1856.5-3754. The plots summarize pfB (A | Y ). The pointwise posterior distribution of A is plotted in blue, where the dark (light) blue area corresponds to the central 68.3% (90%) region of pfB (A | Y ). The pointwise posterior mean of A and its default value, A0 , are plotted as solid blue and dotted black lines, respectively. The top panel displays the effective areas, and the bottom panel shows the same data with the default effective area, A0 , subtracted off to highlight the differences. (We cannot subtract off the true curve as in Figure 4 because it is unknown.) Here the fitted (posterior mean) and default (dotted black curve) are very similar. (A color version of this figure is available in the online journal.)

calibration products, such as photon redistribution matrices (RMF), exposure maps, and point-spread functions. Because all of these calibration products exhibit more complex structure than an effective-area curve--they are represented by matrices rather than vectors--more sophisticated methods will be needed to summarize their calibration libraries into concise and useable forms. Less obviously, our methods can be used to account for other sources of systematic errors. In our fit of the radio-loud quasars in Section 4.1 and of the background-dominated isolated neutron star RX J1856 in Section 4.3, for example, we used highly structured background models that were originally fit to the blank-sky data provided by the Chandra X-ray Center or to the locally measured background from a spatially offset region. Only the normalization of this background model was fit to the individual sources. Just like an effective-area curve, this background model is a vector that can only be specified with uncertainty. Ignoring this uncertainly can lead to biases and systematic errors. Similarly, the comprehensive atomic line emissivity database, AtomDB (Foster et al. 2012), is often used to specify a spectral model for X-ray data. While this database has been compiled by carefully combining empirical observations with theoretical calculations, its entries are not known exactly. Like a calibration product, its errors exhibit complex high-dimensional correlations that cannot be summarized with simple error bars for each entry. A better strategy is to compile an "AtomDB library" akin to a calibration library that can then be modeled to fully integrate uncertainty in AtomDB into individual spectral analyses. In principle, we can image fitting 17

models that simultaneously account for uncertainly both in multiple calibration products and in multiple model components. In practice, this will involve significant modeling challenges such as accounting for correlations between calibration products and/ or model components. Sophisticated computation methods will also be required, and large data sets will be needed to narrow uncertainty on multiple sources of systematic error. Although such work will involve substantial effort, it is likely to pay significant dividends in reducing bias stemming from calibration and model misspecification, at least when large-count data sets are available. We dedicate this paper to the memory of Alanna Connors. Her commitment to disseminating principled statistical methods among astronomers and to demonstrating their practical benefits in real problems influenced not only this work but the burgeoning discipline of astrostatistics as a whole. She was also an enthusiastic educator of statisticians, helping them to grapple with the subtleties of astrophysical data and models and to become actively involved in solving statistical challenges in astronomy. Although the impact of her efforts remains clearly felt, she is dearly missed. This work was supported by NASA contract NAS8-03060 to the Chandra X-ray Center (V.L.K., A.S., J.D., P.R.), NSF grants DMS 09-07522, DMS-09-07185, DMS-12-08791, and DMS-12-09232 (D.v.D., X.L.M., Y.Y., A.C., J.X.). In addition, D.v.D. was supported in part by a British Royal Society Wolfson Research Merit Award, by a European Commission Marie-Curie Career Integration Grant, and by the STFC (UK).


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.

APPENDIX STATISTICAL COMPUTING In this appendix we outline our computational strategy for fitting the fully Bayesian model. We begin in Appendix A.1 with a brief review of how Bayesian models can be fit using MCMC. As we discuss in Section 2.3, our fully Bayesian samplers are built upon a pragmatic Bayesian sampler. Thus, in Appendix A.2 we describe how we can improve the efficiency of the computational method presented in Paper I for fitting the pragmatic Bayesian model. Appendix A.3 describes the simple approximation to ppB (, A | Y ) that we use in our fully Bayesian sampler. Additional details of our algorithms, samplers, and numerical methods can be found in Xu (2014). A.1. Fitting Bayesian Models Using MCMC To compute fitted parameters and their error bars in a Bayesian analysis, we use quantities such as the mean, variance, and percentiles of the posterior distribution. Although in some simple cases the posterior distribution is a well-known distribution and these quantities can be derived analytically, numerical methods are typically used in practice. One very popular numerical method is Monte Carlo, which involves simulating replicate values, { (1) ,..., (T ) }, from the posterior distribution and using sample means, variances, and percentiles of the replicate values to compute fitted parameters and error bars. The most common Monte Carlo methods involve MCMC. (Brooks et al. (2011) offers a general reference for MCMC, while van Dyk et al. (2001) and Park et al. (2008) review how it can be used in X-ray spectral analysis.) A Markov chain is constructed as an ordered sequence { (t ) ,t = 1, 2,...}, each value of which only depends on the previous value in the chain. Specifically, suppose we were able to obtain a simulated value, (0) , from the posterior distribution, and using it, we could randomly generate another value:
(1)

The partially collapsed Gibbs (PCG) sampler reduces the conditioning in one or more steps of a Gibbs sampler while maintaining the validity of the resulting algorithm (van Dyk & Park 2008). For example, if we draw A in Step 1 of the two-step Gibbs sampler without conditioning on , i.e., A(t +1) p(A | Y ), the resulting sampler would be a PCG sampler. PCG samplers typically converge faster than their parent Gibbs samplers. The MH algorithm draws from a proposal distribution, g ( | (t ) ), that is different from the posterior distribution, but it can be corrected using a rejection step. Specifically, MH Algorithm: For t = 0, 1, 2,... , T , Step 1: Draw
prop

g ( | (t ) ) and compute

=

p( prop | Y )g ( (t ) | prop ) . p( (t ) | Y )g ( prop | (t ) )

Step 2: Let u be a uniformly distributed random number between 0 and 1 and set
(t +1)

=



prop (t )

if u < . otherwise

K( | (0) ),

(A1)

so that (1) is also a simulated value from the posterior distribution. In Equation (A1), K is called the kernel of the MCMC sampler; it describes a random distribution that depends only on the previous value (t -1) and that we use to simulate a next value (t ) . Iterating Equation (A1) in this way results in a chain of values that is called a Markov chain and that delivers a correlated sample from the posterior distribution. In practice, we must start the chain at a typically arbitrary value, (0) , and let it run until it converges to the posterior distribution. Convergence can be checked by running multiple chains and waiting until they appear to sample from the same distribution (e.g., van Dyk et al. 2001; Gelman & Shirley 2011). Deriving kernels that are easy to use and result in MCMC samplers that are fast to converge is key in practice. Two popular methods for deriving MCMC samplers are the Gibbs sampler (e.g., Casella & George 1992) and the MH algorithm (e.g., Hastings 1970). The Gibbs sampler proceeds by sampling each of several components of for their conditional posterior distributions given the other components of . Specifically, again letting = (, A), Two-step Gibbs Sampler: For t = 0, 1, 2,... ,T , Step 1: Draw A(t +1) p(A | (t ) ,Y ). Step 2: Draw (t +1) p( | A(t +1) ,Y ). 18

For a given (t ) , the Gibbs sampler and the MH algorithm both provide a mechanism to randomly generate (t +1) . We refer to this random mechanism as the kernel, denoted by K. The Sherpa module, pyBLoCXS (Bayesian Low Count X-ray Spectral analysis in Python) is adopted from the method of van Dyk et al. (2001) and uses MH algorithms for Bayesian spectral analysis in the Sherpa environment. pyBLoCXS provides a convenient and efficient MCMC sampler for drawing the spectral parameter, , from its posterior distribution given A, that is, from p( | Y, A); see Siemiginowska et al. (2011) and http://hea-www.harvard.edu/astrostat/pyblocxs/ for details of pyBLoCXS. In this article, we use Gibbs-type samplers to draw and A from their joint posterior distribution. We accomplish Step 2 of the Gibbs sampler using pyBLoCXS. That is, pyBLoCXS is incorporated as a component of an overall algorithm that simulates from the full posterior distribution, p(, A|Y ). To clarify its role in the overall algorithm, we refer to the pyBLoCXS kernel as KpyB . Using an MH algorithm for one or more of the component steps of a (partially collapsed) Gibbs sampler is a common hybrid strategy known as MH within (partially collapsed) Gibbs sampling (e.g., van Dyk & Jiao 2014). Our algorithm for implementing the fully Bayesian method builds on the algorithm for the pragmatic Bayesian method. For computational efficiency, we have modified and improved the latter compared with that given in Paper I. Here, we first review the algorithm from Paper I and then detail our improvements to it. A.2. Fitting the Pragmatic Bayesian Model Fitting the pragmatic Bayesian model involves simulating a sequence of effective-area curves from p(A) and running pyBLoCXS for each simulated curve. Specifically, in Section 4.2.2 of Paper I, we proposed the following. MH within PCG Sampler of Paper I For t = 0, 1, 2,... , T , Step 1: Draw ej N (0, 1) for j = 1,...,J and set


The Astrophysical Journal, 794:97 (21pp), 2014 October 20
(t +1) ( epB = (e1t +1) ( ,...,eJt +1)

Xu et al.

¯ ) and A(t +1) = A0 +(A - A0 )+ pB
(t +i/I )

(t +1) J rj vj j =1 ej

.

Step 2 (Inner Loop): For i = 1,...,I , simulate pB KpyB ( | (t +(i -1)/I ) ; Y, A(t +1) ).

The iteration of pyBLoCXS in Step 2 aims to eliminate the (t +1) (t ) dependence of its final output, pB , on its starting value, pB .
(t +1) Only pB is retained; the intermediate draws from the inner

,...,pB , are discarded. (Notice that the loop, pB superscripts on the discarded draws of are fractional; we only retain the draws of with integer superscripts.) This is necessary because A(t +1) is simulated from its marginal distribution in Step pB
(t ) (t +1) 1 and is independent of pB . The simulated value pB should be

(t +1/I )

(t +(I -1)/I )

correlated with A(t +1) , insofar as A is informative for and thus pB A and are correlated under ppB (, A | Y ). Likewise, because (t ) (t +1) A(t ) and A(t +1) are independent, pB and pB should also be pB pB independent. The number of inner iterations, I, is determined by examining the empirical autocorrelation function of KpyB for a (t ) given data analysis. Its value should be set to ensure that pB
(t +1) and pB are approximately independent; see Paper I and van Dyk & Jiao (2014) for discussion. While this MH within PCG sampler effectively delivers fitted values and error bars that account for calibration uncertainty under ppB (, A | Y ), it can be very slow in terms of its required computational time. This is owing to the inner workings of pyBLoCXS which approximates p( | A, Y ) by fitting the spectral model via maximum likelihood (i.e., using the Cash statistic) in Sherpa. This is of little consequence when pyBLoCXS is used with fixed A, because the Sherpa fit only needs to be performed once. In the context of the MH within PCG Sampler, however, pyBLoCXS must be reinitiated with a new Sherpa fit at every iteration because A(t +1) is updated at every iteration. pB Because each Sherpa fit requires about 6­8 s of CPU time, a run of 3000 MCMC iterations requires 5­7 hr on Sherpa fits. Below, we describe an alternate implementation of the MH within PCG sampler that is dramatically faster. In particular, it fully leverages each of the effective-area specific Sherpa fits by (t ) continuing to iterate as in Step 2 after the dependence on pB has worn off in order to obtain several simulated values of per iteration. Specifically, we propose the following. Iterated MH within PCG Sampler For t = 0, 1,... ,T , ( Step 1: Draw ejtM (tM epB +1) ( = (e1tM +1) +1)

results in M times more simulated values of than of A. The advantage is that the extra T (M - 1) simulations of require relatively little computational time. The disadvantage is that these simulated values are correlated whereas the simulated values of the MH within PCG sampler of Paper I are essentially independent. We illustrate this trade-off using the simulation study described in Section 2.2. The left and right columns of Figure 14 show the performance of the MH within PCG Sampler of Paper I and our Iterated Sampler (run with M = 10), respectively. The first row is a time-series plot of the first 300 consecutive draws of (t ) obtained with the two samplers. The blocky nature of the draws obtained with the Iterated Sampler is a result of its infrequent updating of A; each block is drawn with a common effective-area curve. The second row presents the same time-series plots, but as a function of CPU time (in seconds) rather than iteration number. The relative speed of the Iterated Sampler is apparent. The final row shows scatterplots ( of the simulated values of (NHt ) , (t ) ) obtained in about 1 hr. Again the advantage of the Iterated Sampler is clear, despite the blocky nature of its simulated values. Running the Iterated Sampler requires deciding on the values of both I and M, that is, the number of initial simulated values of to discard (I - 1) at each iteration and the number of additional extra simulated values to keep (M). The initial iteration in Step 2 is designed to mitigate the dependence of the simulated values on the input value of at each iteration, e.g., the dependence of (Mt +1) on (Mt ) . To measure this effect, we can compute the correlation of (Mt ) and (Mt +1) , ^
T t =1

(

(Mt )

^ - pB )( (
(Mt )

(Mt +1) 2

^ - pB )

T t =1

^ - pB )

,

(A2)

N (0, 1) for j = 1,...,J and set
+1)

( ,...,eJtM

) and A(tM pB

+1)

¯ = A0 +(A -
(Mt +i/I )

A0 )+

(tM +1) J rj vj j =1 ej

.

Step 2 (Inner Loop): For i = 1,...,I , simulate pB KpyB ( | (t +(i -1)/I ) ; Y, A(t +1) ).
(Mt Step 3: For i = 2,...,M , simulate pB (Mt +i -1) (t +1) ; Y, A ). (tM At iteration t, we only retain epB +1) +i )

(t ) ^ where pB = (1/T M ) TM pB . A statistical hypothesis test of t =1 the correlation being 0 can be rejected if T 2 /(1 -^ 2 ) is greater ^ than about 2 (for large T, e.g., >100). If this threshold is met, the sampler should be rerun with a larger value of I. Van Dyk and Jiao (2014) discuss other methods for choosing and validating I in the general MH within PCG sampler. (The choice of I is less important when implementing the fully Bayesian method; see Appendix A.3.) An optimal value of M can be obtained by minimizing the Monte Carlo variance of the estimated posterior expectation of ^ . This quantity is approximated with pB and is a common choice for the fitted value of . Xu (2014) shows how the ^ Monte Carlo variance of pB depends on M and how it can be approximately minimized on the fly within the Iterated Sampler. Xu (2014) also discusses various choices of error bars ^ for pB that may be less affected by the blocky nature of the chain generated by the Iterated Sampler. We do not purse this topic here, however, because the Iterated Sampler is only an intermediate step toward our goal of an algorithm for fitting the fully Bayesian model.

K

pyB

( |

A.3. Approximating the Pragmatic Posterior Distribution As discussed in Section 2.3, although ppB (, A | Y )isideally suited as an overdispersed approximation to pfB (, A | Y ), it cannot be used as the proposal distribution, g, in the Pragmatic Proposal Sampler for the simple reason that we cannot directly evaluate ppB (, A | Y ). (We need to evaluate g in order to compute given by Equation (11).) The problem becomes 19

, A(tM pB

+1)

, and

(tM (Mt pB +1) ,...,pB +M ) and discard the intermediate draws of obtained internally within the inner loop of Step 2. That is, we retain draws of with integer-valued superscripts and discard those with fractional superscripts. Notice that this strategy


The Astrophysical Journal, 794:97 (21pp), 2014 October 20

Xu et al.



1.15

1.05



0.95

0

50

100

150

200

250

300

0.95 0

1.05

1.15

(a)

(b)

50

100

150

200

250

300

iterations
1.15 1.15
(c) (d)

iterations




0 50 100 150

0.95

1.05

0.95 0

1.05

50

100

150

CPU time(seconds)
1.15 1.15
(e) (f)

CPU time(seconds)

1.05


0.10 0.11 0.12 0.13
H



0.95

N

0.14

0.15

0.95 0.10

1.05

0.11

0.12

N

0.13
H

0.14

0.15

Figure 14. Improved speed of the Iterated MH within PCG sampler. Using the simulation study described in Section 2.2, panels (a) and (b) plot (t ) as a function of iteration number using the sampler of Paper I and the Iterated MH within PCG sampler, respectively. Panels (c) and (d) plot the same, but as a function of CPU (t ) time rather than of iteration. The relative speed of the iterated method is apparent. Panels (e) and (f) are scatterplots of (NH , (t ) ) obtained in a run of about 1 hr for each of the two samplers. The red lines in panels (a)­(d) and squares in panels (e) and (f) indicate the true parameter values. Compare panels (e) and (f): the Iterated Sampler obtains far more draws per unit time. Thus, despite the blocky nature of the chains in panels (b) and (d), the advantage of the Iterated Sampler in terms of computational speed is clear. (A color version of this figure is available in the online journal.)

evident if we write, recalling that and A are a priori independent under the pragmatic Bayesian assumption, ppB (, A | Y ) = p( | A, Y ) (A) =
L(Y | , A) ( ) (A). p(Y | A)

Evaluating the denominator, p(Y | A) = p(Y | , A) ( )d , would require a possibly high-dimensional numerical integration at each iteration of the Pragmatic Proposal Sampler. To avoid this difficulty, we propose a simple approximation to ppB (, A | Y ) that serves equally well as an overdispersed approximation of pfB (, A | Y ). Although ppB (, A | Y ) is difficult to evaluate, it has a relatively simple form. In particular, its marginal distribution for A can be represented by a J â 1 vector, e, the components of which are independent N (0, 1) variables. (Recall that under the pragmatic Bayesian model, A is simulated according to its prior distribution; see Equation (5)). Thus, we only need to approximate ppB ( | A, Y ). To do this, we run the Iterated Sampler described in Appendix A.2 to obtain (tM (tM (tM {epB +1) ,pB +1) ,...,pB +M ) , for t = 0,...,T } and regress
(tM (tM (tM the M replicates of (i.e., pB +1) ,...,pB +M ) ) on each epB +1) using multivariate Gaussian regression (details are given in Xu 2014). This results in a multivariate normal approximation to ppB ( | A, Y ) that we combine with a multivariate standard

normal distribution for e to form the g (, e) used in the Pragmatic Proposal Sampler. In particular, this choice of g serves as the needed overdispersed approximation to pfB (, A | Y ). In this context we run the Iterated MH within PCG Sampler only to obtain the approximation, g. Because g does not need to exactly match ppB (, A | Y ), the choice of I in this sampler is less critical. The choice of M, on the other hand, controls how quickly the sampler runs, and thus remains important for efficient computation. REFERENCES
Anders, E., & Grevesse, N. 1989, GeCoA, 53, 197 Bevington, P., & Robinson, D. 1992, Data Reduction and Error Analysis for the Physical Sciences (New York: McGraw-Hill), 194 Bridle, S., Crittenden, R., Melchiorri, A., et al. 2002, MNRAS, 335, 1193 Brooks, S., Gelman, A., Jones, G. J., & Meng, X.-L. 2011, Handbook of Markov Chain Monte Carlo (London: Chapman & Hall/CRC) Casella, G., & George, E. I. 1992, Am. Stat., 46, 167 Drake, J. J., Marshall, H. L., Dreizler, S., et al. 2002, ApJ, 572, 996 Drake, J. J., Ratzlaff, P., Kashyap, V., et al. 2006, Proc. SPIE, 6270, 62701I Foster, A. R., Ji, L., Smith, R. K., & Brickhouse, N. S. 2012, ApJ, 756, 128 Freeman, P., Doe, S., & Siemiginowska, A. 2001, Proc. SPIE, 4477, 76 Gelman, A., & Shirley, K. 2011, in Handbook of Markov Chain Monte Carlo, ed. S. Brooks, A. Gelman, G. Jones, & X.-L. Meng, (London: Chapman & Hall/CRC Press), 163 Hastings, W. K. 1970, Biometrika, 57, 97 Heinrich, J., & Lyons, L. 2007, ARNPS, 57, 145

20


The Astrophysical Journal, 794:97 (21pp), 2014 October 20 Herve, A., Rauw, G., & Naze, Y. 2013, Massive Stars: From to , held 10­14 June 2013 (available online at http://orbi.ulg.ac.bc/2268/154522) Ho, W. C., Kaplan, D. L., Chang, P., van Adelsberg, M., & Potekhin, A. Y. 2007, MNRAS, 375, 821 Lee, H., Kashyap, V. L., van Dyk, D. A., et al. 2011, ApJ, 731, 126 (Paper I) ´ ´ ´ Lopez-Garc´ M. A., Lopez-Santiago, J., Albacete-Colombo, J. F., Perezia, ´ Gonzalez, P. G., & de Castro, E. 2013, MNRAS, 429, 775 Nevalainen, J., David, L., & Guainazzi, M. 2010, arXiv:1008.2102 Park, T., van Dyk, D. A., & Siemiginowska, A. 2008, ApJ, 688, 807 Pollock, A. 2007, A&A, 463, 1111 Raassen, A. J. J., van der Hucht, K. A., Miller, N. A., & Cassinelli, J. P. 2008, A&A, 478, 513

Xu et al. Siemiginowska, A., Kashyap, V., Refsdal, B., et al. 2011, in ASP Conf. Ser. 442, Astronomical Data Analysis Software and Systems XX, Vol. 442, ed. I. N. Evans, A. Accomazzi, D. J. Mink, & A. H. Rots (San Francisco, CA: ASP), 439 Siemiginowska, A., LaMassa, S., Aldcroft, T. L., Bechtold, J., & Elvis, M. 2008, ApJ, 684, 811 van Dyk, D. A., Connors, A., Kashyap, V., & Siemiginowska, A. 2001, ApJ, 548, 224 van Dyk, D. A., & Jiao, X. 2014, J. Comput. Graphical Stat., in press van Dyk, D. A., & Park, T. 2008, J. Am. Stat. Assoc., 103, 790 Waldron, W. L., & Cassinelli, J. P. 2001, ApJL, 548, L45 Xu, J. 2014, PhD thesis, Department of Statistics, Univ. of California, Irvine Yamauchi, S., Kamimura, R., & Koyama, K. 2000, PASJ, 52, 1087

21