Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://www.mrao.cam.ac.uk/ppeuc/astronomy/papers/hobson.ps
Äàòà èçìåíåíèÿ: Tue Feb 24 13:44:47 1998
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 10:18:06 2012
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: m 5
CMB map making using Maximum Entropy Methods
M.P. Hobson & A.W. Jones
Mullard Radio Astronomy Observatory 1 , Cavendish Laboratory, Madingley Road, Cambridge, CB3 0HE, UK
A maximum entropy method (MEM) is presented for separating the emission due to differ­
ent foreground components from simulated satellite observations of the cosmic microwave
background radiation (CMBR). In Jones & Hobson (1998, these proceedings) 2 , the method
is applied to simulated observations by the proposed Planck Surveyor satellite. The method
is also compared with traditional Wiener filtering.
1 Introduction
The importance of making accurate measurements of the fluctuations in the CMBR is now widely
appreciated. Indeed, by making maps of these fluctuations and by measuring their power spectrum,
it is hoped that tight constraints may be placed on fundamental cosmological parameters and that
we may distinguish between competing theories of structure formation in the early Universe such as
inflation and topological defects.
Several ground­based and balloon­borne experiments are planned over the next few years, and these
should provide accurate images of the CMBR fluctuations and lead to a significant improvement in the
measurement of the CMBR power spectrum. Nevertheless, these experiments are unlikely to be able
to achieve the accuracy required to resolve numerous degeneracies that exist in the parameter set of,
for example, the standard inflationary CDM model. As a result, a new generation of CMBR satellites
are currently in the final stages of design, and it is hoped that these experiments will provide definitive
measurements of the CMBR power spectrum as well as detailed all­sky maps of the fluctuations.
According to current estimates, the NASA MAP satellite is due to be launched in 2000, followed by
the ESA Planck Surveyor mission in 2005. Both experiments aim to make high­resolution, low­noise
maps of the whole sky at several observing frequencies. As with any CMBR experiment, however, the
maps produced will contain contributions from various foreground components. The main foreground
components are expected to be Galactic dust, free--free and synchrotron emission as well as the kinetic
and thermal SZ effects from galaxy clusters. In addition, significant contamination from extragalactic
points sources is also likely.
Aside from extragalactic point sources, the other physical components affecting any CMB experi­
ment have reasonably well defined spectral characteristics, and we may use this information, together
with multifrequency observations, to distinguish between the various foregrounds. Several linear meth­
ods have been suggested to perform this separation, many of which are based on Wiener filtering (e.g.
Bouchet et al. (1996); Tegmark & Efstathiou (1996); Bouchet et al. (1997)). In this paper, however, we
investigate the use of a non­linear maximum entropy method (MEM) for separating out the emission
due to different physical components and compare its performance with the Wiener filter approach.
We apply these methods to simulated observations from the Planck surveyor satellite but, of course,
the same algorithms can be used to analyse data from the MAP satellite. The application of the MEM
technique to simulated interferometer observations of the CMBR is discussed in Maisinger, Hobson &
Lasenby (1997) and the method has also been applied to the analysis of ground­based switched­beam
observations by Jones et al. (1998).
2 Component separation methods
As a first step in discussing component separation methods, let us consider in more detail how the
simulated data are made. At any given frequency š the total rms temperature fluctuation on the sky
1 http://www.mrao.cam.ac.uk/
2 http://www.mrao.cam.ac.uk/ppeuc/astronomy/papers/aled/aled.html
1

in a direction b
x is given by the superposition of n c physical components (n c = 6 in our simulations). It
is convenient to factorise the contribution of each process into a spatial template s p (b x) at a reference
frequency š 0 and a frequency dependence f p (š), so that
\DeltaT (b x;š)=
n c
X
p=1
\DeltaT p (b x;š)=
n c
X
p=1
f p (š)s p (b x):
In this paper we take the reference frequency š 0 and normalise the frequency dependencies such that
f p (š 0 ) = 1.
If we observe the sky at n f observing frequencies then, in any given direction b x, we obtain a
n f ­component data vector that contains the observed temperature fluctuation in this direction at
each observing frequency plus instrumental noise. In order to relate this data vector to the emission
from each physical component it is useful to introduce the n f \Theta n c frequency response matrix with
components defined by
F šp =
Z 1
0
t š (š 0 )f p (š 0 ) dš (1)
where t š (š 0 ) is the frequency response (or transmission) of the šth frequency channel. Assuming that
the satellite observing beam in each channel is spatially invariant, we may write the beam­smoothing
as a convolution and, in discretised form, the šth component of the data vector in the direction b
x is
then given by
d š (b x) =
Np
X
j=1
P š (jb x \Gamma b
x j j)
n c
X
p=1
F šp s p (b x j ) + ffl š (b x) (2)
where P š (b x) is the beam profile for the šth frequency channel, and the index j labels the N p pixels in
each of the simulated input maps; the ffl š (b x) term represents the instrumental noise in the šth channel
in the direction b x.
In each channel the beam profile is assumed spatially invariant and the noise statistically homo­
geneous (which are both reasonable assumptions for small fields), and it is more convenient to work
in Fourier space, since the convolution in (2) becomes a simple multiplication and we obtain
e
d š (k) =
n c
X
p=1
R šp (k)es p (k) +effl š (k); (3)
where R šp (k) = e
P š (k)F šp are the components of the response matrix for the observations. It is
important to note that (3) is satisfied at each Fourier mode k independently. Thus, in matrix notation,
at each mode we have
d =Rs+ ffl (4)
where d, s and ffl are column vectors containing n f , n c and n f complex components respectively,
and the response matrix R and has dimensions n f \Theta n c . Although the column vectors in (4) refer to
quantities defined in the Fourier domain, it should be noted that for later convenience they are not
written with a tilde.
The significant simplification that results from working in the Fourier domain is clear, since the
dimensions of the matrices in (4) are rather small (n c = 6 and n f = 10 in our simulations). Thus, the
situation reduces to the solving a small­scale linear inversion problem at each Fourier mode separately.
Once this inversion has been performed for all the measured modes, the spatial templates for the
sky emission due to each physical component at the reference frequency š 0 are then obtained by an
inverse Fourier transformation. Due to the presence of instrumental noise, however, it is clear that the
inverse, R \Gamma1 , of the response matrix at each Fourier mode does not exist and that the linear inversion
problem in each case is degenerate. The approximate inversion of (4) must therefore be performed
using a statistical technique in which the inversion is regularised in some way. This naturally leads us
to consider a Bayesian approach.
2

2.1 Bayes' theorem
Bayes' theorem states that, given a hypothesis H and some data D the posterior probability Pr(HjD)
is the product of the likelihood Pr(DjH) and the prior probability Pr(H), normalised by the evidence
Pr(D),
Pr(HjD) = Pr(H)Pr(DjH)
Pr(D) :
In our application, we consider each Fourier mode k separately and, from (4), we see that the
data consist of the n f complex numbers in the data vector d, and we take the `hypothesis' to consist
of the n c complex numbers in the signal vector s. We then choose as our estimator b s of the signal
vector that which maximises the posterior probability Pr(sjd). Since the evidence in Bayes' theorem
is merely a normalisation constant we must therefore maximise with respect to s the quantity
Pr(sjd) / Pr(djs)Pr(s) (5)
which is the product of the likelihood Pr(djs) and the prior Pr(s).
Let us first consider the form of the likelihood. If the instrumental noise on each frequency channel
is Gaussian­distributed, then at each Fourier mode the probability distribution of the n f ­component
noise vector ffl is described by an n f ­dimensional multivariate Gaussian. Assuming the expectation
value of the noise to be zero at each observing frequency, the likelihood is therefore given by
Pr(djs) / exp
\Gamma
\Gammaffl y N \Gamma1 ffl
\Delta
/ exp
\Theta
\Gamma(d \Gamma Rs) y N \Gamma1 (d \Gamma Rs)
\Lambda
; (6)
where the dagger denotes the Hermitian conjugate and in the second line we have used (4). We
note that no factor of 1=2 appears in the exponent in (6) since it refers to the multivariate Gaussian
distribution of a set of complex random variables. The noise covariance matrix N has dimensions
n f \Theta n f and at any given Fourier mode k it is defined by
N (k) = hffl(k)ffl y (k)i; (7)
i.e. its elements are given by N šš 0 (k) = heffl š (k)effl \Lambda
š 0 (k)i, where the asterisk denotes complex conjugation.
Thus, at a given Fourier mode, the šth diagonal element of N contains the value at that mode of the
ensemble­averaged power spectra of the instrumental noise on the šth frequency channel. If the noise
is uncorrelated between channels then the off­diagonal elements are zero for all k.
We note that the expression in square brackets in (6) is simply the ü 2 misfit statistic. Since, for a
given set of observations, the data vector d, the response matrix R and the noise covariance matrix
N are all fixed, we may consider the misfit statistic as a function only of the signal vector s,
ü 2 (s) = (d \Gamma Rs) y N \Gamma1 (d \Gamma Rs); (8)
so that the likelihood can be written as
Pr(djs) / exp[\Gammaü 2 (s)]: (9)
Having calculated the form of the likelihood we must now turn our attention to the form of the prior
probability Pr(s).
2.2 The Gaussian prior
If we assume that the emission due to each of the physical components is well approximated by a
Gaussian random field, then it is straightforward to derive an appropriate form for the prior Pr(s).
In this case, the probability distribution of the sky emission is described by a multivariate Gaussian
distribution, characterised by a given sky covariance matrix. Thus, at each mode k in Fourier space,
the probability distribution of the signal vector s is also described by a multivariate Gaussian of
dimension n c , where n c is the number of distinct physical components (n c = 6 in our simulations).
The prior therefore has the form
Pr(s) / exp
\Gamma
\Gammas y C \Gamma1 s
\Delta ; (10)
3

where the signal covariance matrix C is real with dimensions n c \Theta n c and is given by
C(k) = hs(k)s y (k)i; (11)
i.e. it has elements C pp 0 (k) = h es p (k)es \Lambda
p 0 (k)i. Thus, at each Fourier mode, the pth diagonal element of
C contains the value of the ensemble­averaged power spectrum of the pth physical component at the
reference frequency š 0 ; the off­diagonal terms describe cross­power spectra between the components.
Strictly speaking, the use of this prior requires advance knowledge of the full covariance structure of
the processes that we are trying to reconstruct. Nevertheless, it is anticipated that some information
concerning the power spectra of the various components, and correlations between them, will be
available either from pre­existing observations or by performing an initial approximate separation
using, for example, singular value decomposition (see Bouchet et al. (1997)). This information can
then be used to construct an approximate signal covariance matrix for use in Pr(s).
Substituting (9) and (10) into (5), the posterior probability is then given by
Pr(sjd) / exp
\Theta
\Gammaü 2 (s) \Gamma s y C \Gamma1 s
\Lambda
: (12)
where ü 2 (s) is given by (8). Completing the square for s in the exponential (see Zaroubi (1995)), it
is straightforward to show that the posterior probability is also a multivariate Gaussian of the form
Pr(sjd) / exp
\Theta
\Gamma(s \Gamma b s) y E \Gamma1 (s \Gamma b s)
\Lambda : (13)
which has its maximum value at the estimate b s of the signal vector and where E is the covariance
matrix of the reconstruction errors.
The estimate b s of the signal vector is found to be
b s =
i
C \Gamma1 +R y N \Gamma1 R
j \Gamma1
R y N \Gamma1 d jWd; (14)
where we have identified the Wiener matrix W . Thus, we find that by assuming a Gaussian prior of
the form (10) in Bayes' theorem, we recover the standard Wiener filter. This optimal linear filter is
usually derived by choosing the elements of W such that they minimise the variances of the resulting
reconstruction errors. From (14) we see that at a given Fourier mode, we may calculate the estimator
b s that maximises the posterior probability simply by multiplying the data vector d by the Wiener
matrix W . Equation (14) can also be derived straightforwardly by differentiating (12) with respect s
and equating the result to zero (see Appendix A).
As is well­known, the assignment of errors on the Wiener filter reconstruction is straightforward
and the covariance matrix of the reconstruction errors E in (14) is given by
E j h(s \Gamma b s)(s \Gamma b s) y i =
i
C \Gamma1 +R y N \Gamma1 R
j \Gamma1
(15)
Since the posterior probability (13) is Gaussian, this matrix is simply the inverse Hessian matrix of
(minus) the exponent in (13), evaluated at b s (see Appendix A).
It should be noted that the linear nature of the Wiener filter and the simple propagation of errors
are both direct consequences of assuming that the spatial templates we wish to reconstruct are well­
described by Gaussian random fields with known covariance structure.
2.3 The entropic prior
The emission due to several of the underlying physical processes may be far from Gaussian. This is
particularly pronounced for kinetic and thermal SZ effects, but Galactic dust and free--free emissions
also appear quite non­Gaussian. Ideally, one might like to assign priors for the various physical
components by measuring empirically the probability distribution of temperature fluctuations from
numerous realisations of each component. This is not feasible in practice, however, and instead we
consider here the use of the entropic prior, which is based on information­theoretic considerations
alone.
4

Let us consider a discretised image h j consisting of L cells, so that j = 1; : : :; L; we may consider
the h j as the components of an image vector h. Using very general notions of subset independ­
ence, coordinate invariance and system independence, it may be shown Skilling (1989) that the prior
probability assigned to the values of the components in this vector should take the form
Pr(h) / exp[ffS(h;m)];
where the dimensional constant ff depends on the scaling of the problem and may be considered as a
regularising parameter, and m is a model vector to which h defaults in the absence of any data. The
function S(h;m) is the cross entropy of h and m. In standard applications of the maximum entropy
method, the image h is taken to be a positive additive distribution (PAD). Nevertheless, the MEM
approach can be extended to images that take both positive and negative values by considering them
to be the difference of two PADS, so that
h=u \Gamma v:
where u and v are the positive and negative parts of h respectively. In this case, the cross entropy is
given by (Gull & Skilling (1990); Maisinger, Hobson & Lasenby (1997))
S(h;m u ; m v ) =
L
X
j=1
ae
/ j \Gamma m uj \Gamma m vj \Gamma h j ln
Ÿ / j +h j
2muj
–oe
; (16)
where / j = [h 2
j +4muj m v j ] 1=2 and m u and m v are separate models for each PAD. The global maximum
of the cross entropy occurs at h=m u \Gamma m v .
In our application, we might initially suppose that at each Fourier mode we should take the `image'
to be the n c components of the signal vector s. However, this results in two additional complications.
First, the components of signal vector are, in general, complex, but the cross entropy given in (16)
is defined only if the image h is real. Nevertheless, the MEM technique can be straightforwardly
extended to the reconstruction of a complex image by making a slight modification to the above
discussion. If the image h is complex, then models m u and m v are also taken to be complex. In
this case, the real and imaginary parts of m u are the models for the positive portions of the real and
imaginary parts of h respectively. Similarly, the real and imaginary parts of m v are the models for
the negative portions of the real and imaginary parts of the image. The total cross entropy is then
obtained by evaluating the sum (16) using first the real parts and then the imaginary parts of h, m u
and m v , and adding the results. Thus the total cross entropy for the complex image h is given by
S(!(h);!(mu ); !(m v )) +S(=(h);=(m u ); =(m v )); (17)
where ! and = denote the real and imaginary parts of each vector. For simplicity we denote the sum
(17) by S c (h;m u ; m v ) where the subscript c indicates that it is the entropy of a complex image.
The second complication mentioned above is more subtle and results from the fact that one of the
fundamental axioms of the MEM is that it should not itself introduce correlations between individual
elements of the image. However, as discussed in previous subsection, the elements of the signal vector s
at each Fourier mode may well be correlated, this correlation being described by the signal covariance
matrix C defined in (11). Moreover, if prior information is available concerning these correlations,
we would wish to include it in our analysis. We are therefore lead to consider the introduction of an
intrinsic correlation function (ICF) into the MEM framework (Gull & Skilling (1990)).
The inclusion of an ICF is most easily achieved by assuming that, at each Fourier mode, the
`image' does not consist of the components of the signal vector s, but that instead h consists of the
components of a vector of hidden variables that are related to the signal vector by
s =Lh; (18)
The n c \Theta n c lower triangular matrix L in (18) is that obtained by performing a Cholesky decomposition
of the signal covariance matrix, i.e. C =LL T . We note that since C is real then so is L. Thus, if the
components of h are apriori uncorrelated (thereby satisfying the MEM axiom) and of unit variance,
5

so that h h p h \Lambda
p 0 i = ffi pp 0 , we find that, as required, the a priori covariance structure of the signal vector
is given by
hss y i = hLhh y L T i =Lhhh y iL T =LL T =C:
Moreover, using this construction the expected rms level for the real or imaginary part of each element
of h is simply equal to 1= p
2. Therefore, at each Fourier mode, we assign the real and imaginary parts
of every component in the model vectors m u and m v to be equal to m= 1= p
2.
Substituting (18) into (8), ü 2 can also be written in terms of h and is given by
ü 2 (h) = (d \Gamma RLh) y N \Gamma1 (d \Gamma RLh): (19)
Thus, using an entropic prior, the posterior probability becomes
Pr(hjd) / exp
\Theta
\Gammaü 2 (h) + ffS c (h;m u ; m v )
\Lambda : (20)
where the cross entropy S c (h;m u ; m v ) is given by (17) and (16).
2.4 Maximising the posterior probability
As discussed in Section 2.1, we choose our estimate b s of the signal vector at each Fourier mode, as
that which maximises the posterior probability Pr(sjd) with respect to s.
For the Gaussian prior, we found in Section 2.2 that the posterior probability is also a Gaussian
and that the estimate b s is given directly by the linear relation (14). Nevertheless, we also note that,
in terms of h defined in (18), the quadratic form in the exponent of the Gaussian prior (10) has the
particularly simple form
s y C \Gamma1 s = h y L T (LL T ) \Gamma1 Lh= h y L T (L T ) \Gamma1 L \Gamma1 Lh=h y h;
i.e. it is equal the inner product of h with itself. Thus, using a Gaussian prior, the posterior probability
can be written in terms of h as
Pr(hjd) / exp
h
\Gammaü 2 (h) \Gamma h y h
i
: (21)
where ü 2 (h) is given by (19) Therefore, in addition to using the linear relation (14), the Wiener filter
estimate b s can also be found by first minimising the function
\Phi WF (h) = ü 2 (h)+h y h; (22)
to obtain the estimate b h of the corresponding hidden vector, and then using (18) to give b s =L b h.
We have developed an algorithm (which will be presented in a forthcoming paper) for minimising
the function \Phi WF with respect to h. Indeed, this algorithm calculates the reconstruction b
h in slightly
less CPU time than the matrix inversions and multiplications required to evaluate the linear relation
(14). The minimiser requires only the first derivatives of the function and these are given in Appendix
A.
Let us now consider the MEM solution. From (20), we see that maximising the posterior probability
when assuming an entropic prior is equivalent to minimising the function
\Phi MEM (h) = ü 2 (h) \Gamma ffS c (h;m u ; m v ); (23)
The minimisation of this 2n c ­dimensional functions may also performed using the minimisation al­
gorithm mentioned above, and the required first derivatives in this case are also given in Appendix
A.
It is important to note that, since we are using the same minimiser to obtain both the Wiener
filter (WF) and MEM reconstructions, and the evaluation of each function and its derivatives requires
similar amounts of computation, the two methods require approximately the same CPU time. Thus, at
least in this application, any criticism of MEM that is based on its greater computational complexity,
as compared to the WF, is no longer valid. For both the WF and the MEM, the reconstruction of
the six 400 \Theta 400 maps of the input components requires only about two minutes on a Sparc Ultra
workstation.
6

2.5 The small fluctuation limit
Despite the formal differences between (22) and (23), the WF and MEM approaches are closely related.
Indeed the WF can be viewed as a quadratic approximation to MEM, and is commonly referred to as
such in the literature. This approximation is most easily verified by considering the small fluctuation
limit, in which the real and imaginary parts of h are small compared to the corresponding models.
Following the discussion at the end of Section 2.3, we begin by setting the real and imaginary
parts of all the components of the models vectors m u and m v equal to m. Then, expanding the sum
in (16) as a power series in h j and using (17), we find that for small h j the total cross entropy is
approximated by
S c (h;m u ; m v ) ú \Gamma
n c
X
j=1
!(h j ) 2 +=(h j ) 2
4m = \Gamma
h y h
4m : (24)
Thus, in the small fluctuation limit, the posterior probability assuming an entropic prior becomes
Gaussian and is given
Pr(hjd) / exp
''
\Gammaü 2 (h) \Gamma ff
h y h
4m
#
: (25)
In fact, this approximation is reasonably accurate provided the magnitudes of the real and imaginary
parts of each element of h are less than about 3m. Since m is set equal to the expected rms of level
these parameters, we would therefore expect that for a Gaussian process this approximation should
remain valid. In this case, the posterior probability (25) becomes identical to that for the WF solution,
provided we set ff = 4m.
We note, however, that for highly non­Gaussian processes, the magnitudes of the real and imaginary
parts of the elements of h can easily exceed 3m and in this case the shapes of the posterior probability
for the WF and MEM approaches become increasingly different.
2.6 The regularisation constant ff
A common criticism of MEM has been the arbitrary choice of regularisation constant ff, which is often
considered merely as a Lagrange multiplier. In early applications of MEM, ff was chosen so that the
misfit statistic ü 2 equalled its expectation value, i.e. the number of data points to be fitted. This
choice is usually referred to as historic MEM.
In the reconstruction of Fourier modes presented here, the situation is eased somewhat since the
choice ff = 4m is at least guaranteed to reproduce the results of the Wiener filter when applied to
Gaussian processes. In fact, when applied to the simulations presented in the companion paper, this
choice of ff does indeed bring ü 2 into its expected statistical range n f \Sigma
p 2n f , where n f is the number
of (complex) values in the data vector d at each Fourier mode.
Nevertheless, it is possible to determine the appropriate value for ff in a fully Bayesian manner
(Skilling (1989); Gull & Skilling (1990)) by simply treating it as another parameter in our hypothesis
space. It may be shown (see Appendix B) that ff must be a solution of
\GammaffS c ( b
h;m u ; m v ) = n c \Gamma ffTr(M \Gamma1 ); (26)
where b
h is the hidden vector that maximises the posterior probability for this value of ff. The n c \Theta n c
matrix M is given by
M =G \Gamma1=2 HMEMG \Gamma1=2 ;
where HMEM is the Hessian matrix of the function \Phi MEM at the point b
h and G is the metric on
image­space at this point.
It should be noted that both the reconstruction b h and the matrix M depend on ff and so (26)
must be solved numerically using an iterative technique such as linear interpolation or the Newton--
Raphson method. We take ff = 4m as our initial estimate in order to coincide with the Wiener filter
in the small fluctuation limit. For any particular value of ff, the corresponding reconstruction b
h(ff)
is obtained by minimising \Phi MEM as given in (23), and the Hessian of the posterior probability at
this point is then calculated (see Appendix A). This in turn allows the evaluation of S c ( b h;m u ; m v )
and Tr(M \Gamma1 ) respectively. Typically, fewer than ten iterations are needed in order to converge on a
solution b
ff that satisfies (26).
7

2.7 Updating the ICF and models
In the MEM approach, after the Bayesian value b
ff for the regularisation constant has been found,
the corresponding posterior probability distribution is maximised to obtain the reconstruction b
h(bff),
from which the estimate of the signal vector b s may be straightforwardly derived. Once this has
been performed for each Fourier mode, the reconstruction of the sky emission due to each physical
component is then found by performing an inverse Fourier transform.
We could, of course, end our analysis at this point and use the maps obtained as our final recon­
structions. However, we find that the results can be further improved by using the current recon­
struction to update the ICF matrix L and the models m u and m v , and then repeating the entire
MEM analysis discussed above. At each Fourier mode, the updated models are taken directly from the
current reconstruction and the updated ICF matrix is obtained by calculating a new signal covariance
matrix C from the current reconstruction and performing a Cholesky decomposition. These quantities
are then used in the next iteration of the MEM and the process is repeated until it converges on a final
reconstruction. Usually, fewer than ten such iterations are required in order to achieve convergence.
We might expect that a similar method may be used in the WF case, by repeatedly calculating
an updated signal covariance matrix from the current reconstruction and using it in the subsequent
iteration of the WF analysis. It is well­known, however, that, since the WF tends to suppress power
at higher Fourier modes, the solution gradually tends to zero as more iterations are performed. This
behaviour was indeed confirmed by experiment.
2.8 Estimating errors on the reconstruction
Once the final reconstruction has been obtained, it is important to be able to characterise the errors
associated with it. In the case of the Wiener filter, the reconstructed signal vector b s at each Fourier
mode may be obtained in a linear manner from the observed data vector using (14). Thus the
propagation of errors is straightforward and the covariance matrix of the reconstruction errors at each
Fourier mode is given by (15).
As mentioned in Section 2.2, however, this simple propagation of errors is entirely a result of the
assumption of a Gaussian prior, which, together with the assumption of Gaussian noise, leads to a
Gaussian posterior probability distribution. In terms of the vector of hidden variables h the posterior
probability for the WF is given by
Pr(hjd) / exp [\Gamma\Phi WF (h)] = exp
h
\Gamma(h \Gamma b h) y HWF (h \Gamma b h)
i
;
where the Hessian matrix HWF is given by HWF = r h r h \Lambda \Phi WF evaluated at the peak b
h of the
distribution, and the function \Phi WF is given by (22). Thus, the covariance matrix of the errors on the
reconstructed hidden vector is then given exactly by the inverse of this matrix, i.e
h(h \Gamma b
h)(h \Gamma b h) y i =H \Gamma1
WF :
From (18), the error covariance matrix for the reconstructed signal vector is then given by
h(s \Gamma b s)(s \Gamma b s) y i = hL(h \Gamma b
h)(h \Gamma b h) y L T i =LH \Gamma1
WF L T : (27)
Using the expression for the Hessian matrix given in (35), and remembering that s =Lh and C =LL T ,
the expression (27) is easily shown to be identical to the result (15).
For the entropic prior, the posterior probability distribution is not strictly Gaussian in shape.
Nevertheless, we may still approximate the shape of this distribution by a Gaussian at its maximum
and, recalling the discussion of subsection 2.5, we might expect this approximation to be reasonably
accurate, particularly in the reconstruction of Gaussian processes. Thus, near the point b
h, we make
the approximation
Pr(hjd) / exp [\Gamma\Phi MEM (h)] ú exp
h
\Gamma(h \Gamma b h) y HMEM (h \Gamma b
h)
i
;
where HMEM = r h r h \Lambda \Phi MEM evaluated at b h, and \Phi MEM is given by (23). The covariance matrix
of the errors on the reconstructed hidden vector is then given approximately by the inverse of this
matrix, and so
h(s \Gamma b s)(s \Gamma b s) y i úLH \Gamma1
MEM L T :
8

In both the WF and MEM cases, the reconstructed maps of the sky emission due to each physical
component is obtained by inverse Fourier transformation of the signal vectors at each Fourier mode.
Since this operation is linear, the errors on these maps may therefore be deduced straightforwardly
from the above error covariance matrices.
3 Discussion and conclusions
In this paper we adopt a Bayesian approach to the separation of foreground components from CMBR
emission for satellite observations. We find that by assuming a suitable Gaussian prior in Bayes'
theorem for the sky emission, we recover the standard Wiener filter (WF) approach. Alternatively,
we may assume an entropic prior, based on information­theoretic considerations alone, from which we
derive a maximum entropy method (MEM). We apply these two methods to the problem of separating
the different physical components of sky emission.
3.1 Variations on standard Wiener filtering
By assuming a Gaussian prior in Bayes' theorem, we derived the standard form of the Wiener fil­
ter. This approach is optimal in the sense that it is the linear filter for which the variance of the
reconstruction residuals is minimised. This is true both in the Fourier domain and the map domain.
Nevertheless, it is straightforward to show that this algorithm leads to maps with power spectra that
are biased compared to the true spectra, and this leads us to consider variants of the standard Wiener
filter.
At a given value of k, the estimator b
C p (k) of the azimuthally averaged power spectrum for the pth
physical component is obtained simply by calculating the average value of jbs p (k)j 2 over those modes
for which jkj = k, i.e.
b
C p (k) = 1
N (k)
X
jk i j=k
bs p (k i )bs \Lambda
p (k i ); (28)
where N (k) is the number of measured Fourier modes satisfying jk i j = k.
The bias in the power spectrum of the standard WF map reconstruction may be quantified by
introducing, for each physical component, a quality factor Q p (k) at each Fourier mode k (Bouchet et
al. (1997)). This factor is given by
Q p (k) =
X
š
W pš (k)R šp (k);
where R(k) is the response matrix of the observations at the Fourier mode k, as defined in equation
(3), and R(k) is the corresponding Wiener matrix given in equation (14). The quality factor varies
between unity (in the absence of noise) and zero. If bs p (k) is the WF estimate of the pth component
of the signal vector at k and s p (k) is the actual value, then it is straightforward to show that
hjbs p (k)j 2 i =Q p (k)hjs p (k)j 2 i:
Thus, in similar way, the expectation value of the naive power spectrum estimator defined in (28) is
given by h b
C p (k)i = Q p (k)hC p (k)i, where Q p (k) is the average of the quality factors at each Fourier
mode satisfying jkj = k; thus the estimator in equation (28) is biased and should be replaced by
b
C p (k)=Q p (k).
It is clearly unsatisfactory, however, to produce reconstructed maps with biased power spectra and,
from the above discussion, we might consider using the matrix with elements W pš =Q 1=2
p to perform
the reconstructions. Bouchet et al. (1997) shows that this leads to reconstructed maps that do indeed
possess unbiased power spectra and, moreover, the method is less sensitive to the assumed input
power spectra. However, one finds in this case that the variance of the reconstruction residuals is
increase by a factor 2(1 \Gamma Q 1=2
p )=(1 \Gamma Q p ) compared to those obtained with the standard WF and so
the reconstructed maps appear somewhat noisier.
Another variant of the Wiener filter technique has been proposed by Tegmark & Efstathiou (1996)
and Bouchet et al. (1997), and uses the matrix W pš =Q p to perform the reconstructions. This approach
9

has the advantage that the reconstruction of the pth physical component is independent of its assumed
input power spectrum. Nevertheless, for this technique the variance of the reconstruction residuals
is found to be increased by the factor (1=Q p \Gamma 1)=(1 \Gamma Q p ) as compared to the standard WF, which
results in even noisier reconstructed maps.
As a final variant, Tegmark (1997) suggests the inclusion into the WF algorithm of a parameter j
that scales the assumed input power spectra of the components, This parameter can be included in
the all of the versions of the WF discussed above and is equivalent to assuming in Bayes' theorem a
Gaussian prior of the form
Pr(s) / exp
\Gamma
\Gammajs y C \Gamma1 s
\Delta
:
In the use of this variant for the analysis of real data, j is varied in order to obtain some desired
signal­to­noise ratio in the reconstructed maps by artificially suppressing or enhancing the assumed
power in the physical components as compared to the noise. Clearly, j plays a similar role in the WF
analysis to the parameter ff in the MEM. Thus, by making the appropriate changes to the calculation
of the Bayesian value of ff in Appendix B, we may obtain an analogous expression to (26) that defines
a Bayesian value for j. Indeed, with the inclusion of the parameter j, the WF method is simply the
quadratic approximation to the MEM, as discussed in Section 2.5. However, even with the inclusion
of the j factor, we find that the corresponding reconstructions of non­Gaussian components are still
somewhat poorer than for MEM.
Acknowledgements
MPH acknowledges Trinity Hall, Cambridge, for support in the form of Research Fellowships.
References
Bouchet F.R., Gispert R., Puget J.L., 1996, in Dwek E., ed., Proc. AIP Conf. 348, The mm/sub­mm
foregrounds and future CMB space missions. AIP Press, New York, p. 255
Bouchet F.R., Gispert R., Boulanger F., Puget J.L., 1997, in Bouchet F.R., Gispert R., Guideroni
B., Tran Thanh Van J.,eds, Proc. 36th Moriond Astrophysics Meeting, Microwave Anisotropies.
Editions Fronti`ere, Gif­sur­Yvette, p. 481
Gull S. F., Skilling J., 1990, The MEMSYS5 User's Manual. Maximum Entropy Data Consultants
Ltd, Royston
Hobson M.P., Lasenby A.N., 1998, MNRAS, submitted
Jones A. W., Hancock S., Lasenby A. N., Davies R. D., Guti'errez C. M., Rocha G., Watson R. A.,
Rebolo R., 1998, MNRAS, in press
Maisinger K., Hobson M.P., Lasenby A.N., 1997, MNRAS, 290, 313
Skilling J., 1989, in Skilling J., ed., Maximum Entropy and Bayesian Methods. Kluwer, Dordrecht,
p. 45
Tegmark M., 1997, ApJ, 480, L87
Tegmark M., Efstathiou G., 1996, MNRAS, 281, 1297
Zaroubi S., Hoffman Y., Fisher K. B., Lahav O., 1995, ApJ, 449, 446
A Calculation of derivatives
As discussed in Section 2.4, maximising the posterior probability for the WF and MEM cases is
equivalent to minimising respectively the functions \Phi WF and \Phi MEM , which are given by (22) and (23)
as
\Phi WF (h) = ü 2 (h)+h y h; (29)
\Phi MEM (h) = ü 2 (h) \Gamma ffS c (h;m u ; m v ): (30)
From (19), in each case the standard ü 2 misfit statistic may be written in terms of the hidden vector
h=L \Gamma1 s as
ü 2 (h) = (d \Gamma RLh) y N \Gamma1 (d \Gamma RLh): (31)
10

The cross entropy S c (h;m u ; m v ) for this complex image is given by (16) and (17).
Since h is a complex vector we may consider \Phi WF and \Phi MEM to be functions of the real and
imaginary parts of the elements of h. Alternatively, we may consider these functions to depend
upon the complex elements of h, together with their complex conjugates. While it is clear that the
former approach is required in order to use standard numerical minimisers, a simpler mathematical
derivation is provided by adopting the latter approach. In any case, derivatives with respect to the
real and imaginary parts of h may be easily found using the relations
r !(h) j r h +r h \Lambda ;
r =(h) j i
\Gamma
r h \Gamma r h \Lambda
\Delta :
Differentiating (31) with respect to h and h \Lambda , we find the gradient of ü 2 is given by
r h \Lambda ü 2 =
\Theta
r h ü 2 \Lambda \Lambda = \GammaL T R y N \Gamma1 (d \Gamma RLh); (32)
and upon differentiating once more we find the Hessian (curvature) matrix of ü 2 has the form
r h r h \Lambda ü 2 =L T R y N \Gamma1 RL: (33)
Using (32) and (33), the gradient of \Phi WF in (29) is simply given by
r h \Lambda \Phi WF =
\Theta
r h \Phi WF
\Lambda \Lambda
= \GammaL T R y N \Gamma1 (d \Gamma RLh)+h; (34)
and its Hessian matrix reads
HWF =r h r h \Lambda \Phi WF =L T R y N \Gamma1 RL+I ; (35)
where I is the unit matrix of appropriate dimensions. We note that the Hessian matrix for \Phi WF is
independent of h.
By setting the right­hand side of (34) equal to zero, and remembering that s =Lh and C =LL T ,
it is straightforward to obtain the linear relation (14) for the WF solution. Moreover, from (27), the
error covariance matrix for the reconstructed signal vector is given by E =LH \Gamma1 L T , and using (35)
this is easily shown to be identical to the result (15).
In a similar way, we may calculate the derivatives of \Phi MEM defined in (30). Unfortunately, the
form of the cross entropy given in (16) precludes us from writing its gradient or curvature as a simple
matrix multiplication, and we must instead express them in component form. From (16) and (17), we
find the components of the gradient vector of the cross entropy are given by
ffi S c
ffi h j
=
/
ffi S c
ffi h \Lambda
j
! \Lambda
= \Gamma 1
2 ln
Ÿ
!(/ j +h j )
2!(m uj )

\Gamma 1
2 i ln
Ÿ
=(/ j +h j )
2=(muj )

; (36)
where !(/ j ) = [!(h j ) +4!(m u j )!(m v j )] 1=2 and a similar expression exists for =(/ j ). Differentiating
once more we find the components of the Hessian of the cross entropy to be given by
ffi 2 S c
ffi h j ffi h \Lambda
k
= \Gamma 1
4
Ÿ 1
!(/ j ) + 1
=(/ j )

if j = k (37)
and equals zero otherwise. We note that these components may be used to define the (diagonal) metric
on the space of images, which is given simply by by G(h) = \Gammar h r h \Lambda S c (Skilling (1989); Hobson &
Lasenby (1998)).
Using (36) and (37) the gradient and Hessian of \Phi MEM are then easily calculated. In particular,
we find that the Hessian matrix is given by
HMEM =r h r h \Lambda (ü 2 \Gamma ffS c ) =L T R y N \Gamma1 RL+ ffG (38)
where G is the image space metric and we have used the expression for the curvature of ü 2 given in
(33). In contrast to (35), we see that the Hessian matrix of \Phi MEM depends on h through the metric
G.
11

B Bayesian value for ff
A Bayesian value for ff may be found simply by treating it as another parameter in our hypothesis
space. This procedure is outlined for the case of real images in Skilling (1989) and Gull & Skilling
(1990), and we modify their treatment here in order to accommodate complex images h.
After including ff into our hypothesis space, the full joint probability distribution can be expanded
as
Pr(h;d; ff) = Pr(ff)Pr(hjff)Pr(djh; ff)
= Pr(ff)Pr(hjff)Pr(djh) (39)
where in the last factor we can drop the conditioning on ff since it is h alone that induces the data d.
We then recognise this as the likelihood. Furthermore, the second factor Pr(hjff) can be identified as
the entropic prior and so (39) becomes
Pr(h;d; ff) = Pr(ff) e ffS c (h)
ZS (ff)
e \Gammaü 2 (h)
ZL
= Pr(ff) e ffS c (h)\Gammaü 2 (h)
ZS (ff)Z L
; (40)
where ZS (ff) and ZL are respectively the normalisation constants for the entropic prior and the likeli­
hood such that the total probability density function in each case integrates to unity. For convenience
we have dropped the explicit dependence of the cross entropy S c on the models mu and m v .
Since we have assumed the instrumental noise on the data to be Gaussian, the likelihood function is
also Gaussian and so the normalisation factor ZL is easily found. Evaluating the appropriate Gaussian
integral gives
ZL = ú nf jN j
where n f is the dimension of the complex data vector d and is equal to the number of observing
frequencies that make up the Planck Surveyor data set; jN j is the determinant of the noise covariance
matrix defined in (7).
The normalisation factor ZS (ff) for the entropic prior is more difficult to calculate since this prior
is not Gaussian in shape. Nevertheless, we find that a reasonable approximation to Z S (ff) for all ff
may be obtained by making a Gaussian approximation to the prior at its maximum, which occurs at
hm =mu \Gamma m v . As discussed in Appendix A, the Hessian matrix of the entropy at this point is given
by r h r h \Lambda S c = \GammaG, where G is the metric on image space evaluated at the maximum of the prior
hm ; the metric matrix is real and diagonal. Remembering that S c (hm ) = 0 and using the Gaussian
approximation, Z S (ff) is then given by
ZS (ff) =
Z
1
e ffS c (h) jGjd n c hd n c h \Lambda
ú
Z
1
e \Gammaff(h\Gammah m ) y G(h\Gammah m ) jGjd n c hd n c h \Lambda
ú ú n c jffIj \Gamma1 = (ú=ff) n c ; (41)
where n c is the dimension of the complex (hidden) image vector h and is equal to the number of
physical components present in the simulations.
Now, returning to (40), in order to investigate more closely the role of ff, we begin by considering
the joint probability distribution Pr(d; ff), which may be obtained by integrating out h in (40):
Pr(d; ff) =
Z
1
Pr(h;d; ff) jGjd n c hd n c h \Lambda
= Pr(ff)
Z S (ff)Z L
Z
1
e ffS c (h)\Gammaü 2 (h) jGjd n c hd n c h \Lambda
j Pr(ff) Z \Phi (ff)
ZS (ff)Z L
(42)
12

where we have defined the normalisation integral Z \Phi (ff). In order to calculate Z \Phi (ff), we follow a
similar approach to that use to calculate Z S (ff) and make a Gaussian approximation to exp[ffS c (h) \Gamma
ü 2 (h)] about its maximum at b h. The required Hessian matrix HMEM is given by (38) evaluated at
b
h. Let us, however, define a new matrix M that is given by
M jG \Gamma1=2 HMEMG \Gamma1=2 =G \Gamma1=2 L T R y N \Gamma1 RLG \Gamma1=2 + ffI: (43)
The integral Z \Phi (ff) is then approximated by
Z \Phi (ff) ú e ffS c ( b
h)\Gammaü 2 ( b
h)
Z
1
e \Gamma(h\Gamma b h) y HMEM (h\Gamma b h) jGjd n c hd n c h \Lambda
ú e ffS c ( b
h)\Gammaü 2 ( b
h)
Z
1
e \Gamma(h\Gamma b h) y G 1=2 MG 1=2 (h\Gamma b
h) jGjd n c hd n c h \Lambda
ú e ffS c ( b
h)\Gammaü 2 ( b
h) ú n c jM j \Gamma1 : (44)
Thus, substituting into (42) the expressions for Z S (ff) and Z \Phi (ff) given by (41) and (44) respect­
ively, we find that in the Gaussian approximation the joint probability distribution Pr(d; ff) has the
form
Pr(d; ff) = Pr(ff)Pr(djff)
ú Pr(ff)Z \Gamma1
L e ffS c ( b h)\Gammaü 2 ( b
h) ff n c jM j \Gamma1 :
Now, in order to obtain a Bayesian estimate for ff, we should choose an appropriate form for
the prior Pr(ff). Nevertheless, for realistically large data sets, the distribution Pr(djff) is so strongly
peaked that it overwhelms any reasonable prior on ff, and so we assign the Bayesian value b
ff of the
regularisation constant to be that which maximises Pr(djff). Taking logarithms we obtain
lnPr(djff) = constant + ffS c ( b
h) \Gamma ü 2 ( b h)+n c lnff \Gamma ln jM j:
Differentiating with respect to ff, and noting that the b
h­derivatives cancel, we find
d
dff lnPr(djff) = S c ( b h)+ n c
ff \Gamma Tr
`
M \Gamma1 dM
dff
'
; (45)
where we have used the identity
d
dff ln jM j j Tr
`
M \Gamma1 dM
dff
'
;
which is valid for any non­singular matrix M (ff). From (43), however, we see that dM=dff = I.
Substituting this relation into (45) and equating to the result to zero, we find that in order to maximise
Pr(djff), the parameter ff must satisfy
\GammaffS c ( b h) = n c \Gamma ffTr(M \Gamma1 ): (46)
13