Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mrao.cam.ac.uk/~steve/algor2008/images/MemSys5_manual.pdf
Дата изменения: Mon Dec 1 13:48:47 2008
Дата индексирования: Tue Oct 2 12:22:31 2012
Кодировка:

Quantified Maximum Entropy MemSys5 Users' Manual

This manual describes the MemSys5 package of Fortran 77 subroutines designed to find and use maximum entropy reconstructions of images, power spectra, and other such additive distributions from arbitrary sets of data.

S.F. Gull and J. Skilling Maximum Entropy Data Consultants Ltd. South Hill 42 Southgate Street Bury St. Edmunds Suffolk, IP33 2AZ, U.K.

Version 1.2 September 8, 1999 Copyright c MEDC 1990--1999

ICF

-

OPUS TROPUS

-

Hidden space

Visible space
TRICF

Data space

Truth f
OPUS
-

Data D Entropy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

MEM5
Multipliers w

TRICF ICF

TROPUS

Pr(h )
-

Inference Pr(f )

MOVIE5
Hidden sample h

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Multipliers w

ICF

-

Visible sample f Visible mask

TRICF

Hidden mask

MASK5

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Multipliers w
-

Quantification

Contents
Intro duction 7

I

Theory

9

1 Probability theory and Bayes' Theorem 11 1.1 Bayesian calculus for scientific inference . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Entropic prior for a distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Gaussian likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 Classic maximum entropy 2.1 Classic maximum entropy . . . . . 2.2 Inferences about the reconstruction 2.3 Basic iterative algorithm . . . . . . 2.4 The intrinsic correlation function . 2.5 Predicted data . . . . . . . . . . . 2.6 Inferences about the noise level and 3 Extensions and generalisations 3.1 Positive/negative distributions 3.2 FermiDirac distributions . . 3.3 Quadratic entropy . . . . . . 3.4 Fixed total . . . . . . . . . . 17 17 24 26 29 29 30 33 33 34 34 34

.... .... .... .... .... other

..... ..... ..... ..... ..... variables

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. .. .. ..

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4 Algorithmic details 35 4.1 Iterative algorithm in hidden space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Vector coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Data-space algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

II

Practice

41
43

5 The MemSys5 software package

6 Storage areas 45 6.1 Area structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Initialisation: MEINIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3 Area handling and overlaying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3

4

CONTENTS

7 The 7.1 7.2 7.3

vector library Area access routines VFETCH and VSTORE . . . . . . . . . . . . . . . . . . . . . . . . . Disc-handling routines UFETCH and USTORE . . . . . . . . . . . . . . . . . . . . . . . . Arithmetical vector routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 50 52 53

8 Transform routines 55 8.1 Transform subroutines OPUS, TROPUS, and UMEMX . . . . . . . . . . . . . . . . . . . . 55 8.2 Intrinsic correlation function routines ICF and TRICF . . . . . . . . . . . . . . . . . . 57 9 Diagnostics and utilities 9.1 Diagnostics routine UINFO . . . . . . . . . . 9.2 Transform checking subroutine MEMTRQ . . . 9.3 Save and restore utility: MESAVE and MEREST 9.4 Error messages . . . . . . . . . . . . . . . . 9.5 Miscellaneous subroutines . . . . . . . . . . 10 Sp ecifications of the 10.1 MEMSET arguments 10.2 MEM5 arguments . 10.3 MOVIE5 arguments 10.4 MASK5 arguments ma jor ... .... ... .... subroutines ....... ....... ....... ....... 59 59 60 61 61 63 65 65 66 69 71

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

11 Example of simple use 73 11.1 Example of classic maximum entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 11.2 Post script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 12 Helpful hints 12.1 Multiple maxima of . . 12.2 Tolerance variable UTOL . 12.3 Internal iteration limits . 12.4 Arithmetic accuracy . . . 12.5 Combination of transforms A User-supplied routines B Historic maximum entropy C Reserved names D Changes from MemSys2 to MemSys3 E Changes from MemSys3 to MemSys5 87 87 87 88 88 88 91 93 95 97 99

. . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 Maximum Maximum Maximum Maximum Maximum Maximum Maximum Integrated entropy tra jectory. entropy cloud. . . entropy cloud. . . entropy cloud. . . entropy cloud. . . entropy cloud. . . entropy cloud. . . maximum entropy ... ... ... ... ... ... ... cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 21 21 22 22 23 23 24 74 80 80 81 81 82 82 84

Noisy blurred data (gauss.dat). Classic MaxEnt reconstruction f Classic MaxEnt reconstruction h Sample from posterior cloud . . . Sample from posterior cloud . . . Sample from posterior cloud . . . Sample from posterior cloud . . . The true simulation f . . . . . .

. . . . . . . .

B.1 Historic maximum entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5

List of Tables
6.1 9.1 MemSys5 area usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Coding of INFO calls with MLEVEL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

11.1 MASK5 quantification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 11.2 Comparison of estimated and true values for the simulation of Figure 11.2. . . . . . 85

6

Intro duction
The origins of maximum entropy analysis can be traced back to the 18th century, to the works of Bernoulli and Laplace, and brought to our own day through the works of Jeffreys and Jaynes. Today, it is widely recognised as a fundamental tool of scientific inference. The MemSys5 package is the most advanced software available for maximum entropy analysis, and has been developed by Maximum Entropy Data Consultants Ltd. as a result of more than 10 years' research and development. Over this period, extensive experience has been gained in the application of these techniques to practical problems in commercial, academic, and government organisations around the world. Problems involving the reconstruction of distributions occur in many branches of science and engineering, and many techniques have been developed over the years to try to deal with them. One of the reasons for the enormous success of maximum entropy in an ever-increasing number of disciplines is that it is not an ad hoc technique, but is founded instead on the bedrock of probability theory. Thus the principles involved are universally applicable, and, if properly applied, allow the most efficient possible use of the data in the inference process. One of the ma jor benefits of an analysis based on probability theory is that it provides a mechanism for handling `noisy' and sparse data in an entirely rational and consistent manner. The result of the analysis is not just a single `best estimate' of the quantity of interest, but includes a precise statement of the confidence which can be placed in it. The MemSys5 package provides two tools for investigating and quantifying the accuracy of these estimates. Firstly, it provides a general-purpose routine which can be used to give a dynamic, visual illustration of the way in which the uncertainties in the data allow uncertainties in the results. Secondly, it provides a routine for quantifying how accurately specific features of the reconstructed distributions are determined. The MemSys5 package is written in Fortran 77, but a machine-translated version of the source code in C can be supplied instead. This manual describes the theory and practice of maximum entropy data analysis as implemented in the MemSys5 package. Part I summarises the derivation of the maximum entropy formalism from the basic laws of probability theory, and describes the algorithms used in the package. Part II contains a detailed description of the software itself. The MemSys5 package is a maximum entropy `inference engine'; to apply this `engine' to a specific problem, a set of interface routines is required to model the relationship between the data and the distribution of interest. The specifications for these interface routines are given, together with examples. The package is distributed with a complete, working "TOY" deconvolution program as a simple illustration of its use.

7

Part I

Theory

9

Chapter 1

Probability theory and Bayes' Theorem
1.1 Bayesian calculus for scientific inference

Bayesian probability calculus is the natural, and indeed the only, language for drawing inferences in a consistent fashion. Each of the hypotheses A, B , C , . . . which might be assessed in an experiment corresponds to a logical proposition of the form "X is true", where X stands for any of A, B , C , . . . . It follows that scientific data analysis is a special application of our methods for dealing with logical propositions. Let us suppose that there is a general calculus for dealing with such propositions. If there is, then obviously it must apply in special cases. Cox (1946) proved that any calculus which is consistent with the ordinary rules of Boolean algebra obeyed by propositions must be equivalent to Bayesian probability theory. Each proposition P carries a numerical code Pr(P ), obeying Pr(P, Q) = Pr(P ) Pr(Q | P ) Pr(P ) + Pr(P ) = 1 where " , " means "and", " | " means "given", and " " means "not". All such codes lie between 0 and 1, with the particular propositions "true" and "false" having the extreme values Pr(false) = 0, Pr(true) = 1 .

These are the standard rules of probability calculus, and accordingly the codes are called "probabilities". The only freedom allowed is to rescale the Bayesian probabilities by some arbitrary monotonic function, as percentages are obtained by multiplying by 100. Whatever codes were first assigned to the propositions, these could be mapped uniquely back to the Bayesian probability values, so that we may adopt this standard convention without any loss of generality. This definition of "probability" is, of course, consistent with the commonly used definition as a limiting frequency ratio of "successes/trials". Frequency experiments are a compelling simple case for which any general calculus ought to give the obvious rules derived from multiplication and addition of (large) integers. Bayesian calculus passes this test. Moreover, the Cox definition of probability is more general than the frequentist definition, because it refers to arbitrary propositions without having to invent a "many-worlds" scenario involving all the events which might have happened but did not. 11

12

CHAPTER 1. PROBABILITY THEORY AND BAYES' THEOREM

Conceivably, there may be no general language at all. One can imagine the existence of some other compelling simple case for which Bayesian calculus might give a demonstrably "wrong" result. We cannot prove that such counter-examples do not exist. However, it is difficult to see how one might be constructed. Despite much effort, nobody has yet found one, and so we shall adopt a rigorous Bayesian approach. In scientific data analysis, we wish to use our data D to make inferences about various possible hypotheses A, B , C , . . . . We have no rational alternative but to code this inference in terms of conditional probabilities Pr(A | D ), Pr(B | D ), Pr(C | D ), ...

5 Using "h" to represent any particular hypothesis A, B , C , . . . , we need the probability density Pr(h | D ) as a function of h. The data D do not give us this directly. Instead, the data give us the likeliho o d Pr(D | h) as a function of h. The particular algebraic form of the likelihood may involve delta functions for exact data, or Gaussian factors if the data have normally-distributed noise, and may include various correlations and other peculiarities of the observing system which produced the data. Just about the simplest example would be a single measurement D =h± of a single number h, which (assuming normally distributed error) is just a convenient shorthand for 1 Pr(D | h) = (2 2 )- /2 exp(-(D - h)2 /2 2 ). Another common example concerns data which are integer counts with mean h, sub ject to Poisson statistics. In that case, Pr(D | h) = e-h hD /D! In order to reverse the conditioning theorem. A useful trick for navigating when dealing with multiple propositions relevant, and expand it in various ways. from Pr(D | h) to the required Pr(h | D ) we use Bayes' through the morass of possible algebraic manipulations is to start with the joint probability density of everything Here we have Prior
?

Likelihood
?

Pr(h, D ) = Pr(h) = Pr(D )
6

Pr(D | h) Pr(h | D )
6

Evidence Inference

1.2. ENTROPIC PRIOR FOR A DISTRIBUTION

13

which is the basic tool of Bayesian data analysis. Basically, we seek the inference Pr(h | D ) = Pr(h) Pr(D | h)/ Pr(D ) about h (Bayes' Theorem), but the evidence Pr(D ) =
h

Pr(h, D )

is hardly less important. In order to calculate these quantities, we need the "prior probability" density Pr(h) as well as the experimental likelihood. This prior codifies our prior expectations about possible results h before acquiring the new data D . Probability theory tells us how to use data to modulate this prior expectation into a posterior inference, but it does not tell us how to assign the prior expectation in the first place. Actually, this is quite reasonable: a language should not impose on what one wishes to say, and it leaves open the question of assigning the prior. Different analysts may legitimately disagree on their choice of prior. However, the evidence lets us discriminate ob jectively between them. Each choice X , Y , . . . of prior leads to a numerical evidence Pr(D ), more properly written now as Pr(D | X ), Pr(D | Y ), . . . . Unless we have particular grounds for favouring one choice over another, our inferences Pr(X | D ), Pr(Y | D ), . . . about the available choices will follow the evidence values in proportion. Indeed, the evidence discriminates between priors in exactly the same way that the likelihood discriminates between hypotheses. In principle, one perhaps ought to average over the different choices of prior, but in practice the discrimination is usually good enough that we may simply choose the "best" available prior, on the basis of the evidence, and ignore the others.

1.2

Entropic prior for a distribution

Many distributions h of interest in science are positive and additive. For example, the intensity of light across an image is positive (obviously) and additive (meaning that the flux received across two non-overlapping patches is the sum of the individual fluxes). Likewise, a power spectrum is positive (obviously) and additive (meaning that the net power in two non-overlapping bands is the sum of the powers in the individual bands). We call a positive and additive distribution a PAD. We seek the prior density of h Pr(h) where each individual h now takes the form of a positive additive function h of position x (defined in one or more dimensions). h(x ) = density at position x so that h(x ) dx = quantity within the range of integration. The corresponding discrete form on L cells i = 1, 2, . . . , L is Pr(h ) where hi = quantity in cell i so that hi = quantity within the range of summation

14

CHAPTER 1. PROBABILITY THEORY AND BAYES' THEOREM

tends to the continuous form as the cells shrink and L . As a step towards finding Pr(h ), we consider the easier problem of assigning a "best" h . Later, we shall identify this with the most probable h , and thus gain information about Pr(h ) itself. Following the Cox mode of reasoning, we suppose that there is a general prior Pr(h ), and hence a generally valid rule for assigning the most probable h . If there is, it must be valid for special types of data, and in particular the most probable selection must also be valid for such data. We shall use the "kangaroo" argument (Gull and Skilling 1984a): If the proportion of some entity which has a given property is known to be p, then the most probable estimate of the proportion in some subclass which has that property is (in the absence of any information connecting the subclass with the property) the same number p. For example, if 30% of kangaroos are left-handed, then the most probable estimate of the proportion of kangaroos in Queensland which are left-handed is also 30% (unless more is known about the handedness of Queensland kangaroos). Remarkably, the consequence of this apparently weak requirement (Shore and Johnson 1980, Tikochinsky, Tishby and Levine 1984, Gull and Skilling 1984a,b) is that the "best" set of proportions pi (i = 1, 2, . . . , L) on L a priori equivalent cells must be obtained by maximising the entropy
L

S (p ) = -
i=1

pi log p

i

No other function will always give the required uncorrelated form of "best" proportions. This result is slightly restrictive in that proportions must sum to 1, whereas more general positive additive distributions need not. In fact, the only acceptable generalisation of this (to PADs h which need not add to 1) is to select h by maximising the entropy
L

S (h ) =
i=1

(hi - mi - hi log(hi /mi ))

(1.1)

where mi is the measure assigned to cell i (Skilling 1988). In the continuum limit the entropy becomes S (h) = dx (h(x ) - m(x ) - h(x ) log(h(x )/m(x ))). The global maximum of S is at h = m (where S = 0), so that the (negative) value of S measures the deviation of h from the assigned measure m. Often, m may be set to a constant, perhaps justified by an appeal to spatial invariance. More generally, m(x ) can be used as an initial or "default" mo del for the distribution h, to which h will default in the absence of data. Identifying the "best" h with the most probable, this result shows that the most probable h under definitive "testable" constraints (Jaynes 1978) is to be found by maximising the entropy of h. Applied directly to the assignment of probability densities, this prescription is the Principle of Maximum Entropy (Jaynes 1978), here separated from its usual derivation in terms of counting large numbers of states. However, our problem is more difficult: we do not wish merely to assign an h--we have to determine a full prior probability density Pr(h). If the most probable h is always to be found by maximising S , then Pr(h) can only be some as yet unknown but monotonically increasing function Pr(h) = ( S (h) ).

1.3. GAUSSIAN LIKELIHOOD

15

To find we need some quantified special case which is consistent with the selection requirement above and for which Pr(h) is known. We use the following "monkey" argument (Gull and Daniell 1978): If a team of monkeys throws a very large number N of quanta randomly at the L a priori equivalent cells of a distribution, then the probability of obtaining a particular set (n1 , n2 , . . . , nL ) of occupation numbers shall be proportional to the degeneracy N !/n1 !n2 ! . . . nL ! Of course, we do not suppose that distributions of interest have to be formed in this way; we merely remark that we would like to obtain the right answer in that special case. The consequence of this argument (Skilling and Gull 1989, Skilling 1989) is that must be of exponential form (S ) exp(S ) where is some constant. A full treatment yields also the diagonal metric tensor [ g ] = - S and 1 the corresponding invariant volume element (det[ g ]) /2 dL h which should be assigned to the space of distributions h . Here, and henceforward, we use square brackets [ ] to denote a diagonal matrix. For later convenience, we rewrite the metric tensor in contravariant form as [ µ ] = [ g ]-1 = (- S )-
1

1

whence the invariant volume element becomes dL h/ det[ µ ] /2 . The final result is that Pr(h V | ) = where V is a domain of PADs h , and ZS () = exp(S (h )) dL h/ det[ µ ]

1 / 2

1 ZS ()

exp(S (h )) dL h/ det[ µ ]
V

1 / 2

is the scaling constant required to ensure that Pr(h ) is properly normalised. We have now determined the prior completely, apart from the single number which remains unknown. Because S is dimensional (having the units of total h ) and the exponential must have dimensionless argument, it follows that too must be dimensional, having the inverse units to total h . Accordingly, cannot be determined a priori , and the prior analysis can go no further. If there is a general prior, it must be the one derived here. Conceivably, there may be no general prior at all. One can imagine the existence of some other compelling simple case for which this prior might be demonstrably "wrong". However, such other cases as have been investigated, such as the geometrical arguments of Levine (1986) and Rodrґ iguez (1989), serve to confirm this functional form, and might indeed be developed into alternative derivations of it. The coefficient is known as the "regularisation constant" by analogy with statistical regularisation theory.

1.3

Gaussian likeliho o d

Noise in experimental apparatus is often adequately described by normal statistics. Specifically, we then take the probability density for errors on a series of N measurements Dk (for k = 1, 2, . . . , N ) to be Gaussian, with a covariance matrix [ -2 ], so that the likelihood is Pr(D | h ) = e-L(h ZL
)

16

CHAPTER 1. PROBABILITY THEORY AND BAYES' THEOREM

where ZL = and L(h ) = = 1 (D - F (h ))T [ 2 1 2 (h ) 2
-2

e-

L(h )

dND,

] (D - F (h ))

(1.2)

and F (h ) are the data predicted from the hidden-space distribution h , so that ZL = (2 )
N / 2

/ det[

-1

].

We take the covariance matrix [ -2 ] to be diagonal, as from independent errors. Although these assumptions are not essential, they facilitate our analysis.

Chapter 2

Classic maximum entropy
2.1 Classic maximum entropy

Classic maximum entropy uses the Bayesian formulation just described to produce the posterior probability cloud Pr(h | D ). The full joint probability density can be expanded as Pr(h , , D ) = Pr() Pr(h | ) Pr(D | h , ) Taking these factors in reverse order, we can drop the conditioning on in Pr(D | h , ) because it is h itself which induces the data. Thus Pr(D | h , ) = Pr(D | h ) = exp(-L(h ))/ZL for Gaussian errors. Next, the factor Pr(h | ) is the entropic prior exp(S (h ))/ZS (). We use the notation S (h ) and L(h ) to indicate that the maximum entropy formalism can be extended to accommodate forms other than (1.1) and (1.2) on pages 14 and 16 for S (h ) and L(h ). Alternative entropy forms must, however, like the standard form 1.1, be convex functions of h with a global maximum value of zero. This will be illustrated in Chapter 3. Hence Pr(h , , D ) = Pr() Pr(h | ) Pr(D | h ) exp(S (h ) - L(h )) = Pr() . ZS () ZL

(2.1)

If L(h ) is a quadratic function of h , then for each there is necessarily (because S (h ) - L(h ) is also convex in h ) a single maximum of (2.1) over h at h (), where h obeys L S = . h h

Gaussian approximation
Further analysis using the exact form (2.1) appears to be algebraically and numerically intractable, and so we form a Gaussian approximation about h . The Hessian matrix of S - L is - 2 S (h ) - L(h ) h h = -
1

S+

L
1

= [µ- /2 ] I + [µ /2 ] = [µ- /2 ] B [µ- /2 ] 17
1 1

L [µ /2 ] [µ- /2 ]

1

1

18

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

where B I = I + A/ = identity matrix, and A = [µ /2 ]
1 1

L [µ /2 ].

Hence the Gaussian approximation to (2.1) is Pr(h , , D ) = Pr() exp(S (h ) - L(h )) 1 1 exp - (h - h )T [µ- /2 ] B [µ- /2 ] (h - h ) . 2 ZS ZL (2.2)

The regularisation constant
In order to investigate the r^ of more closely, we integrate h out of (2.2), obtaining ole Pr(, D ) = Pr(h , , D ) dL h/ det[ µ ]
1 / 2 1

- = Pr() ZL 1 exp(S (h ) - L(h )) (det B )- /2 .

(2.3)

We now need to assign Pr(). Because is a scale parameter, complete ignorance demands the improper Jeffreys prior Pr() -1 . However, impropriety carries a large penalty, in that the evidence Pr(D ) = d Pr(, D ) can become arbitrarily small. In practice, users of Jeffreys priors cut them off outside some largish range, but even then there is a large penalty relative to an analyst who already knew the order of magnitude of . It seems to us that this cost is too high. With theoretical ob jectivity being impractical, we propose to recover ob jectivity by convention. In practice, one almost always knows the order of magnitude of what one is going to observe (to which is inversely related). Accordingly, we recommend reducing both tails of the Jeffreys prior by single powers of , to give the fully convergent form Pr( | 0 ) = 20 2 (2 + 0 ) ( > 0).

Although the purist would require 0 to be fixed in advance, the pragmatist may observe that the experiment is likely to have been set up so that the inferred range of will not be too far from 0 . Hence we recommend choosing 0 to maximise the evidence Pr(D | 0 ). This involves evaluating Pr(, D ) for a sufficient range of , and performing the integrals over numerically. In this way, adequate ob jectivity can be retained, so that (for example) maximum entropy results could be compared with different Bayesian calculations (e.g., Skilling 1991). With realistically large datasets, though, it is seldom necessary to attend to such subtleties. In most practical cases, the evidence Pr(D | ) is so strongly peaked that it overwhelms any plausible prior on . We then just select this "best" value and proceed on the assumption that is then fixed. Differentiation of (2.3) shows that the evidence is extremal over when -2S (h ) = G, G = trace((B )-1 A)

and maximal when d(-2S /G)/d > 0 at the extremum. Using the eigenvalues of A to write G= +

2.1. CLASSIC MAXIMUM ENTROPY

19

gives G a natural interpretation as the number of "good" measurements. Each eigenvalue larger than represents an informatively accurate independent measurement, which contributes roughly 1 to G. Conversely, each eigenvalue less than represents an inaccurate measurement, which contributes roughly 0 to G. The most probable value of occurs when the number of good measurements equals the dimensionless amount -2S (h ) of structure in the distribution h (remember S is negative). When is small, G and S go to fixed limits, so that -2S /G < 1. Conversely, when is large, it is usually the case that the data are sufficiently informative to ensure -2S /G > 1. In that case, some intermediate value of will be the most probable. Although mildly unusual, it is perfectly possible for there to be more than one local maximum of Pr( | D), in which case the largest ought to be found and selected. It is also possible for the data to be sufficiently in accord with the assumed default model m that = is the most probable value, in which case the cloud for h collapses because there is no adequate evidence for any change from the model. With this assignment of , the corresponding probability cloud
1 1 Pr(h | D ) = constant в exp - (h - h )T [µ- /2 ] B [µ- /2 ] (h - h ) 2

(2.4)

is the maximum entropy result. It represents the inference about h , conditional in particular on the default model m . Accompanying it is the evidence
- Pr(D ) = ZL 1 exp(S (h ) - L(h )) (det B )-
1 / 2 1 / 2

= (2 )- for Gaussian errors. The term det[

N / 2

det[

-1

] exp(S (h ) - L(h )) (det B )-

(2.5)

-1

] in (2.5) confirms that Pr(D ) has the dimensions of [data]-N .

The maximum entropy tra jectory.
The solutions h for various values of define the "maximum entropy tra jectory", starting from h = m at = and ending at minimum L when = 0. As a matter of practical computation, it is advantageous to start from h = m and iterate down the tra jectory until the stopping criterion is satisfied. We call the value of at which this occurs the "stopping" value. It is usually timeconsuming to recover from a PAD h which lies far from the tra jectory. A general feature of maximum entropy tra jectories is that structure in h for which there is good evidence in the data is picked up first. Only later does the tra jectory bend around to acquire structure for which there is progressively weaker evidence. This qualitative behaviour can be seen in Figure 2.1, which illustrates maximum entropy applied to finding a two-cell PAD h = (h1 , h2 ) from the noisy data 0.56h1 + 0.83h2 = 5.32 ± 0.48, 0.83h1 - 0.56h2 = 4.24 ± 2.00. Solid contours are of L, L(h ) = 1 (0.56h1 + 0.83h2 - 5.32)2 /0.482 + (0.83h1 - 0.56h2 - 4.24)2 /2.002 . 2 Dashed contours are of entropy S (h ) = h1 + h2 - m1 - m2 - h1 log(h1 /m1 ) - h2 log(h2 /m2 ) with m1 = m2 = 1. The line of stars marks the tra jectory of maxima of entropy over different values of L, starting at the global maximum of entropy at h = m and ending at the L minimum where the data are fitted as closely as possible (exactly, in this example). Because S is a strictly convex function of h , and L is concave, each such maximum is necessarily unique.

20

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

m2

h2 - 0 0 m1 h1 - Figure 2.1: Maximum entropy tra jectory.

Maximum entropy clouds.
Figures 2.2 to 2.7 illustrate classic maximum entropy clouds for the numerical example of Figure 2.1. For any given value of , the posterior probability cloud of h takes a roughly Gaussian form centred at the corresponding point h () of the maximum entropy tra jectory. Each cloud is plotted as a scatter diagram of points randomly picked from the corresponding probability density exp(S - L), and scaled for display to a constant number of samples. Initially, at large (Figure 2.2), the cloud is small, being tightly constrained by the entropy. Later, at smaller , the cloud expands until (by about Figure 2.5) the accurate data have been fitted and the entropy exerts relatively little effect in such directions. At yet smaller values of , less accurate data become fitted until ultimately (at = 0) the cloud fits all the data (in so far as this is consistent with positivity of h ) and is defined purely by L (Figure 2.7). The most probable cloud is close to that in Figure 2.6, so that there happens to be rather little entropic regularisation in this small-scale problem.

2.1. CLASSIC MAXIMUM ENTROPY

21

m2

h2 - 0 0 m1 h1 - Figure 2.2: Maximum entropy cloud. m2 h2 - 0 0 m1 h1 - Figure 2.3: Maximum entropy cloud.

22

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

m2

h2 - 0 0 m1 h1 - Figure 2.4: Maximum entropy cloud. m2 h2 - 0 0 m1 h1 - Figure 2.5: Maximum entropy cloud.

2.1. CLASSIC MAXIMUM ENTROPY

23

m2

h2 - 0 0 m1 h1 - Figure 2.6: Maximum entropy cloud. m2 h2 - 0 0 m1 h1 - Figure 2.7: Maximum entropy cloud.

24

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

Integrated maximum entropy cloud.
The clouds in Figures 2.2 to 2.7 were plotted with fixed numbers of points, in order to produce clear pictures. Had they been plotted in accordance with the posterior probability of their values, the initial clouds far from the "best" Figure 2.6 would have been invisibly faint. Figure 2.8 shows the full posterior cloud, correctly integrated over the permissible values of . Even for this very small-scale two-cell problem, the integrated cloud is visually indistinguishable from the "best" individual cloud (Figure 2.6) obtained by fixing at its most probable value.

m2

h2 - 0 0 m1 h1 - Figure 2.8: Integrated maximum entropy cloud.

2.2

Inferences ab out the reconstruction

The posterior probability cloud Pr(h | D ) has the interesting and entirely correct property that as the number of cells L the quantity hi in any particular cell goes to zero as O(L-1 ), 1 whereas the corresponding standard deviation goes to zero only as O(L- /2 ). This means that the proportional error on individual components of h becomes arbitrarily large in the continuum limit. This is correct, because one should not expect macroscopic data of integral form Dk = dx h(x) Rk (x) ± noise

to be informative about the microscopic structure of h . However, macroscopic data are informative about macro scopic structure, and integrals of the form = dx h(x) p(x)

2.2. INFERENCES ABOUT THE RECONSTRUCTION

25

where p is a "mask" function of position are well determined. Technically, we may note at this point that the positive additive distribution h really is a distribution, and not a function. It is defined through data integrals over position and used through other integrals over position, so that it is a member of the dual vector space to that of functions of position. Members of this space are formally known as distributions, and their pointwise values become undefined in the continuum limit. Digitising to
L

=
i=1

hi p

i

the probability cloud for h integrates to give a Gaussian posterior density Pr( | D ) for , with = ± where = (as might be expected) and ()2 = p T [µ /2 ] B
1

hi pi = h T p
-1

[µ /2 ]p /.

1

For a given mask p(x), this posterior density of is well defined in the continuum limit L . p(x) might, for example, be set to 1 in a certain domain and 0 elsewhere, in which case would estimate the amount of h in that domain. To investigate a difference between two regions, might be set +1 in one region, -1 in the other, and 0 elsewhere. Clearly there are many possibilities. Note that the probability density of , like that of h , is conditional in particular on the assignment of the default model m . Were p to be one of the experimental response functions Rk , the corresponding mock datum would not reproduce the actual datum Dk exactly: prior probability factors almost always destroy exact fits to data. Neither would the error k be reproduced exactly. In fact the estimated error would be less than k , possibly considerably less. Indeed, if the original model m was sufficiently close to the data that there was no evidence for any difference, the entropy formalism would return h = m , and would claim all estimates to be perfectly accurate with zero error. That does not mean that the estimates become magically accurate--it just means that there is no evidence for any deviation from the model. Presumably the user is somewhat uncertain about what the reconstruction should be, otherwise there would be little reason to run maximum entropy. That means that the user will also be uncertain about the model m , because otherwise he would simply assign "m = the assumed truth". If this assignment of m was correct, the data ought to agree reasonably well, so that the formalism would be entirely correct to return h = m with no error, thus ignoring such discrepancies as could plausibly be explained as noise. Logically, such uncertainties in the model should always be taken into account, and will induce their own variations in subsidiary estimates of derived quantities. In the special case of the entropy formalism returning = and h = m , the entire onus of such variations falls upon the model. Indeed, advanced use of maximum entropy relies on suitably sophisticated treatment of the model m (Gull 1989). In practice, though, assignment of a simple, uniform model often gives good reconstructions with sensible error bars. One warning is that the statistics of are derived from a Gaussian approximation to Pr(h | D ). It can happen that a computed range extends into negative values, (e.g., ± = 0.1 ± 0.3) even when is a positive functional of the positive distribution h . Such results signal a local breakdown of the Gaussian approximation, which we are presently unable to quantify.

26

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

Although the above discussion concerned inference about a linear functional of the distribution h, this restriction is by no means necessary. Any functional = (h) has a probability density Pr( | D) = ( - (h)) Pr(h | D) dL h/ det[µ /2 ]
1

derived from that of h itself. The median location in a one-dimensional distribution is a good example of a nonlinear functional. The location of a maximum is another example. Here, though, one has to be more careful, because pointwise values of a distribution become undefined in the continuum limit, and the location of the maximum becomes correspondingly uncertain. That means that one should blur h to some finite resolution before attempting to determine a maximum. Whatever functional is required, one can obtain independent samples from its distribution simply by taking independent samples ht (t = 1, 2, . . . , T ) from the posterior probability density Pr(h | D ) of h, and evaluating (ht ) for each. Then the mean is estimated as =T the variance as ()2 = T
-1 t -1 t

(ht ),

((ht ) - )2

and so on. Other facets of the (non-Gaussian) probability density of can also be estimated, provided one takes sufficient samples. A particularly useful form of presentation is to display the samples ht as a "movie" with t representing time. This is even more effective if the successive samples are allowed to be correlated, because one then sees a sample ht moving ergodically and quasi-continuously through its probability distribution. The reliability of features in h can easily be assessed by eye from movie displays: all one does is watch the feature to see how frequently it is present. All this extra generality, to nonlinear functionals and movie displays, relies on being able to compute samples from the posterior probability density of h. It is one of the strengths of MemSys5 that this is now possible.

2.3

Basic iterative algorithm

The most efficient way of computing the required central reconstruction h () appears to be to iterate down the maximum entropy tra jectory, decreasing from its initial value of where h = m until the stopping value is reached. Having selected a value of , the first need is for a procedure to iterate h towards h () at that value. This is defined by the maximum of Q(h ) = S (h ) - L(h ) Locally, the quadratic approximation to Q is Q(h + h ) = Q(h ) + h where Q = -[µ- /2 ] (I + A) [µ- /2 ]
1 1

T

Q + 1 h 2

T

Q h ,

2.3. BASIC ITERATIVE ALGORITHM

27

and this may be presumed to be sufficiently accurate within some trust region near h , defined using the entropy metric [ µ ] by 2 | r |2 = h T [ µ ]-1 h r0 . The distance limit r0 defining the trust region radius should, on dimensional grounds, be set to 1 some value of the order of (trace[ µ ]) /2 . Maximisation of Q within the trust region is effected by h = [µ /2 ] ( I + A)-1 [µ /2 ] Q
1 1

where is set just large enough to ensure that the trust region constraint is obeyed. The second need is for to be changed (usually downwards) towards the stopping value. For safety, this should be done in steps sufficiently small that the corresponding change in h should not violate the current trust region constraint. This can be done alongside the iteration of h itself. The third need is to converge to the correct stopping value . This can always be defined by an expression of the form d < 0. () = 1 with d For classic maximum entropy, = G/(-2S ), although other criteria can be defined. Derivative information about d/d is generally difficult to compute, so that the stopping value is best approached by extrapolating or interpolating a table of values of (, ) remembered from previous iterates. Finally, a termination criterion is needed to determine when the computed probability cloud is sufficiently close to correct that the algorithm should be stopped. The correct cloud, centred at the correct tra jectory point h , is given by (2.4) on page 19:
1 1 Pr0 (h | D ) = constant в exp - (h - h )T [µ- /2 ] B [µ- /2 ] (h - h ) . 2

~ If a slightly different estimate h were produced, it would be taken to represent the slightly different cloud 1 1 ~ ~ Pr1 (h | D ) = constant в exp - (h - h )T [µ- /2 ] B [µ- /2 ] (h - h ) . 2 The mismatch between these clouds should be measured by their (positive) cross-entropy H= which evaluates to H= = where g = Q F h
T

Pr1 log(Pr1 / Pr0 ) dL h/(det[ µ ])

1 / 2

(h - h )T [µ-1/2 ] B [µ-1/2 ] (h - h ) ~ ~ 2 1 g T [µ 1/2 ] B -1 [µ 1/2 ] g 2

= S+

[

-2

] (D - F )

for Gaussian errors. Because of noise on the data, we expect random fluctuations ± on each datum, so that our knowledge of g is limited by g = F h
T

[

-1

]r

28

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

where r is a random vector of unit normal components. Accordingly, the expectation mismatch between clouds due to noise is H = 1 F r T [ -1 ] 2 h = 1 trace((B )-1 A) 2 = G/2. [µ /2 ] B
1

-1

[µ /2 ]

1

F h

T

[

-1

]r

This result can be understood by noticing that "bad" degrees of freedom in h are controlled by the entropy, and do not fluctuate with the noise, so only the good degrees of freedom contribute to the mismatch. We see that if H is less than some fraction of G, then the cloud represented by the current esti~ mate h overlaps the true cloud to within the corresponding fraction of the noise. This mathematical guarantee is the termination criterion. The basic iterative algorithm can be summarised as follows: (1) Scalars: S (h ) = and L(h ) = 1 (D - F (h ))T [ 2 (2) Iteration: h = [µ /2 ] ( I + A)-1 [µ /2 ] g
1 1

(h - m - h log(h /m ))
-2

] (D - F (h )).

and
1 H = 1 g T [µ /2 ] B 2

-1

[µ /2 ] g
2 0

1

with | r |2 = h T [µ-1 ] h r where g = S - L. (3) Probabilities: and where A = [µ /2 ] and B = I + A/. G gives the stopping criterion, and Pr(D | ) enables variables to be inferred. (4) Samples from the posterior cloud Pr(h | D ) h = h + [µ /2 ] B (5) Linear features of h : = and ()2 = p T [µ /2 ] B
1 1 1

- Pr(D | ) = ZL 1 exp(S (h ) - L(h )) (det B )-

1 / 2

G = trace((B )-1 A) L [µ /2 ]
1

-1 2 /

r / /2 .

1

h i pi = h T p
-1

[µ /2 ] p /.

1

2.4. THE INTRINSIC CORRELATION FUNCTION

29

2.4

The intrinsic correlation function

One of the fundamental axioms on which maximum entropy is based (Skilling 1989) is that entropy maximisation should not itself introduce correlations between individual elements of a PAD. This was illustrated in the "kangaroo argument" of Section 1.2. Nonetheless, in many (if not all) cases the ob ject of interest is known a priori to contain correlations, typically between neighbouring elements. The source and scale of the correlations depends on the physical nature of the ob ject or phenomenon under investigation. The recommended technique for encoding this correlation is to derive the quantity of interest, such as an image or spectrum, and henceforth designated f (y ) or f in the discrete case with M cells, as a `blurred' version of an underlying `hidden variables' PAD h(x) (Gull 1989, Charter 1990). The `blurring' is accomplished with an operator C (y , x), through an integral of the form f (y ) = or, in the discrete case
L

C (y , x) h(x) dx

fj =
i=1

Cj i h

i

(j = 1, 2, . . . , M )

so that f = C h. This operator is known as the intrinsic correlation function (ICF). By construction, all the prior expectation of correlation is assigned to the ICF, so that h itself is a priori uncorrelated. Accordingly, it is legitimate to use an entropic prior on h, and all the corresponding analysis remains valid. Without the ICF construction, the MaxEnt analysis would simply fall apart. Often, the intrinsic correlation may be modelled by convolution with a Gaussian (say) of some given width. More sophisticated usage is possible, however, and the explicit incorporation of intrinsic correlation functions is a ma jor advance in the maximum entropy formalism. Depending on the nature of the ICF, one may wish to digitise h and f at different resolution, so that h and f may be of different sizes, and so should formally be considered to belong to two different spaces. The space to which h belongs is known as hidden space, and that to which f belongs is known as visible space. We let L be the number of cells in hidden space (as before), and set M to the number of cells in visible space. It was pointed out in Section 2.2 that the proportional error on individual components of h becomes arbitrarily large in the continuum limit. Since f (y ) is defined in terms of an integral over h(x), however, pointwise errors on f are well-behaved. It should be noted, though, that a series of pointwise errors on f may well be highly correlated, and should therefore be interpreted with caution.

2.5

Predicted data

Although nonlinear variants are allowable, the predicted data F are generally considered to be related to the visible distribution f via a N в M matrix R of response functions, so that
M

F

k

=
j =1

Rkj fj

(k = 1, 2, . . . , N )

30

CHAPTER 2. CLASSIC MAXIMUM ENTROPY

M

L

=
j =1

R

kj i=1

Cj i h

i

and hence F = RC h and

F h

= R C.

Note again the dimensions of the three spaces: L = dimension of hidden space M N = dimension of visible space = dimension of data space.

2.6

Inferences ab out the noise level and other variables

The classic maximum entropy analysis carries with it the overall probability of the data from (2.5) on page 19: Pr(D ) = d Pr( | D ) = Pr( | D )
N / 2

= (2 )-

det[

-1

] exp(S (h ) - L(h )) (det B )- /2 .

1

This expression becomes useful when it is realised that, like any other probabilistic expression, it is conditional on the underlying assumptions. In fact all the probabilities derived in the classic maximum entropy analysis have been conditional upon a choice of model m , experimental variables defining the response functions, noise amplitudes , and so on, so that Pr(D ) is really a shorthand for Pr(D | m , R, , . . . ). If such variables are imperfectly known, these conditional probability values can be used to refine our knowledge of them, by using Bayes' theorem in the form Pr(variables | D ) = constant в Pr(variables) Pr(D | variables). Ideally, one would set up a full prior for unknown variables and integrate them out in order to determine Pr(h | D ). In practice, though, with large datasets it usually suffices to select the single "best" values of the variables, just as was the case for the regularisation constant . A common special case of this concerns experimental data in which the overall noise level is uncertain, so that all the standard deviations in Pr(D ) above should be scaled with some coefficient c. Rescaling to /c2 for convenience gives Pr(D | , c) = (2 c2 )-
N / 2

det[

-1

] exp (S (h ) - L(h ))/c2 (det B )- /2 .

1

For this case, the maximum entropy tra jectory itself is unaltered and parameterised by the same values of , though the probability clouds for h are of different overall size. The Evidence is maximised over c when c2 = 2 (L - S )/N . At this scaling, we note that (for linear data) the 2 misfit statistic 2 = (D - F )T [c-2
-2

] (D - F )

2.6. INFERENCES ABOUT THE NOISE LEVEL AND OTHER VARIABLES

31

evaluates to 2 = 2 L(h )/c2 . Thus, when the stopping criterion is satisfied at the most probable , 2 + G = N (= number of data).

We see that the N data separate into the "good" measurements which are sufficiently informative to be fitted and induce structure in the central reconstruction h , and the remaining number "2 " which are not informative about h and which are "bad" measurements serving merely to specify the noise scaling. A corollary of this is that 2 at h is always less than the number of data. Redressing the balance, the average 2 over the whole posterior cloud is just N . With large datasets, the noise scaling is determined to high accuracy, enough to overwhelm all but the most rigid prior knowledge of it, essentially because all the many "bad" measurements are being used to determine this single number. Accordingly, automatic noise scaling is usually recommended.

Chapter 3

Extensions and generalisations
3.1 Positive/negative distributions

The maximum entropy method can be extended to reconstruct distributions h which can be either positive or negative. It may be appropriate to describe such a distribution as the difference between two subsidiary positive distributions u and v : thus h = u - v. The total entropy, relative to a common model m , is
L L

S (u , v ) =
i=1

(ui - mi - ui log(ui /mi )) +
i=1

(vi - mi - vi log(vi /mi )).

If we wished to find both u and v separately, we would incorporate the linear mapping (u , v ) h into the ICF, and proceed with the standard analysis. However, we are not usually interested in this detail, wishing to estimate h only, with its corresponding visible distribution f . In that case, we note that under any constraint on h , the gradients with respect to u and v must be equal and opposite, S S =- u v so that the maximising u and v always obey uv = m2 . The entropy can now be rewritten as a functional of h alone.
L

S (h ) =
i=1

(i - 2mi - hi log((i + hi )/2mi )),

i = (h2 + 4m2 ) /2 . i i

1

All the original maximum entropy algebra can be performed with this revised form, with appropriate modifications to the gradient and curvature. The underlying PADs u and v disappear from the subsequent formulae, and we can work directly with h , which may now be of either sign. 33

34

CHAPTER 3. EXTENSIONS AND GENERALISATIONS

3.2

FermiDirac distributions

Another extension is to positive distributions which have an upper limit, such as reflective albedo, or electron density in a semiconductor band. Here we may think of the PAD h being accompanied by a second PAD k , such that hi + ki = 1 in each cell. (Setting the upper limit to 1 is a mere convention; any other value can be scaled to 1 by appropriately re-scaling h and k .) Because both h and k are positive, we have 0 hi 1 in each cell.

The total entropy, relative to a model mi for hi (and 1 - mi for ki ) is
L

S (h ) =
i=1

-hi log

1 - hi hi - (1 - hi ) log mi 1 - mi

Again, all the original maximum entropy algebra can be performed with this revised form, with appropriate modifications. The extra PAD k disappears from the subsequent formulae, and we can work directly with the now-limited h .

3.3

Quadratic entropy

Sometimes, one desires neither an upper nor a lower limit on the values of hi . In that case we define
L

S (h ) = -
i=1

h2 /2mi . i

Maximum entropy then reduces to least squares, scaled proportionally to model variance m. The simplest "derivation" of this assignment is to note that in the large- limit h m , the standard entropy approaches
L

S (h )

-
i=1

(hi - mi )2 /2mi ,

whence an offset of the origin of h gives the assigned quadratic form.

3.4

Fixed total

One does not always require the full generality of the standard entropy formula, and it is possible to restrict this in various ways. One such restriction is to require a fixed total
L L

hi =
i=1 i=1

mi = fixed constant.

Either one may know this total h accurately, from a particularly precise measurement, or one may impose it as part of the structure of the problem, as when h represents a set of proportions, summing to one by definition. This restriction on h removes one degree of freedom from the possible inferences. The calculations are performed as usual, but with that degree of freedom explicitly pro jected out.

Chapter 4

Algorithmic details
4.1 Iterative algorithm in hidden space

With the various extensions and generalisations just described, the stopping criterion takes one of the following forms: Option Option Option Option 1: 2: 3: 4: Classic maximum entropy: Classic automatic, with noise scaling: Ad hoc " fixed": Historic maximum entropy : = = = = G/(-2S ) Gc2 /(-2S ) with c2 = 2 (L - S )/N 1/ N/(2L) (see Appendix B)

The iterative hidden-space algorithm can then be summarised as follows: (1) Scalars: S (h ) = and L(h ) = 1 (D - R C h )T [ 2 (2) Iteration: h = [µ /2 ] ( I + A)-1 [µ /2 ] g
1 1

(h - m - h log(h /m ))
-2

or variant

] (D - R C h ).

and
1 H = 1 g T [µ /2 ] B 2

-1

[µ /2 ] g
2 0

1

with | r |2 = h T [µ-1 ] h r where g = S + C T RT [ (3) Probabilities: Pr(D | , c) = (2 c2 )- and G = trace((B )-1 A) where A = [µ /2 ] C T RT [ and B = I + A/.
1 N / 2

-2

] (D - R C h ). det[
-1

] exp (S (h ) - L(h ))/c2 (det B )-

1 / 2

-2

] R C [µ /2 ]

1

35

36

CHAPTER 4. ALGORITHMIC DETAILS

(4)

For the classic options, G gives the stopping criterion, and Pr(D ) enables variables to be inferred. Samples from the posterior cloud Pr(h | D ): h =h+ c2
1 / 2

[µ /2 ] B

1

-1 2 /

r.

(5)

Linear features of h : = and
2 1 ()2 = c p T [µ /2 ] B -1

h i pi = h T p [µ /2 ] p .
1

4.2

Vector co ding
-1 -1 -1 2 /

The matrix operations which are needed in various parts of the algorithm are B and trace(B
-1

b,

bT B

b,

B

b

A),

det(B ),
1 1

where B = I + A/ and A = [µ /2 ] C T RT [ -2 ] R C [µ /2 ]. Direct inversion of B and the evaluation of its determinant each require O(L3 ) operations. This would be impractical for large datasets, even if the O(L2 ) memory locations could be found for the list of elements. A practical large-scale algorithm can store only vector quantities, and its use of the transform routines C , C T , R and RT must be restricted to applying them (perhaps several times) to vectors. In order to evaluate B -1 b , the obvious, indeed only, vector operand is b itself, so that one becomes restricted to some fairly small n-dimensional subspace spanned by b , A b , A2 b , . . . , An-1 b . Within this subspace, a vector yn can be found which maximises
T T Y = 2 yn b - yn B yn .

As n increases, the subspace expands and y becomes less constrained, so that Y increases monotonically. Eventually, when n reaches L or perhaps before, Y reaches its greatest possible value Y0 Y1 Y2 · · · Ymax = b T B
-1

b

at y = B

-1

b.

By construction, the lower limits on Ymax are optimal in terms of the information available in the given subspace. Although the conjugate gradient algorithm organises this calculation efficiently, we need to know when to stop it. For this, we introduce another vector zn in the same subspace which maximises Z = 2zTAb - zTB Az. This calculation too can be organised by conjugate gradients, and we reach another sequence of optimal lower limits Z0 Z1 Z2 · · · Zmax = b T B -1 A b

4.2. VECTOR CODING

37

Now Ymax + Zmax / = b T B
-1

(I + A/) b = b T b ,

which is known on entering the calculations. Hence, just as Y0 , Y1 , Y2 , . . . give successively improved optimal lower limits to Ymax , so do Z0 , Z1 , Z2 , . . . lead to successively improved optimal upper limits. At any stage, we can write b T B -1 b [Y- , Y+ ] where Y- = Yn , Y+ = b T b - Zn /. We terminate the conjugate gradient calculations when n is large enough that the numerical uncertainty is sufficiently small: Y+ - Y- Y- where is whatever tolerance (say 0.01) which may be desired. As a technicality, there is no need to perform the calculations of Y and Z independently. Both use the same base vectors, and they can be merged together. In fact, accuracy is best preserved by slaving both of them to an underlying "master" conjugate gradient maximisation of X = 2 xTb - xTA x. B can be obtained similarly. The trace and determinant calculations can be carried out with the aid of one or more random normal vectors r , whose L components are correlated according to rr Consider Z = rT B
-1 T -1 2 /

= I.

A r.

This can be evaluated as above, using r as the input vector and obtaining Z [Z- , Z+ ] where Z- = Zn , The expectation over r is Z = rT B
-1

Z+ = (r T r - Yn ). A r = trace(B
-1

A),

which is one of the quantities (G) that we need. Moreover, the variance of this estimate is also available (dev(Z ))2 = 2 trace(B 2 rT B Thus we can write G = [Z- , Z+ ] ± dev(Z ). The random variation is additional to the numerical uncertainty caused by terminating the conjugate gradient calculation, and is part of the price we must pay for avoiding the direct, expensive calculation of the trace.
-2 -2

A2 )
T 2 zn A2 zn .

A2 r

38

CHAPTER 4. ALGORITHMIC DETAILS

Although the error thus induced depends on the random vector r , it will be systematic if the same r is used repeatedly. Thus, provided the same vector is re-used each iterate, it will not affect the actual convergence of the maximum entropy calculation, though it will affect the precise position on the tra jectory to which the calculation converges. This may well not matter much. G is the dimensionless number of good measurements, and 1 the standard deviation of its estimate turns out to be only about (in fact, less than) G /2 . For large datasets, with many good measurements, this variation will be relatively small, and unimportant. In effect, the integrations over the complete cloud which are needed to find G are being performed by a Monte Carlo exploration, but the space is so large that, paradoxically, a single exploration is usually sufficient. In any case, the variation can always be reduced, in proportion to the square root, by averaging the results over several random vectors r . An extension of this procedure gives the determinant det(B ) in parallel with the evaluation of the trace. The determinant too has its numerical and random uncertainties, so that the evidence too is computed in the form Evidence = [P- , P+ ] ± dev(P ).

4.3

Data-space algorithm

The hidden-space algorithm outlined in Section 4.1 involves computations with L-dimensional vectors and matrices, ideally with L large to give high resolution. However, any central reconstruction h maximises Q = S - L so that Q = 0 . Hence for standard entropy h must be of the form (exponentiating componentwise) h = m exp(C T RT [
-1

] w)

for some N -dimensional w , actually equal to -1 [ -1 ] (D - R C h ). Similar expressions can be found for the other variants. By using w to define h , memory space is saved when computing at high resolution, and accuracy improves because h automatically lies within the N -dimensional subspace of possible solutions. The data-space algorithm, finding and using w , is obtained by translating the hidden-space steps directly into data space. (0) (1) Preliminary: h = m exp(C T RT [ Scalars: S (w ) = and L(w ) = 1 (D - R C h )T [ 2 (2) Iteration: w = ( I + A )-1 g and H = 1 g TB 2 with | r |2 = w T A w r where
2 0 -1 -2 -1

] w)

or variant. or variant

(h - m - h log(h /m ))

] (D - R C h ).

Ag

4.3. DATA-SPACE ALGORITHM

39

g = -w + [ and A = [ and
-1

-1

] (D - R C h )
-1

] R C [ µ ] C T RT [

]

B = I + A /. (3) Probabilities: Pr(D | , c) = (2 c2 )- and G = trace((B )-1 A ). For the classic options, G gives the stopping criterion, and Pr(D ) enables variables to be inferred. Samples from the posterior cloud Pr(h | D ): h =h+ (5) Linear features of h : = and
2 1 ()2 = c p T [µ /2 ] B -1
N / 2

det[

-1

] exp (S (h ) - L(h ))/c2 (det B )-

1 / 2

(4)

c2

1 / 2

[µ /2 ] B

1

-1 2 /

r.

h i pi = h T p [µ /2 ] p .
1

Part I I

Practice

41

Chapter 5

The MemSys5 software package
The MemSys5 package (mnemonic Maximum entropy method System, 5th generation) of Fortran 77 subroutines is designed to converge to the posterior probability cloud Pr(h | D ), for a wide variety of types of data D . Options are available for constant or variable Gaussian noise or Poisson errors, as well as for different stopping criteria. The purpose of the ma jor subroutine MEM5 is to perform one iterate of the data-space maximum entropy algorithm, initialising a new run if necessary. It updates the regularisation constant and the Lagrange multipliers w , finds the corresponding central reconstruction h = m exp(C T RT [ -1 ] w ) or the appropriate variant, and optionally calculates the overall evidence Pr(D | ) and the number of good measurements. It also controls progress down the maximum entropy tra jectory towards the desired stopping value. The purpose of subroutine MOVIE5 is to obtain samples from the posterior probability cloud in hidden space, optionally correlated with previous samples. From this, the user can display a movie in which successive frames show a sample reconstruction moving ergodically through the posterior density (Skilling, Robinson and Gull 1991). This important addition to the MemSys package allows the user to see whatever variability is present in the results. It also allows quantification of nonlinear functionals such as a maximum or a median (Charter 1991), simply by accumulating the values of such quantities over several independent frames. The purpose of the quantification subroutine MASK5 is to take a user-defined mask p and evaluate = h T p , returning its mean = h T p and its standard deviation . In this way linear features can be estimated without having to average over MOVIE5 samples. The hidden distribution h , visible distribution f , data D , multipliers w , and related quantities used in the algorithm, are stored in areas whose contents are manipulated by the MemSys5 package. Henceforward in this document, angle brackets denote an area. MemSys5 itself uses at most 14 areas, but up to 40 can be defined, to allow for extensions and development within the MemSys framework. The internal arithmetic is carried out to REAL precision throughout. 43

44

CHAPTER 5. THE MEMSYS5 SOFTWARE PACKAGE

There is, of course, more detail to be understood, but the basic way of using the system is as follows. Call MEINIT to initialise the system. Set up storage areas and data D . Optionally CALL MEMTRQ to check consistency of transforms. Repeat . . . CALL MEM5 . . . until converged to h , with f = C h . Repeat . . . CALL MOVIE5 . . . to generate ergodic samples h from Pr(h | D ). Repeat . . . CALL MASK5 . . . to quantify any desired features = h T p .

Chapter 6

Storage areas
6.1 Area structure

The package works with up to 40 areas, each of which may be held in memory or externally on disc or elsewhere at the user's discretion. By convention, areas 1 to 10 are hidden-space areas, each capable of holding a complete hidden distribution of L cells. Similarly, areas 11 to 20 are visible-space areas, each capable of holding a complete visible distribution of M cells. Areas 21 to 30 are data-space areas, each capable of holding a complete set of N data. The uses of the areas are summarised in Table 6.1. Any unused area is available for the user's own purposes, if required. The MemSys5 package uses the working space ST to perform vector operations on the areas. This working space ST is supplied by the user as an argument to the main routines of the package. The package requests access to an area by calling the user-supplied subroutine VFETCH before the operation, and relinquishes access to it by calling the user-supplied subroutine VSTORE when the operation is complete. There are no restrictions on the storage device or location in which the user holds an area when it is not being accessed by the package, and the address in ST at which it is made available to the package may be different on each occasion it is requested. The user can set the number of elements of an area supplied to the package at any one time (the block size) through VFETCH.

45

46

CHAPTER 6. STORAGE AREAS

Table 6.1: MemSys5 area usage. Area 1 2 3 4 11 Space H H H H V Use Current h Hidden sample v = h + h Working space for transforms Default model m Mask p 1 1 Hidden offset h = - /2 c [µ /2 ] B Working space for transforms Current f = C h Visible sample C v Data D Accuracies [ -1 ] Multipliers w Input/output O O I I
-1 2 /

Notes MEM5 MOVIE5 unless DEF > 0 MASK5 MOVIE5 MEM5 MOVIE5 MEM5 unless ACC > 0 Not needed as input for initial call to MEM5 if METHOD(2) = 1 if METHOD(2) = 2

r

O O O I I I

21 22 23

D D D

O

24 25 26 27 28 29

D D D D D D

Normalised residuals [ -1 ] (D - F ) 1 (D - F )/(D + 1 ) /2 Working space for transforms Working space Working space Working space Differential responsivity

O O

Needed only if response is nonlinear

6.2. INITIALISATION: MEINIT

47

6.2

Initialisation: MEINIT

Before using any of the routines of the package, the user must CALL MEINIT This is a compulsory first call, which initialises the system to a defined state. It may also be called to re-initialise the package completely, erasing the set-up and access to results from the previous run.

6.3

Area handling and overlaying

The package's storage handling has been designed to leave as much freedom as possible in the way that areas may be stored when not being accessed. All the areas must be kept separate, except for areas 2 , 11 , and 25 . Within the package, all the transforms C and C T between hidden space and visible space are between areas 2 and 11 , and the transforms R and RT between visible space and data space are between 11 and 25 . To the extent that the user's transforms can be done in place, these three areas may be overlaid. The maximum number of areas to which the package will request simultaneous access is four, in which case one of the areas will always be the transform area 25 . Hence the absolute minimum required in ST is workspace for four blocks, one for part of each of the accessed areas. If storage is limited to this, the user must transform directly between his external areas, although the whole of ST would be available as working space. For convenience, though, transforms are usually performed in memory within ST wherever possible. Assuming that all the other areas are stored elsewhere, ST must then be long enough to hold the transform areas, plus workspace for three more blocks. It is, of course, generally more convenient to store all the areas directly in ST if this can be done. It is possible to `switch out' one or more observations from the data vector D , by setting the corresponding element(s) of the accuracies in area 22 to zero. The system will then run as though these elements of D were absent. Since area 22 contains reciprocal standard deviations, setting an element 1/k to zero effectively sets the standard deviation of Dk to infinity, and so Dk has no influence on the calculation. One or more elements of h can be `switched out', i.e., set to zero, by setting the corresponding elements of the default model m in area 3 to zero.

Chapter 7

The vector library
MemSys5 is supplied as two source-code modules, memsys5 with its INCLUDE file, and vector. The central module memsys5, hereafter referred to as the `system', performs all scalar calculations, all high-level manipulation of the storage areas and the production of all diagnostics. This module is independent of any hardware considerations. All the actual vector arithmetic, and the addressing needed to control it, is handled by the subroutines in the vector module, comprising the vector library. This means that it is relatively straightforward to transfer MemSys5 to array processors, parallel systems, or other hardware architecture. The system allows areas to be accessed piecemeal in blocks, largely under the user's control so that best use can be made of a possibly limited amount of memory. Sequences of calls to the vector library always form a chain, in which one or more vector operations are performed on successive blocks of up to four areas at a time (in which extreme case one of the areas is always the transform area 25 ). The following example illustrates a typical sequence of calls. In it, the contents of 23 and 24 are being used to calculate new contents for 24 and 27 . Enter vector chain: Repeat . . . CALL fetch next CALL fetch next CALL fetch next begin at the start of each area. block of 24 (input values needed) block of 27 block of 23 (input values needed)

CALL first vector operation on block of 23 , 24 , 27 CALL second vector operation on block of 23 , 24 , 27 . . . CALL last vector operation on block of 23 , 24 , 27 of 23 of 24 (output values to be preserved) of 27 (output values to be preserved) exhausted: exit vector chain. CALL store block CALL store block CALL store block . . . until the areas are

The system will never mix calls between the different spaces (hidden, visible and data), so that all the areas in a chain should be of the same length. Every "fetched" area will subsequently be "stored", and conversely every area to be "stored" will first be "fetched". As illustrated, however, the calls to "fetch" the areas may be in any order, though the calls to "store" them will be in serial order. 49

50

CHAPTER 7. THE VECTOR LIBRARY

7.1

Area access routines VFETCH and VSTORE

VFETCH and VSTORE control the addressing, and are the only subroutines which require access to the user's pointer block COMMON /VPOINT/ which describes the user's storage structure. Hence, if the format of /VPOINT/ becomes inappropriate or inadequate, the only subroutines which need to be changed are VFETCH and VSTORE. As supplied, VFETCH and VSTORE have a rather complicated form which encodes dynamic control of blocked input/output to external storage outside the ST array. Dynamic control is in the interests of efficient I/O, but is not essential for proper operation of the system. Should the user wish to alter VFETCH and VSTORE, he must adhere to their specifications: SUBROUTINE VFETCH(ST,J,IOFF,ACTION, KCORE,LENGTH, MORE) REAL ST(0:*) arithmetic workspace array INTEGER J Input: area number INTEGER IOFF Input: first element of current block LOGICAL ACTION Input: contents to be read? INTEGER KCORE Output: start address of current block INTEGER LENGTH Output: number of elements in current block INTEGER MORE Input: start of new chain? SUBROUTINE VSTORE(ST,J,IOFF,ACTION, KCORE,LENGTH, MORE) REAL ST(0:*) arithmetic workspace array INTEGER J Input: area number INTEGER IOFF Input: first element of current block LOGICAL ACTION Input: contents to be preserved? INTEGER KCORE Input: start address of current block INTEGER LENGTH Input: number of elements in current block INTEGER MORE Output: any more elements? VFETCH enables access to a single block of area J , starting at element IOFF. This block may comprise either all or just part of the area. IOFF is controlled by the system. It will be zero for the first element of J , and will thereafter be incremented by LENGTH until all the elements of J have been accessed. VFETCH must return a storage address KCORE pointing to where the block may thereafter be found in ST. VFETCH must also return LENGTH, being the number of elements in the block. Thus on return from VFETCH, elements IOFF, . . . , IOFF + LENGTH - 1 of J are to be found in ST(KCORE), . . . , ST(KCORE + LENGTH - 1). Within a particular loop of a vector chain, the system will keep supplying the same offset IOFF, and VFETCH must keep returning the same LENGTH. If ACTION is .TRUE. then the system needs the current block of J for input, but if ACTION is .FALSE. the current values values are not needed. MORE is a status flag. On entry to each vector chain, the system will supply MORE = 0, but (sub ject to a restriction on its value as an output from VSTORE) the user is then free to use this flag for his own purposes. VSTORE relinquishes access to a previously "fetched" block of ST. For each area number J, the offset IOFF, start address KCORE and length LENGTH will all be the same as on return from the previous VFETCH call. If ACTION is .TRUE., then the contents of the block have been altered and must be preserved, but if ACTION is .FALSE. any new values need not be preserved. The system will, however, assume that any old values which were unchanged by the vector calls will remain in place. A call to VSTORE with ACTION = .FALSE. does not mean that the old values of the block may be altered. On exit from the final VSTORE in a loop, the system will interrogate the value of

7.1. AREA ACCESS ROUTINES VFETCH AND VSTORE

51

MORE. A zero value of MORE signals the end of a chain to the system, and VSTORE must provide this only at the correct time. After the system has relinquished access to a block with VSTORE, it will not attempt to access it again without first enabling it with VFETCH, so the relevant part of ST may be used for other purposes if J is not being permanently stored there. Although it is possible to include considerable sophistication in VFETCH and VSTORE if hardware considerations demand it, the routines do not have to be complicated. If all the areas are held permanently at fixed locations in ST, the following pair suffices. SUBROUTINE VFETCH(ST,J,IOFF,ACTION, KCORE,LENGTH, MORE) REAL ST(0:*) LOGICAL ACTION COMMON /VPOINT/ KL(40),KB(40) Area lengths KL and base addresses KB KCORE=KB(J) LENGTH=KL(J) END SUBROUTINE VSTORE(ST,J,IOFF,ACTION, KCORE,LENGTH, MORE) REAL ST(0:*) LOGICAL ACTION END For the implementations of VFETCH and VSTORE supplied in vector, the COMMON block /VPOINT/ is as follows: COMMON /VPOINT/ KL(40),KB(40),KA(40),KWORK,LWORK,LENREC In this scheme, the array KA is used to indicate whether area J is held in the ST array in memory (KA(J) = 0) or on disc (KA(J) = 1). The array KB again holds start addresses for the areas. If area J is held in ST, then KB(J) holds the start address in ST, as above. If area J is held on disc, then KB(J) holds the start address on a direct-access file (or some equivalent) with a record length of LENREC REALs. In this case, the start address is encoded in KB(J) as an element of a (suitably large) imaginary array, with every area starting at the beginning of a new record. If any areas are stored on disc, then contiguous workspace is required in ST for disc buffers, in which the package can access the requested portions of these areas. The start address of this workspace must be supplied in KWORK, and its length in LWORK. The minimum amount of workspace required in ST is discussed in Section 6.3 on page 47. The following code fragment illustrates how /VPOINT/ might be set up: COMMON /VPOINT/ KL(40),KB(40),KA(40),KWORK,LWORK,LENREC . . . LENREC = record length (in REALs) for disc I/O, typically 1024 elements DO 1 for each area J used IF ( J held in ST ) THEN KA(J) = 0 KB(J) = start address within ST ELSEIF ( J held on disc ) THEN KA(J) = 1 KB(J) = start address on disc

52

CHAPTER 7. THE VECTOR LIBRARY

ENDIF KL(J) = length of J 1 CONTINUE KWORK = start of disc buffer area in ST LWORK = length of disc buffer area in ST OPEN( UNIT = lun, ACCESS = 'DIRECT', . . . ) if any areas are on disc . . . Proceed to MemSys5 calls, etc.

7.2

Disc-handling routines UFETCH and USTORE

The implementations of VFETCH and VSTORE in vector call the user-supplied routines UFETCH and USTORE to perform disc I/O. SUBROUTINE UFETCH(ST,KCORE,KDISC,LENGTH) REAL ST(0:*) and SUBROUTINE USTORE(ST,KCORE,KDISC,LENGTH) REAL ST(0:*) should read or write a block of LENGTH REALs, starting at address KCORE in ST, from or to disc, starting at disc address KDISC. The disc address KDISC will always correspond to the start of a disc record. The number of elements LENGTH to be transferred will not necessarily be an integer multiple of the record length LENREC, but incomplete records will be read or written only at the end of a storage area. Examples of these routines are as follows: SUBROUTINE UFETCH(ST,KCORE,KDISC,LENGTH) REAL ST(0:*) COMMON /VPOINT/ KL(40),KB(40),KA(40),KWORK,LWORK,LENREC DO 1 I=0,(LENGTH-1)/LENREC READ (UNIT = lun, REC = 1+KDISC/LENREC+I) * (ST(KCORE+K),K=I*LENREC,MIN((I+1)*LENREC-1,LENGTH)) 1 CONTINUE END and SUBROUTINE USTORE(ST,KCORE,KDISC,LENGTH) REAL ST(0:*) COMMON /VPOINT/ KL(40),KB(40),KA(40),KWORK,LWORK,LENREC DO 1 I=0,(LENGTH-1)/LENREC WRITE (UNIT = lun, REC = 1+KDISC/LENREC+I) * (ST(KCORE+K),K=I*LENREC,MIN((I+1)*LENREC-1,LENGTH)) 1 CONTINUE END

7.3. ARITHMETICAL VECTOR ROUTINES

53

7.3

Arithmetical vector routines

All the vector arithmetic controlled by the package, apart from the transforms, is performed by the arithmetic routines in the vector library. Only these routines have to be replaced if the arithmetic is to be performed somewhere other than in the array ST in memory, on an array processor or such hardware. Several of these arithmetical routines are simple, and commonly available in external libraries supplied with array processing hardware. VFILL fills a block with a scalar. VMOV moves a block to another location. VSWAP interchanges two blocks. VMUL multiplies two blocks. VDIV divides two blocks. No denominator will be zero. VLOG returns the logarithm of a strictly positive block. VSQRT returns the square root of a non-negative block. VSIGN returns the signs (+1.0 or -1.0) of a block. VSADD adds a scalar to a block. VSMUL multiplies a block by a scalar. VSMULA multiplies a block by a scalar, then adds a second block. VSUM returns the sum of the elements of a block. As supplied, VSUM uses a butterfly pattern of addition to preserve full arithmetical accuracy. Whilst this can be useful on occasion, especially with large-scale problems where rounding errors might otherwise accumulate, it is seldom essential. VDOT returns the scalar product of two blocks. As supplied, VDOT, like VSUM, uses a butterfly pattern of addition to preserve full arithmetical accuracy. The `VRAND' group of routines is more specific to MemSys5, and might need special coding in terms of external library routines. Such coding might involve extra "dummy" buffers to hold blocks of intermediate results. Such buffers could safely overlay areas 27 and 28 . VRAND fills a block with samples from the unit normal distribution. VRAND0 initialises the generator with an integer seed. VRAND1 saves the generator state. VRAND2 restores the generator state. As supplied, the generator state is stored in COMMON /VRANDC/ P1,P2,P3,P4,P5,P6

54

CHAPTER 7. THE VECTOR LIBRARY

Finally, the last arithmetical routine VMEMX sets any nonlinear corrections to the responsivity transform (METHOD(3) = 2). As supplied, this routine calls the user's SUBROUTINE UMEMX for each element of the area. For efficient vectorisation, the code for UMEMX should be pulled into an appropriately recoded version of VMEMX. VMEMX sets the nonlinear option.

Chapter 8

Transform routines
8.1 Transform subroutines OPUS, TROPUS, and UMEMX

The user must supply two subroutines OPUS and TROPUS which transform between visible space and data space. SUBROUTINE OPUS(ST,JM,JN) should apply R to the visible-space distribution from area JM to calculate the corresponding data-space quantities
M

Fk =
j =1

Rkj fj

or

F = Rf

and write them to data area JN . Areas JM and JN can be, and usually are, of different length. SUBROUTINE TROPUS(ST,JN,JM) should apply RT to data-space quantities from area JN to calculate the corresponding visible-space quantities
N

fj =
k=1

Rk j F

k

or

f = RT F

and write them to area JM . In these formulae, R is the response function of the apparatus producing the data, which is assumed constant, as for a linear experiment giving D = Rf ± The package always calls OPUS and TROPUS with JM = 11 and JN = 25, but the explicit arguments remain in place in order to preserve flexibility and compatibility with earlier maximum entropy programs. Clearly it will usually be convenient to have both area JM and area JN in ST, and possibly overlaid if the transform coding allows it, so that a simple OPUS routine could be constructed as follows. SUBROUTINE OPUS(ST,JM,JN) REAL ST(0:*) 55

56

CHAPTER 8. TRANSFORM ROUTINES

COMMON /VPOINT/ KL(40),KB(40) CALL VOPUS(ST,KB(JM),KB(JN),KL(JM),KL(JN)) END SUBROUTINE VOPUS(ST,JU,JV,M,N) REAL ST(0:*) Matrix multiply DO 2 K = 0,N-1 X = 0.0 DO 1 J = 0,M-1 X = X + TransformMatrix(K,J) * ST(JU+J) 1 CONTINUE ST(JV+K) = X 2 CONTINUE END The routine TROPUS would be similar SUBROUTINE TROPUS(ST,JN,JM) REAL ST(0:*) COMMON /VPOINT/ KL(40),KB(40) CALL VTROP(ST,KB(JN),KB(JM),KL(JN),KL(JM)) END SUBROUTINE VTROP(ST,JV,JU,N,M) REAL ST(0:*) Transpose matrix multiply DO 1 J = 0,M-1 X = 0.0 DO 2 K = 0,N-1 X = X + TransformMatrix(K,J) * ST(JV+K) 1 CONTINUE ST(JU+J) = X 2 CONTINUE END These examples assume a storage scheme similar to the one suggested in Section 7.1 on page 51, in which the lengths and start addresses of areas in memory are held in KL and KB, and the elements are stored sequentially. Any quantities other than lengths and start addresses must be passed to the transform routines in a separate COMMON block. Nonlinearities of the type Dk = (Fk ) ±
k

where is some known and reasonably well-behaved function can be tolerated by the algorithm, provided that area 29 is defined, and that a subroutine UMEMX is supplied which calculates the values of and its derivative for a given value of Fk . SUBROUTINE UMEMX(F,PHI,DPHI) REAL F,PHI,DPHI should set PHI = (F ) and DPHI = d(F )/dF . The package always calls UMEMX with its three arguments at different locations.

8.2. INTRINSIC CORRELATION FUNCTION ROUTINES ICF AND TRICF

57

8.2

Intrinsic correlation function routines ICF and TRICF

The user is also required to supply two routines ICF and TRICF which are analogous to OPUS and TROPUS. SUBROUTINE ICF(ST,JL,JM) should apply the ICF operator C to the hidden-space distribution from area JL to calculate the corresponding visible-space quantities
L

fj =
i=1

Cj i h

i

or

f =Ch

and write them to visible area JM . Likewise, SUBROUTINE TRICF(ST,JM,JL) should apply C T to visible-space quantities from area JM to calculate the corresponding hiddenspace quantities
M

hi =
j =1

Cj i fj

or

h = CTf

and place the result in hidden-space area JL . There is no need for JL and JM to be of equal length for these routines. Within the MemSys5 package, ICF and TRICF are always called with JL = 2 and JM = 11. Examples of ICF and TRICF are supplied with the MemSys5 package. For the user's convenience, these can operate in-place.

Chapter 9

Diagnostics and utilities
9.1 Diagnostics routine UINFO

The MemSys5 package itself performs no I/O, but instead directs all its diagnostic output through calls to the user-supplied subroutine UINFO, which should be as follows: SUBROUTINE UINFO(STRING,MLEVEL) INTEGER MLEVEL CHARACTER*(*) STRING The character string STRING contains the diagnostic message, and MLEVEL is the diagnostic type, as shown in Table 9.1. Diagnostics in the internal box (MLEVEL = 10) are generally found to be the most useful. Table 9.1: Coding of INFO calls with MLEVEL. Type Progress MLEVEL 1 2 Numerical 10 20 Interpretation Names of main routines as they are called Flowchart of area input/output Descriptor with triple equals sign === Internal technical

An internally documented example of UINFO is supplied with the MemSys5 package. When UINFO is called with MLEVEL = 1, STRING will contain the name of one of the main internal routines of the package, together with any area numbers which that routine takes as arguments. For example, MeCopy(25,21) indicates that MECOPY has been called to copy the contents of area 25 to 21 . 59

60

CHAPTER 9. DIAGNOSTICS AND UTILITIES

When UINFO is called with MLEVEL = 2, STRING will contain one line of the flowchart showing area usage by the package in the vector operation just finished. It is often helpful to read this in conjunction with the preceding MLEVEL = 1 call, which indicates the subroutine involved. For example, the flowchart string following the above call was: .......... .......... W...R..... ..........

This represents the status of each of the package's 40 areas, arranged in the four groups of ten: 1 to 10 , 11 to 20 , 21 to 30 and 31 to 40 . The area is represented by a dot `.' if no operation has occurred. If the area has been read, then `R' appears. In the above example, MECOPY has read the contents of area 25 . If the area has been written to, then `W' appears. In this example, MECOPY has written to area 21 . If the contents of the area have been read and written in the same vector operation, then `U' (Update) appears. Finally, a semicolon `:' would denote an area used as temporary workspace. UINFO calls with MLEVEL = 10 contain the main scalars calculated by MEM5, for example Entropy === -3.3419E+03 Test === .0004 Chisq === 3.3359E+01

These values are returned also in the output arguments of MEM5, in this case S, TEST and CHISQ. UINFO calls with MLEVEL = 20 contain detailed technical diagnostic information about the package's internal functioning. This will generally be of little concern to the user.

9.2

Transform checking subroutine MEMTRQ

A common error in programming transform routines is inconsistency between ICF and TRICF, or OPUS and TROPUS, which arises if these routines do not represent each other's transpose. Matrices Q and T are each other's transpose if and only if uT T v = vT Q u for all vectors u , v .

Subroutine MEMTRQ (acronym MEM TRopus Query) enables this relationship to be checked for individual pairs of vectors u , v in areas 1 and 26 respectively. CALL MEMTRQ(ST,ISEED,ERR) If the INTEGER ISEED is greater than zero, then MEMTRQ will use the package's internal random generator to set up areas 1 and 26 . If ISEED is less than or equal to zero, MEMTRQ will use 1 and 26 as supplied by the user. The routine will return the dimensionless REAL ERR: ERR = where x = Opus ( ICF (u ) ) and y = TrICF ( Tropus (v ) ) . If OPUS/TROPUS and ICF/TRICF have been consistently coded, the value of ERR should be appropriate to the rounding errors expected in the transforms. Most inconsistencies show up as considerably larger errors. Sometimes, addressing errors show up only after a second call to a routine, so that a repeated call to MEMTRQ is recommended. Of course, MEMTRQ cannot detect all errors. Subroutines can be consistent, but still wrong. | uT y - vT x | (|u | |y | |v | |x |)
1 / 2

9.3. SAVE AND RESTORE UTILITY: MESAVE AND MEREST

61

9.3

Save and restore utility: MESAVE and MEREST

This facility allows the user to stop a MemSys5 run after any iterate, and preserve outside the package the information necessary to restart the run from exactly the stage at which it was halted. It can also be used, for example, to store the information from a converged MEM5 run so that one of the other routines, such as MOVIE5 or MASK5, can be run from a separate program. To save the information from a run, the user should CALL MESAVE This will cause a call to the user-supplied routine USAVE which should be as follows: SUBROUTINE USAVE(INTS,NINTS,REALS,NREALS) INTEGER NINTS,NREALS,INTS(NINTS) REAL REALS(NREALS) The user is then required to save the contents of the arrays INTS, and REALS externally, perhaps on disc. The values of the input arguments NINTS and NREALS will be approximately 9 and 33 respectively, depending on the MemSys5 version being used. The vector storage areas 3 (if DEF 0), 21 , 22 (if Poisson or ACC 0), 23 (and 29 if using nonlinear response) must also be preserved at this stage. The user will also need to save his storage addressing information in /VPOINT/, as well as any other variables passed in COMMON blocks to the transform routines. A call to MESAVE during a run has no effect on that run, which may be continued if desired. In order to restart the package, the user should CALL MEREST This will result in a call to the user-supplied routine SUBROUTINE UREST(INTS,NINTS,REALS,NREALS) INTEGER NINTS,NREALS,INTS(NINTS) REAL REALS(NREALS) in which NINTS and NREALS are input REALS the values supplied by USAVE. also be restored, along with /VPOINT/ Examples of USAVE and UREST are arguments. The user should restore to the arrays INTS and The contents of the vector storage areas listed above must and any other COMMON blocks. supplied with the package.

9.4

Error messages

All the errors detected within the MemSys5 package are deemed to be fatal, and so if an error is encountered the package will STOP with one of the following error messages: Illegal MEMRUN value MEMRUN is an INTEGER switch for initialisation or continuation of a MEM5 run, and must be set to a value between 1 and 4 inclusive when calling MEM5. See page 67. Illegal METHOD(0) value METHOD(0) is an INTEGER which determines the stopping criterion, and must be set to a value between 1 and 4 inclusive when calling MEMSET. See page 65.

62

CHAPTER 9. DIAGNOSTICS AND UTILITIES

Illegal METHOD(1) value METHOD(1) is an INTEGER which determines the type of entropy to be used, and must be set to a value between 1 and 5 inclusive when calling MEMSET. See page 65. Illegal METHOD(2) value METHOD(2) is an INTEGER which determines the type of likelihood (Gaussian or Poisson) to be used, and must be set to either 1 or 2 before calling MEMSET. See page 65. Illegal METHOD(3) value METHOD(3) is an INTEGER which determines the type of response (linear or nonlinear) to be used, and must be set to either 1 or 2 before calling MEMSET. See page 66. Method incompatible with Poisson statistics Poisson likelihood cannot be used with automatic noise scaling. Illegal NRAND value NRAND is an INTEGER which sets the number of random vectors used to calculate G and Evidence for the classic maximum entropy stopping criteria. If one of these criteria is used, NRAND must be set greater than zero before calling MEMSET. See page 66. Illegal RATE value RATE is a REAL distance limit for a MEM5 iterate. It must be set greater than 0.0 before calling MEMSET. See page 66. Illegal AIM value AIM is the required REAL value of the stopping criterion. It must be set greater than or equal to 0.0 before calling MEMSET. See page 66. Illegal UTOL value UTOL is the dimensionless REAL tolerance demanded of various computations. It must be set between 0.0 and 1.0 inclusive before calling MEMSET. See page 66. Illegal area number MemSys storage areas are numbered from 1 to 40 inclusive. See page 45. TOLerance determination failed - see MEINIT source code The package attempts to estimate the computer's arithmetic tolerance TOL, i.e., the smallest positive number for which (1.0+TOL) .GT. 1.0 (or (1.0D0+TOL) .GT. 1.0D0 for DOUBLE PRECISION) is .TRUE., in MEINIT. If this calculation fails, then TOL may be set explicitly by editing the source code in MEINIT, for example TOL = 1.0E-7 for REAL or TOL = 1.0D-16 for DOUBLE PRECISION. The exact value is not critical. MEVRND needs separate output and work areas When generating correlated random vectors, MEVRND requires an area as workspace. This area must not be the same one as the output area. This error will not arise from the MemSys5 package's use of the routine. In addition the following errors are detected by the storage-handling routines:

9.5. MISCELLANEOUS SUBROUTINES

63

Inconsistent lengths All the areas involved in a vector operation must be of the same length. Not enough space for buffers If areas are stored outside the workspace array ST, then LWORK must be large enough to allow temporary buffers to be set up in ST. See page 51. Stack overflow The maximum number of buffers in the workspace array ST which can be used simultaneously is four, and requests to VFETCH for further buffers will cause this error. It will not arise from the MemSys5 package's use of this routine. See page 51. Stack underflow This indicates an error in the use of the storage-handling routines, such as a call to VSTORE which was not preceded by a corresponding call to VFETCH. This error will not arise from the MemSys5 package's use of these routines. See page 51.

9.5

Miscellaneous subroutines

MECOPY and MESWAP
The user may find the following routines useful for copying data from one area to another and interchanging the contents of two areas. CALL MECOPY(ST,J,K) will copy the contents of area J to K . If J = K, then no operation occurs. A common use of this facility is for loading an externally-stored area. The values may easily be read into a transform area held in memory in ST, and then copied to a less accessible destination area with MECOPY. Thus the data file could be read into 25 , before copying it to 21 which might be stored on disc. CALL MESWAP(ST,J,K) will interchange the contents of areas J and K . If J = K, then no operation occurs. MESWAP can be used for diagnostic purposes, swapping a disc area into memory for inspection, before returning it non-destructively to disc. The MemSys5 package itself, however, makes no use of any SWAP routines.

MEMICF and MTRICF
CALL MEMICF(ST,JL,JM) will apply the ICF operator C to JL , putting the result in JM . The routine uses the standard transform areas 2 and 11 as workspace. This enables a hidden-space area to be viewed in visible space. Conversely CALL MTRICF(ST,JM,JL) will apply C
T

to JM , putting the result in JL , again using 2 and 11 as workspace.

Chapter 10

Sp ecifications of the ma jor subroutines
10.1 MEMSET arguments

The MemSys control variables are set by CALL MEMSET(METHOD,NRAND,ISEED,AIM,RATE,DEF,ACC,UTOL) which must be called before MEM5. The package's internal random generator is also re-initialised by every MEMSET call. All the arguments are input variables. In accordance with Fortran typing, METHOD, NRAND and ISEED are INTEGERs, and the other arguments are REALs. Input variables should be set as follows: METHOD is an INTEGER array for setting options, encoded as follows: METHOD(0) sets the stopping criterion: 1 Classic maximum entropy 2 Classic automatic, with noise scaling 3 Ad hoc , fixed and specified by AIM 4 Historic `2 = N ' maximum entropy METHOD(1) sets the type of entropy: 1 Standard entropy 2 Positive/negative entropy 3 FermiDirac entropy 4 Quadratic `entropy' 5 Standard entropy with METHOD(2) sets the type of noise: 1 Gaussian statistics 2 Poisson statistics
i

INTEGER

hi =

i

mi .

65

66

CHAPTER 10. SPECIFICATIONS OF THE MAJOR SUBROUTINES

METHOD(3) allows some nonlinearity 1 response is linear in h 2 response is nonlinear, given by SUBROUTINE UMEMX NRAND is the number of random vectors to be used to calculate the Evidence. NRAND is needed for the "classic" options METHOD(0) = 1 and METHOD(0) = 2, and is normally set to 1. is the seed for the internal pseudo-random number generator, which is re-initialised by every MEMSET call. ISEED is needed for the "classic" options METHOD(0) = 1 and METHOD(0) = 2, and is normally not changed between iterates. is the required value of the stopping criterion 1/. For "classic" and "historic" options METHOD(0) = 1, 2, and 4, AIM ought to be set to 1.0. When setting explicitly ("ad hoc " METHOD(0) = 3), AIM is the required value of . In all cases, setting AIM to a smaller value forces the algorithm to proceed further down the tra jectory towards fitting the data more closely. is a dimensionless distance limit. RATE gives a conservative upper bound, normally set to a value of 1 order 1.0, to the trust region radius r0 = RATE в (trace[ µ ]) /2 . gives the default level. If DEF > 0.0, then mi = DEF (or mi /(1 - mi ) = DEF for FermiDirac entropy) for i = 1, 2, . . . , L. Otherwise, m (or m /(1 - m ) for FermiDirac) is assumed to be set up in area 3 . Individual values of mi must be positive or zero. gives the accuracies, for Gaussian likelihood only. If ACC > 0.0, then 1/k = ACC for k = 1, 2, . . . , N . Otherwise, -1 is assumed to be set up in area 22 . Individual values of 1/k must be positive or zero. is a dimensionless tolerance. UTOL defines the accuracy demanded of the intermediate conjugate gradient results, and of the overlap between the computed and true probability clouds. UTOL is normally set about 0.1, or less for higher precision. INTEGER, > 0 if used INTEGER

ISEED

AIM

REAL 0.0

RATE

REAL > 0.0

DEF

REAL

ACC

REAL

UTOL

REAL between 0.0 and 1.0

The current values of the variables may be recovered by CALL MEMGET(METHOD,NRAND,ISEED,AIM,RATE,DEF,ACC,UTOL) where the types and dimensions of each of the arguments are as for MEMSET.

10.2

MEM5 arguments

The main maximum entropy routine MEM5 is called by

10.2. MEM5 ARGUMENTS

67

CALL MEM5(ST,MEMRUN, * S,TEST,CHISQ,SCALE,PLOW,PHIGH,PDEV, * GLOW,GHIGH,GDEV,OMEGA,ALPHA,ISTAT,NTRANS) The first two arguments are input variables, and the rest are output diagnostics (none of which needs to be preserved between iterates). In accordance with Fortran typing, MEMRUN, ISTAT and NTRANS are INTEGERs, and the other arguments are REALs. Each call may involve up to a few tens of transforms, depending on the difficulty of the problem. The input arguments are as follows: ST MEMRUN is the vector arithmetic workspace. is a switch for initialisation or continuation: 1 initialises a new run, starting from h = m . Enter with contents of 21 . 2 continues the previous run. Enter with contents of 21 , 23 . 3 restarts the previous run with different METHOD. Enter with contents of 21 , 23 . 4 finds diagnostics only. The arguments S, TEST, CHISQ, SCALE, OMEGA (and PLOW, PHIGH, PDEV, GLOW, GHIGH, GDEV if appropriate) are returned for the current w . The corresponding h and f are regenerated in areas 1 and 11 respectively. The argument ALPHA will be set to its current value, and the components code2 (distance limit), code3 (change in ALPHA) and code5 (cloud mismatch accuracy) of ISTAT will be set to zero. The remaining components of ISTAT will be set according to the diagnostics calculated. The total number of transforms applied since the last call with MEMRUN = 1 will be returned in NTRANS. Enter with contents of 21 , 23 . In all cases, area 3 must also be set up by the user if DEF 0. For Gaussian errors, area 22 must be set up by the user if ACC 0. For Poisson likelihood, 22 must be provided but its contents will be supplied and manipulated by the package. For nonlinear response, 29 must be provided, but its contents will be supplied and manipulated by the package. Areas 3 , 22 and 29 must be preserved between iterates if they are in use. Output arguments are interpreted as follows: S is the entropy of the input h . REAL 0.0 REAL INTEGER

68

CHAPTER 10. SPECIFICATIONS OF THE MAJOR SUBROUTINES

TEST

is 1 - cos for the input h . is the angle between the gradients of entropy and 2 . TEST is 0 on the tra jectory and less than 1 whenever the angle is acute. is 2L (equal to 2 for Gaussian errors) for the input h . If the noise is being rescaled (METHOD(0) = 2), CHISQ incorporates this re-scaling, and at the end of the run will equal the number of "bad" measurements. is c, the scaling factor for the noise (classic automatic only). SCALE multiplies the initial estimates of the noise standard deviations by c, and will be 1.0 if the noise is not being rescaled (METHOD(0) = 2). is the numerical lower limit on loge Pr(D | ) for the input h (classic options only). This value incorporates any rescaling of the noise. is the numerical upper limit on loge Pr(D | ) for for the input h (classic options only). This value incorporates any rescaling of the noise. is the random vector standard deviation of loge Pr(D | ) for the input h (classic options only). This value incorporates any rescaling of the noise. is the numerical lower limit on G, the number of "good" measurements for the input h (classic options only). is the numerical upper limit on G, the number of "good" measurements for the input h (classic options only). is the random vector standard devation of G, the number of "good" measurements for the input h (classic options only). is в AIM, the rescaled stopping criterion for the input h . (Note the distinction between the MEM5 argument OMEGA and the algebraic quantity .) OMEGA should rise close to 1.0 at the end of the run. is , the regularisation constant used for the new w .

REAL between 0.0 and 2.0 REAL 0.0

CHISQ

SCALE

REAL > 0.0

PLOW

REAL

PHIGH

REAL

PDEV

REAL 0.0 REAL 0.0 REAL 0.0 REAL 0.0 REAL 0.0

GLOW

GHIGH

GDEV

OMEGA

ALPHA

REAL > 0.0

10.3. MOVIE5 ARGUMENTS

69

ISTAT

is a return code. ISTAT = 0 indicates that the run has successfully ended to within the desired tolerance UTOL. Exceptionally, this occurs immediately if OMEGA 1.0 on setup, when the routine will return with h = m , and the following diagnostics defined: S, TEST, CHISQ, SCALE, OMEGA (and PLOW, PHIGH, PDEV, GLOW, GHIGH, GDEV if appropriate) for h = m . The argument ALPHA will be undefined. Otherwise, ISTAT = 64 в code6 + 32 в code 8 в code3 + 4 в code code0 .
5 2

INTEGER 0

+ 16 в code + 2 в code

4 1

+ +

For each of the binary components code0 to code6 a value of zero denotes `acceptable' or `no change', and a value of one denotes the opposite. Values of zero correspond to the following for each of the individual components: co co co co co co co de de de de de de de
0 1 2 3 4 5 6

H OMEGA ALPHA TEST H G

cloud mismatch is less than UTOL в G is within UTOL of unity distance limit not invoked for ALPHA not changed 1 cloud mismatch was found to within UTOL в H `good' degrees of freedom found to within UTOL в G INTEGER >0

NTRANS

is the number of transforms (calls to ICF/OPUS or TRICF/TROPUS) performed within the package since the last call to MEM5 with MEMRUN = 1.

On each return from MEM5 the new hidden and visible distributions h and f will be in areas 1 and 11 respectively. Note that the probabilistic quantities PLOW, PHIGH, PDEV, GLOW, GHIGH, GDEV, SCALE, and the return code code6 , are calculated only for the "classic" options METHOD(0) = 1 and METHOD(0) = 2. Otherwise, they are returned with standard harmless values PLOW = PHIGH = PDEV = 0.0, GLOW = GHIGH = GDEV = 0.0, and SCALE = 1.0, code6 = 0.

10.3

MOVIE5 arguments
CALL MOVIE5(ST,NCORR,KSTAT,NTRANS)

The random-sample routine MOVIE5 is called by

In accordance with Fortran typing the arguments NCORR, KSTAT and NTRANS are INTEGERs. MOVIE5 is designed to be called after MEM5 has converged, and each call is likely to be about as expensive as an iterate of MEM5. If NCORR 0 the routine will use whatever vector r is supplied in area 4 as input, otherwise the routine itself will set up a random vector r , with optional correlation with previous such vectors. The multipliers w in area 23 must be retained from the preceding MEM5 run, together with 3

70

CHAPTER 10. SPECIFICATIONS OF THE MAJOR SUBROUTINES

for the model (if DEF 0.0), 22 for the accuracies (if ACC 0.0 or METHOD(2) = 2), and the nonlinear area 29 (if METHOD(3) = 2). On return, 4 will contain a vector h = c 1 / 2 ]B 1 [µ /2
-1 2 /

r,

1 will contain v = h + h and 11 will contain C v . If r was a random vector, v will be a random sample from the hidden posterior probability cloud and C v will be the corresponding sample from the visible posterior cloud. The variables are interpreted as follows: ST NCORR is the vector arithmetic workspace. determines the random vector r used as input. If NCORR 0 the routine will use the user-supplied vector in area 4 as input, otherwise the routine itself will set up r , with optional correlation with previous random vectors. If NCORR = 1, r will be uncorrelated with previous vectors, but if NCORR > 1, then r will be correlated with them as follows: during several successive calls to the routine with the same value of NCORR (greater than zero), (m) (m ) let the mth and m th vectors be ri (i = 0, . . . , L - 1) and ri (i = 0, . . . , L - 1). All values are individually from N (0, 1), but have covariance structure r
(m) (m ) ri i

REAL INTEGER

=

max(1 - |m - m |/NCORR, 0), if i = i ; 0, if i = i .

For example, NCORR = 3 gives (for each i = i ) covariance

r

(m) (m )

r

=1 3

3 2 1 0 0

2 3 2 1 0

1 2 3 2 1

0 1 2 3 2

0 0 1 2 3

between the first five samples. NCORR = 1 would have given the uncorrelated identity. KSTAT is the return code of evaluation of the offset vector h . On writing KSTAT = 2 в code1 + code0 , then for each of the binary components code0 and code1 a value of zero denotes `acceptable', and a value of one denotes the opposite. Values of zero correspond to the following for each of the individual components: code code
0 1

INTEGER 0

conjugate gradient converged successfully no inconsistency detected in calculation of h .

10.4. MASK5 ARGUMENTS

71

NTRANS

is the number of transforms (calls to ICF/OPUS or TRICF/TROPUS) performed within the package since the last call to MEM5 with MEMRUN = 1.

INTEGER >0

10.4

MASK5 arguments

The inference routine MASK5 is called by CALL MASK5(ST,AMEAN,ERRLO,ERRHI,JSTAT,NTRANS) The arguments (apart from ST) are all output variables. In accordance with Fortran typing, JSTAT and NTRANS are INTEGERs, and the other arguments are REALs. MASK5 is designed to be called after MEM5 has converged, and each call is likely to be somewhat less expensive than an iterate of MEM5. A hidden mask p must be set up in area 4 : this will be overwritten by the routine. Commonly, this will be a visible mask, transformed into the hidden area 4 by a call to MTRICF. The multipliers w in area 23 must be preserved from MEM5, together with 3 for the model (if DEF 0.0), 22 for the accuracies (if ACC 0.0 or METHOD(2) = 2), and the nonlinear area 29 (if METHOD(3) = 2). The arguments are interpreted as follows: ST AMEAN ERRLO is the vector arithmetic workspace. is , the value of the required integral h T p . is the numerical lower limit on , the standard deviation of . is the numerical upper limit on , the standard deviation of . is the return code of evaluation of : 0 Standard deviation was estimated accurately 1 Standard deviation may not have been estimated accurately NTRANS is the number of transforms (calls to ICF/OPUS or TRICF/TROPUS) performed within the package since the last call to MEM5 with MEMRUN = 1. INTEGER >0 REAL REAL REAL 0.0 REAL 0.0 INTEGER 0

ERRHI

JSTAT

For some masks p , the estimation of can be intrinsically very ill-conditioned, and prone to serious rounding error. Thus the value of the return code JSTAT should be checked. If MASK5 returns with JSTAT = 1 and relaxing UTOL (by a call to MEMSET) does not solve the problem, the only palliative may be full DOUBLE PRECISION operation (see `Arithmetic accuracy' on page 88).

Chapter 11

Example of simple use
A self-documented "TOY" deconvolution simulation is supplied with copies of MemSys5. Schematically, it operates as follows, using the basic steps common to maximum entropy driving programs. 1. Call MEINIT to initialise the package. 2. Request user input (here METHOD, NRAND, AIM, RATE, UTOL and ICF width for flexible testing). 3. Initialise the storage area management. ("TOY" has 64-cell areas, broken into multiple blocks for illustration.) 4. Call MEMTRQ to check that ICF/TRICF and OPUS/TROPUS are consistent (this step is optional). 5. Read the data into area 21 from the appropriate disc file, set up the default m (or m /(1 -m ) for FermiDirac entropy) in DEF or 3 , and, for Gaussian likelihood, the accuracies [ -1 ] in ACC or 22 . The data are from a simulation in which a 64-cell visible-space distribution is convolved with a square point-spread function of width five cells. Three files of data are provided: gauss.dat in which Gaussian noise with standard deviation 10 units is added to produce the dataset shown in Figure 11.1 (for METHOD(2) = 1), poiss.dat in which Poisson noise is added (for METHOD(2) = 2), and fermi.dat in which the data are scaled suitably for FermiDirac entropy (METHOD(1) = 3). 6. CALL MEMSET(METHOD,...) MEMRUN=1 Repeat . . . CALL MEM5(ST,MEMRUN,...) MEMRUN=2 . . . until (ISTAT.EQ.0) to set up the control variables for MEM5 to make MEM5 start a new run to update w and maybe find Pr(D | ) and G to make MEM5 continue the run repeat until MEM5 has converged

7. Interrogate and/or output the results. 8. Repeat . . . CALL MOVIE5(..) . . . until (finish) Repeat . . . Set mask CALL MTRICF(..) CALL MASK5(..) . . . until (finish) to obtain a sample from Pr(f | D ) repeat for as many samples as desired to define a linear feature of f to generate hidden-space mask to estimate the feature, with error bar repeat for as many features as desired 73

9.

74

CHAPTER 11. EXAMPLE OF SIMPLE USE

800

600 Intensity 400

200

0 20 Cell number Figure 11.1: Noisy blurred data (gauss.dat). 40 60

11.1. EXAMPLE OF CLASSIC MAXIMUM ENTROPY

75

11.1

Example of classic maximum entropy

For the MEM5 run to be discussed, the arguments of MEMSET were as follows: METHOD(0) METHOD(1) METHOD(2) METHOD(3) AIM RATE UTOL NRAND = = = = = = = = 2 1 1 1 1.0 1.0 0.01 1 classic with automatic noise scaling standard entropy Gaussian likelihood linear response stop at = 1.0 dimensionless distance limit dimensionless tolerance number of random vectors

The MEM5 diagnostics, reported in UINFO calls with MLEVEL = 10, were as follows, ending with statements of Good = G and Evidence = 10 log10 (Pr(D | )), and a numerical printout of the central reconstruction f . Note: The exact values produced by TOY may differ between computers, though they should always converge to similar results. Fractional OPUS/TROPUS inconsistency Fractional OPUS/TROPUS inconsistency Fractional OPUS/TROPUS inconsistency Iteration 1 Entropy === 0.0000E+00 LogProb === Omega === Ntrans === -3.3657E+02 .213183 7 = = = 1.6015E-08 4.5856E-09 1.0006E-08

Test === 1.0000 Code === dist === 0 .6809

Chisq Scale Good Alpha Code

=== === === === ===

6.4000E+01 4.6527E+00 0.0000E+00 3.7872E-01 001010

Iteration 2 Entropy === -1.2811E+03 LogProb === Omega === Ntrans === -3.3540E+02 .084627 21

Test === Code === dist ===

.4970 0 .2615

Chisq Scale Good Alpha Code

=== === === === ===

1.3494E+01 4.3833E+00 4.2742E+00 2.2674E-01 001011

Iteration 3 Entropy === -8.5499E+02 LogProb === Omega === Ntrans === -3.1733E+02 .144363 37

Test === Code === dist ===

.0106 0 .1967

Chisq Scale Good Alpha Code

=== === === === ===

2.7713E+01 3.2688E+00 5.2385E+00 1.2807E-01 001011

Iteration 4 Entropy === -1.1799E+03 LogProb === Omega === -3.0820E+02 .172493

Test === Code === dist ===

.0021 0 .1697

Chisq Scale Good Alpha

=== === === ===

2.4730E+01 2.7741E+00 6.7738E+00 7.7915E-02

76

CHAPTER 11. EXAMPLE OF SIMPLE USE

Ntrans

===

54

Code

===

001011

Iteration 5 Entropy === -1.4613E+03 LogProb === Omega === Ntrans === -2.9980E+02 .201272 75

Test === Code === dist ===

.0032 0 .1696

Chisq Scale Good Alpha Code

=== === === === ===

2.3842E+01 2.3813E+00 8.0827E+00 4.9607E-02 001011

Iteration 6 Entropy === -1.7225E+03 LogProb === Omega === Ntrans === -2.9223E+02 .232325 99

Test === Code === dist ===

.0022 0 .1621

Chisq Scale Good Alpha Code

=== === === === ===

2.4083E+01 2.0691E+00 9.2738E+00 3.2603E-02 001011

Iteration 7 Entropy === -1.9524E+03 LogProb === Omega === Ntrans === -2.8542E+02 .266607 125

Test === Code === dist ===

.0015 0 .1628

Chisq Scale Good Alpha Code

=== === === === ===

2.5585E+01 1.8205E+00 1.0242E+01 2.1939E-02 001011

Iteration 8 Entropy === -2.1635E+03 LogProb === Omega === Ntrans === -2.7944E+02 .306788 154

Test === Code === dist ===

.0008 0 .1600

Chisq Scale Good Alpha Code

=== === === === ===

2.7936E+01 1.6225E+00 1.1064E+01 1.5025E-02 001011

Iteration 9 Entropy === -2.3606E+03 LogProb === Omega === Ntrans === -2.7432E+02 .356566 188

Test === Code === dist ===

.0009 0 .1639

Chisq Scale Good Alpha Code

=== === === === ===

3.0943E+01 1.4649E+00 1.1787E+01 1.0433E-02 001011

Iteration 10 Entropy === -2.5434E+03 LogProb === Omega === Ntrans === -2.7005E+02 .416849 225

Test === Code === dist ===

.0008 0 .1642

Chisq Scale Good Alpha Code

=== === === === ===

3.4437E+01 1.3398E+00 1.2323E+01 7.3241E-03 001011

Iteration 11 Entropy === -2.7154E+03 LogProb === -2.6664E+02

Test === Code ===

.0007 0

Chisq === Scale === Good ===

3.8179E+01 1.2412E+00 1.2802E+01

11.1. EXAMPLE OF CLASSIC MAXIMUM ENTROPY

77

Omega Ntrans

=== ===

.495818 266

dist ===

.1637

Alpha === Code ===

5.1881E-03 001011

Iteration 12 Entropy === -2.8785E+03 LogProb === Omega === Ntrans === -2.6402E+02 .597425 310

Test === Code === dist ===

.0006 0 .1598

Chisq Scale Good Alpha Code

=== === === === ===

4.1946E+01 1.1638E+00 1.3176E+01 3.7034E-03 001011

Iteration 13 Entropy === -3.0312E+03 LogProb === Omega === Ntrans === -2.6212E+02 .732356 362

Test === Code === dist ===

.0006 0 .1532

Chisq Scale Good Alpha Code

=== === === === ===

4.5560E+01 1.1034E+00 1.3505E+01 2.6618E-03 001010

Iteration 14 Entropy === -3.1699E+03 LogProb === Omega === Ntrans === -2.6091E+02 .910269 419

Test === Code === dist ===

.0005 0 .0662

Chisq Scale Good Alpha Code

=== === === === ===

4.8891E+01 1.0568E+00 1.3754E+01 2.2873E-03 001010

Iteration 15 Entropy === -3.2234E+03 LogProb === Omega === Ntrans === Good = Evidence = 3.1341E+01 2.1907E-02 1.5856E+00 1.3046E-01 1.8152E+00 1.2920E-03 5.8249E-01 5.7649E-03 4.1466E+02 3.7920E+00 4.3522E+01 2.7999E+01 8.8791E+01 6.9818E-01 -2.6027E+02 1.005892 469 13.7 = [ -1130.4 = [ 1.0469E+01 4.8423E+00 3.1543E+00 3.3589E+00 5.2656E+00 1.3445E-07 8.4300E-01 4.2275E-06 2.7147E+02 8.7204E+00 1.0696E+01 2.3419E+01 4.3318E+01 2.9075E-01

Test === Code === dist ===

.0000 0 .0056

Chisq Scale Good Alpha Code

=== === === === ===

5.0331E+01 1.0386E+00 1.3749E+01 2.2873E-03 000000

13.7 , -1130.4 ,

13.8 ] +-1130.3 ] +3.4742E-03 7.4004E+00 2.6403E+00 4.4637E+00 3.0551E+00 1.2424E-01 6.2134E-02 2.0464E+02 4.2455E+00 5.6673E+01 1.1046E+01 5.4434E+01 7.3660E-01 7.2216E+00

4.7 34.2 decibels

1.4083E-02 1.0920E+01 5.4914E+00 7.0285E+00 7.1888E+00 2.7727E-05 4.4108E-01 1.3015E+01 5.2480E+01 2.8976E+01 1.3838E-02 1.3818E+01 1.8904E+00 6.1664E-01

78

CHAPTER 11. EXAMPLE OF SIMPLE USE

1.2752E+01 2.1309E-05

6.1454E+00 1.0211E-04

3.3634E-02 1.9958E-01

2.9892E-06 5.9834E-01

The first diagnostic is the consistency check, verifying that OPUS/ICF and TRICF/TROPUS are apparently each other's transpose to REAL precision accuracy. The check is done three times, not only to use different seed vectors, but also to ensure that ICF, TRICF, OPUS and TROPUS are at least reasonably non-destructive. This simple deconvolution problem was solved to within the desired 1% fractional tolerance in 15 iterates and 469 transforms of OPUS/ICF or TRICF/TROPUS. The maximum entropy diagnostics, flagged by their triple equals sign === , generally correspond to the appropriate MEM5 arguments, apart from the following: Code (single-digit) Code (six-digit) LogProb Good dist = = = = = code6 code5 to code0 (reading from left to right) (PLOW + PHIGH)/2 = loge Pr(D | ) (GLOW + GHIGH)/2 = G RATE в | r |/r0

The Entropy, zero on entry, falls away from this global maximum to a value of -3223. Alpha, which as printed is the value of on exit, used to calculate the new multipliers w , likewise falls steadily to its final value of 0.0023. (There might have been an occasional pause in flagged by code3 = 0 in a problem of this difficulty, though this run happened to proceed continuously.) Test, after the first couple of iterates, remains comfortably small throughout the run. Omega rises steadily to 1.006 (tolerably close to 1). The return Codes in ISTAT give overall information on each iterate: co co co co co co co de de de de de de de
6 5 4 3 2 1 0

= = = = = = =

0 0 0 1 0 1 1

records that G was always found to acceptable accuracy, records that the cloud mismatch H was always found to acceptable accuracy, records that TEST never exceeded 1.0, until the end records steady decrease in , records that the distance limit was never invoked for , until the end records the stopping condition OMEGA = 1.0 not being reached, until the end records the cloud overlap criterion not being met.

At the end of the run, Good and Evidence are reported. Their numerical lower and upper limits are given in the form [13.7,13.8]. The width of this interval depends on the convergence criterion for conjugate gradient iteration, which is controlled by UTOL. The random vector standard deviation is also given, in the form +-4.7. This standard deviation may be decreased by increasing NRAND. The Evidence is expressed in decibels (dB), so that 10 в -260.27/ loge 10 = -1130.4 dB. Points to note are that LogProb does indeed rise to a maximum value at the end of the run, as it should because the final value of ought to be the most probable. With automatic noise scaling in operation, 2 on entry rescales to the number of data (here 64), and when the stopping criterion is satisfied it measures the number of "bad" measurements (here 50). Conversely, the number of "good" measurements rises from 0 to 14, verifying the prediction (page 31) that when the stopping criterion is satisfied 2 + G = N . The diagnostic Scale is the rescaling coefficient SCALE = c. The final value c = 1.0386 suggests that the noise, which was initially assumed to have standard deviation 10.0, should more properly be assigned a standard deviation of 10.386. As it happens, the noise was drawn from a distribution with standard deviation 10.000. However, the prediction of 10.386 is unusually accurate for such a

11.1. EXAMPLE OF CLASSIC MAXIMUM ENTROPY

79

small-scale problem, because with only 50 "bad" measurements to define c, a proportional error of 1 order 50- /2 14% might be expected. The reconstruction f at the centre of the posterior cloud for the classic maximum entropy analysis of the data shown in Figure 11.1 is shown in Figure 11.2, and the hidden-space distribution h from which it was derived is shown in Figure 11.3. The ICF was in this case a three-cell triangular point-spread function ( 1/4, 1/2, 1/4), and comparison of Figures 11.2 and 11.3 shows that its effect is to make f = C h a `smoother' version of h , by introducing local correlations between the elements of f . Inspection of f in Figure 11.2 shows a large signal around cell 33, some more structure around cell 49, a certain amount of signal between these two positions, and not much outside. Comparing this with the data in Figure 11.1, and noting that the noise standard deviation was 10 units, these features of the reconstruction are entirely reasonable. Nonetheless, inspection of f alone gives no indication of the reliability of the structures seen. To gain an impression of which features are likely to be `real', we turn to the four uncorrelated MOVIE5 samples from the posterior cloud Pr(f | D ), shown in Figures 11.4 to 11.7. From these we see immediately that the ma jor peak in cells 32 to 34 is present in similar form in all four samples, which suggests that it is probably reliable. Indeed, the proportion of samples in which a feature appears is a direct estimate of the probability of that feature being present. There is also a secondary peak around cells 48 to 50 which, while slightly less clearly defined than the peak in cells 32 to 34, is also consistently present. There is also some structure present in all four samples between these two peaks which varies from sample to sample. Outside this region, there appears to be little consistent structure. All four MOVIE5 samples show some negative values. As explained on page 25, such values signal a local breakdown of the Gaussian approximation, which we are presently unable to quantify.

80

CHAPTER 11. EXAMPLE OF SIMPLE USE

800

600 Intensity 400

200

0 20 Cell number Figure 11.2: Classic maximum entropy reconstruction f from the data of Figure 11.1. 40 60

800

600 Intensity 400

200

0 20 Cell number Figure 11.3: Hidden-space distribution h from which f in Figure 11.2 was constructed. 40 60

11.1. EXAMPLE OF CLASSIC MAXIMUM ENTROPY

81

800

600 Intensity 400

200

0 20 Cell number Figure 11.4: MOVIE5 sample from the posterior cloud in visible space. 40 60

800

600 Intensity 400

200

0 20 Cell number Figure 11.5: MOVIE5 sample from the posterior cloud in visible space. 40 60

82

CHAPTER 11. EXAMPLE OF SIMPLE USE

800

600 Intensity 400

200

0 20 Cell number Figure 11.6: MOVIE5 sample from the posterior cloud in visible space. 40 60

800

600 Intensity 400

200

0 20 Cell number Figure 11.7: MOVIE5 sample from the posterior cloud in visible space. 40 60

11.1. EXAMPLE OF CLASSIC MAXIMUM ENTROPY

83

Finally, the use of MASK5 is illustrated to enable specific linear features of the reconstruction to be quantified. Some results are given in Table 11.1. Cells 32 to 34 contain the ma jor peak, estimated as 891 ± 37. A little more accuracy is obtained by incorporating the nearest neighbours (956 ± 30). Although the central reconstruction shows some structure to the left of this peak, the MOVIE5 samples indicate that it is not well-defined. This is confirmed by MASK5, which demonstrates that it is scarcely significant (112 ± 53). Localised structure to the left, as in cells 6 to 8, is even less significant (23 ± 26). Of course, this relatively large quoted error does not imply that the total quantity in these first three cells might be negative: it represents the curvature at the maximum of the probability cloud, and the cloud necessarily cuts off at zero intensity. The second peak, in cells 48 to 50, has much the same absolute error as the first (187 ± 35 alone, 202 ± 30 with neighbours), and again there is no significant evidence for structure beyond it (29 ± 34). Between the two ma jor peaks, there is evidence (219 ± 42) of some broad structure, although the precise form varies from one sample to another. There might be a barely-significant peak (62 ± 35) next to an insignificant valley (0 ± 1). Combining these two yields (62 ± 35), with almost the same quoted error (actually fractionally smaller) as on the peak alone: estimates of fluxes in different domains are correlated. All these estimates seem entirely reasonable in the light of the data (Figure 11.1), and serve to confirm the initial visual assessment of the analysis gained from f and the MOVIE5 samples. These examples of quantification are only a simple illustration of the use of MASK5. More sophisticated masks can be constructed to estimate uncertainties in the position or width of specific features. For features which cannot be expressed or approximated by simple masks, such as medians or maxima, marginal posterior probability densities can be obtained using MOVIE5 to generate samples for a Monte Carlo analysis. Of course, it will generally be quicker to use MASK5 to estimate the uncertainty of any specified linear feature.

Table 11.1: Results of MASK5 quantification of features of f in Figure 11.2. Left cell 32 31 1 6 48 47 52 36 44 43 43 Right cell 34 35 30 8 50 51 64 46 46 43 46 Estimate 890 956 112 23 .77 .26 .25 .16 ± ± ± ± 37.46 30.44 53.37 25.77 Description First ma jor peak First ma jor peak neighbourhood Left of first ma jor peak Localised structure Second ma jor peak Second ma jor peak neighbourhood Right of second ma jor peak Broad Minor Minor Minor structure peak valley peak and valley

186.54 ± 34.50 202.25 ± 29.74 29.29 ± 33.80 219 62 0 62 .10 .46 .01 .48 ± ± ± ± 41.65 34.72 1.28 34.69

84

CHAPTER 11. EXAMPLE OF SIMPLE USE

11.2

Post script

The TOY simulation was actually produced from the distribution shown in Figure 11.8--but it might not have been! When attempting to assess the performance of an algorithm by its recovery of a simulation, beware the fixed-point theorem, which implies that under very weak conditions, any reconstruction algorithm, no matter how bizarre, will work perfectly for something . The MaxEnt quantified results are compared with the "truth" in Table 11.2. The MaxEnt error bars are seen to be quite reasonable, with the exception of the penultimate value, 0.01 ± 1.28, for which the "true" value was 20. This overly small quoted error, on a low value of f , must reflect a local failure of the Gaussian approximation to Pr(f ).

800

600 Intensity 400

200

0 20 Cell number Figure 11.8: The true simulation f for Figures 11.1, 11.2 and 11.3. 40 60

11.2. POST SCRIPT

85

Table 11.2: Comparison of estimated and true values for the simulation of Figure 11.2. Left cell 32 31 1 6 48 47 52 36 44 43 43 Right cell 34 35 30 8 50 51 64 46 46 43 46 Estimate 890 956 112 23 .77 .26 .25 .16 ± ± ± ± 37.46 30.44 53.37 25.77 True simulation 920 940 0 0 220 240 0 220 60 20 80

186.54 ± 34.50 202.25 ± 29.74 29.29 ± 33.80 219 62 0 62 .10 .46 .01 .48 ± ± ± ± 41.65 34.72 1.28 34.69

Chapter 12

Helpful hints
As with other sophisticated systems, a "folklore" of useful advice continues to build up around the MemSys programs.

12.1

Multiple maxima of

Perhaps most important is that the "best" (i.e., most probable) may not be unique. As decreases along the maximum entropy tra jectory, there may be more than one local maximum of Pr(D | ), the natural logarithm of which is printed as the diagnostic LogProb. Subroutine MEM5 is designed to give a zero return code at the first such maximum to be encountered. Indeed, MEM5 will give a return code of zero right at the beginning if the initial default distribution m is already sufficiently close to the data ( 1.0). However, there may be a yet more probable value of further down the tra jectory. The only way of finding out is to force MEM5 to iterate further down by imposing a suitably low temporary target value AIM until LogProb starts to increase again (if it does). Then AIM can be reset to the desired value (usually 1.0) and MEM5 should iterate towards the next local maximum.

12.2

Tolerance variable UTOL

It can be tempting to try to attain "more accuracy" by reducing the tolerance variable UTOL. However, a value of, say, 0.01 ensures that when the stopping is satisfied the probability cloud Pr(h | D ) estimated by the program overlaps the "true" cloud to within a cross-entropy H of less than 0.01. Actually, the overlap is likely to be considerably more accurate than this, because each successive iterate of MEM5 attempts to reduce still further the difference between the estimated and "true" clouds. Even in the worst case, an integral = h T p inferred from the computed 1 cloud should be in error by only about (2H ) /2 0.14 of its quoted standard deviation . Any greater precision than this merely involves refining the estimates to within a fraction of a standard deviation. Indeed, making UTOL unduly small is actually harmful, because it imposes unnecessarily tight requirements on the conjugate gradient algorithms within the package. These will require more (expensive) transforms, possibly to the extent that the internal iteration limits are reached. If that happens, one or more of the return codes code0 , code1 , code5 , code6 will be non-zero, and the user may be misled into thinking that the program has failed. The exception to this, when "more accuracy" is truly needed, is when values of Evidence for different variable settings are being used as Pr(D | variable) in order to infer the most probable value of the variable in question. Here, one requires Evidence to be a smooth function of the 87

88

CHAPTER 12. HELPFUL HINTS

variable, in order to locate the maximum. Even here, though, the efficient way of proceeding is to start off with NRAND = 1 and UTOL reasonably large (perhaps even up to the allowed limit of 1.0), and then do a few more iterations at the end with a smaller value of UTOL, and perhaps a larger NRAND, until the value of Evidence settles down.

12.3

Internal iteration limits

Unusually difficult problems may hit internal iteration limits even when UTOL is relatively large. If this problem persists and causes difficulty, the PARAMETER LMAX in the MemSys INCLUDE file may be increased from its installation value (usually 30). This is not generally recommended because of the extra expense which may be incurred, and because accumulated rounding errors may detract from the potential extra accuracy.

12.4

Arithmetic accuracy

The accuracy of all calculations within MemSys5 is commensurate with the arithmetic precision of the transform operations, so that ordinary REAL precision is adequate to deal with almost all practical data. In exceptional cases, with data of extreme accuracy, it may be necessary to go to double precision. This is effected by editing " REAL " to " DOUBLE PRECISION " throughout all the source code, including transforms and the INCLUDE file, with explicit spaces so that the variable name NREALS is preserved. A particularly strict compiler might also require floating-point formats to change from "E" to "D". Such formats are to be found in "WRITE (INFO,'( . . . E . . . )') . . . " statements. At the same time, the PARAMETER LMAX in the MemSys INCLUDE file should be increased, say to 50.

12.5

Combination of transforms

Within the MemSys5 package, the application of C to a vector by ICF is almost always followed immediately by an application by OPUS of R to the result, and no use is made of the intermediate visible-space vector. The only exceptions to this are in the construction of f from h at the end of each MEM5 iterate and the construction of the visible sample from v = h + h at the end of each MOVIE5 call. Similarly, the application of RT to a vector is always followed immediately by the application of C T to the result, and the intermediate vector is not used. In some cases, it may be computationally cheaper to apply the combined R C transform in one operation than to apply the individual C and R transforms separately. Similarly for the transpose operation, it may be cheaper to apply C T RT than to apply RT and C T separately. If this is so, the user should supply the combined forward operator R C as either ICF or OPUS, and an identity operator as the other. Likewise, the combined transpose operator C T RT should be supplied as either TRICF or TROPUS, and an identity operator as the other. It may be convenient to overlay areas 2 and 11 (if OPUS applies R C ) or areas 11 and 25 (if ICF applies R C ) so that no actual operation is required to apply the identity. If the transforms are combined in this way, however, then the vectors supplied in area 11 on return from MEM5 and MOVIE5 will not be the correct visible-space quantities, and the recovery of the correct visible-space quantities from the hidden-space distributions becomes the user's responsibility.

Bibliography
[1] M. K. Charter. Drug absorption in man, and its measurement by MaxEnt. In Paul F. Foug` ere, editor, Maximum Entropy and Bayesian Methods, Dartmouth Col lege 1989, pages 325339, Dordrecht, 1990. Kluwer. ISBN 0792309286. [2] M. K. Charter. Quantifying drug absorption. In Grandy and Schick [4], pages 245252. ISBN 079231140X. [3] R. T. Cox. Probability, frequency and reasonable expectation. Am. J. Physics, 14(1):113, 1946. [4] W. T. Grandy, Jr. and L. H. Schick, editors. Maximum Entropy and Bayesian Methods, Laramie, Wyoming, 1990, Dordrecht, 1991. Kluwer. ISBN 079231140X. [5] S. F. Gull. Developments in maximum entropy data analysis. In Skilling [15], pages 5371. ISBN 0792302249. [6] S. F. Gull and G. J. Daniell. Image reconstruction from incomplete and noisy data. Nature, 272:686690, April 1978. [7] S. F. Gull and J. Skilling. The maximum entropy method. In J. A. Roberts, editor, Indirect Imaging. Cambridge Univ. Press, 1984. [8] S. F. Gull and J. Skilling. 131(F):646659, 1984. Maximum entropy method in image processing. IEE Proc.,

[9] E. T. Jaynes. Where do we stand on maximum entropy? Jaynes: Papers on Probability, Statistics and Statistical D. Reidel, Dordrecht, 1983. Originally published in 1978 on `The Maximum-Entropy Formalism' held at M.I.T. in

In R. D. Rosenkrantz, editor, E.T. Physics, chapter 10, pages 210314. in the Proceedings of a Symposium May 1978.

[10] H. Jeffreys. Theory of Probability. Oxford Univ. Press, Oxford, third edition, 1985. ISBN 0198531931. [11] R. D. Levine. Geometry in classical statistical thermodynamics. J. Chem. Phys., 84:910916, 1986. [12] C. C. Rodrґ iguez. The metrics induced by the Kullback number. In Skilling [15], pages 415422. ISBN 0792302249. [13] J. E. Shore and R. W. Johnson. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Info. Theory, IT26(1):2637, January 1980. 89

90

BIBLIOGRAPHY

[14] J. Skilling. Classic maximum entropy. In Maximum Entropy and Bayesian Methods, Cambridge 1988 [15], pages 4552. ISBN 0792302249. [15] J. Skilling, editor. Maximum Entropy and Bayesian Methods, Cambridge 1988, Dordrecht, 1989. Kluwer. ISBN 0792302249. [16] J. Skilling. On parameter estimation and quantified MaxEnt. In Grandy and Schick [4], pages 267273. ISBN 079231140X. [17] J. Skilling and S. F. Gull. Bayesian maximum entropy. In A. Possolo, editor, Proc. AMSIMS SIAM Conference on Spatial Statistics and Imaging, Bowdoin Col lege, Maine, 1988, 1989. [18] J. Skilling, D. R. T. Robinson, and S. F. Gull. Probabilistic displays. In Grandy and Schick [4], pages 365368. ISBN 079231140X. [19] Y. Tikochinsky, N. Z. Tishby, and R. D. Levine. Consistent inference of probabilities for reproducible experiments. Phys. Rev. Lett., 52:13571360, 1984.

App endix A

User-supplied routines
The following subroutines which are called by the MemSys5 package must be supplied by the user. Examples of them are provided in the TOY demonstration program. UINFO Handle diagnostic messages from the package. See page 59. USAVE Preserve arrays of INTEGERs and REALs containing the package's internal variables. See page 61. UREST Retrieve the arrays of INTEGERs and REALs preserved by USAVE. See page 61. UFETCH Read a block of an area from a direct-access disc file. See page 52. USTORE Write a block of an area to a direct-access disc file. See page 52. ICF Transform from hidden space to visible space, giving correlations in visible-space distribution. See page 57. TRICF Transform from visible to hidden space, applying transpose of ICF transform. See page 57. OPUS Transform from visible space to data space, giving differential response. See page 55. TROPUS Transform from data space to visible space, applying transpose of OPUS transform. See page 55. UMEMX Calculate nonlinearity in differential response (called only if METHOD(3) = 1). See page 56.

91

App endix B

Historic maximum entropy
It has been shown in part I of this manual that the maximum entropy analysis implemented in the MemSys5 package follows inevitably from the basic laws of probability theory as applied to positive additive distributions, and the whole process should be considered as an application of Bayesian inference. Nonetheless, maximum entropy can be used as an ad hoc regularisation technique, in which the experimental data are forced into a definitive "testable" form which rejects outright all but a "feasible" set of h . The Principle of Maximum Entropy is then applied directly by selecting that surviving feasible h which has greatest entropy S . The misfit between the observed data D and the predicted data F (h ) may be quantified in the usual way with a 2 statistic: 2 (h ) = (D - F (h ))T [
-2

] (D - F (h )).

On average, 2 might be expected to take a value not much greater than its formal expectation over possible data, namely 2 (h ) N , which corresponds to the data being, on average, one standard deviation away from the reconstruction. This "discrepancy principle" can be used as a testable constraint to define the feasible distributions h . In the event that h is positive and additive, entropic arguments apply, and the "best" feasible h can be found by "maximising S (h ) over the constraint 2 (h ) = N ". This is illustrated in Figure B.1. The "historic" maximum entropy reconstruction is that distribution h = (4.36, 3.00) which occurs where the tra jectory cuts into the shaded region 2 (h ) N . We call the "2 = N " technique "historic maximum entropy" after its use by Frieden (1972), Gull and Daniell (1978), Gull and Skilling (1984b) and others. Although such reconstructions h are often much clearer and more informative than those obtained with cruder techniques, historic maximum entropy is not Bayesian, and hence imperfect. The constraint 2 (h ) = N is only an approximation to the true likelihood Pr(D | h ), no single selected h can fully represent the posterior probability Pr(h | D ) which theory demands, and it is difficult to define the number N of fully independent data in a suitably invariant manner. A further difficulty arises with Poisson data, for which the formal expectation over possible data of the log(likelihood) analogue of 2 is not constant, but varies with the PAD h . Finally, it is well known that the discrepancy principle gives systematically underfitting reconstructions. The reason is that the 2 statistic between D and F (h ) will indeed average to N if h is 93

94

APPENDIX B. HISTORIC MAXIMUM ENTROPY

the "truth". The truth, however, is unattainable, and the computed h will necessarily be biassed towards the data, so that the misfit is reduced. Accordingly, "2 = N " is too pessimistic. The classic Bayesian analysis confirms this and quantifies the expected misfit in terms of good and bad degrees of freedom. Nevertheless, for reasons of compatibility, historic maximum entropy is retained within MemSys5 as the "METHOD(0) = 4" option.

m2

h2 - 0 0 m1 h1 - Figure B.1: Historic maximum entropy.

App endix C

Reserved names
The MemSys5 package contains the following globally-visible symbols: MADD MDIV MEINFO MEMCGH MEMDOT MEMLA MEMLB MEMSMA MENT1 MENT51 MEPROJ MESMUL MEVMUL MFETCH MMOV MRECIP MSMULA MSWAP MASK5 MDOT MEINIT MEMCHI MEMENT MEMLA0 MEMOP MEMSMD MENT2 MENT52 MEREST MESURE MEVRND MFILL MMUL MSADD MSQRT MTRICF MCHECK MECOPY MEM5 MEMCOM MEMGET MEMLAR MEMPOI MEMTR MENT3 MEPRB0 MESAVE MESWAP MEXP MLOG MOVIE5 MSIGN MSUB MWHILE MCLEAR MEFLAG MEMCGD MEMDET MEMICF MEMLAW MEMSET MEMTRQ MENT4 MEPROB MESCAL METEST MEZERO MMEMX MRAND MSMUL MSUM

and from the library of storage-handling and vector routines: VADD VFETCH VMOV VRAND0 VRECIP VSMULA VSUB VDIV VFILL VMUL VRAND1 VSADD VSQRT VSUM VDOT VLOG VPOINT VRAND2 VSIGN VSTACK VSWAP VEXP VMEMX VRAND VRANDC VSMUL VSTORE

These reserved names should be not used for other subroutines, functions, or COMMON blocks.

95

App endix D

Changes from MemSys2 to MemSys3
Apart from the obvious improvements in quantification and algorithmic power, the principal changes from the earlier MemSys2 maximum entropy program are:(a) Incorporation of positive/negative distributions. (b) Incorporation of Poisson statistics. (c) Withdrawal of the Kramersvon Mises "exact-error-fitting" option as inconsistent with correct theory. (d) Withdrawal of support for sequential-access disc files (because of the difficulty of updating in place). (e) Withdrawal of user-defined weights in the individual cells, which are unnecessary, given that hj is defined as the quantity in cell j . (f ) Accuracies defined as 1/ , not 2/
2

(g) Altered format of common block MECOMP and addition of a control block MECOML. (h) All variables within MemSys3 are explicitly typed, and the common blocks specified in the "INCLUDE" files must adhere to this convention. (i) Storage limits, such as in MECOMS, can be declared as PARAMETERs, so that potential array bound violations can be checked at runtime. (j) Only the lowest, vector library, routines need be aware of the physical storage locations, because all pointers are passed explicitly as integers: this aids portability.

97

App endix E

Changes from MemSys3 to MemSys5
The principal changes from the earlier MemSys3 package are further algorithmic power, and: (a) Generation of random samples from the posterior probability density Pr(h | D ). (b) Estimates of the numerical uncertainties in G and log Pr(D ) are provided. (c) Different usage of the scalars DEF and ACC for setting `flat' defaults and constant accuracies. (d) Output scalars such as ALPHA and SCALE no longer need to be preserved by the user between calls to the package. (e) Intrinsic correlation functions (ICFs) are supported explicitly. (f ) FermiDirac and quadratic entropy have been added to the existing standard and positive/negative options. (g) The vector arithmetic workspace ST is supplied as an argument to the ma jor subroutines of the package, and also to the user-supplied transform routines. (h) Revised storage management, so that the externally-visible COMMON blocks have been withdrawn. (i) A separate subroutine is used to set the MemSys5 control variables METHOD, AIM, RATE, etc. (j) All diagnostic output is directed through calls to a user-supplied diagnostic handler, so that LEVEL and IOUT are no longer required inside the package. (k) A `save/restore' facility has been added, to save the internal state of the package in external storage (perhaps on disc), whence it can later be restored to regenerate the internal state exactly. (l) The area-copying routines MEMJ and MEMK have been withdrawn, and replaced with a single routine MECOPY. (m) The user-supplied subroutine MEMEX for specifying nonlinear response has been re-named UMEMX. (n) The compiled ob ject size is smaller.

99

Index
, see regularisation constant distance limit, 27, 28, 35, 38, 69 2 , 16, 30, 65, 68, 78, 9394 ACC argument of MEMGET, 66 of MEMSET, 46, 61, 65, 66, 67, 73, 99 accuracies, 46, 61, 66, 67, 70, 71, 73, 97 addressing, 49 disc, 5152, 60, 61 memory, 45, 5052, 60, 61 advanced maximum entropy, 25 AIM argument of MEMGET, 66 of MEMSET, 62, 65, 66, 68, 73, 87, 99 algorithm assessment, 84 algorithmic performance, 84, 97, 99 ALPHA argument, 6669, 99 AMEAN argument, 71 areas, 43, 4547, 5052, 6671 arguments of ICF, 57 of MASK5, 71 of MEM5, 6669 of MOVIE5, 6971 of MEMGET, 66 of MEMSET, 6566 of OPUS, 55 of TRICF, 57 of TROPUS, 55 of UMEMX, 56 array bounds, 97 assignment of density, 1315 automatic noise scaling, 3031, 35, 62, 65, 75, 78 bad measurements, 28, 31, 68, 78, 94 Bayes' theorem, 12, 30 Bayesian calculus, 11 Bayesian formulation, 17, 93 101 buffers, 51, 52, 53, 63 C language, 7 central reconstruction, 26, 31, 38, 43, 75, 83 CHISQ argument, 60, 66, 67, 68, 69 classic maximum entropy, 1724, 27, 30, 35, 36, 39, 62, 65, 66, 68, 69 example, 7583 cloud, 17, 1925, 2728, 30, 31, 36, 38, 39, 43, 66, 67, 69, 70, 78, 79, 8183, 87 compatibility, 55, 94 conditional probability, 12, 17, 19, 25, 30 conjugate gradient, 3637, 66, 70, 78, 87 consistent coding, 44, 60, 73, 78 constraint on h, 34, 65 continuum limit, 14, 24, 26, 29 control, 43 convolution, 73 cross-entropy, 27, 87 data, 7, 1114, 1627, 3031, 36, 38, 4345, 55, 63, 66, 73, 74, 78, 79, 83, 87, 88, 93 data space, 30, 43, 45, 47, 49, 55 algorithm, 3839 deconvolution, 7, 73, 78 DEF argument of MEMGET, 66 of MEMSET, 46, 61, 65, 66, 67, 73, 99 default model, 14, 19, 25, 46, 47, 66, 73, 87, 99 degeneracy, 15 determinant of matrix, 3638 diagnostics, 49, 5960, 63, 67, 69, 7579, 87, 99 direct-access disc file, 51 distance limit, 27, 62, 66, 75, 78 distribution, 1315, 25 default, 87 FermiDirac, 34

102

INDEX

hidden-space, see hidden space, distribution visible-space, see visible space, distribution distributions, 7 DOUBLE PRECISION, 62, 71, 88 eigenvalues, 18 entropic prior, 1315, 17, 29 entropy, 14, 17, 19, 28, 33, 35, 38, 65, 67, 78 FermiDirac, see FermiDirac entropy positive/negative, see positive/negative entropy quadratic, see quadratic entropy standard, 65 ERR argument, 60 ERRHI argument, 71 ERRLO argument, 71 error messages, 6163 evidence, 13, 18, 19, 30, 38, 43, 62, 66, 75, 78, 87 exact-error-fitting, 97 external storage, 45, 47, 50, 52, 61, 63 fatal error, 61 feasible distribution, 93 FermiDirac entropy, 34, 65, 73, 99 fixed-point theorem, 84 flowchart, 59, 60 Fortran 77, 7, 43, 65, 67, 69, 71 Gaussian approximation, 1718, 20, 25, 79, 84 Gaussian distribution, 25 Gaussian errors, 12, 1517, 19, 27, 43, 62, 65, 66, 73, 75 GDEV argument, 66, 67, 68, 69 GHIGH argument, 66, 67, 68, 69 GLOW argument, 66, 67, 68, 69, 78 good measurements, 19, 28, 31, 38, 43, 68, 69, 78, 94 hardware, 49, 51, 53 hidden space, 29, 38, 43, 45, 47, 49, 63, 70, 71, 73 algorithm, 3536, 38 distribution, 16, 19, 43, 45, 57, 69, 79, 80, 88 hints, 8788 historic maximum entropy, 35, 65, 66, 9394

I/O, see input/output ICF, see intrinsic correlation function ICF subroutine, 57, 60, 63, 73, 78, 88, 91 number of calls, see NTRANS argument image, 13, 29 inference, 7, 1113, 93 about noise level, 3031 about the reconstruction, 19, 2426 routine MASK5, 71 initialisation MEM5 run, 43, 61, 67 random generator, see random generator, initialisation storage area management, 73 system, see system initialisation input/output, 5052, 59 intrinsic correlation function, 29, 33, 57, 63, 73, 79, 99 ISEED argument of MEMGET, 66 of MEMSET, 65, 66 of MEMTRQ, 60 ISTAT argument, 66, 67, 69, 73, 78 iterative algorithm, 2629 in data space, see data space, algorithm in hidden space, see hidden space, algorithm joint probability, 12, 17 JSTAT argument, 71 kangaroos, 14, 29 Kramersvon Mises, 97 KSTAT argument, 70 L, 16, 17 L, 1720, 26, 28, 30, 35, 38, 68 Lagrange multiplier, 38, 43, 46, 69, 71, 78 least squares, 34 LEVEL argument, 99 likelihood, 12, 13, 93 Gaussian, see Gaussian errors Poisson, see Poisson statistics local maximum, 19, 87 logical proposition, 11 macroscopic data, 24 mask, 25, 43, 46, 71, 73, 83

INDEX

103

MASK5 subroutine, 43, 44, 46, 61, 71, 73, 83, 95 maximum entropy, 7, 14, 25, 29, 33, 34, 38 classic, see classic maximum entropy cloud, 2024 historic, see historic maximum entropy principle, see principle of maximum entropy program, 55, 73 tra jectory, 19, 26, 30, 43, 87 MECOML common block, 97 MECOMP common block, 97 MECOMS common block, 97 MECOPY subroutine, 59, 63, 95, 99 MEINIT subroutine, 44, 47, 62, 73, 95 MEM5 subroutine, 43, 44, 46, 6062, 6569, 71, 73, 7583, 87, 88, 95 MEMEX subroutine, 99 MEMGET subroutine, 66 MEMICF subroutine, 63, 95 MEMJ subroutine, 99 MEMK subroutine, 99 MEMRUN argument, 61, 67, 69, 71, 73 MEMSET subroutine, 6566, 71 MemSys2, 97 MemSys3, 9799 MemSys5, 7, 26, 43, 45, 49, 52, 53, 57, 59, 61 63, 73, 88, 9395 MEMTRQ subroutine, 44, 60, 73, 95 MEREST subroutine, 61, 95 MESAVE subroutine, 61, 95 MESWAP subroutine, 63, 95 METHOD argument of MEMGET, 66 of MEMSET, 54, 61, 62, 65, 6669, 73, 75, 94, 99 metric tensor, 15 microscopic structure, 24 MLEVEL argument, 59, 75 mock data, 25, 29, 93 model, 25, 30, 33, 34, 70, 71 monkeys, 15 Monte Carlo, 38, 83 movie, 26, 43 MOVIE5 subroutine, 43, 44, 46, 61, 6971, 73, 7983, 88, 95 MTRICF subroutine, 63, 71, 73, 95 multiplier, see Lagrange multiplier

NCORR argument, 69, 70 noise, 12, 15, 25, 27, 28, 3031, 35, 65, 78, 79 Gaussian, 43, 73 Poisson, 73 scaling, 68 nonlinear functional, 26, 43 response, 29, 46, 54, 56, 61, 62, 66, 67, 70, 71 normal statistics, see Gaussian errors NRAND argument of MEMGET, 66 of MEMSET, 62, 65, 66, 73, 75, 78, 88 NTRANS argument MASK5, 71 MEM5, 66, 67, 69 MOVIE5, 69, 71 OMEGA argument, 66, 67, 68, 69, 78 OPUS subroutine, 5557, 60, 73, 78, 88, 91 number of calls, see NTRANS argument overall probability, see evidence overlap of clouds, 28, 66, 78, 87 overlay of areas, 47, 53, 55, 88 PAD, see positive additive distribution PDEV argument, 66, 67, 68, 69 PHIGH argument, 66, 67, 68, 69, 78 PLOW argument, 66, 67, 68, 69, 78 pointers, 50, 97 Poisson errors, 12, 43, 61, 62, 65, 67, 93, 97 positive additive distribution, 1315, 25, 93 positive/negative distribution, 33, 97 positive/negative entropy, 65 posterior inference, 13 posterior probability, 17, 20, 2426, 28, 31, 36, 39, 43, 70, 79, 83, 93, 99 power spectrum, 13, 29 predicted data, see mock data principle of maximum entropy, 14, 93 prior probability, 1315 for , 18 for unknown variables, 30 probability calculus, 1112 proportional error, 24, 29 in c, 79 proposition, see logical proposition quadratic entropy, 65, 99

104

INDEX

random fluctuations, 27 random generator, 53, 60 initialisation, 53, 65, 66 random vector, 28, 37, 38, 62, 66, 69, 70, 75 standard deviation, 68, 78 RATE argument of MEMGET, 66 of MEMSET, 62, 65, 66, 73, 75, 99 REAL precision, 43, 78, 88 regularisation constant, 15, 1819, 30, 43, 68 reserved names, 95 Residuals, 46 response function, 25, 29, 30, 54, 55, 62, 66, 75 nonlinear, see nonlinear response restore, see save/restore utility return code, 67, 6971, 78, 87 ISTAT, see ISTAT argument JSTAT, see JSTAT argument KSTAT, see KSTAT argument rounding error, 53, 60, 71, 88 S , 14, 17, 33, 34 S , 1720, 26, 30, 35, 38, 93 S argument, 60, 66, 67, 67, 69 save/restore utility, 61 SCALE argument, 66, 67, 68, 69, 78, 99 scientific data analysis, 11, 12 sequential-access disc files, 97 single precision, see REAL precision spatial invariance, 14 special case, 11, 14, 15, 25, 30 ST argument of ICF, 57 of MASK5, 71 of MECOPY, 63 of MEM5, 67 of MEMICF, 63 of MEMTRQ, 60 of MESWAP, 63 of MOVIE5, 69, 70 of MTRICF, 63 of OPUS, 55 of TRICF, 57 of TROPUS, 55 of UFETCH, 52 of USTORE, 52

of VFETCH, 50 of VSTORE, 50 vector arithmetic workspace, 45, 47, 50 53, 55, 63, 99 stopping criterion, 19, 28, 31, 35, 36, 39, 43, 61, 62, 65, 66, 68, 78, 87 stopping value, 19, 26, 27, 43 storage requirement, 47 subspace, 36, 38 system initialisation, 44, 47, 73 termination criterion, 27, 28 TEST argument, 60, 66, 67, 68, 69, 78 testable information, 14, 93 TOL variable, 62 tolerance arithmetic, 62 user-supplied, 37, 62, 66, 69, 75, 87 TOY program, 7, 73, 7584 trace of matrix, 37, 38 tra jectory, see maximum entropy, tra jectory transform area, 47, 49, 63, 88 routines ICF, see ICF subroutine OPUS, see OPUS subroutine TRICF, see TRICF subroutine TROPUS, see TROPUS subroutine transforms, 36, 46, 47, 5356, 60, 61, 67, 71, 78, 8788, 99 consistency of, see consistent coding number of, see NTRANS argument TRICF subroutine, 57, 60, 73, 78, 88, 91 number of calls, see NTRANS argument TROPUS subroutine, 5557, 60, 73, 78, 88, 91 number of calls, see NTRANS argument trust region, 27, 66 UFETCH subroutine, 52, 91 UINFO subroutine, 5960, 75, 91 UMEMX subroutine, 54, 56, 66, 91, 99 uniqueness, 19 lack of, 19, 87 UREST subroutine, 61, 91 USAVE subroutine, 61, 91 USTORE subroutine, 52, 91 UTOL argument of MEMGET, 66

INDEX

105

of MEMSET, 62, 65, 66, 69, 71, 73, 75, 78, 87, 88 vector arithmetic, 49 library, 49, 5354 workspace, see ST, vector arithmetic workspace vector coding, 3638 vector space, 25, 29, 49 data, see data space hidden, see hidden space visible, see visible space VFETCH subroutine, 45, 5052, 63, 95 visible space, 29, 45, 47, 49, 55, 57, 63, 70, 71, 88 distribution, 29, 33, 43, 45, 55, 69 /VPOINT/ COMMON block, 5052, 56, 61, 95 VRAND subroutine, 53, 95 VSTORE subroutine, 45, 5052, 63, 95 weights, 97 zero error, 25

What do you think?
We would like your comments on the MemSys5 package, the TOY demonstration program and the Manual. Criticism is welcome, and constructive criticism with suggestions for improvements even more so. The following questions are a guide only, so please comment on other aspects if you wish.

The MemSys5 package
1. How easy do you think the package is to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. What features make it easy to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 3. What features make it difficult to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... How do you think these could be improved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 4. What other facilities would you like to see in the package? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 5. Do you write your own transform routines (ICF, TRICF, OPUS, TROPUS)? . . . . . . . . . . . . . . . . . . . If so, do you find this easy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . If not, what do you find difficult? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................................................................................

The TOY demonstration program
1. If you ran the demonstration program, how helpful was it in illustrating how to use MemSys5? ........................................................................................... 2. What was particularly helpful about it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 107

3. What was particularly unhelpful about it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... How do you think this could be improved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 4. What parts (if any) of the demonstration program would you like to see removed, and why? ........................................................................................... 5. What other features (if any) of MemSys5 would you like to see illustrated in the demonstration program? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................................................................................

The MemSys5 Users' manual
1. How helpful do you find this manual? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 2. Which parts do you find most helpful, and why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 3. Which parts do you find least helpful, and why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 4. How could they be improved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 5. How do you find the mathematical content of the manual? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................................... 6. What other things (if any) would you like to see included in the manual? . . . . . . . . . . . . . . . . . ........................................................................................... Please return your comments to Maximum Entropy Data Consultants Ltd., South Hill, 42 Southgate Street, Bury St. Edmunds, Suffolk IP33 2AZ, United Kingdom.