Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hbar.phys.msu.ru/gorm/fomenko/fomenko6.pdf
Дата изменения: Tue May 18 09:30:00 2010
Дата индексирования: Mon Oct 1 21:03:14 2012
Кодировка:
STATISTICAL ESTIMATION OF CHRONOLOGICAL NEARNESS OF HISTORICAL TEXTS V. V. Fedorov and A. T. Fomenko

io

The "principle of correlation of maxima" of the plots of the volume of histG~J~cal

texts has been formulated and tested for the first time in [I] for the case of a uniform di~~ tribution (see also [2]). This principle and the related method of dating of events de-

scribed in historical texts (with a time scale) were found to be necessary in various chrono .... logical investigations carried out in [1-3, 10-16]. The importance of the results obtained

in these papers, and especially with the use of the principle of correlation of maxi~a (see the corresponding formulation below or in [i, 2, i0, 16]), shows the utility of testing the stability of thisprinciple and of the corresponding method of dating of events with respect to other procedures of statistical processing of the volume functions of texts. All the vol-

ume functions of historical texts used in this paper have been calculated in [i~ i0], 2. Let us recall the principle of correlation of maxima and the method of dating of Suppose that a historical period from the year A to the year B in the

events based on it.

history of a region G (i.e., a state, a city, etc.) has been described in a fairly comprehensive year-by-year text (chronicles, annals, etc.), i.e., the text X is split into sections or "chapters" X(T), each of which describes the events of one year T. We calculate the

volume H(X, T) of each such section X(T)

measured, for example, by the number of lines (or For another year text Y that describes will

words, or symbols, or pages, etc.) (see Fig. i). this same time interval(A, B) and this same region

G, the corresponding plot of H(Y,T)

in general have a different form, since the distribution of the text volume is considerably affected by the personal interests of the chroniclers (the authors of the texts). the chronicle X of the history of art and the military chronicle Y For example,

will be written with an

entirely different emphasis.

To what extent are these differences essential, i.e., do there

exist characteristics of the volume functions that are determined only by the time interval (A, B) and the region G and that unambiguously specifY all (or almost all) the texts descrih-

.

'~ti", ! jL'Jv :'g,.vrxN~,
....

:!~

,,.: .4 ~ .-,t /~ \

-'157

Livy Sergeev
Fig. i

Translated from Problemy Ustoichivosti Stokhasticheskikh Modelei -- Trudy Seminara, pp. 101-107, 1983.

668

0090-4104/86/3206-0668512.50

9 1986 Plenum Publishing Corporation


ing this interval of time,

It is found that an important characteristic of such a plot conFor simplicity we shall The local

sists of the years in which the plot has local maxima (see [i, i0]).

assume that the latter are nondegenerate, i.e., reached locally at one point. maxima of the function H(X, T} val (A, B). Let C(T)

indicate the "years described in detail" in the time interby contempor-

be the volume of all the texts written about the year T The plot of C(T)

aries (i.e., persons living at that time),

is not known to us, since the The following model of loss of

texts are lost over the years, and the information vanishes. information is constructed in [i, i0]:

For the years about which a very large number of texts

have been written, the number of texts that have been preserved will also be larger than usual. In such a form it is difficult to test the model, since we do not know the plot of s

9 How-

ever, it is possible to test one of the consequences of this model, i.e,, in view of the fact that later chroniclers X and Y, who describe this same period (A, B),are no longer contempor-

aries of these ancient eventsj they must rely on more or less the same collection of texts passed over to them, so that they must ("on the average") describe in more detail the years for which more texts have been preserved, and in less detail the years about which little is known (a small number of texts are available). volume functions of texts X and Y The principle of correlation of maxima of the The plots

has been formulated in [i, 2, i0] as follows: X and F

of the volume of "chapters" for correlated texts period of time (A, B) and the same region

(i.e., which describe the same

G ) must reach simultaneously local maxima in the and the years described in detail In contrast, if the texts

interval (A, B), i.e., the years described in detail in X in Y X

must be either close to one another, or they must coincide.

and Y are independent, i.e., they describe either quite different historical periods of the same length, or different regions, then the plots of the volume funcand H(Y, T) will reach local maxima at different points [provided that we let This principle of correlation of maxima can be suband Y , i.e.j and

(A, B) and (C, D) tions H(X, T)

the segments (A, B) and (C, D) overlap].

stantiated if for the majority of pairs of actual correlated historical texts X

which describe practically the same events, the volume functions of the "chapters" for X reach their maxima in roughly the same years. considerably. Here the value of the maxima must differ

In contrast, for actual independent texts there must be no correlation whatsoIn actual fact, in [I, 2, I0] the comparison was carried out

ever of the point of the maxima.

not just with two texts, but with two groups of texts, and the averaged plot of the volume was calculated for each group. 3. It is evident that for actual plots of volumes of correlated texts the simultaneity For estimating the degree of simultaneity with

of their peaks will occur only approximately,

which two volume functions reach their maxima, it is necessary to introduce a natural measure that makes it possible to estimate numerically the mismatch of the points of the maxima. a measure can be introduced by different methods. Such

It is required that it should distinguish It is

reliably between pairs of dependent (correlated) texts and pairs of independent texts. found that such measures exist (which is not self-evident). posed in [i~ 2, !0]. Let us briefly describe this measure.

The first method has been proThe points at which the function By measuring

H(X, T) reaches its maxima are dividing the segment (A, B) into smaller parts.

669


/////I~Dn 6

~
Fig 9 2 their lengths in years, vector we obtain a sequence

(al, a2......

, aJ, of integers
p.,

that

specifies Y that

an de-

integer-valued scribes where a period the

a(X) in a Euclidean
same length,

space

R ~ of dimension in general P another

For a text

of the q

we obtain number

vector

a(Y)= (bll b2.... , bq),
It can the

number

can differ

from the

(a different the same.

number of maxima).

nevertheless maxima of the coalesce at

be assumed that function

the number of maxima is

Let. P>q ; then
we assume that maxima to It
P p

some of

H(Y, T) are assumed to be multiple,
point 9 This means that of introducing Thus, it we are adjoining

i.e.,

some maxima plot that of such a the

a single v not

p--q

the

H(Y, T).
procedure

Let is

be a version single-valued 9 lie

such multiplicities.

is

evident A, which length

can be assumed that ~p-1 (see

.~a.~=~bi=B-i E1 ? ~1

i.e., is

ends of the in the space

two vectors R p by the

in the p

same simplex

of dimension Let

p--l,

defined

equation

~.xi=B--A
i=l

Fig 9 2).

l be the

of the vec-

tor a (Y) -- a (X)6.~. Let us write with a radius

pv(X, Y)~-vol D N~ ' ~6~

where D

is a ball in RP

centered at the point X If C

and

l, and D~o

is its intersection with the simplex 9 will be either the Euclidean

is a (P--1)-dimensional

measurable subset in a , then volC of C

(p--])-dimensional volume

(the continuous case), or a number of integer points, (the discrete case)

i.e., of points with integer co-

ordinates in C

Finally, we shall write p(X, Y)= k(X, r)+k(Y,

X)
in

where k(X, Y)----minpv(X, Y), i.e., the minimum is taken over all the faces of the simplex a the case that the original number of maxima was different. to verify that
p--I

In the continuous case it is easy

pc,(X, Y)<

~[--~--) (B--A)
and

2-lP-~ (p-- 1)t (p+l~

p-' ]/ P

Together with the functions [i, 2, i0]. The coefficient

H(X, T)

H(Y, T)

also their smoothing is considered in

p(X, Y)

was calculated each time, and its minimum value was is uniformly distributed on

taken as the final value.

By assuming that the random vector ~

a simplex, we can interpret the number ing that the random vector ~ observed

p~(X,

Y) as the probability of a random event signifya(X) that does not exceed the
X and Y reach

is at a distance from the vector

distance l=la(X)--a(Y)l.
then

If the volume functions for the texts

their maxima simultaneously,

p(X, Y)=0,

This procedure has been sharpened mathemati-

cally in different ways in [i, 2, i0, 16]. 670


If the coefficient the coefficient ant events. 4.
p(X, F)

p(X, Y)

is "small," then the texts X

and Y

are dependent, but if

is "large," then the texts are independent, i.e., they tell differ-

The approach described in the previous section has been illustrated in [i, 3, i0]

with the aid of a comprehensive computational experiment involving calculation of the numbers P(X, Y) for different pairs X and Y of actual historical texts. Let us present here the

principal result of this experiment.

It was found that the coefficient

p(X,

Y)

distinguishes

clearly between dependent and independent pairs of historical texts. texts X, ~

For all the pairs of

studied in [i0] and which describe quite different events (different epochs or difp(X, Y) fluctuates between 1 and and

ferent regions), i.e., for independent texts, the number i/i00 with a number of maxima ranging from i0 to 15.

On the other hand, if the texts X

Y are dependent, i.e., they describe the same events (this being known by a preliminary historiographical analysis), then the coefficient number of maxima. p(X, Y) does not exceed 10-s for this same %

In Fig. 1 we plotted a typical example of two dependent texts, i.e.,

which denotes V. S. Sergeev's monohraph "Topics in the History of Ancient Rome" [4], and

which denotes

the"history of Rome"by TitusLivius (Livy) [5]. Here (A, B) represents the years 757-287 B.C., and p(X, Y)=2"10 -I~. Both texts describe the same period of Roman history. Another example

of dependent texts is

X which denotes the Kholmogory Annals [7], and Y which denotes the In this case (A, B)represents the years 850-1000 A.D., and

Tale of Past Years [8]. =10-15.

p(X, F)
y

A similar example of dependent texts is

X, denoting the Nikiforov Annals, and

the Suprasl' Annals [7]. =I0-2!

In this case (A, B) represents the years 850-1255 A.D., and p(X, Y)

Yet another example of dependent texts which have been detected as a result of a representing part of the

numerical experiment carried out in [i, 3, i0] is given by X, "History of Mediaeval Rome" by F. Gregorovius

[9], which covers the history period from the

year 300 to 745 A.D., whereas the second text, Y , is the "History of Rome" by Titus Livius [5], which covers the period from the ist to the 459th year counted from the foundation of Rome, i.e., from the year 753-294 B.C. (ancient Rome). pendent texts are plotted in Fig. 3. dependent texts the coefficient The volume functions of these two deLet us recall that for quite in-

Herep(X,Y) =6 10H~ Y)

p(X,

is not smaller than i/i00 (see [i, 2, i0]).

In analyzing quite independent texts, we calculated a lower bound for the number p(X, Y) by approximating a multidimensional region by "cubic layers." compare (see [!, 2, I0]): In this way it was possible to

a) Ancient texts with ancient texts, b) ancient with contemporary Instead of the volume functions X(T! of

texts, and c) contemporary with contemporary texts.

ft rl It
. J n~

l
I

I V~

+JO0

Gregorovius
Fig. 3

Livy

+816

671


Uchapters," we compared also other quantitative characteristics of texts such as plots of the number of names mentioned (in each year T ), plots of the number of times a certain year is mentioned in the text, plots of the frequency of referring to any other text, etc, (for details see [i, 3, i0]). It was found that all these characteristics are governed by the same

statistical laws, i.e., the plots of dependent texts reach their local maxima practically simultaneously, whereas for independent texts the peaks of the plots are not at all correlated. Let The following procedure of dating of texts has been proposed and tested in [i, 3, i0].

Y be a text that describes events unknown to us whose absolute dating has been lost, Y on the basis of an event of local importance

with the years T being counted in the text

such as the founding of a city, or the day on which a ruler has been crowned, etc., the absolute dating of such an event having been lost. Y? For the text How shall we date the events described in

Y we shall calculate its volume function

Y(T)

of "chapters" and compare

it with the volume functions of other texts for which the absolute dating of events described in them is known to us. p(X,F) If among these texts we can find a text X, for which the number

is small, i.e., it has the same order of magnitude as for pairs of dependent texts

(i.e., it does not exceed I0-s in the case of a number of maxima ranging from i0 to 15), then with a fairly high probability [the higher, the smaller the number p(X, Y)% we can conclude that the events described in these texts either coincide, or are very near in time. In [i, i0] this procedure has been tested on mediaeval texts with a priori known dating. A

typical example is Y , which represents the Dvinsk Chronicle (short edition) that describes the events taking place over a period of 327 years [6]. icles in the "Complete Collection of Russian Chronicles, tion
,!

In going through the list of chronwe can find a text X, whose func(after letting -25. It is

H(X, T)

has maxima in practically the same years as the function H(Y, T) A calculation yields

the time intervals described in them overlap). found that X

p(X, Y)=2.10

is a verbose edition of this same Dvinsk Chronicle (see [6]).

Here (A, B)repre-

sents the years 1390-1717 A.D, standard dating.

The dating of the text Y obtained in [i0] coincides with its X denote the Academic Chronicle [7]. Following the

As another example let

technique described in [i0], we find that the text X with

is part of the Suprasl' Annals [7], Other examples

(A,B)
5.

representing the years 1336-1374 A.D.

In this case, p(X, Y)=10 -14

and tables can be found in [i, I0]. The coefficient p(X, Y) described above is based on the concept of spherical neigh-

borhood of a point, and this makes it difficult to process by computer the experimental material, i.e., the plots of the text volume. For analyzing the stability of the principle of cor-

relation of maxima described above and in [i, I0], and for utilizing a computer, it is possible to resort to the following method of statistical estimation of the chronological nearhess of sequences of points of maxima of the volume functions of texts. let a(X) Just as in Sec. 3,

be a vector that describes the instants of time at which certain events are mentioned It is natural to assume that: a(H) where ~ a) This vector is closely related to

most in the source X .

the vector of actual events

is the historical period that is being described;

b) the instants of occurrence of certain events that deserve to be mentioned in a certain text constitute a random point process. If we have two texts X and Y, , then the simplest

672


relationships between them can be described by the following schemes:

a(X)+-a(H)~a(Y)

and

a(H) --+a(X)-+a(Y), a(X)
and

In either case it is necessary that

a(X)

and

a(Y)

should be realizations

of certain random processes that are close to one another.

As a measure of proximity between

a(Y)

we shall use the following quantity:

R (X, Y) = %~ min t at (X) -- aJ (Y)]+ ~ where the subscripts i=],N we shall and ]=l/M henceforth

mini at (X)-- aj (Y) i,

are marking the components of the vectors

a(X)

and

a(Y).
X and

For brevity Y.

say that

R(X, Y)

is the distance

between the texts maximum

In other words, we fix a maximum of one text, text. We calculate the distance

and then find the nearest After text.

of the other

between them,

that we find the sum of Then we repeat this proWith

these distances cedure by letting

with respect

to all

the maxima of the first As a result

the two texts

change place. that

we obtain

the above number.

such an approach the two texts and we are not obliged that It is

are being compared can have a different number by introducing multiple

number of maxima, Let us note

to equate their

maxima.

such a choice of the measure of proximity certainly possible

is mainly due to simplicity

of calculation.

to use also other measures of proximity which are found to be (on reliable in distinguishing between pairs of dependent and pairs in fact of

the basis

of experiment) texts.

independent

The reader

can see that

the following analysis

does not use the

form of the function

R(X, Y).
i.e., we calculate

Now let us utilize a technique which is fairly common in statistics, the distribution function F0(R) of the random variable R(X, Y)

for a set of hypotheses which and

necessarily contains also the hypothesis that the vectors After that we find the distance us. /~(X, Y)

a(X)

a(Y)

are independent. of interest to

between the specific texts

X and Y

If the probability of occurrence of such a distance or of a smaller distance is small, X and Y are independent, and

then it is natural go discard the hypothesis that the texts assume that they are correlated. In this paper the function F0(R) following assumptions :
=

has been calculated by the Monte Carlo method under the

(x), [an(Y), a=2,
signifies that the values of the components of

where

n

is the number of the trial,

and

a(X)

a(X) have been calculated a~*_1(Y)~t,i~2, M, ~i6F(A,S),, where
the vector

on the basis F

of the text X , with with a mean

aln(Y)~--Yl,

ain(Y)=

is a distribution

N--I

A and a variance

1

N--l ~
i=!

[~i+,(X)--~zi(X)]

N--I

S

N--1

i=l

673


g-.

TABLE I
9

;-j4

I
"" 7

2

3 10 84
0. 569 0.515 0. 305 0.422

4

it

!5

1

/Table of Past Years
0 0.550 0.497

(850--1110) N=61

Nikofor0v Annals (850--1430) N=83 0,840 0.999 0 0.003 0.003 0 o
o o

0.660 0.993 0.001 0.004 0.313 0.929 0.375 0.887

0

0.01 0,03

0.001 0.002

]

Supprasl' Annals ~(850~1446) N=132 0. 155 0.699

4

Academic Chronicle (1336--1446) N=33

5 0.013 0.012
0

Dvinsk Ct~onicle (complete) (t39o-m7) N=52 0 0

6

Dvinsk Chronicle (short) (1390--1717) N=47

7

Nikiforov Annals (850--1255) N=31

0.006 O. 008
0.006

8

Suprasl' Annals, (850--1255) N=30

0,005

0 o 0 0 0.002 0. 108 0.003 o 0.130 0 ] t

9

Titus Livius "History of Rome" (757-287 B.C.)N= 15:~

10

Gregorovious "History of Rome" A,D,) N= 15

Ii

0 0
0.001 0.111

0.003 0,58 0 0

12

Suprasl' Annals (1336--1374) N=15 Academic Chronicle (1336--1374) N=15

N is the number of maxima, the first number in the corresponding row is the probability in the case of a normal distribution, and the second number is the probability in the case of a Poisson distribution,


The simulation was performed for the cases that tion (~i~0) or an exponential distribution.

F(A, S)

is a truncated normal distribu-

The simulation results are listed in Table i. S, we used, in turn, the vectors d(X) and d(f)

As the basic vectors used for specifying

A and

It is easy to see that our approach can be satisfactorily utilized in the case of vectors d(X) and a(Y) of roughly the same length. It follows from Table i that the pro-

cedures used in this section and in Sec. 4 above yield basically the same qualitative results, so that we can hope that our original assumption concerning the representativeness of information about the peaks of the volume functions of historical texts is correct. It is of interest to study other measures that make it possible to distinguish between pairs of dependent texts and pairs of independent texts. This would enable us to compare the

results obtained by using different techniques, and to reach meaningful Chronological conclusions. The authors express their gratitude to I. S. Shiganov for his assistance in the cal-

culations. LITERATURE CITED I. A. T. Fomenko, "Some statistical regularities in the distribution of the density of information in texts with a scale," Semiotika Inf., No. 15, 99-124, VINITI Press, Moscow

(1980).
2. A. T. Fomenko, "Informative functions and corresponding statistical regularities," Abstracts of Reports of the Third International Vilnius Conference on Probability Theory and Mathematical Statistics, Vol. 2, 211-212, Institute of Mathematics and Cybernetics of the Academy of Sciences of the Lith. SSR, Vilnius (1981). A. T. Fomenko, "A technique of recognition of duplicates and some applications," Dokl. Akad. Nauk SSSR, 258, No. 6, 1326-1330 (1981). V. S. Sergeev, Topics in the History of Ancient Rome [in Russian], Vols. i-2, Moscow

3. 4.

5.
6. 7. 8. 9. i0.

(1938). Titus Livius, History of Rome [Russian translation], Vols. 1-6, Moscow (1897-1899).
Complete Collection of Russian Chronicles [in Russian], Vol. 33, Leningrad (1977). Complete Collection of Russian Chronicles [in Russian], Vol. 35, Moscow (1980). Tale of Past Years. Literary Monuments of Ancient Russia [in Russian], Khud. Lit. Press, Moscow (1978). F. Gregorovius, History of Mediaeval Rome, SPB (Collection of Works) [in Russian], Vols. 1-5 (1902-1912). A. T. Fomenko, "New statistical experimental techniques of dating of ancient events and applications to the global chronology of the ancient and mediaeval world," Preprint No. B07201, Nov. 9, 1981, State Committee for Television and Broadcasting, Moscow (1981). A. T. Pomenko, "A new empirical statistical technique of ordering of texts with applications to dating problems," Dokl. Akad. Nauk SSSR, 268, No. 6, 1322-1327 (1983). A. T. Fomenko, "Calculation of the second derivative of the Moon's elongation and statistical regularities in the distribution of certain astronomical data," Operations Research and Control Systems [in Russian], Vyshcha Shkola, No. 20, Kiev (1982), pp. 98-113. A. T. Fomenko, "The Jump of the second derivative of the moon's elongation," Celestial Mech., 29, 33-40 (1981). A. T. Yomenko, "On the properties of the second derivative of the moon's elongation and related statistical regularities," Problems of Computational and Applied Mathematics, No. 63, 136-150, Tashkent (1981). A. T. Fomenko, "The author's invariant of Russian literary texts," Methods of Quantitative Analysis of Texts of Narrative Sources, 86-109, Inst. of History of the USSR, AN SSSR, Moscow (1983). A. T. Fomenko, "On the geometry of distribution of integer points in hyperregions," Proceedings Seminar on Vector and Tensor Analysis, No. 21, Moscow State Univ. (1983), pp. 106-152.

ii. 12.

13. 14.

15.

16.

675