Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hbar.phys.msu.ru/gorm/fomenko/fomenko10.pdf
Дата изменения: Tue May 18 09:57:22 2010
Дата индексирования: Mon Oct 1 21:07:55 2012
Кодировка:

Поисковые слова: m 80
We would like to thank T. A. Azlarov, S. Rachev, and attention to this research and useful comments. LITERATURE CITED
.

an

anonymous referee for their

2. 3. 4. 5. 6. 7. 8. 9.

E. J. Gumbel, "Bivariate exponential distributions," JASA, 55, No. 292, 698-707 (1960)o A. W. Marshall and I. Olkin, "A multivariate exponential distribution," JASA, 62, No. 317, 30-44 (1967). J. E. Freund, "A bivariate extension of the exponential distribution," JASA, 56, No. 296, 971-977 (1961). P. S. Purl and H. Rubin, "On a characterization of the family of distributions with constant multivariate failure rate," Ann. Prob., 2, No. 4, 738-740 (1974). H. W. Block, "A characterization of a bivariate exponential distribution," Ann. Stat., ~, No. 4, 808-812 (1977). L. Lee, "Multivariate distributions having Weibull properties," J. Multivar. Anal., 2, No. 2, 267-277 (1979). P. Ferman, "Characterization of multivariate exponential distributions," Vestn. Mosk. Gos. Univ., Set. Mekh. Mat., No. 4, 44-47 (1981). P. Barlow and F. Proschan, Statistical Theory of Reliability an Life Testing, Holt, Reinhart and Winston, New York (1974). B. Dimitrov, L. B. Klebanov, and S. Rachev, "Characterization stability of the exponential distribution," in: Stability Problems of Stochastic Models, Proceedings of a Seminar [in Russian], VNIISI, Moscow (1982), pp. 39-46.

NEW TECHNIQUES FOR COMPARING THE VOLUME FUNCTIONS OF HISTORICAL TEXTS V. V. Kalashnikov, S. T. Rachev, and A. T. Fomenko

We propose new techniques for estimating the degree of dependence of historical texts, such as annals, chronicles, etc. We consider texts "parametrized by time." This means that the text can be divided into a union of disjoint fragments, each describing the events of one year (or one decade, etc.). We also assume that the texts describe events over time intervals of the same length (say, a period of a few decades or centuries). Following [i], two texts X and Y are called dependent if they describe events over the same time interval and in the history of the same region, or have a common prototype. Dependent texts may have the same origin, rely on the same volume of archival data, or be versions of the same prototype. Texts are said to be independent if they describe events in essentially different time intervals (i.e., time intervals that intersect over not more than half their combined length) or describe events in different regions. It is relevant to consider techniques for estimating the degree of dependence of a pair of texts. Consider a text X that describes events over the time interval from A to B (in some system of chronology). Let the parameter t run over the years from A to B. Represent the text X as the union of fragments X(t), where X(t) describes the events of one year t. Count the volume of the fragment X(t), e.g., in lines (or in pages, etc.). The result is a certain graph f(X, t) = volX(t). Similarly construct the graph f(Y, t) for the text Y, which is also assumed to be given on the interval [A, B]. Identify the splash points (i.e., the points of local maxima) in the volume of the text X on the interval [A, B]. The following correlation principle of maximum points has been formulated and experimentally tested by Fomenko [1-3]: i) If the texts X and Y are dependent, then the splashes in their volume functions occur virtually at the same time, i,e., the points of local maxima of the volume functions volX(t) and volY(t) are correlated.

Translated from Problemy Ustoichivosti Stokhasticheskikh Modelei, Trudy Seminara, pp. 33-45, 1986.

2302

0090-4104/89/4701-2302512.50

9 1989 Plenum Publishing Corporation


2) If the texts X and Y are independent, then the points of local maxima of their volume functions are uncorrelated (assuming that the time intervals of equal length described in both texts overlap). For a see [1-4]. ignore the applied by [2, 3], it In that are splashes properly tion of discussion of the maximum correlation principle and its experimental verification, Here we deal with "pointed information," ~.e., we track the maximum points and magnitude of the splashes. The maximum correlation principle has been successfully historians, and, together with the frequency decay principle, also formulated in has been used in [5] to analyze the dependence of particular historical texts.

this paper, we advance and test the following hypothesis: i) for two texts X and Y known to be dependent, the volume functions (and not only the points of local of the volume functions) should be "correlated" (assuming that the problem has been posed); 2) for two texts that are known to be independent, no reasonable correlavolume functions should be observed.

This hypothesis, of course, is more complicated than the maximum correlation principle described above. It incorporates "more information" - both the location of the splash points and the magnitude of the splashes. The original maximum correlation principle [i-3] relied on the fact that different chroniclers describing the same period in the history of the same region draw mainly upon the same "store of preserved information" (ancient texts), and as a result they tend to describe in greater detail those years for which a larger number of texts have survived and in less detail years with only few surviving texts. Now that we want to take into account also the amplitude of the volume function, we have to allow for the obvious fact that although the chronicler "makes a splash" in describing a particular year, the magnitude of this splash may depend on a variety of intractable factors, such as personal sympathies and antipathies with the events being described. In our research we tory in the 9th through years (introduced by the from the creation of the red in that year. used several long chronicles describing the events in Russian his17th centuries. Each of these texts contains a clear division by original chroniclers). The chronicler states the year (reckoning world) and then enumerates the events which (in his opinion) occur-

a) As the first pair of dependent texts, we chose the Nikiforovaskaya chronicle (X) and the Suprasl'skaya chronicle (Y), both from Complete Collection of Russian Chronicles, Vol. 35, Moscow (1980). As the interval (A, B) described in both texts we chose the period of 406 years from 850 A.D. to 1256 A.D. The choice of this particular interval can be justified as follows. The brief introduction at the beginning of the Nikiforovskaya chronicle covers a long historical period from Adam to the Flood and then up to year 6362 from the creation of the world. This introductory part contains no detailed chronological markers (no dates) and is extremely brief (less than half a page), all of which suggested that we should omit the description of the period from deep antiquity until the year 6362 from the creation of the world. It is only starting with this year that the text is divided into "chapters" describing different years. For example: "In the summer of 6362. The beginning of the land of Rus," etc. The key events described inthe chronicle include legends about the beginning of Rus, Ryurik, the brothers Kii, Shchek, Khoriv, the baptism of the Bulgars, Oleg, Igor, the campaign against the Greeks, Greeks and Russia, Vladimir (in detail), Yaroslav, Novgorod, Suzdal, Smolensk, the invasion of Mamai, the history of Vitovt (Vytautas), the war against the Tartars, Lithuania. The text ends in 1430 A.D. However, starting with 1112 A.D. large lacunae appear in the chronicle. We therefore decided to end the sample period in 1256, where a particularly large 50-yr lacuna begins. The second text is the Suprasl'skaya chronicle. In both chronicles, the volume of the fragments X(t) was determind by line count. The two chronicles describe roughly the same epoch in the history of Russia and some adjacent regions. Their dependence is particularly remarkable in that the two chronicles are definitely not identical, although possibly both have common sources. The chronicles substantially differ in style and in emphasis on the assessment of events. Thus, the author of the Nikiforovskaya chronicle devotes 36 lines to the year 970, while the author of the Suprasl'skaya chronicle devotes only 7 lines to this year. On the other hand, the Nikiforovskaya chronicler had nothing to report about the events of the year 977, while the Suprasl'skaya chronicler devoted 4 lines to this year. Despite all this, the correlation of the maximum points is quite pronounced [I-3]. In addition to the different distribution of fragment volumes, different events are sometimes
2303


described in the same years. wedding of Aleksandr Nevskii, Both chroniclers thus increase times by describing different b) Tale of lection of Old differ points

For instance, the Suprasl'skaya chronicle reports in 1233 the while the Nikiforovskaya chronicle does not mention this event. (or decrease) the degree of detail of their description, someevents.

Another pair of dependent texts included the Kholmogorskaya chronicle (X) and the Bygone Years (known in English as Russian Primary Chronicle) (Y), from Complete Colof Russian Chronicles, Vol. 33, Leningrad (1977) and the series Literary Monuments Russia, Moscow (1950). Here A = 850 A.D., B = i000 A.D. These chronicles essentially from each other in degree of detail. Nevertheless, the maximum correlation principle to pronounced dependence of the two chronicles [1-3].

c) A third pair of dependent texts included the Dvinskii chronicle (sort edition) (X) and the Dvinskii chronicle (complete, extended edition) (Y), both from Complete Collection of Russian Chronicles, Vol. 33, Leningrad (1977). Here A = 1390 A.D., B = 1717 A.D. d) The fourth pair of dependent texts included the Akademicheskaya chronicle [see Complete Collection of Russian Chronicles, Vol. 35, Moscow (1980)] and the part of the Suprasl'skaya chronicle describing the events from 1336 to 1374 A.D. (Y). Independent pairs of texts are generated quite simply, e.g., by the following formal technique. Take some text X and as an independent text Y take the same text X "reading it backward," i.e., the sequence of years is reversed (the last year becomes the first, and so
on).

It is sometimes helpful to treat the graph of the volLune function of the text X as the result of observations of some stochastic process. Such a stochastic process is the sequence of events in the history of the given region (over the given time interval). Each chronicler is a "black box" processing this sequence and producing on the "output" his own chronicle, which in particular determines the volume of description of each year. In this way, different chroniclers may generate texts that will have roughly the same or substantially different degree of detail. The most stable results (in statistical terms) are obtained of course when we compare texts of "equal order" (i.e., "poor" with "poor" or "rich" with "rich"). The comparison of texts of different order (i.e., "poor" with "rich") should be approached more carefully. Let us formulate some useful text manipulation rules: i) Volume graphs should not be treated as "ideally exact," they should be regarded as "fuzzy" information. If two chroniclers "made splashes" close to each other (e.g., one of them "erred" by i year in dating a particular event), then these splashes should be treated as "approximately coincident," since an error of this kind is quite natural when describing events removed by many tens or hundreds of years into the past. 2) It is helpful to "smooth" the volume graphs and to repeat the comparison each time, taking the least value of the proximity coefficient of the graphs. 3) It is useful to focus only on the "largest" splashes, ignoring "small ripples" on the volume graph. Let us briefly summarize the findings of our research. a) The proposed statistical methods confidently discriminate between pairs of texts that are known to be dependent and those that are known to be independent (allowing for the amplitudes of the volume functions!), b) The sharpness of this discrimination is different for different techniques (see below), c) Pairs of dependent texts (of equal degree of detail) are confidently discriminated (by all procedures) from pairs of independent texts. d) Pairs of dependent texts of different degree of detail (poor and rich) are still discriminated from pairs of independent texts, but (for some techniques) with lower degree of confidence. i. TECHNIQUES TREATING THE VOLUME FUNCTION AS A PROBABILITY DISTRIBUTION

i.I. Let us consider a modification of Fomenko's comparison methods [1-4], based on Kantorovich's multidimensional theorem of displacement of masses [6]. Let f(t) = f(X, t) be the volume function of the text X on the interval [A, B]. Consider the full volume of the text X. Then we may write
B B

lI

,,=

x c,l.

2304


Construct the function
t

S(t)----S(x, t)= v-v-3T- f f(u)du. 2A

I

Clearly, O~S(t)~.l on the interval [A, B] and S(t) is a nondecreasing function. S(.) will be called for brevity the "accumulated sum" of the text X.

The function

Consider two texts X and Y. Let us estimate their dependence or independence by comparing the accumulated sums S(X, t) and S(Y, t) on the interval [A, A + T], where T = B - A is the length of the time period described in the texts. The accumulated sums "smooth out" small fluctuations of the volume graphs, and it is therefore natural to try and apply them for text dependence analysis. We use the functions f(X, .), f(Y, .) to define the probability measures Px(-) and Py(.), treating f(X, ") and f(Y, .) as the distribution functions of these measures. The measures Px and Py obviously have the same support - the interval D = [A, B]. The measure Px is called the (normalized) mass of the text. Following the terminology of the problem of displacement of masses (see [6]), we call the comparison plan of the texts X and Y any probability measure P on the direct product D · D with the projections Px(')=P('xD), Py(.)=P(Dx-). For any intervals 11 and 12 (lj c D, j = i, 2), P(/IXf2) is the fraction of the mass of the text X on the interval 11 which is identified with the mass of the text Y from the interval 12 under the plan P (see Fig. I). We have the obvious equalities

P (I,· D) = Px (I,), P (D X I~) = Py (&).
This identification events in the two texts: event in text Y dated by assess the damage caused t26D , where c(t, t) = 0 the form is an the by for essentially interpreted as identification of the dates of the event in the text X dated by the year t I is identified with the year t 2. Such time shifts are obviously undesirable, and we this redating with the aid of a nonnegative function c(tl,t2), t~, all t6D. It is sometimes convenient to define the function c in

c (t,, ~) = H( f t~--tuJ ), t,, ~D,

(1)

where H is some nondecreasing convex function. We will only consider the most general case, i.e., we assume that c is a 2-antitone function, i.e., it satisfies the inequality

C(tl+A,, t2+A2)--c(tl, t2+A2)--c(6"i-A1, t2)nt-c(tl, t2)~O
for all At>0, A2>D, tl, t2, tl+Al, 2-antitone function.

t2+A2~D.

If c has the representation (i), then c is clearly a

We denote the collection of comparison plans by ~=~(X, Y). The total cost associated with the realization of each plan P6~ is naturally evaluated by the integral

Cost (P)--o ~D c (tl, t2) P (dtl, dt2).
Therefore, the sought optimal comparison is characterized by the measure minimum

(2) P*E~
on which the

min {Cost (P) : P~.~} = cost (P*)
is attained.

( 3)

We denote the left-hand side of the equality (3) by ~(X, Y; c). The number ~(X, Y; c) is naturally considered as a measure of difference of the texts X and Y with cost function c relative to the characteristics f(X, .) and f(Y, .). The measure P* is called the optimal comparison plan. Let us now describe the explicit formula for ~(X, Y; c). Let

S -l(u) =S -~(X, u) =max{t : S(X, t) 1

(4) function,

Now, if c is a 2-antitone

~(X, Y; c)= t c(S~(X, u), S-~(r, u))du
0

(5)
(6) 2305

and the optimal comparison plan P* is given by the equality P*([A, tl]x[A, t2])=min{S(X, t~), S(F, t2)}, t,, &~D


I I.::f.
i L t

:-':"; :'.'.1 -: .:.
t;:.-:'.~ _

I
i
:

A

II~+T

AI Fig. 1

z

A+T
0A A+T

~
A
Fig. 2
B A

~.'='-"J S(X,t)

A~F
Fig. 3

(about the formulas (5) and (6), see the survey [6]). If c(t, s) = It - sl, then the difference measure ~(X, Y;c) of the texts X and Y is called Kantorovich metric and

(x, F; c)= ~ (x, D = 11 s (x, t)- s (r, t)Idt
(see Fig. 2).

(7 )

The constructions described above can be extended also to the comparison of N > 2 texts (see [6]), but so far this possibility has not been tested empirically. Optimal comparison plans are particularly important for the needs of chronology, because they ensure the best overlapping of the two texts. Another problem is choosing the bounds a and b so that for ~=~(X, Y; c)
Cosh (P) = max {sup inf max { I t -- s l, P ([A, t] · (s, BI) },

,Eos6o

(8)

sup inf max {] t-- s l, P ((s, BI X [A, t])}} t6Os~D

(see [8, 9]).
The optimal plan P* among all the plans P6~(X,

Y) is defined by the equality (9)

CostL(P*) =min{Cost=(P) : P6W~(X, Y)}. The optimal comparison plan P* exists and is defined by formula (6) (see [8, 9]), and the

total cost associated with the realization of P* is determined by Levy's metric between the accumulated sums S(X, .), S(Y, .), i.e.,

CostL(P*)

=L(S(X, .), S(Y, .))=L(X, Y)

(10)

(see [8, 9]): this is the length of the largest square that can be inscribed between the completed graphs S(X, .), S(Y, ") (see Fig. 3). Levy's metric L(X, Y) may be treated as a measure of deviation of the volume functions f(X, .), f(Y, .) in the spirit of the problem of displacement of masses. 1.3. The tests presented in be dependent - Nikiforovskaya (X) tained · Y) = 1.5 and L(X, Y) known to be independent (X is the forovskaya chronicle for 950-1050 Secs. 1.1-1.2 were applied to two texts that were known to and Suprasl'skaya (Y) chronicles (850-950 A.D.). We ob= 0.01. Application of the same tests to a pair of texts Nikiforovskaya chronicle for 850-950 A.D., Y is the NikiA.D.) produced the following valuas: ~(X, Y)=8.4, L(X,Y)=0.08.

We see that the proposed tests distinguish between dependent and independent texts. Further computational work is needed in order to justify the applicability of the proposed tests.

2306


...... t
~of 9 Bygone

J Years (850-950)

I
950

t

1

i

I

--

i

,

I

M

850

860

870

880

890

900

910

920

930

9~0

Fig. 4 1.4. The procedures described in Secs. I.i, 1.2 are effective when dealing with texts of roughly the same volume. If we compare, say, a "poor" and a "rich" text (see Fig. 4), then the two volume curves are substantially different in the metrics x and L. This suggests that we need a special cost function c that will bring rich and poor texts closer to each other. This problem has not been solved so far, and therefore for comparison of texts of different order ("poor"-"rich") we propose the "sum of jumps" tests. This test can be described as follows. Take a sufficiently small number E > 0 (the approximation level). Let AS(X, t) be the magnitude of the jump of the graph S(X, t) [resp., AS(Y, t)] in year t. Compute the sum of the jumps AS(X, t') by the following rule:

z (t As (x, t'),
t'

where t' does not %(t') = 0 certainty

are the years when contain a year t" otherwise. Here of the dating in

AS(X, t') such that ~ > 0 is a the text.

> E; X(t') = i if the 6-neighborhood of the year t' the function S(Y, t") makes a jump greater than g and fixed number, characterizing the admissible time unSet A(X-+Y) =~%(II)AS(X,I~). We similarly evaluate
t

A(Y + X) by interchanging the texts X and Y. As the resultant measure of distance between the texts X and Y take A(X, Y) = (I/2)(A(X + Y) + A(Y + X)). Thus, by those maxima constancy of tween the two computing the number A(X + Y), say, we compute, roughly speaking, the sum of of the function vol (X, t) that occur "against the background" of approximate the function vol (Y, t). In other words, we compute a measure of distance beaccumulated sums S(X, t), S(y, t).

Here we again use "pointed information" which, however, also follows for the magnitude of the jumps. Let us demonstrate the application of this procedure for E = 0, i, 6 = 2 (i.e~, jumps in the graphs S not exceeding 0.i were "filtered" and time "inaccuracies" of up to 4 years were allowed). For the Nikiforovskaya (X) and the Suprasl'skaya (Y) chronicles, 850950, we obtained A(X, Y) = 0 - ideal dependence. For the two parts fo the Nikiforovskaya chronicle (X is the part for 850-950, Y the part for 950-1050), we obtained A(X, Y) = 488. Comparison of the Nikiforovskaya chronicle (X) with the Tale of Bygone Years (Y) on the interval 850-950 gave A(X, Y) = Ii. This test detects dependence and independence with fair degree of confidence. 1.5. Yet another test can be devised by observing the dynamics of convergence of the graphs S(X, .) as we gradually "deplete" the texts. For example, assume that the texts X and Y are compared by the metric · Y) [see (7)]. Let · Y; ~l, T2..... ~i), i~i be the metric (7) between the texts X and Y after deletion of the chapters relating to the years ~i, %2,-..,Ti (these are naturally different years). Let

· (X, Y)= rain ~ (X, Y; ~l

.....

~).

For dependent texts, we naturally expect to observe not only smaller changes in · (i~l) than for independent texts, but also relatively faster reduction of x~ (for independent texts, the reduction of ~. will be slower). Figure 5 illustrates this situation.

2307


,

~

{'x,Y)

0,9~ 0,8
0,7

X =Suprasl'skaya 25-100 1 Y= Nikiforovskaya 500-575

0,6

~

o?
0 f~--]
1 2

~

.e~(x,Y;

V = Nikiforovskaya 200-275
J #5 6 7 8

Number of years omitted from observations Fig. 5

2.

A COMPARISON TECHNIQUE TREATING THE TEXT VOLUMES AS RANDOM SAMPLES

2.1. We start with a review of some well-known results of probability theory [i0]. Let and q be two nonnegative random variables (r.v.s) with joint distribution function (d.f.) Pf(x,y)-----P(~
P (~ ~ y)

-----

lim H (x, Y).

We measure the deviation of the
OD

r.v.s $ and q by E I$--NI. We have the

equality

mine I ~--rll= f
0

[F(x)--C(x)ldx=l(F,

0),

(1)

where know form r.v.s

min is over all possible distributions H with given marginal distributions F and G. We that the equality in (i) is attained for the function H(x, y) = min(F(x), G(y)). This of H corresponds to the case of "strong dependence" between the r.v.s ~ and q. If the ~ and ~ are independent, then

E I ~-- ~ I----i (F (x) + G (x) -- 2F (x) O (x)) dx = m (F, G).
0

(2) G) or to m(F, G), we say

Therefore judging by the degree of proximity of EI$ - ql to s
that s not Even the r.v.s ~ and q are strongly dependent G) is the minimal possible value for EI~ the maximal possible value for El~--ql It higher values of EI~--NI are obtained for

or "almost independent." Incidentally note that - ql [see (ii)]. At the same time, m(F, G) is corresponds to the case of independent g and q. negatively correlated g and q.

2.2. We treat the sequence of volumes f(X, t),A~t~B, as a sample of independent r.v.s with d.f. Fx(x)=P{f(X, O<~x} This approach is justified by the unpredictability of real historical events, nondeterminism of the personal traits of the chronicler, and also the effect of purely conjunctural and personal factors. We similarly treat the sequence f(Y, t), A~t~B as a sample of independent r.v.s with d.f. F,.(x)=P{[(Y, t)<~x}. Since we have no other information apart from the texts X and Y, we naturally take F X and Fy as the empirical distribution functions,

Fx (x) = @ {t : f (X,t)
(3) (4)

An analogue of the mean E]~--~i in this case is the empirical average
B

1 7il (X, Y)~- B--A+ I X ] vol (X, t) -- vol (Y, t) l"
t=A

(5)

This interpretation of the texts X and 5( suggests dependence if M(X, Y) is close to ~(F x, Fy) and independence if M(X, Y) is close to m(Fx, Fy). As a more sophisticated test of dependence of the texts X and Y we can suggest comparison of the sample d.f. Hxy(x, y) =~#{t:vol(X,t)~x, vol(Y,t)
2308


TABLE 1
Chronicles compared M (x. Y) l Fx, Fy) (Fx, m FF) M--l m--l

Suprasl'skaya (X) (850-1256 A.D.) Nikiforovskaya (Y) Suprasl'skaya (X) (850-1256 A.D.) Inverted Nikiforovskaya~Y) Dvinskaya complete (X) (1390-1717 A.D.) Dvinskaya short (Y) Rachinskii's (X)"(1400-1550A.D.) Evreinovskaya (Y) Inverted Rachinskii's (X)(1400-1550 A.D.) Evreinovskaya (V)' Vladimirskaya (X) (830-1241 A.D.) Volynskaya (Y) Inverted Vladimirskaya (X) (830-1241 A.D.) Volynskaya (Y) Suprasl'skaya (X) (850-1110 A.D.) Tale of Bygone Years (Y) Suprasl'skaya (X) (850-1110 A.D.) Inverted Tale of Bygone Years (V) Nikiforovskaya (X) (850-1110 A.D.) Tale of Bygone Years (Y) Nikiforovskaya (X) (850-iII0 A.D.) Inverted Tale of Bygone Years (Y)

0,88 2,67 2,89 2,32

0,62 0,62 2,55 0,92 0,92 0,42 0,42 18,81 18,81 18,95 t 8,95

2,96 2,96 5,63 5,51 5,61 0,83 0,83 20,14 20, 14 20,08 20,08

0,11 0,87 0,II 0,3 I, [ 0,9 0,92 0,65 0,99 0,85 0,98

5,97
0,79 0,80 19,68 20, 13 19,91 20,04

Our assumptions of independent and identically distributed vol (X, t), A~t~B , are not fulfilled in practice, strictly speaking. Nevertheless, it seems that these assumptions are not decisive. This can be demonstrated by the following idealized example. Let the quantity ~ (A~t~B) characterize the volume of the actual events in hear t. Assume that the chronicler X processes these events as follows:

vol

(x, t) =f~(o,),

(6)

where fx(x) is a positive monotone increasing function. In other words, the chronicler writes more in years that are richer in events. Similarly, the chronicler Y processes the events with the corresponding function

vol(Y,t) =fy(~,).
It is easy to see that in this case the following equality holds without any randomness assumptions:

M(X, Y) =l(Fx, Fy).

(7)

It is useful to note the following fact. If the texts X and Y are such that vol (X, t)~<~ vol (Y, t) for all A<~t

TABLE 2 Chronicles compared (850-i000 A.D.) Tale of Bygone Years (X) Without time Kholmogorskaya (Y) shift With time shift Tale of Bygone Years (X) Without time Suprasl'skaya (Y) shift With time shift Tale of Byfone Years (X) Without time Nikiforovskaya (Y) shift With time shift Total a (x. 7)

(Fx, Fy)[m(Fx,Fy)
1O, 97 20,42

13,07 11,05 16,73 15,6 17,16 15,64

0,22

0

10,98 15,28 15,28 15,46 15,46

20,60 17,58 17,56 17,45 17,48

O,0072
0,63 0,12 0,85 0,09

8
0 12 0 10

chronicle (which, in its turn, is strongly dependent with the Suprasl'skaya chronicle) suggests that the two texts are independent. We will return to the case of texts of different dize in Sec. 2.4. 2.3. When the sequences of values f(X, t), f(Y, t) are treated as random samples, a ~ natural measure of dependence is provided by the coefficient of correlation
B

(/ (x, t)--F.x)(f (r, t)--s

r=r(X, Y)=

t=A

t

(.1'(X, O--EX)'

t=A

(f (Y, t)--EY)'|

The possible values of r are contained in the interval [-i, i]. Closeness of r to zero suggests that the texts X and Y are independent and its closeness to 1 suggests that they show (positive) dependence. Calculations using this technique produced the following results. Comparison of the texts of Sergeev (X) [12] and Levy (Y) [15], both dealing with ancient Rome, gave a correlation coefficient r(X, Y) = 0.48, i.e., detected noticeable dependence of these texts, which is not surprising, because both are based on the same events. Comparison of the texts of Bemont and Monod (X) [14] and Kohlrausch (Y) [13], both describing medieval Rome, gave r(X, Y) = 0.77, which also points to dependence. Comparison of the texts of Levy (X) [15] and Gregorovius (Y) [16], describing ancient Rome and medieval Rome, respectively, gave r(X, Y) = 0.528. On the other hand, comparison of the text of Sergeev [12] with the same text read in backward order gave r = --0.046, which indicates independence. Application of the proposed procedure to texts of different size (X - Tale of Bygone Years, Y - Suprasl'skaya chronicle) gave r = 0.125. In this case, the proposed technique fails to detect dependence of the texts. 2.4. The results presented in Tables 1 and 2 show that the confidently detects dependence of texts of similar volume (e.g., forovskaya chronicles). At the same time, dependence of "poor" forovskaya chronicle and Tale of Bygone Years) is not detected, most always above the "poor" text. This is attributable to the (4-5 cases) in the "poor" text that are located at a distance of corresponding splashes in the "rich" text. Accuracy of one year problems of this kind. Therefore, we should try to construct a two of the techniques described above: i) the method based on proximity of maxima (see [1-4]), 2) the procedure that treats text volumes as r.v.s. Under this approach, texts (and determine the (distorted) texts. This Skorokhod distance in the we first match the close splashes in the ~olume of required shifts) and then apply the test to compare technique is widely used in functional analysis space D[0, i] [ii]. The resulting test allows for the compared the resulting - see, e.g., the both components. proposed procedure fairly Suprasl'skaya and Nikiand "rich" texts (e.g., Nikialthough the "rich" lies alsmall number of splashes about one year from the is of course excessive for dependence test combining

2310


Table 2 summarizes the results obtained by applying this procedure to pairs of texts of different size. The table gives not only the value of the test statistic, but also the total number of years by which the chapters were shifted (only the "poor" texts were shifted) each chapter was shifted by not more than one year. For example, in the Kholmogorskaya chronicle, only 8 of the 150 chapters had to be shifted by one year. The results show that these time shifts sharply reduce the value of (M - ~)/(m - ~) for dependent texts. For independent texts, such a reduction requires a substantially greater number of shifts. Our results are encouraging for the possibilities of detection of dependence between texts. So far, however, the question of combining the computed distances and total shifts into a single test remains open. Here, as in the other techniques, the solution of the problem will follow once more extensive computational material has been accumulated. 3. The techniques described in been tried on a limited volume of should await more detailed checks it is indeed possible to develop with allowance for their volume. CONCLUSIONS

this paper are essentially experimental. So far, they have empirical material, and final verdict of applicability and calibration. Yet even preliminary results suggest that tests for classifying texts into dependent and independent

We are grateful to No Ya. Rives for his considerable interest in this research and for his willing assistance with computer work. His expert help has enabled us to test a number of hypotheses and to advance new ones. LITERATURE CITED i. A. T. Fomenko, "Some statistical regularities in the distribution of information density in texts with a scale," in: Semiotics and Informatics [in Russian], No. 15, VINITI, Moscow (1980), pp. 99-124. A. T. Fomenko, "Information functions and associated statistical regularities, ~' in: Abstracts of Papers at 3rd International ViiVnyus Conf. on Probability Theory and Mathem. Statistics [in Russian], Volo 2, Inst. Mat. i Kibernet. AN LitSSR, Vilnius (1981), pp. 211-212. A. To Fomenko, New Empirical-Statistical Procedures for Dating of Ancient Events and Application to the Global Chronology of the Ancient and Medieval World [in Russian], Preprint, Gos. Kom. Telev. Radioveshch. order 3672 (9 Sept. 1981), No. B7201, Moscow

2.

3.

(1981).
4. V. V. Fedorov and A. T. Fomenko, "Statistical estimation of chronological proximity of historical texts," in: Stability Problems of Stochastic Models, Proc. of a Seminar [in Russian], VNIISI, Moscow (1983), pp. i01-i07. L. E. Morozova, '"Quantitative methods in the analysis of so-called Filaret manuscripts a record of 'Troubled Times,'" in: Mathematical Methods and Computers in Historical Research [in Russian], Nauka, Moscow (1985), pp. 182-203. S. T. Rachev, "The Monge-Kantorovich problem of displacement of masses and its application in stochastic theory," Teor. Veroyatn. Primen., 2_99, No. 4, 625-653 (1984). F. Hausdorff, Set Theory [Russian translation], ONTO, Moscow (1937). S. T. Rachev, "On minimal metrics in the space of real random variables, ~' Dokl. AN SSSR, 257, No. 5, 1057-1070 (1981). S. T. Rachev, "Minimal metrics in the real valued random variable space," Lect. Notes Math., 982, 172-180 (1983). V. M. Zolotarev, "Metric distances in spaces of random variables and their distributions," Mat. Sb., 101(143), No. 3(11), 416-454 (1976). P. Billingsley, Convergence of Probability Measures [Russian translation], Nauka, Moscow

5.

6. 7. 8. 9. i0. ii. 12. 13. 14. 15. 16.

(1977).
V. S. Sergeev, Essays in the History of Ancient Rome [in Russian], Moscow State Univ.

(1938).
Kohlrausch, German History [Russian translation], Volso I, 2, Moscow (1860). C. Bemont and G. Monod, History of Europe in the Middle Ages [Russian translation], Petrograd (1915). T. Levy, History of Rome [Russian translation], Moscow (1897-1899). F. Gregorovius, The History of the City of Rome in the Middle Ages [in Russian], St. Petersburg (1902-1912).

2311