Документ взят из кэша поисковой машины. Адрес оригинального документа : http://brain.bio.msu.ru/papers/Darkhovskii_Kaplan_Shishkin_2002_AutomRemoteControl_Complexity_eng.pdf
Дата изменения: Fri Aug 10 18:34:50 2007
Дата индексирования: Mon Oct 1 20:22:06 2012
Кодировка:
Automation and Remote Control, Vol. 63, No. 3, 2002, pp. 468­474. Translated from Avtomatika i Telemekhanika, No. 3, 2002, pp. 134­140. Original Russian Text Copyright c 2002 by Darkhovskii, Kaplan, Shishkin.

CONTROL IN BIOLOGICAL SYSTEMS AND MEDICINE

On an Approach to the Estimation of the Complexity of Curves (Using as an Example an Electroencephalogram of a Human Being)
B. S. Darkhovskii, A. Ya. Kaplan , and S. L. Shishkin


Institute for Systems Analysis, Russian Academy of Sciences, Moscow, Russia Lomonosov State University, Moscow, Russia
Received Septemb er 27, 2001

Abstract--A new approach to the estimation of the "complexity" of a continuous curve on a section is suggested. The basic idea of the approach consists in that the complexity of the curve must be estimated by a relative share of information, which is necessary for its recovery with a specified accuracy from a set of values in the finite number of points by means of a prescribed aggregate of metho ds. The suggested approach is illustrated by an estimate of the complexity of various fragments of an electro encephalogram (EEG) of a human being and can be used, in particular, for the macrostructural and microstructural analysis of the EEG.

1. BASIC IDEA The notion of a "complexity" (or "simplicity") of one ob ject or another relates to the class of fundamental scientific notions. This problem was taken up by many prominent scientists, and there are numerous attempts to employ the "complex" approach for applied problems, including the analysis of an EEG of a human b eing. Let us recall in short the basic idea involved with the notion of the complexity. It seems likely that the quantitative approach to the notion of a "complex physical system" first app eared in statistical physics when the notion of "entropy" arose. In the equilibrium statistical physics, by the entropy is meant a coefficient with the asymptotics of the logarithm of a numb er of configurations displaying certain prop erties when the numb er of the degrees of freedom of a system tends to infinity. The higher the entropy, the more complex the system. In the Shannon theory of information, created in the 1940s, the notion of the entropy of probability distributions is introduced. It is established that the entropy is a certain measure of the "uncertainty degree" inherent in one or another probability distribution, and this measure is unique under natural conditions. It can b e shown that the Shannon entropy is a coefficient with the asymptotics of the logarithm of the numb er of "typical" sequences of indep endent random quantities that have one and the same current distribution of probabilities as the length of the sequence increases (typical sequences are those sequences for which the deviation of the frequencies of events from their probabilities does not exceed a sp ecified small numb er). It can b e said again that the probability distribution is more "complex," the higher is its entropy (from this viewp oint, the most complex distribution is a uniform one b ecause it has the highest p ossible uncertainty). The theory of dynamic systems, develop ed by Kolmogorov and Sinai (see, for example [1]), comes to the notion of the entropy of these systems. In essence, this is an extension of the Shannon entropy to dynamic systems. The entropy of a dynamic system is a coefficient with the asymptotics of the logarithm of the numb er of various typ es of tra jectories of the dynamic system when the time tends to infinity. Here, again, the entropy of a dynamic system can serve as a measure of its complexity: the more complex the system, the greater the variety of its tra jectories.
0005-1179/02/6303-0468$27.00 c 2002 MAIK "Nauka/Interp eriodica"


ON AN APPROACH TO THE ESTIMATION OF THE COMPLEXITY OF CURVES

469

The notion of the entropy of a dynamic system (and such "nonlinear characteristics" related to the entropy as the correlation integral, the fractal dimension, the Lyapunov exp onential curve, see, for example [2]) is used in the so-called nonlinear, or "chaotic," dynamics. One of the problems of this science is the recovery of the description of a dynamic system from its implementations. Here again, the complexity of a system is identified with the complexity of its description, i.e., in the final analysis, with the entropy. In the early 1980s, Kolmogorov [3] suggested a new algorithmic approach to the notion of the complexity of ob jects. The basic idea of this approach consists in that a "complex" ob ject requires a large b ody of information for its description, and a "simple" one a small b ody of information. This idea is formalized in the language of the theory of algorithms, and the complexity, broadly sp eaking, is measured by the length of a program, which leads to the separation of the given ob ject from a certain set. It is this Kolmogorov idea that is the closest one to the idea that we intend to outline here. But b efore we proceed to do this, we will consider how "complex" approaches to the analysis of an EEG are used. As is known, an EEG signal relates to the most complex physical signals. First of all, this is due to a high (and principal) unsteadiness of an EEG [4]. In view of the EEG unsteadiness, the methods based on the adjustment of various models to the fixed EEG are not fit in principle for the description of its new sections. The phenomenon of the unsteadiness led to the need for the preliminary segmentation of an EEG into (quasi)-stationary segments so as to select for each segment its own mathematical model after this segmentation. For purp oses of the segmentation, attempts were taken to employ the notion of the complexity in order to separate in the EEG signal the fragments that are homogeneous in "complexity." The notions of the complexity used in the literature of the EEG analysis rely on the ideas presented ab ove, primarily on the entropy (or close "nonlinear characteristics") of dynamic systems and the Shannon entropy. We will note the main drawbacks of similar approaches. The basic assumption of the nonlinear dynamics (the theory of dynamic systems, the theory of determinate chaos) is the assumption of the fact that the dynamic system under study is stationary, i.e., does not change its prop erties with time. It is this assumption that forms the basis of the modern, rather develop ed, mathematical theory, which has numerous applications in physics and other fields. The disturbance of this condition renders the use of the ab ove-mentioned approaches imp ossible, but it is the unsteadiness that is almost the main sign of an EEG signal. Here, we note that even for the stationary case, the problem of recovery of the description of a dynamic system from its observable tra jectories is rather complex and is rather far from its solution in the practical sense (we recommend the reader to b ecome familiar with an interesting and informative article [2], where, probably, the practical asp ects of this problem were first dealt with at a rather strict mathematical level). Thus, an attempt to use the entropy notions of complexity as a method of describing the dynamic system generating an EEG signal, in our opinion, is inconsistent a priori. On the other hand, the use of the classical Shannon entropy for the characterization of an EEG signal, in our opinion, does not also fully conform to those purp oses which could b e stated in dealing with such a signal. The thing is that the Shannon entropy sp ecifies the complexity of the probability distribution or, as noted ab ove, the abundance of "typical" tra jectories of the sequence of indep endent random quantities. Even if we abstract ourselves from the unsteadiness problem, such a characteristic can only indirectly b e used for measuring the complexity of the description of a signal itself, but it is the description complexity that is, in our opinion, the prop erty by which we could sp ecify various EEG segments.

AUTOMATION AND REMOTE CONTROL

Vol. 63

No. 3

2002


470

DARKHOVSKII et al.

We will pass on directly to the idea of our approach to the notion of the EEG complexity and, more generally, to the notion of the "complexity of a continuous curve." Let us prescrib e in the section [0,T ] a continuous function x(t) (curve). We would wish to put in corresp ondence to this curve a certain numb er that sp ecifies its complexity. We will proceed from the Kolmogorov idea of the fact that a complex ob ject requires for its description a large amount of information, and a simple ob ject a small amount of information. We will select a numb er h > 0 and consider the sub division of the section [0,T ] by p oints {ti } into n = [T/h] equal parts (here [a] is the integral part of the numb er a). Let us assume that we measured our curve only at sub division p oints ti . At what accuracy can we recover the entire curve from this information? Let us prescrib e a certain set F of methods of the approximation of the curve according to the finite set of p oints (ideally, this is the set of all known methods of the recovery of the curve by its values in the finite numb er of p oints). We will consider an approximation error (h) = inf F x(t) - x(t) , where x(t) is an approximating curve plotted by one of the admissible approximation method over n given p oints, · is a certain integral norm in the functional space, while the infimum is taken relative to all the set of the sp ecified approximation methods. It is clear that if x(t) const, then for its recovery it is sufficient to use one p oint; in other words, the constant is the simplest curve, which quite agrees with the intuition. Therefore, it is exp edient to eliminate the constants from the set of the curves of interest, for which purp ose it is necessary to carry out a preliminary centering of x(t). Next, to make it p ossible to compare with one another the approximation errors of various curves, we need to consider all of them at one scale, i.e., to carry out the preliminary normalization. For this purp ose, after the centering, we should consider the normed curve z (t) = x(t)/ x(t) (it is obvious that z (t) = 1). After the centering and the normalization of the curve, the approximation error (the integral one) will not exceed 1, in which case a maximum error app ears if the discreteness interval is h = T (this means that we must use only one constant for the approximation, so that the b est constant will b e the zero one from consideration of symmetry; here, the error will prove to b e equal to the norm of the initial curve, i.e., 1). We will now plot a graph of the function (h), increasing the quantity h (i.e, decreasing the numb er of the current n). It is clear that with a growth of h, the function (h) must steadily increase (in fact, an increase in the discreteness step means that we select a more and more amount of information on the function and, hence, its approximation b ecome worse and worse if the set of the approximation methods is fixed). In view of the presented remarks, the error (h) will b e a monotone p ositive function of its argument, which reaches its maximum equal to 1 at h = T . If now we prescrib e a certain "acceptable" level of the approximation error < 1 (from the viewp oint of a sp ecific user), then it is p ossible to define what admissible b ody of information that can b e discarded without an appreciable loss of the quality of the curve recovery. Now, the complexity (more exactly, , F · -complexity) of the given sp ecific curve x(t) can b e called the quantity 1 - h /T , where h = min{h T : (h) = }. In other words, this is a relative share of information that is necessary for the recovery of the given curve with the sp ecified accuracy by the prescrib ed set of methods. Let us note that the suggested measure of the complexity is an individual characteristic of a particular curve rather than a certain set of curves generated by a certain mechanism, as this takes place with the use of the entropy of dynamic systems. Next, this measure is in no way related to p ossible mechanisms of the generation of the curve (for example, to the fact whether the curve is a piece of the implementation of any random process or a tra jectory of any dynamic system). This circumstance app ears to b e imp ortant, in particular, as applied to an EEG in view of the ab ove--mentioned considerations as to the sp ecific features of this signal. Generally sp eaking,

AUTOMATION AND REMOTE CONTROL

Vol. 63

No. 3

2002


ON AN APPROACH TO THE ESTIMATION OF THE COMPLEXITY OF CURVES

471

the suggested measure of the complexity is "adjusted" to the p erception of the user, i.e., directly accounts for the fact whether it is easy or not to work with discrete data for the curve. Note 1. It could b e p ossible to attempt to sp ecify the complexity of a curve by its Shannon entropy; namely, it is p ossible to break up the axis of values of the curve into a finite numb er of intervals, count the rates of the fall of values of the curve into one or another interval and then estimate the entropy of the obtained frequency distribution by the Shannon formula. But it is easy to plot the curves (for example, p eriodic ones) in which the entropy (at the fixed numb er of the intervals of the sub division of the axis of values) with practically not change with a decrease in the numb er of discrete counts (if there are still rather many counts of this typ e), so that the dep endence of such a measure on the share of the remaining information will b e very weak. At the same time, the approximation error must b e sufficiently sensitive to a decrease in the accessible information, esp ecially for complex curves. Note 2. The suggested complexity measure obviously dep ends on the adopted functional norm, on the class F , and on the level . It may so happ en that with a change of these parameters, the complexity relation for one and the same pair of curves will change to the opp osite one. However, this, in our opinion, is not a deficiency b ecause the functional norm and the level are criteria of the approximation quality, which a particular user chose for himself, while the class F describ es the algorithmic capabilities of this user. Moreover, to analyze such complex curves as EEGs, what is imp ortant is not a particular numerical value of the complexity, but the dynamics of its changes in passing from one p ortion of the curve to another. In principle, it is this dynamics that can offer the p ossibility of detecting imp ortant changes in an EEG, which can signify some functional rearrangements. From this viewp oint, we should admit such a complexity index of the most acceptable typ e for the analysis of an EEG as that which detects its changes in passing from one segment of the EEG to another; in this case the numerical expression of the index is unessential for a sp ecific segment, and of imp ortance is only the degree of its sensitivity to a change in the character of the curve. Note 3. The p ossibility of calculating the complexity of the curve on a segment of an arbitrary length enables the researcher to select that time scale for which he would wish to p erform the analysis of the entire observed implementation. A p ossibility app ears to investigate the dynamics of the complexity of an EEG at various time scales and, thus, to analyze various structures of this curve, which can sp ecify various functional states of the brain. 2. ALGORITHM In the practical implementation of the idea outlined ab ove, we need to account for the fact that the implementation of a curve is always preset in a discrete time, i.e., the "curve" represents a finite set of counts (a finite-dimensional vector). In addition, from considerations of the simplicity of the implementation, not too large a class F should b e chosen for the first exp eriments. Further, as noted ab ove, all analyzable curves need b e brought to one scale. The discreteness of the initial curve leads to the fact that the dep endence of an approximation error on the numb er of discarded p oints will not b e monotone if there is a small numb er of the remaining p oints. Therefore, it is desirable to have a fairly high frequency of the inquiry so that even with the removal of a large p ercent of data (actually, in our exp eriments, this removal reaches 70­90%), at which a satisfactory approximation is still p ossible, the discreteness of the recording of the curve should have a small effect on the monotonicity of the b ehavior of the error.
AUTOMATION AND REMOTE CONTROL Vol. 63 No. 3 2002


472

DARKHOVSKII et al.

To reduce the curves to one scale, we used the centering and the subsequent normalization (in the metric of 1 ) of the finite-dimensional vector (curve), with the result that we obtained the vector from an appropriate unit sphere. Thus, if the curve was preset by the vector x = x1 ,x2 ,... ,xn , then the first op eration of the algorithm consisted in the transition to the vector y = x - x, where x = n
n -1 n

xi , and the second
i=1

op eration consisted in the transition to the vector z = y/ y , y =
i=1

|yi |.
1

We measured the error of the approximation of the vector z by the vector z in the same i.e., z - z =
n i=1

-metric,

|zi - zi |.

The discreteness of the recording of the initial continuous curve leads to a small modification of the formula for the complexity. First, with a sufficiently large numb er n < n of the p oints left after the rejection, the ratio h/T can b e replaced by the quantity 1/n . Second, it makes sense to reason that if n = n (i.e., one cannot discard any p oint from the initial recording in view of a substantial loss of the recovery quality), then the complexity of the curve is equal to 1. In view of - /n these remarks, we used for the estimation of the complexity s of the curve, the formula s = 11-11/n , where, we recall, n is the numb er of p oints left after the discarding, at which the curve still recovers with the acceptable accuracy, and n is the numb er of p oints in the initial recording of the curve. 3. EXPERIMENTAL RESULTS The class F in the first version of our algorithm contained merely one method--the approximation of the curve by a step function. In other words, with the removal of some p oints from the initial set, we replaced values of the curve at the removed p oint by its value at the preceding p oint. In view of the discreteness, in the search for the b est approximation, it is necessary to carry out the exhaustive search relative to the value at the initial p oint despite the fact that we use only one approximation method. This exhaustive search is carried out within that "window of thinning" which is used at the given instant (i.e., at the prescrib ed share of discarded p oints). Later on, it is assumed to include into the class F the piecewise--p olynomial functions and the approximation by a section of the Fourier series with resp ect to various orthonormal systems of functions. The testing was p erformed on an EEG recorded on a human b eing during his sleep. What is typical for this EEG is a wide range of patterns corresp onding to various states of neurons of the cortex of large cerebral hemispheres [5]. The EEG was recorded in the right occipital lead at a sampling frequency of 100 Hz. In the course of the survey, 25 nonartifactual fragments of a length from 4 to 16 s were selected (the median--9 s), each of which visually app eared comparatively homogeneous, but differed in pattern from the remaining ones. The complexity index was estimated in sequential "windows"-- sections of a time length of T = 2 s. A value of the approximation error was set at a level of 0.32. The variance of the complexity estimates in "windows" of the analysis within the homogeneous fragments of the EEG proved to b e markedly lower than the variance of the estimates averaged over each of these fragments: the median of the "intrainterval" standard deviation was equal to 0.0019, whereas the standard deviation (in all fragments) of values averaged over the fragment was equal to 0.0083. The complexity estimates rather strongly differed among 6 fragments, which had the most distinguished traits of a mild or a paradoxical sleep (the mean is 0.993, the standard deviation is 0.0004), and four fragments with the most clearly expressed, p owerful delta-activity

AUTOMATION AND REMOTE CONTROL

Vol. 63

No. 3

2002


ON AN APPROACH TO THE ESTIMATION OF THE COMPLEXITY OF CURVES

473

0.9950 (a) 0.9925

0.9950 (b) 0.9925

0.9925 (c) 0.9900

0.9925 (d) 0.9900

0.9900 (e) 0.9800

0.9900 (f) 0.9800

0.9800 (g) 0.9700 0.9500

(h)

0.9700 0.9500

0

2

4

6

0

2

4

6

Fragments of an EEG (a)­(h) in order of decreasing complexity estimate and the dynamics of this estimate. The EEG is on the left, the dynamic of the complexity estimate is on the right. The b oundaries of windows in which the complexity was estimated, are marked by p oints for convenience of the comparison of the EEG and the dynamics of the complexity. The time scale is at the b ottom (in seconds). AUTOMATION AND REMOTE CONTROL Vol. 63 No. 3 2002


474

DARKHOVSKII et al.

typical for the third and the fourth stage of the sleep (the mean is 0.973, the standard deviation is 0.009). The relation b etween the EEG pattern and complexity estimates, which showed up in these results, found an additional verification when we arranged the EEG fragments in order of decreasing mean estimate of the complexity (see Figure where in an effort to save room, we showed only each third fragment and only the first three windows each of a length of 2 s). The mean estimate of the complexity (over all windows) for the fragments presented in the Figure, denoted by the letters (a)­(h), was equal to 0.9934, 0.9926, 0.9917, 0.9911, 0.9875, 0.9830, 0.9794, and 0.9635, resp ectively. It was found that the deep ening of the sleep occurs approximately in this order, too, if we judge by the EEG signs (see on the left in Figure). The dynamics of single complexity estimates (see in the right part of the figure), as a rule, also followed the changes in the EEG. The suggested complexity measure describ ed here for the first time, even in the simplest version of the algorithm implementation, proved to b e quite a working tool sensitive to the EEG pattern. As one might exp ect, the complexity estimates fell in parallel with the shift of the EEG to the side of the typically sleepy pattern. Let us note once again that the new complexity measure represents an individual characteristic of a particular curve and is not related in any way to p ossible mechanisms of its outcome, which is esp ecially imp ortant in the case of an EEG, the conventional model of the generation of which does not exist at present. The p ossibility to calculate the complexity of a curve on the segment of an arbitrary length enables the researcher to study the dynamics of this index of an EEG at various time scales and, hence, to analyze the time hierarchy of various functional states of the brain. We express our gratitude to Professor J. Roschke (the University of the city of Mainz, Germany) for the submitted recordings of the carolic EEG. We are grateful to Yu.S. Popkov for attention to the work. REFERENCES
1. Sinai, Ya.G., Vvedenie v ergodicheskuyu teoriyu (Intro duction to Ergo dic Theory), Moscow: Fazis, 1996. 2. Demidov, E.E., Darevskaya, Yu.V., Morenkov, O.A., et al., Nelineinyi korrelyatsionnyi analiz. Obozrenie prikladnoi i promyshlennoi matematiki (Nonlinear Correlation Analysis. The Review of Applied and Industrial Mathematics), Moscow: TVP, 1999, vol. 6, issue 1, pp. 4­57. 3. Kolmogorov, A.N., Combinatorial Bases of the Theory of Information and Probability Calculus, Usp. Mat. Nauk, 1983, vol. 38, issue 4, pp. 27­36. 4. Kaplan, A.Ya., Unsteadiness of an EEG: Metho dological and Experimental Analysis, Usp. Fiziiol. Nauk, 1998, vol. 29, no. 3, pp. 35­55. 5. Steriade, M., Coherent Oscillations and Short-Term Plasticity in Coricothalamic Networks, Trends Neurosci., 1999, vol. 22, no. 8, pp. 337­345.

This paper was recommended for publication by Yu.S. Popkov, a member of the Editorial Board

AUTOMATION AND REMOTE CONTROL

Vol. 63

No. 3

2002