Документ взят из кэша поисковой машины. Адрес оригинального документа : http://zmmu.msu.ru/personal/pavlinov/doc/zipf.doc
Дата изменения: Sun Dec 12 13:45:24 2004
Дата индексирования: Tue Oct 2 02:44:50 2012
Кодировка: Windows-1251

TO ZIPF OR NOT TO ZIPF, OR WHY ARE THERE SO FEW SCIENTISTS SUPPOSING THERE
ARE NO GENERA IN THE NATURE AT ALL
I.Ja.Pavlinov, Yu.G.Puzachenko, A.Yu.Puzachenko, G.Yu.Lyubarsky. J. General
Biol.. 1995. 56 (1): 152-158.

To talk casually
About an iris flower
Is one of the pleasures
Of the wandering journey

Basho

Under discussion is possible ontological interpretation of hollow rank
distribution (HRD) in taxonomy usually referred to as the Zipf-Mandelbrot
model. Empirical distribution of species among genera differs significantly
from the above model in many instances. A possibility to elaborate non-
formal model of biological evolution which might produce such kind of
distribution is discussed. In order to understand causal mechanisms of
HRD's, both central trends predicted by the Zipf-Mandelbrot model and
deviations from them are to be analyzed simultaneously.

ЦИПФ ИЛИ НЕ ЦИПФ, ИЛИ ПОЧЕМУ ТАК МАЛО СИСТЕМАТИКОВ, ПОЛАГАЮЩИХ, ЧТО В
ПРИРОДЕ ВОВСЕ НЕТ РОДОВ?

И.Я.Павлинов, Ю.Г.Пуза?енко, А.Ю.Пуза?енко, Г.Ю.Любарский. Журн. общей
биол. 1995. 56 (1): 152-158.

Рассмотрены некоторые методологи?еские вопросы интерпретации модели
Ципфа-Мандельброта (МЦМ) в систематике. Вогнутые ранговые распределения
(ВРР) в реальных таксонах нередко достоверно отли?аются от предсказываемых
этой моделью при использовании робастных статистик. Отстаивается то?ка
зрения, ?то ВРР имеют онтологи?ескую интерпретацию, для ?его использованы
две схемы аргументации. 1. Таксономи?еская система (ТС) может быть
интерпретирована как текст, написанный на специфи?еском техни?еском языке,
для которого мета-языком является естественный язык. Структура последнего
адекватна структуре эмпири?еской реальности; следовательно, структура ТС
также адекватна этой реальности, ?то позволяет с?итать ВРР в ТС отражением
ВРР в природном биологи?еском разнообразии. 2. На основании общих положений
неравновесной термодинамики и теории фракталей представлена ка?ественная
модель порождения иерархи?еской организации природного биологи?еского
разнообразия с атрибутами ВРР без явного обращения к таксономи?еской
иерархии. Особую проблему составляет индивидная интерпретация таксонов, в
рамках которой МЦМ, строго говоря, вообще не применима. Систематика не
может быть "независимой" от концептуальных содержательных моделей, в
которых определены свойства классифицируемого разнообразия. В рамках этих
моделей также разрабатываются критерии оптимальности классификаций. Именно
это делает биологи?еские классификации осмысленными текстами, отли?ными от
"обезьяньих текстов", оптимальных с то?ки зрения МЦМ.

The shape of taxonomic system (classification) has been constituting
long time one of the most critical problems in taxonomy. Should the system
be hierarchical, and if it should, then why? If there are regularities in
subtaxa distribution among respective inclusive taxa (species among genera,
genera among families, etc), and if there are any, what might cause them?
Providing that there are several classificatory methodo-
logies in modern systematics, how to evaluate which of them produces the
most optimal shape of taxonomic system and what might be the criteria of
this optimality?
One of the focal points within this very wide problem is famous hollow-
curve (concave) distribution of taxic diversity usually referred to as the
Zipf Law or, more correctly, Zipf-Mandelbrot model. According to it, there
are few large (more diverse) and many small (less diverse) taxa at each
particular hierarchical level. Such kind of distribution is known in
linguistics, economy, biogeography, systematics. As far as the latter is
concerned, this model evidently presumes positive answers to the first two
pairs of the above questions, at least at phenomenological level. Very high
generality of such distribution provokes taxonomists to search for the
mechanisms generating it. Needless to say that publications on the present
matter are quite numerous. In most of them, this distribution is assumed to
be a property of the biological diversity. However, there is a position
according to which it has no ontological meaning, being just a specific
result of classificatory imaginations of taxonomists. Among most vigorous
proponents of the second viewpoint are Sukhanov and Kafanov who published
an intriguing paper in the present issue of this journal.
Principal conclusion of Kafanov and Sukhanov is quite impressive: a
taxonomic system is just a text (unfortunately, they omitted important
paper of Brooks (1981) on linguistic interpretation of classification)
construed for communications only and having no essential (worthy being
analyzed) concern to the real world. And this, they believe, is a sole
explanation of the "Zipf Law". Their proposition evidently implies that
there are no genera (families etc.) in the Nature at all. Such nominalism,
a very straightforward one (which is rather uncommon in recent theoretical
taxonomic literature), provides a good occasion to consider shortly what
is, if any, the "zipfity" of taxonomic systems.
Generally speaking, the Zipf-Mandelbrot model is a particular case of
specific class of rank (not frequency, as it is asserted incorrectly
sometime) distributions (Arapov et al., 1975; Arapov, Shreider, 1977). In
original formulation, it describes words distributions among sentences,
paragraphs, etc. in fairly long texts written in a natural language.
However, it has been shown later that monkey hitting typewriter keys at
random also produces texts with similar properties, which makes the model
in question true for any texts, even for those generated by stochastic
procedures (Schroeder, 1991).
In our discussion of methodological ideas of Kafanov and Sukhanov, we
should like to stress first one important question. As it was mentioned
above, there are several theoretical distribution models. As they predict
quite similar pattern of rank distributions, it is not an easy task to
decide unambiguously which one is concerned in each particular empirical
study case. Due to this, routinal methods usually employed to show
correspondence between two distributions, like F statistics, seem to be
inappropriate for these purposes. More robust approach is based on analysis
of residuals. According to it, an empirical distribution can be considered
fitting a theoretical one if the difference between them (expressed by sum
of residuals) can be interpreted as "white noise", that is, if it does not
contain any nonrandom component.
To show how this high resolution method works, we evaluated rank
distribution of species in genera of recent mammals of the World (original
data are from Sokolov, 1973-1979; computer program MESOSAUR was employed
for calculations). It is approximated by Zipf-Mandelbrot model with the
following numerical parameters:
p(r) = 0.0830 * (r + 2.3169 ) -0.8187,
where r is generic rank determined by species abundance. Determination
coefficient R 2 is 0,981, standard deviations are 0,0021, 0,0947, and
0,0063 for the first, second and third constants of the model,
respectively, standard deviation of all residuals is 0,0003, and maximal
absolute deviation of residuals is 0,0047. These figures, if considered
alone, seem to be very strong arguments for acknowledging "zipfity" of the
distribution analyzed. But other tests (analyses of number of turning
points, signs of differences) indicate that the residuals do not form
"white noise" after last approximation (p <0,0001), which means there is
low-frequency stationary nonrandom component in the empirical distribution.
Besides, significant (p < 0,00001) high order autoregression is revealed on
the same residuals. All this demonstrates clearly that ZipfMandelbrot model
cannot be applied here. Such a conclusion, however, is relevant only to the
formal model proper and does not mean there is no hollow-rank distribution
et all. Thus, we have to search either for another, more realistic model to
incorporate both the central trend and nonrandom deviations, or for causal
explanation of significant residuals.
There is an interesting result obtained by comparing rank distributions
in real taxa with null models generated by computer by means of several
random diversification processes (Dial, Marzluff, 1989). It indicates there
is a specific nonrandom feature of diversity in real taxa, that is
overdominance of species number in the largest taxon within a real clade as
compared with that predicted by null models. If the latters are considered
analogous to "monkey texts", then distributions in real taxa hardly could
be interpreted as expressions of Zipf-Mandelbrot model in its strict
(numerical) sense.
Taking all the above into account, it seems preferable to refer not to
any formal authorized model or "law" but to hollow-rank distribution (HRD)
in its most general sense. Such a non-rigorous position seems to make the
entire problem more operational, as it allows to ask why there is not a
particular Zipf Law, but just a HRD as a qualitative property of taxic
diversity.
Next is the question of causation of HRD in taxonomy: what kind of
processes generates such kind of taxonomic systems? Is there any background
to speculate about our classifications reflecting hierarchical pattern in
the Nature? Or, alternatively, do their shapes originate from properties of
our communication means, that is conventionally?
To be true, we are to acknowledge there are reasons for nominalists to
believe our classifications, with all their properties, reflect only
ourselves and not the world around us. First, according to the hestalt
psychology, human thinking is organized in hierarchical patterns of unequal
size, which makes their totality a kind of HRD. Second, modern taxonomy is
based on the Linnean classificatory paradigm that borrowed its method from
pure logical "Porfirian tree" algorhythm. Its peculiar feature is such that
it cannot produce anything but very asymmetrical hierarchical pattern, that
is HRD again. And if our classifications are obligatory hierarchical and
asymmetrical just because of the methodology employed, how could we know
there is factually any hierarchy in the Nature (Heywood, 1988), with the
pattern described by HRD? So, could not it be true that the methodology
proper is the ultimate cause of the HRD, as nominalists assert it?
We believe ic could not. There are (at least) two argumentation schemes
which would draw us to an answer compatible with our general perception.
One is to start with taxonomic system proper and to demonstrate that its
shape does have certain relation to the real biological diversity. Another
is to develop some deductive rational model of hierarchy in the Nature
without any explicit reference to hierarchy in the taxonomic system in
order to break the closed circle.
The first of the above schemes is based on assuming our
classifications are specific results of our cognitive activity.
Furthermore, being evolutionists, we presume that human cognitive
capacities are inherited from nonhuman predecessors. We also postulate that
they addressed their cognitive activity at the surrounding world, not at
themselves (we suspect it is human privilege to ask self-addressed
questions). It is obvious that the more adequate to that world were the
results of that activity, the more successful were our cognising ancestors.
One of those results appeared to be communication means evolved gradually
into human natural languages. So, the structure of the latters used by
human beings to produce texts certainly corresponds to the structure of the
real world (to a degree allowed by the very cognitive abilities), just
because it has, at list in part, originated as reflection of the latter.
This is most probably true for any technical languages (including those
used in taxonomy) as well, as natural ones are metalanguages for them (see
Gadamer on that matter).
Within this argumentation scheme, considering classifications as texts
written in specific language (Brooks, 1981; Kafanov, Sukhanov, this volume)
seems to be the correct position. This is because any form of
representation of information by a set of symbols is the text. Furthermore,
from the hermeneutical viewpoint the entire world, both empirical and
imaginary, as well as any of its parts is the text. From this standpoint
all we are doing is translating different texts, written in different
languages, from one into another. Of course, there should always be certain
correspondence between such texts, providing that if we operate with
nonrandom translators. Only pure "monkey texts" have neglectably low
probability of any correspondence to regular ones. So, any classification
that appeared as a result of investigation of real objects (that is,
nonrandomly) has a good chance to be the text more or less adequately
describing these objects.
It follows from the immediate above that the shape of taxonomic
systems, HRD in particular, is not accidentally related to fundamental
property of objectively existing biological diversity. This gives us a hope
that classifications having been produced by generations of practical
taxonomists, who heard nothing about Zipf, models, etc, do reflect at least
some fundamental properties of biological matter.
Turning to the second approach to "justifying" the HRD, we should like
to remark that, from hermeneutical standpoint, the correct answer depends
on the correct question. So, to elaborate the model, we should ask the
Nature, first, if and why there are genera at all. The next will be, if and
why are they of unequal size. And at last we ask, if and why there are few
large and many small genera and not vice versa. Needless to say that it
would be desirable to obtain the answers in as general form as possible.
Indeed, various explanations of HRD appealing to "evolutionary success" of
some genera over others, etc. are answers on the last pair of questions, as
if we have already answered positively the preceding ones. But have we?
To do so, let us assume, before all, that the existing biological
diversity is a result of historical development of the biota. Then, we can
assume that the biota, at least by some of its properties, behaves as self-
developing dissipative system (Brooks, Wiley, 1986). So, we may extrapolate
behavioral pattern of the latter, involving two aspects - growth and
differentiation, onto biotic evolution.
As to the growth, it is impossible for such developing system to
enlarge its size infinitely. At certain step, the growing system looses its
integrity and splits into pieces. What is most important is that it is
highly improbable for such splitting to produce pieces of equal size, a
regularity shown both theoretically (Traibus, 1970) and by computer
simulations employing stochastic diversification models (Slowinski, Guyer,
1989; Dial, Matzluff, 1989). Turning back to the biota, we may conclude
that its historical development a) produces its "pieces" (taxa) which b)
are initially of different size by the very nature of this development.
Differentiation of the systems in question obligatory makes them
structured hierarchically (Brooks, Wiley, 1986). As far as biota is
concerned, at least two global inclusive hierarchies are recognized,
ecological and phylogenetic (Eldredge, 1992). The first one corresponds to
the syntaxonomic system, the second one - to the taxonomic system. Thus, we
may conclude that, within the second hierarchy, the parts (taxa) in which
biota is being splitted due to growth appeared to be arranged
hierarchically due to differentiation. In other words, there are taxa with
subtaxa in the Nature: genera with species, families with genera, etc.
More formal model describing origination of taxic hierarchy with the
above properties can be elaborated if phylogenetically developing biota is
treated as fractal (Burlando, 1990). Theory asserts that being a fractal is
fundamental property of any hyperspace formed by open non-crossing sets
(Turbin, Pratzevityi, 1992). In taxonomic terms, the hyperspace corresponds
to the entire and sets correspond to its taxa. Hierarchy of the fractal is
produced by its sequential splitting into subsets following certain rules
(algorhythms). It is evident that initial conditions for elaborating such
rules and, hence, the rules themselves may be different, respective to
particular background models. Among the latters, Prigoginian bifurcations
resulted from non-linear relations between respective sets (Druzhinin et
al., 1989; Schroeder, 1991) deserves special attention, as development of
natural self-developing dissipative systems is also representable in terms
of Prigoginian thermodynamics. Fractals, by definition, are self-similar
(Feder, 1988), which means the characteristics true for inclusive entities
are also true for their respective subentities. So, the model under
consideration implies that nonsymmetrical hierarchical arrangement should
be anticipated at each level of the biotic hierarchy.
Now, we have the process model according to which biotic evolution
produces taxa structured hierarchically, with the subtaxa being initially
of unequal size. This faces us directly at the third pair of questions
constituting the very essence of the subject of our consideration. In order
to get a satisfactory answer, we have to explore more closely interaction
between the evolving taxon and its environment. Indeed, the self-developing
systems are open ones, that is, they utilize some resources in order to
support themselves. There are two principal strategies of resource
utilization, specialization and generalization. Each of them is a result of
co-action of two kinds of mechanisms, extrinsic (properties of the
environment inhabited by taxa) and intrinsic (properties of taxa
themselves). Its seems reasonable to suppose this coaction to be involved
in producing specific HRD pattern.
As to the extrinsic forces, resource space partitioning should be
indicated before all. Highly structured environment promotes
specialization, thus leading to appearance of (in terms adopted here) small
genera. This is because each of them develops an ad hoc adaptation of its
own to utilize specific, and narrow by necessity, resource. This makes it
highly improbable for such genus to split into many species. Contrary to
this, less structured environment is in favor of "generalists" that are
able to utilize several resources. This promotes high rate of speciation
with just slight modifications of the given morphogenetic organization,
thus producing one multi-species genus. Hierarchical pattern of natural
ecosystems provides the both possibilities, but "narrow specialists" make
it possible to pack the community more optimally (O'Neill et al., 1986). It
is such kind of packing that is responsible for nonequal probability of the
two strategies and, hence, numerical predominance of of small genera over
larger ones. This general trend is enforced by specific intrinsic factors,
among which developmental mechanisms of evolving phyla are of special
importance. Inertial forces promote specialization rather than its opposite
at the steady (non-catastrophic) stage of ecosystem development. Due to
this, probability of appearance of great number of the specialized and less
diverse taxa becomes even higher.
To sum up, we may conclude that it is possible to elaborate a non-
formal model of the evolving biota within which such property of taxic
diversity as HRD can be deduced without explicit reference to logical
generic-species relation. Rough it may look, the model works satisfactorily
as at least the first approximation.
Our observations allow to stress one important point concerning
models utilization in taxonomy. As far as it pretends to be the science,
its ultimate aim is "true" classification, that is one best corresponding
to the structure of the diversity being classified. To be more
correspondent means to reflect more properties of that diversity. However,
any part of the world possess an infinitely large number of properties,
while technical languages containing rigid definitions are finite. So are
the texts written in them. Thus, in order to obtain "true" classification,
one has first to resolve the problem of making empirical reality more
operational by "narrowing" it. This is achieved by recognizing those
properties that should be most adequately reflected in particular
classification, others being discarded. For this, a finite non-formal model
is to be construed in which both the object and its properties to be
reflected in the classification are explicitly defined. These basic
statements, along with deduction rules (classificatory algorhythms),
constitute technical taxonomic language. As there are many possible
background models, each defining the objects being classified its own way,
there are as many possible taxonomic languages. According to Zipf-
Mandelbrot model, all of them will be producing classifications displaying
HRDs with similar patterns. But it is the background models that allow us
to believe the resulting classifications are not just "monkey texts"
accidentally related to structure of the Nature. This means that any sound
(within the given natural science branch) classification can not be
independent of a particular natural science theory.
From this viewpoint, it is the "true" classification (in the above
sense) that should be taken for optimal one. Such a position presumes that
optimality criteria can be formulated only in the framework of a background
model adopted. In this connection, appealing to any formal criteria not
interpreted biologically seems unsound. In particular, we mean Kafanov and
Sukhanov's declaration that the "zipfity" could be employed as optimality
criterium in taxonomy, since it corresponds to as if "equilibrial"
situations. Setting aside the "equilibrium", which is non-operational
notion without proper non-circulary definition, we just remind that the
Zipf-Mandelbrot model is equally applied to both regular and random texts.
So, it looks strange enough to use such a criterium, according to which a
classification generated at random may appear no less "optimal" than one
produced by experienced taxonomist.
There is one more point concerning the problem discussed herewith
worthy to be considered briefly. Many statistical models, including the one
discussed here, are true only if the objects they interpret behave as sets.
That is, if their elements could be considred statistically independent
(say, words and spaces in a "monkey text"). However, subtaxa within a taxon
are non-independent from the evolutionary viewpoint, because they are
interconnected by the kinship relations. That is why phylogenetically
defined taxa are now thouhgt about as rather individuals than sets (Wiley,
1981). Existing taxic diversity being a product of phylogenetic process,
similarity pattern, by which real taxa are operationally identified and
arranged into the classification, depends much on kinship relation pattern.
This makes application of many statistical methods, including those based
on the Zipf-Mandelbrot model, in taxonomy highly questionable (Purvis et
al., 1994). Of course, anybody is in his/her own right to reject this
position. But, to us, rejecting holistic nature of taxonomic units does not
differ much from rejecting evolutionary origin of taxic diversity.
It is to be stressed in this connection that within such a conceptual
framework HRD becomes no less (maybe more) enigmatic than in the case of
set theory based taxonomy. To free it from a "mystery", more efforts are
needed. We should like to notice that in the future investigations
attention should be paid not only to "good" cases adequately described by
any formal model, but to "exeptions", as well. We think that both
situations are of interest for evolutionary taxonomists. On one hand, they
actually may reflect various states of taxonomic knowledge. On the other
hand, however, they may depend on different biological properties of
respective taxa, be it different evolutionary states, predominating
ecological strategies, etc.
To uncover these dependencies, we must first reject the position
Kafanov and Sukhanov offer when ascribe both "good" and "bad" HRD's to
state of minds of classificators. We do hope we do not create artifacts but
explore, with more or less success, the Nature. It is this standing poit
that makes taxonomy a science uncovering natural phenomena and not creating
artifacts. That is why a "genus" of formal classificators is both small and
rear (which, by the way, makes the "family" of taxonomists clearly "non-
zipfian").

This publication was supported in part by the Russian Foundation of
Fundamental Researches (project No 320/31).
Literature cited

Arapov M.V., Efimov E.N., Shreider Yu.A. O smysle rangovyx raspredelenyi
[On the meaning of rank distributions] //Nauch.-Tekn. Inform. Obshch.
vopr. Ser.2. 1975. No 1. P. 9-20. In Russ.
Arapov M.V., Shreider Yu.A. Klassifikatsii i rangovyie raspredeleniya
[Classifications and rank distributions]// Nauch.-Techn. Inf. Obshch.
vopr. Ser.2. 1977. No 11-12. P. 15-21. In Russ.
Brooks D.R. Classifications as languages of empirical comparative biology
//Advances in cladistics. N.Y.: New York Botanical Garden, 1981. P. 61-
70.
Brooks D.R, Wiley E.O. Evolution as entropy: toward a unified theory of
biology. Chicago: Univ.Chicago Press,1986. 335 p.
Burlando B. The fractal dimension of taxonomic system//J. Theor. Biol.
1990. V. 146. No 1. P. 99-114.
Dial K.P., Matzluff J.M. Nonrandom diversification within taxonomic
assemblages//Syst. Zool. 1989. V. 38. No 1. P. 26-37.
Druzhinin V.V., Kontorov D.S., Kontorov M.D. Vvedenie v teoriyu konflikta
[Introduction in the theory of conflicts]. M.: Radio i Sviaz. 1989. 288
p. In Russ.
Eldredge N. Where the twain meet: causal intersections between the
genealogical and ecological realms// Ed. N. Eldredge. Systematics,
Ecology and the Biodiversity Crisis. N.Y.: Columbia Univ. Press, 1992.
P. 1-14.
Feder J. Fractals. N.Y.: Plenum Press, 1988. 348 p.
O'Neill R.V., DeAngelis D.L., Waide J.B., Allen T.F.H. A hierarchical
concept of ecosystems. Princeton: Princeton Univ. Press: 1986. 253 p.
Purvist A., Gittleman J.L., Luh H.-K. Truth or consequences: effects of
phylogenetic accuracy on two comparative methods // J. Theor. Biol.
1994. V. 167. No 3. P. 293-300.
Schroeder M. 1991. Fractals, chaos, power laws: minutes from an infinite
paradise. New York: W.H. Freeman and Company, P.1-410.
Slowinski J.B., Guyer C. Testing null models in questions of evolutionary
success//Syst. Zool. 1989. V. 38. No 2. P. 189-191.
Sokolov V.E. Sistematika mlekopitayushchikh [Systematics of Mammals]. M.:
Vysshaia Shkola, 1973. Pt. 1. 430 p.; 1977. Pt. 2. 494 p.; 1979. Pt. 3. 528
p. In Russ.
Traibus M. Termostatika i termodinamika [Thermostatics and thermodynamics].
M.: Energia. 1978. 540 p. In Russ.
Turbin A.F., Pratsevityi H.F. Fraktal'nye mnozhestva, funktsyi,
raspredelenyia [Fractal sets, functions, distributions]. Kiev: Naukova
Dumka. 1992. 205 p. In Russ.
Wiley E.O. Phylogenetics. The theory and practice of phylogenetic
systematics. N.Y.: John Wiley & sons. 1981. 439 p.