Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.philol.msu.ru/~otipl/SpeechGroup/publications/2004/z-5.doc
Дата изменения: Thu Mar 10 15:18:40 2005
Дата индексирования: Sat Dec 22 20:38:20 2007
Кодировка: koi8-r

Types of information for the multimedia dictionary
of Russian discourse markers


I.M. Kobozeva, L.M. Zakharov

(Russia, Moscow State Lomonosov University)

The thorough study of discourse markers (DM), i. e. words and phrases
bearing mainly discourse-pragmatic information (sentence adverbs,
particles, interjections and the like) have started in seventies and
continues at present. In semantics, pragmatics and discourse analysis
detailed descriptions of many of DMs based on different theories have been
proposed. Special dictionaries of DMs were published with word-lists
ranging from dozens to hundreds. Dictionaries of Russian DMs give
information on their phonetic, grammatical, syntactic and pragmatic
properties. However this information is insufficient for such purposes as
learning/teaching Russian as a foreign language and for automated text or
speech processing. Generally there is not enough phonetic (prosodic),
syntactic and paralinguistic information needed for correct speech
production (synthesis) and speech understanding (analysis).

We argue that the most natural way to proceed is to make a computer
multimedia dictionary that would supply information concerning every aspect
of DM that has to be taken into account. We propose a format for such a
dictionary and discuss relevant types of information, concentrating on
those that are poorly presented in existing paper dictionaries.
General and specific information on DM.
DM is identified in the dictionary by its standard graphic form that serves
as an input to the hierarchically organized lexical entry, covering various
uses of a given form. Such uses differ significantly in their phonetic,
grammatical and semantic properties. The diversity of these uses raises the
problem, that A. I. Smirnitsky called «the problem of the word identity»:
are these uses contextual modifications of one and the same lexical unit or
are they homonyms? We believe (together with many other researchers) that
this problem is irrelevant for DMs because the existing criteria for the
distinction of polysemy and homonymy are inapplicable to words of this
kind. On the one hand criteria that determine lexicogrammatical category,
or part of speech of a functional word are far from clear (at least for
Russian). This fact is reflected in lexicographical practice, where there
are numerous cases of controversial category ascriptions[1] of one and the
same DM. On the other hand the use of a purely semantic criterion (the
degree of semantic similarity) is ineffective too, because semantic 'atoms'
into which meanings of DMs are decomposed are rather abstract and in a way
they are all similar for they belong to the same semantic domain including
elements of communicative situation and relations between them. In such
circumstances it seems rational to treat diverse functional uses of the
same word form as variants of one and the same word and describe them in
one complex entry, just as it is done in [Baranov, Plungian, Rachilina
19&&]. The number and the nature of these variants as well as relations
between them are presented at the beginning of an entry in a form оf a
synopsis, as proposed by Ju. D. Apresian for lexical entries of the
integrated linguistic description (see Apresian 1990, 1992, 1995: 485-537).
In the synopsis each use of DM is characterized by its grammatical category
label (or labels, if there is disageement among specialists), simplified
formulation of its meaning expressing the main idea underlying this kind of
DM usage and a short typical example of such a use. Thus the synopsis for
DM вообще is given in (1):
1) ВООБЩЕ
1. 'in general; ignoring individual characteristics': перевод вообще и
художественный перевод в особенности "translation in general and literary
translation in particular")
2. 'marker of a general statement in the presence of a particular
deviation': Я, вообще, занят, но если надо, я приеду. "As a matter of fact
I am busy but I shall come if needed."
3. 'marker introducing generalization after mentioning some particular
case(s)': Он бросил учебу и вообще ведет себя как-то странно. "He gave up
his studies and on the whole exhibits strange behavior."
4. 'in any circumstances': По выходным они вообще не выключают телевизор.
"On weekends they do not turn off TV at all"
5. 'marker of the ultimate degree in some hierarchy' Река от дома недалеко,
а пруд вообще в пяти минутах ходу. "The river is not far from the house and
the pond is even closer: in five minutes walk from there."
6. 'expression of the emotional reaction (positive or negative) to the
observed or discussed situation, that the speaker considers to be the
extreme case of its kind' Вообще! "That beats me!"

In spite of numerous phonetic, grammatical and semantic distinctions
different variants of the same DM form a unity that is reflected in their
common properties and characteristics. In order to capture this unity of a
word general information that remains constant through all its uses is
distinguished from specific information, related only to one or more
variants. That is why information on the same aspect of DM (e. g. phonetic
or semantic) may appear on two levels: general and specific.


Structure of the lexical entry

On both levels - general and specific - various properties of DM are
presented. The information about these properties is divided into several
zones according to the linguistic aspect that it belongs to. At present we
the data on DM are packaged into the following zones:
2) 1. Graphic information
1. Speling
2. Punctuation
2. Phonetic information
1. Transcription
2. Prosodic information
1. Word prosody
2. Phrasal prosody
3. Sound files with their visualization confirming the given
phonetic characteristics
3. 3. Syntactic information:
4. 3.1. Linear position (illustrated by well formed and ungrammatical
sentences)
3.2. Possibility of independent use
5. 3.3. Argument structure with restrictions on arguments
6. 3.4. Regularly co-occurrence with other DMs:
7. 3.4.1. free, e.g. ведь + же, (in entries of both DMs)
8. 3.4.2. idiomatic, e. g. вот еще, едва ли (in entries of the first
elementary DM)
4. Semantic information
9. 4.1. Description (definition) of meaning
10. 4.2. Paradigmatic relations
11. 4.2.1. Synonyms and analogs (e.g. как таковой for вообще 1, вообще-то
for вообще 2 and совсем for вообще 4)
12. 4.2.2. Сorrelative expressions (e. g. в особенности, в частности for
вообще 1)
13. 4.3.3. Antonyms (e. g. именно for вообще 1)
5. Communicative nformation:
5.1. Relation to topic/focus opposition
5.2. Relation to given/new opposition
14. 5.3. Relation to contrast, emphasis etc.
5. Pragmatic information
5.1. Stylistic markers, e. g., colloquial, bookish etc. (neutral by
default)
5.2. Restrictions on illocutionary force of the utterance
5.3. Meaning modifications within specific kinds of illocutions
6. Paralinguistic information
15. 6.1. Accompanying facial expressions
16. 6.2. Accompanying gestures
17. 7. Derivational information (here words derived from the given DM are
presented, e. g. "diminutive" forms for one variant of A - Аюшки! and
Аиньки!, verb поддакивать for confirmative variant of Да).
8. References to selected bibliography
The structure of the entry can be expanded as new types or kinds of
relevant properties emerge during the study of DM linguistic behavior. In
what follows we address the kinds of information concerning the surface
form of the DM, because it this kind of information is under-specified in
present dictionaries (cf. Apresian 1990, ).
Information on the surface form of the DM ("signifiant")

The dictionary should supply data about all graphic and phonetic forms of a
given DM for each meaning listed in its synopsis.
1.Graphic information.
The zone of graphic information consists of spelling and punctuation.
1. Spelling. In case the spelling of a DM is constant it belongs to general
information zone of its entry and in case this spelling is unique it
serves as an input to the entry and not repeated in graphic zone. But
often a given DM has spelling variants specific to one or more of its
functions. In such a case the standard ("input") variant is repeated in
the general graphic zone, if it is possible for all semantic functions
of a given DM and in specific graphic zones of those functions for which
it is appropriate while function specific spelling variants are given in
the graphic zone of the corresponding functions. E. g. DM A has a
standard variant appropriate for all its meanings and two function
specific variants. Information about the standard variant is given as
general information of the entry, and variants such as A-a and A-a-a are
assigned only to such functions as 'reaction of understanding' and
'exclamation at seeing (and being able to seize) what one was after'.
2. Punctuation. This zone is relevant for the DMs that are associated with
particular punctuation in all or some of their uses. In this zone
general or specific punctuation norms are stated and illustrated. In
future we hope to come to a punctuation classification of DMs that will
be given in an introductory part of a dictionary. Then we shall only
have to mark the corresponding class of an item in question. Thus да in
all its uses as a positive communicative response to a number of speech
acts obey general punctuation rules for main sentences, i. e. it must
either be followed by a sentence final punctuation sign (full stop,
exclamation mark) or isolated from the rest of the utterance by a comma
(сf. Да. Да! Дa, я готов). Some other DMs (or their variants) that may
function as independent utterances (e.g. нет, вот, так, а etc) have the
same punctuation pattern. All such DMs can be assigned to one
punctuation class coded as 1 or STATEMENT and marked as such in the
dictionary.
2. Phonetic information
Phonetic specifications include transcription of all the standard
pronunciation variants of the word and prosodic information.
2.1. Phonetic transcription. For many DMs (e. g. ведь [v'et'], даже
[dazhe], вoвсе [vovs'&&]) transcription belongs to general information
section of the entry. But some DMs have pronunciation variants specific to
one or more of their meanings. Thus DM A in its use as an initial particle,
opening a turn (generally reactive) in a dialogue is normally pronounced
[a], and in its use as a response it is normally pronounced [a:]. DM вообще
has a substandard variant [va&&&] used only with meanings 5 and 6 from the
synopsis. So transcription data are distributed accordingly between general
and particular sections.
2.2. Prosodic information. It is well established in theoretical
linguistics that Russian DMs have special relations with prosody. Words
belonging to main lexical categories, such as nouns, verbs, adjectives and
adverbs depending on their role in logical form and information structure
of an utterance may bear phrasal stress or not and if they do bear phrasal
stress they may be pronounced with raising, falling or even tone, as
examples in (3) show:
(3) a. Осел(/) | увидел соловья.
donkey saw nightingale
"A donkey saw a nightingale."
b. Поперек дороги | лежал осел(\).
across road lay donkey
"There was a donkey lying across the road."
c. Пошел осел(-) дальше.
moved donkey further
"The donkey moved оn."
The noun osel "a donkey" in (3a) constitutes a theme and carries new
information and so as a usual thematic NP (or "the beginning" in terms of
[Paducheva &&&]) it bears a syntagmatic stress and has a rising tone. In
(3b) the same noun with the same meaning constitutes a rheme and so it
bears a sentential stress and has a falling tone. In (3c) this noun again
is a semantic theme of the sentence but this time it carries given
information and so is unstressed at the sentential level (being of course
stressed at the word level). Throughout all these intonation variations the
lexical meaning of this and any other noun remains the same. What is
changed is its logical and informational status. Prosodic variations
exemplified in (3a - c) are generated according to general rules based on
logical form and information structure of a sentence and as such need not
be mentioned in the lexicon.
Unlike words belonging to lexical categories, DMs generally have fixed
prosodical characteristics for each of their meanings (functions, uses).
The laws that underlie correlations between meanings and prosodic formal
variants of one and the same DM are still to be discovered. Naturally such
idiosyncratic correlations have to be stated in the lexicon.
2.2.1. Word prosody
Нere such prosodic properties are given that are fixed for DM as a lexeme
or for its lexico-semantic variant. First of all it is the presence of a
word stress and sometimes its quality. Although traditionally stress in
Russian is a part of the word's transcription, we repeat this information
in this zone.
There are DMs that bear stress in one of their meanings and are unstressed
in the others and behave as clitics. E.g. да as an initial particle that
marks an utterance, implicating some fault on the part of the interlocutor,
is unstressed (cf. Куда ты собрался так поздно? - Да [da] я ненадолго
"Where are you going to go now when it's so late? - I'll be back soon" with
the answer implicating 'Why make such a fuss?'), while да as an initial
particle that marks information conveyed by the utterance as the one that
has just been remembered by the speaker is always stressed (cf. Вот и все,
что она сказала. Да [da&&], она еще просила позвонить ей сегодня вечером.)
Variants of some DMs differ not only in presence or absence of a word
stress, but in its tone. Thus variants of nu from a semantic class of
"overcoming difficulties" pronounced with even or rising tone and have
relatively big length (see Baranov, Kobozeva 19&&), while nu as a command
to begin an action has a falling tone. In a multimedia dictionary for every
DM with a given function a canonical tone pattern is to be graphically
represented in special notation and illustrated by oral utterances and
their intonogramms.
2.2.2. Phrasal prosody

DMs as a whole or their semantic variants are generally associated with one
or more intonation patterns. T. M. Nikolajeva [1985] pointed that for such
DMs as particles two accent-prosodic oppositions are relevant: particles
themselves can be accentuated or not and they can require accentuation of
the element with which they are syntactically connected. Ju. D. Apresjan
[&&] showed the necessity to include lexicographically relevant prosodic
information into a dictionary, using mainly DMs as xamples. From all the
existing dictionaries such an information is given only in [Shimchuk,
Scchur 1999]: in case a particle in a given meaning bears obligatory or
optional phrasal accent, this fact is mentioned as a characteristic
property of the particle's syntactics. The corresponding accent mark is
placed only above the particle and the rest of accentual composition is not
represented. In addition syntagmatic prosodic information about DM is given
in case this DM determines prosodic characteristics of its syntagmatic
partners. Consequently in this dictionary accentuated and accentuating
particles are somehow marked as such. Inclusion of data on relations
between DM and phrasal prosody into a dictionary is an indisputable
achievement of its authors, but in many cases such information does not
give a sufficient account of the prosodic aspect of a DM. And the reason is
that semantic and semantic-syntactic variants of a DM can differ from one
another not only in word stress and / or in
necessity/possibility/impossibility of phrasal accent on the DM itself or
on its syntagmatic partner, but also in other prosodic parameters. In
literature were mentioned such parameters as type of phrasal accent -
syntagmatic, main, contrastive, emphatic [Апресян 1990&&]; type of
intonation pattern (ИК), realized either on accentuated DM or on the
intonational center of the phrase syntactically connected with the DM
[Баранов, Кобозева 1988, Апресян 1990&&];
necessity/possibility/impossibility of a pause after DM [Баранов, Кобозева
1988&&; Кодзасов 1993&&]; rising / falling tone, normal / enlarged
amplitude of tonal change; realization of accent in a high / low register,
overall reduction of pronunciation and some others [Кодзасов 1993&&].
In the multi-media dictionary all these relevant kinds of prosodic
properties should be represented in special notation and exemplified by
standard variants of voicing of DMs in the context of typical utterances.
In addition such a dictionary could visualize the corresponding acoustic
properties, what is important for research purposes.
Our experience in describing Russian DMs within the proposed dictionary
format immediately showed that the attention to DM phonetic
characteristics, besides its evident necessity for a full picture of its
linguistic behavior, also helps to achieve deeper understanding of complex
interrelations between semantic and formal surface properties of these
polyfunctional words. Here we confine ourselves to just one characteristic
example - DM вообще that has been an object of fine semantic analysis in
[&&&.] and [&&& ]. In [] and [] we have two attempts to interrelate
semantic and prosodic variation of this DM which gave incompatible results.
It should be noted that prosodic metalanguages of the two papers differ
significantly. Ju. D. Apresian uses an integral parameter of the type of
phrasal accent that has 4 values: syntagmatic, main, contrastive and
emphatic, while S. V. Kodzasov operates in terms of more more elementary
and concrete acoustic parameters. This circumstance makes the comparison
more difficult but still not impossible. Analyzing the two accounts we
found out the disagreement in treatment of examples of the type (4),
subsumed under the meaning 3 of the synopsis given in (1):
(4) Она пообещала заходить к нему и вообще заботиться о нем.
"She promised to visit him and on the whole take care of him"
According to Apresian in cases like (4) вообще can either have syntagmatic
accent (that in a syntagm-initial position should be the rising one) or
have no phrasal accent at all. According to Kodzasov in such cases as (4)
вообще always bears rising phrasal accent of big amplitude as shown in
(4'):
(4') Она пообещала заходить к нему и 'вообще(//) 'заботиться(\) о нем.
Even without instrumental acoustic analysis it is easy to establish that
the pronunciation of (4) without any phrasal accent on вообще sounds
strange and unnatural. After analyzing instrumentally samples of recorded
material containing standard variants of pronunciation of the DM in
question we discovered that the accent in (4') shows only one of the two
possibilities of accentuating вообще in context of the type (4). The second
possibility is the falling contrastive accent on вообще, as shown in (4''):
(4'') Она пообещала заходить к нему и 'вообще(\con) 'заботиться(\) о нем.
It is worth noting that the second prosodic possibility is excluded for
вообще in contexts like (5) that formally are very close to (4) and are
treated in [&&&] as identical to (4) in terms of prosody:
(5) a. Сам он пил дорогое вино, но вообще(//) проявлял страшную
скаредность.
b. *Сам он пил дорогое вино, но вообще(\con) проявлял страшную
скаредность.
This difference in prosodic form of вообще in the two types of context that
has been thus discovered testifies against the unification of the two
corresponding semantic variants of this DM in one lexeme that has one
semantic definition with alternating components, as was proposed in [&]:
(6) P и вообще <но вообще> Q = 'P has place and Q has place and P is a
special case of Q '
On the contrary our prosodic data confirm the purely semantic analysis of
вообще given in [&&&] according to which the use of this DM in contexts
like (4) is treated as an exponent of the semantic scenario "From
particular to general", its use in contexts like (5) is associated with
different scenario - "Rules and exceptions".
So the detailed account of the different types of information on DM made
possible by computer technology is needed not only for its storage in the
convenient form of the electronic dictionary but as an instrument of
further investigation of this complex phenomenon.
References
Апресян Ю. Д. Типы лексикографической информации об означающем лексемы
// Типология и грамматика. М., 1990. (См. также Апресян Ю. Д. Избранные
труды. Т. II. Интегральное описание языка и системная лексикография. М.
1995, с. 178-197.
Баранов А. Н., Кобозева И. М. Модальные частицы в ответах на вопрос //
Прагматика и проблемы интенсиональности. М. 1988, 45-69.
Баранов А.Н., Плунгян В. А., Рахилина Е. В. Путеводитель по
дискурсивным словам русского языка. М. 1993.
Богуславский И. М. Сфера действия лексических единиц. М, 1996.
Дискурсивные слова русского языка: опыт контекстно-семантического
описания / Под ред. К. Киселёвой и Д. Пайара. М. 1998.
Кодзасов 1993 // Баранов, Плунгян, Рахилина 1993.
Кодзасов С. В. Семантико-фонетическое расщепление русских частиц и
просодическая информация в словаре // Словарь. Грамматика. Текст. М. 1996,
97-112.
Словарь структурных слов русского языка / Под научи, ред. В. В.
Морковкина. М. 1997.
Шимчук Э., Щур М. Словарь русских частиц / Berliner slavistishe
Arbeiten. B. 9. Frankfurt am Main, 1999.
Янко Т. Е. Коммуникативные стратегии русской речи. М., 2001.
-----------------------
[1] See e.g. DM вообще. In MAS and [Efremova 2001] all its uses are
classified as adverbs, in [Ozhegov, Shvedova] one of them is characterized
as a parenthetical word and a particle at the same time, and in [Shimchuk,
Schur 1999] two of its uses are considered as particles. One and the same
use of DM да in [Shimchuk, Schur 1999] is marked as a particle, while in
[Efremova 2001] it is marked as interjection.