Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://www.philol.msu.ru/~otipl/SpeechGroup/publications/2001/specom01.doc
Äàòà èçìåíåíèÿ: Thu Mar 10 15:18:32 2005
Äàòà èíäåêñèðîâàíèÿ: Sat Dec 22 20:50:59 2007
Êîäèðîâêà: Windows-1251

Ïîèñêîâûå ñëîâà: ï ï ï ï ï ï ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ð ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï

Automatic Transcriber of Russian Texts: Problems, Structure and Application

Olga F. Krivnova, Leonid M. Zakharov, Grigory S. Strokin

Philological Faculty, Lomonosov Moscow State University, Vorobyovi Gori, 1-
st Building of the Humanities, Moscow, Russia, 119899

Abstract. In the paper an Automatic transcriber of Russian texts is
described which converts input written texts into a sequence of phonetic
symbols organized as phrases (or syntagmas) with special prosodic markers
(rhythmical, accentuational and intonational). The transcriber is a part
of the text processing module of the TTS- system for Russian developing by
the Speech Group of the Philological Faculty (Lomonosov Moscow
University), but it can be used also as an independent multifunctional
program.

INTRODUCTION

Ìany tasks connected to speech processing require the automatic
conversion of written texts or words in spelling to their phonetic
representations. It is well known that in natural languages the spelling of
words corresponds to their pronunciation in a rather intricate way, thus
the problem of automatic text phonetization become rather difficult. For
many Western European languages there exist pronunciation dictionaries
(machine-readable including), in which the transcriptions of the isolated
base forms or the roots (stems) of words can be found. These dictionaries,
together with other necessary resources, are usually used in speech
technologies (TTS in particular) for automatic construction of the phonetic
transcriptions. The situation for Russian language in this respect is
different. In the ordinary Russian dictionaries, as a rule printed, a
lexical entry is represented by the base word form given in spelling. At
the best the word's spelling contains an additional symbol marking a place
of the lexical stress as it is done in the most representative
dictionaries. Information on pronunciation of many Russian words can be
found in the special, so-called orthoepic dictionaries. In such
dictionaries the choice of lexical entries and the use of phonetic labels
have a number of peculiarities. The following categories of words have the
priority to be included: 1) words, which pronunciation is not derived
unambiguously from their spelling by standard phonetic rules (1; 2); 2)
words that have stress shifts in their grammatical forms (1); 3) words with
some grammatical forms made by non-standard ways (1). Traditionally
orthoepic Russian dictionaries give the information about words
pronunciation according to the following plan: in their introductory or
concluding pages the brief description of the sound system of Russian is
given and the standard (regular) phonetic rules governing the pronunciation
(reading) of all words are described. The general part includes also the
information about pronunciation features of words and morphemes that cannot
be explained by strictly phonological and morphophonological factors. These
features show themselves in pronunciation variability of definite parts of
words under the same contextual conditions (both phonological and
morphological) and are considered by experts as orthoepic pronunciation
norms. Pronunciation variance is reflected also in the main body of
orthoepic dictionaries by way of special phonetic marks given with all
words which can be realized differently by native speakers of Russian. For
example, the entry for the word ñêó?íûé (tedious) looks like this one:
ñêý?íûé [øí], that means that the letters ?í in this word and its
derivatives should be pronounced as [øí]. The loan word àááàò (abbot) is
characterized as àáááò [á è áá], that means that the letter sequence áá can
be equally pronounced as a single consonant or as a geminate. The word
èäåàëèçì (idealism) is described as èäåàëè(çì ! èäåàëè[ç']ì, notation after
exclamation mark corresponds to the incorrect word pronunciation. As a rule
pronunciation variants allowed by orthoepic norms are not reflected in
orthography and in many cases can be treated as exceptions from the
standard reading rules.
It is generally assumed that special phonetic marks together with the
standard phonetic rules described in dictionaries provide all the necessary
information to predict the pronunciation of not only the base words, but
also their grammatical forms and derivatives. It should be specially noted
that Russian orthoepic dictionaries are focused first of all on a careful
pronunciation of isolated words and addresses to human readers. The most
representative ortoepic printed dictionary of modern Russian (1) includes
about 65 thousands words.
The organization of phonetic information in Russian dictionaries briefly
discussed above certainly reflects the fact that in Russian the spelling-to-
phonetic correspondence is not so complicated as, for example, in English
or French languages. It implicitly specifies also that rule-based automatic
phonetization is the most adequate approach for Russian with which the most
part of phonological knowledge of native speakers can be expressed by a set
of rather simple letter-to-phoneme (sound) rules. At the same time it is
clear that the place of stress in Russian words is a lexical feature for
base forms and to assign it to grammatical forms one should take into
account some additional information, namely the so-called accentual
inflectional type for each word. So any system implementing automatic
phonetization of Russian written texts must include the lexicon which
elements are supplied with non-phonetic information necessary for word
stress assignment. Fortunately, there exists (both in printed and
electronic versions) the grammatical Russian dictionary created by A.A.
Zaliznjak (3) which contains about 100 000 base words (in spelling) with
all the features providing the analysis/synthesis of their grammatical
forms with the proper stress placement. The most of morphological
processors developed for Russian uses this dictionary by and large and we
including (for some details see below). But no dictionary can solve the
"stress" problem for unknown or novel Russian words. For their processing
some other methods should be found. The problem of homographs in most cases
resolved by different stress placement cannot be decided even with the help
of the most robust morphological analyzer (or lexicon) and demands an
output beyond the bounds of a separate word.
Set aside the very essential problem of word stress one can think that
automatic phonetization is a straightforward task for Russian texts and can
be easily realized with the help of knowledge contained in dictionaries and
other fundamental descriptions of Russian phonetics. It isn't really so for
several reasons. Let's name the most important ones. Pronunciation
regularities in phonetic treatises are described in the form of verbal
statements and therefore require formalization and integration into a
unified transcription system suited for machine implementation. With the
absence of machine-readable pronunciation dictionaries and large databases
for sentences and/or texts with appropriate transcriptions, made or
verified by phoneticians, such system can be developed only manually (at
least to begin with). Many phonetic regularities (standard rules including)
require the knowledge of a word morphemic structure and its grammatical
features, so reliable morphological analyzer or electronic list of
grammatical forms specially marked with internal morph boundaries is
needed. Orthoepic (nonstandard) pronunciations can be handled in different
ways but in many cases one need to establish the exhaustive list of words
or morphemes, within the framework of which they work. Taking into account
that the resources for this task exist only in a rather complicated printed
form, it can be executed only by labor-consuming hand-operated work.
Besides the orthoepic norms are mobile and statistical in nature so the
completeness of these lists and their adequacy to up-to-date Russian speech
is always disputable and often some special research is required. The
analysis of the phonetic literature shows that researchers interpret
realization of some sounds in identical phonological positions differently.
Therefore additional phonetic experiments are also necessary for
specification of some standard rules, both within word and in particular at
word boundaries within larger prosodic constituents of a sentence. Many
developers of TTS systems consider the revelation of a sentence prosodic
structure and its automatic transcription as an independent task, distinct
from automatic phonetization, which deals only with the "letter-to- sound"
correspondence. We think that it is just a matter of the accepted
terminology. It is clear that phonetic representation of any sentence
includes two components: prosody (suprasegmental) and sound (segmental) and
also that the last in many cases depends on the first. So if the function
of an automatic transcriber is to convert written texts to their phonetic
representations both components should be generated. This task as follows
from said above requires solving many problems, often lying outside the
field of pure phonetics.
Further we describe how these problems are settled in the automatic
transcriber developed by us for Russian language. It should be noted that
it reflects the norms of formal literary pronunciation, is an alive
development and constantly improved. The authors' publications on this
theme are submitted at URL address http: //
isabase.philol.msu.ru/SpeechGroup.

THE FUNCTIONS AND STRUCTURE OF AUTOMATIC RUSSIAN TRANSCRIBER

The first automatic transcriber for Russian language known to us was
developed at the end of 60-th years. It was rather simplified but it worked
and was used to count the frequency of occurrences of different sound
sequences in the phonetized written texts (4). Since then much has changed.
First of all the power of computers and spheres of its applications have
enormously increased. Now it is impossible to imagine the development of
speech technologies without use of some kind of automatic transcribers. The
certain changes have taken place also in Russian language, the
pronunciation norms including.
The transcriber, to description of which our paper is devoted, was
developed as a functional component of TTS system of Russian speech, but we
use also as the independent multifunctional device. Its basic function is
to convert any printed and normalized Russian text to its phonetic
transcription, which could serve as a reliable input for prosodic
parameterization and speech generation modules. The overall structure of
our TTS system is described elsewhere (5) and we'll not discuss it here.
The transcription module takes as its input any text as a sequence of the
ordinary orthographical words with lexical stress marks divided by white
spaces and permitted punctuation marks. Such text form can be conditionally
named "normalized". In general text normalization calls for the processing
of periods not at the sentence end, digital objects, abbreviations and so
on. We analyzed some of these problems, but have no machine implementation
yet. From linguistic point of view the more important tasks concerning the
normalization stage are word stress assignment and replacement of the
letter "å" on "å" in those cases where it is needed for correct word
pronunciation. The last problem arises because in many Russian texts the
letter "å" is not used at all. Its recovery is based on lexical and
morphological knowledge. In our TTS system both specified tasks are solved
by the independent module, which cooperates with transcriber but doesn't
belong to it. At our disposal we have two possible methods for that: on-
line morphological processor based on (3) and complete list of Russian
grammatical words with stresses and restored letter "å" that is generated
off-line from the list of the base word forms given in (3). In the current
version of the system we use the last method, as, unfortunately, our
morphological processor makes errors rather frequently and its improvement
(or development a new one) demands significant efforts. Our grammatical
list contains about 2 millions forms, as it was already said, with stress
marks and "å" restored. Besides that for compound words and their forms
with the secondary stress stem boundaries are marked and homographs are
provided with the information of their frequency of occurrence. For this
class of words this information is used for the partial decision of a
problem of their correct pronunciation. For unknown words two strategies
can be used - pronunciation without stress or creation an additional
dictionary by a user. Both of them are, certainly, temporary. Now we are
working on grammatical analysis of our list forms and hope that this
information could help us to solve many (regretfully, not all) problems in
text phonetization process.
So, the transcription derivation is carried out from the normalized
written text. The transcription module itself consists of two basic
submodules: accent-intonational (suprasegmental) and segmental which
realizes the conversion "letter-phoneme-phone". The accent-intonational
transcriber (AITR) generates the marks specifying the most probable
intonational phrasing of a sentence, selects the pause type after each
intonational phrase (out of three possibilities), assign to it some
intonational model (out of six possibilities) and determines its intonation
center. By default the intonation center is on the last content word of a
phrase and concurs with its main (sentential) stress. Focalization on the
other words can be realized acoustically but focus center should be marked
manually in the phrase transcribed. For the global intonation parameters
(voice range, speaking rate and loudness) the possibility of their manual
adjustment is also realized but automatically the most neutral variants are
used by default. By rules of AITR prosodic grouping of words within a
phrase is also accomplished. It is made with the help of special feature
"degree of prosodic break" with three values: 0- after or before full
clitics, 1-after or before functional words, which are not full clitics, 2-
between two content words. One of these values is assigned to each white
space in a phrase. The space is also used technically as a place where the
orthographic form of the word preceding the space is kept up to the end of
work of the whole transcription module. Into the function of AITR enters
also the generation of a rhythmical pattern of an intonational phrase which
is realized as the distribution of degrees of prominence assigned by rule
to each vowel. In rhythmization and prosodic words grouping operations the
special lists of functional words (prepositions, particles, conjunctions,
pronouns etc.) are used. The results of AITR work can be traced as a
conditional letter-prosodic representation even before the segmental
transcriber begins to work. The user according to his tasks can select the
degree of detail in this prosodic transcription. On Fig.1a an example of
the sentence with the most detailed prosodic transcription is given which
turns out as the output of AITR.
Segmental transcriber (PHTR) works with the output of the AITR, within
the limits of a separate intonational phrase in direction from left to
right. The rules of this module are organized into several functional
submodules, which work in turn and the output of each can be supervised
separately. The "letter-to-phîneme" submodule includes the rules of Russian
graphic and such operations as elimination of spelling fictions
(traditional spellings which conflict with the modern state of Russian
language and with the main phonemic principle of its orthography). As a
result the abstract (deep) phonemic transcription is received, which is
approximately in line with the principles of Moscow phonological school.
Then the rules of "phoneme-to-phoneme" type are applied which formalize the
so called automatic phonemic alternations. The result is the surface
phonemic transcription approximated to the principles of Petersburg
phonological school. The example of the phrase transcription at this stage
of phonetization process is given on Fig.1b for the same sentence as on
Fig.1a. From that point the phonetic realization rules of "phoneme-to-
phone" type work (vowel reduction first of all). The number of such rules
and their complexity can be various and depends on a task, in which the
final phonetic transcription is supposed to be used. The inventory of sound
types (allophones) used in our TTS system is small and includes 56 units
(without the distinction of phonetically short and long consonants). It
only slightly differs from the phonemic inventory of Russian and its use
provide rather wide phonetic transcription the example of which is given on
Fig.1c. It is easy to increase the degree of phonetic detail in the final
transcription up to 1200 different acoustic units, which we use in the last
version of the system, but such transcription will be difficult for human
reading. On fig.1d the more traditional phonetic transcription is shown
which reflects phonetic changes of vowels under the influence of the near
by palatalized consonants. Its generation was realized by addition of only
one rule to the PHTR version used in our TTS system.

a. [ÑÂÈ2ÐÅ+3ÏÛ1É#ÒÈ+4^ÃÐ] Sil2
[Î2ÄÍÀ+3ÆÄÛ1#ÏÎ1ÂÑÒÐÅ2×À+3ËÑß1#Ñ-ËÈ2ÑÈ+4^ÖÅ1É] <\> Sil4
b. [ÑÂ'ÈÐ'Ý+ÏÛÉ'#Ò'È+^ÃÐ] Sil2
[ÀÄÍÀ+ÆÄÛ#ÏÀÔÑÒÐ'È×'À+ËÑ'À#Ñ-Ë'ÈÑ'È+^ÖÛÉ'] <\> Sil4
c. [ÑÂ'ÈÐ'Ý+ÏÚÉ'#Ò'È+^ÃÐ] Sil2
[ÀÄÍÀ+ÆÄÚ#ÏÚÔÑÒÐ'È×'À+ËÑ'Ú#ÑË'ÈÑ'È+^ÖÚÉ'] <\> Sil4
d. [Ñ Â' *È* Ð' *Ý+ Ï Ú* É' # Ò' *È+^ Ã Ð] Sil2
[À Ä Í À+ Æ Ä Ú # Ï Ú Ô Ñ Ò Ð' *È* ×' *À+ Ë Ñ' *Ú # Ñ Ë' *È* Ñ' *È+^
Ö Ú* É'] <\> Sil4
e. [sv'ir'"ep@J'#t'"i^gr] Sil2
[adn"aZd@#p@fstr'itS'"als'@#s-l'is'"i^ts@J'] <\> Sil4

FIGURE 1. Different types of transcriptions for the sentence "Ñâèðåïûé òèãð
îäíàæäû ïîâñòðå?àëñÿ ñ ëèñèöåé". Conventional signs: boundary markers [ ]
for intonational phrases, # for phonological words, - for clitics; +
lexical stress, ^ intonation center, in <> intonation model mark, Sil -
pause type.

PHTR takes into account not only standard rules of pronunciation, but
models also orthoepic regularities extending on groups of words and even
separate words. The current version is focused on one of the variants
recommended by modern orthoepic dictionaries. Orthoepic rules function as
rewriting rules with special lexical and/or morphological conditions, which
refer to the needed lists of items. Currently there are 54 such exception
lists in the system.
All kinds of automatic transcriptions constructed by the described
transcriber are based on the Russian alphabet according to tradition of
Russian phonetics. There is also the additional facility to transform them
into IPA or SAMPA style (see for example Fig.1e).
Though the output of the transcription module looks like the sequence of
sound symbols and prosodic markers appropriate to a sentence the
transcribers use the various phonetic information: segmental and prosodic
features, positional and boundary characteristics of phonetic constituents
and so on. It enables to construct phonetic structure of a sentence in the
form of the hierarchical graph of its constituents and also to fix in the
special code of each segment all phonetic factors, which can influence on
its acoustical realization. Such representations are of importance for many
speech applications.

APPLICATION

As it was said above, we developed the transcription module for the use
in the TTS system of Russian speech. It is based on expert rewrite rules
formalized in the standard and convenient form for the linguists, admitting
simple inclusion of a new rule in the module and its verification through
synthesis. For machine implementation rewrite rules are compiled into some
internal data structure and then realized by the system software for
platform Win32. The practice of its use has shown that it can have various
applications. With its help we developed the electronic dictionary of
Russian literary pronunciation for base word forms in (3). Automatic
transcriptions of large electronic texts and sets of sentences generated by
the transcribers were used in developing of acoustic-phonetic databases for
training the automatic speech recognition systems, and also in research and
educational purposes. Each application was very useful for revealing the
transcription errors, but as a whole the experts who participated in work
with our transcriptions have estimated them highly.

ACKNOWLEDGEMENT

The paper was supported by RFFI, grant ? 00-06-80091 and partly by INTAS,
? 99-00795.

REFERENCES

1. Orthoepic Dictionary of Russian Language. Avanesov, R.I. (ed.),
Moscow: Russkiy Yazyk, 1997.
2. Kalentchuk, M.L., Kasatkina, R.F. Dictionary of Difficulties of
Russian Pronunciation. Moscow: Russkiy Yazyk, 1997.
3. Zaliznjak, A.A. Grammatical Dictionary of Russian Language. Moscow:
Russkiy Yazyk, 1977.
4. Zlatoustova, L.V., Kodzasov, S.V., Krivnova, O.F. , Frolova, I.G.
Algorithms for conversion Russian orthographic texts into phonetic
transcriptions. Moscow: MSU, 1970.
5. Krivnova, O.F. "Main Principles and Overall Structure of TTS system
for Russian Language", Proceedings of the Workshop SPECOM'99. Moscow,
Russia, 1999.