About automatic correction of wrong pronunciation of foreign words
Dmitry Babin; Ivan Mazurenko, Pavel Aliseichick Faculty of Mechanics and Mathematics,
Department of Mathematical Theory of Intelligent Systems (MaTIS).
Formulation of a problem of automatic correction of wrong
pronunciation of foreign words is introduced. Work of a system constructing
the exercises with the help of a teacher is described. Algorithms of fragmentation
of sound data into phonemes and of analysis of correctness
of pronunciation of a student are suggested. Necessity of voice tuning
is explained. Some characteristics of a training system are adduced.
Lately there appeared a large number of systems of automatic
training of the foreign languages with an opportunity of demonstration
of video and audio information and with input of student's speech.
The weakest point in such training systems is absence of appreciation
of correctness of pronunciation, as well as localization of errors
of pronunciation. The complexity of the problem is explained by large
variety of equally correct pronunciations of the different announcers,
different conditions of recording of speech and existence of a plenty
of types of errors of pronunciation: from incorrect stress and intonation
up to wrong pronunciation of separate sounds, that is usually caused
by absence of the majority of sounds of foreign language in native
language of a student. Thus, the solution of this problem requires work
with speech at a level, not dependent on the announcer and allowing
an inexact pronunciation of sound phrases.
The authors offer the functioning variant of a system, making it
possible to a student to appreciate objectively a degree of correctness
of his pronunciation, to classify errors and to listen to a difference of
pronunciations of incorrect sounds interactively.
The system works as follows.
To ensure the independence of functioning of algorithm from a voice of a teacher and a student the system in a natural mode of speech
dialogue in a language native for the user determines the objective
parameters of his speech. These parameters are later used to transform the characteristics of a sound signal to a special voice-invariant form.
The preparation of an exercise by the teacher assumes special
processing of sound images of words. The teacher devices a sound signal
into fragments corresponding to phonemes of language. After that the
system calculates the special parameters of these fragments (loudness,
intonation, rate and the correctness of pronunciation). Then the teacher
defines allowable deviations of all parameters. For example, the
definition of allowable deviations of all parameters of all fragments,
except for one, to infinity will result in the supervising by the system
of only one phoneme in a phrase.
The dialogue of a system with a student occurs in an automatic
mode. After listening to a sound phrase, recorded by the teacher,
the student repeats it. The system calculates special voice-invariant
characteristics of a recorded phrase said by the student and automatically
divides it into sound fragments, corresponding to phonemes of foreign
language, using a well-known method of dynamic programming. After
this the student has an opportunity to estimate the correctness
of pronunciation of each fragment acoustically and visually. Inadmissible
deviation of any student's curve, describing the loudness, intonation,
rate or the correctness of pronunciation of a separate sound fragment,
from the corresponding curve of the teacher means that an error occurred in pronunciation of this fragment.
The tests of this system were conducted for training more than
10 Russian "students" in teaching them English on words and phrases,
containing the following English sounds: th (like in this), th (like in
think), ng (like in ring), r (like in rest), h (like in hello), t (like in
ten), d (like in desk), w (like in wall), having no close analogues
in Russian. The probability of an error during the division of
student's words into fragments averaged 0.05, the probability of
determination of a false error on the correct pronunciation averaged 0.04
and the probability of not detection of an obvious error in pronunciation
averaged 0.15.
The work was executed on the chair of the Mathematical theory of
intelligent systems of faculty of Mechanics and Mathematics of the Moscow
State University named after M.Lomonosov.
|