Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.intsys.msu.ru/en/invest/speech/articles/multic.htm
Дата изменения: Unknown
Дата индексирования: Sun Apr 10 00:40:55 2016
Кодировка:
Intelligent Systems :: Research :: :: Articles

An experience of application of additional sources of information for solving a problem of identification and recognition of speech answer in noisy conditions

D.N.Babin, I.L.Mazurenko, A.V.Urantsev
Faculty of Mechanics and Mathematics,
Department of Mathematical Theory of Intelligent Systems (MaTIS).

Traditional methods, connected with use of one microphone as an input of speech signal into a computer, do not give us a high reliability of recognition. If signal-to-noise relation (SNR) is 0-6 dB, the probability of an error even in identification of speech activity using only one microphone in typical cases is not less than 10% (as experiments and literature data show).

We considered a problem of identification of a speech answer in noisy conditions. To solve this problem, such additional sources of information as additional microphones, a photo-sensor, wind-sensors, laringophone were tested.

If SNR is 0-6 dB the fact of speech answer can be reliably separated from such typical situations that can force an error during recognition as high impulse noise and even external speech using an additional closely- positioned microphone and such parameters as an energy of difference between the first and the second microphones and the difference of energies in both microphones. The far-positioned (third) microphone can be used to estimate the average level of an external noise. Using a more complex array of microphones it is possible to model a directional microphone with a variable direction and as a result to increase the SNR. We also used headsets with a microphone, on which the infra-red light-diode and photo-diode were positioned in order to measure the coefficient of a reflection of light from lips of a speaker. A signal from the photo-sensor (corresponding to the level of opening of a mouth) and a derivative of this signal (corresponding to the speed of lips movements) were used as the key characteristics for an identification process. Also it appeared to be possible to distinguish the lips movements typical to a speech production process from occasional lips movements. A photo-sensor appeared to be an important source of information not actually depending of the level of external acoustic and light noise.

Low-frequency microphone along with the temperature wind-sensor were used as the kinds of wind-sensors. This type of sensors identifies a fact of breath. Moreover, the ordinal breath can be separated from a breath produced during a speech process. The typical noise for this sensor is the infra-sound noise connected with the external streams of an air. The wind-sensor also allows us to identify the break syllables in a speech phrase, that can increase the reliability of speech recognition.

The signal from the laringophone does not practically depend on the external acoustical noise and represents the speech signal, to which the low-level filter with upper frequency 1-2 kHz is applied. The result is that only vowels are excluded from the speech signal.

Every such source of information independently can be used to solve the problem of an identification of speech answer with its own reliability.

However, the reliability of an identification is increased extremely when we use several sensors of different kinds together. It is connected with that the typical noise for sensors of different types has a different nature and a probability of their simultaneous appearance is very low.

In the experiments of authors the probability of an error in identification was reduced to 10-3, that is the probability of speech answer identification in noisy conditions was increased in 10-100 times.

The work was fulfilled on a chair of Mathematical theory of intellectual systems of Mechanics and Mathematics faculty of the Moscow State University.