Method for recognition of speech patterns and device for realization of method

FIELD: analysis and recognition of speech signals, can be used for recognition of speech patterns.

SUBSTANCE: device for realization of aforementioned speech phoneme recognition method has: computing system, including clock generator, controller, random-access memory device, central microprocessor unit, meant for forming bispectral signs and recognizing them on basis of speech phonemes, digital-analog converter, long-term memorizing device, video-controller and analog-digital converter, and also keyboard, display, headphones and a microphone.

EFFECT: increased precision of speech patterns recognition due to forming of phoneme signs for speech phonemes recognition based on application of bispectral analysis, based on transformation of digital code series, appropriate for speech signals, to bispectral zone, characterizing interaction between values of Fourier components at different frequencies within speech spectrum, and thus, to provide selection of an additional, significantly new information from speech signals, to increase precision of phoneme recognition.

2 cl, 5 dwg

 



 

Same patents:

FIELD: automatic voice recognition technologies.

SUBSTANCE: acoustic signal, observed at a point of person body, unknown to unauthorized personnel, is inputted to computing device, values of parameters of acoustic signal are determined, values of estimates of statistical characteristics of parameters of acoustic signal are determined and standards are formed on basis of these, grade of difference between acoustic signal and standards is determined, on basis of aforementioned grade decision is taken whether acoustic signal belongs to person, whose values of statistical characteristics were used during forming of standards.

EFFECT: higher resistance to interference, higher efficiency, higher trustworthiness.

6 dwg

The invention relates to information processing and can be used in telecommunication systems

The invention relates to information processing and can be used in telecommunication systems

FIELD: analysis and recognition of speech signals, can be used for recognition of speech patterns.

SUBSTANCE: device for realization of aforementioned speech phoneme recognition method has: computing system, including clock generator, controller, random-access memory device, central microprocessor unit, meant for forming bispectral signs and recognizing them on basis of speech phonemes, digital-analog converter, long-term memorizing device, video-controller and analog-digital converter, and also keyboard, display, headphones and a microphone.

EFFECT: increased precision of speech patterns recognition due to forming of phoneme signs for speech phonemes recognition based on application of bispectral analysis, based on transformation of digital code series, appropriate for speech signals, to bispectral zone, characterizing interaction between values of Fourier components at different frequencies within speech spectrum, and thus, to provide selection of an additional, significantly new information from speech signals, to increase precision of phoneme recognition.

2 cl, 5 dwg

FIELD: analysis and recognition of speech signals.

SUBSTANCE: in accordance to method, during recognition training of system generated are standard bispectral signs of phonemes - position of bispectral module maximums of sound signals and amplitude of bispectral module maximums of sound signal, and also standard signs of words, represented by sets of averaged time spans from the beginning of word to end and ending of all phonemes and pauses in word, and during recognition speech signal, appropriate for word interval, is divided on segments, formed wherein are bispectral signs - position of bispectral module maximums of sound signals and amplitude of bispectral module maximums of sound signal, compared to first and second solution taking criterions. Formed from solutions about recognized phonemes taken in process of comparison on all segments are two series of solutions about recognized phonemes, selected from which are most frequently encountered solutions (letter codes of phonemes), forming a set of letter codes of phonemes of word being recognized. During comparison of a set of letter codes of phonemes of recognized word to sets of letter codes of phonemes of all words of dictionary with consideration of all standard signs of words formed is array of values of recognition coefficients, equal to amount of coinciding letter codes of phonemes and codes of pauses and decision about recognition of word is taken in favor of the word of dictionary, during comparison to which maximal recognition coefficient was produced.

EFFECT: increased precision of recognition of spoken words.

8 dwg

FIELD: physics.

SUBSTANCE: method involves capture of sound represented by a numbered frame in a set of numbered frames. Additionally, the method involves frame class calculation, where a class is any of vocalised or non-vocalised class. If a frame belongs to vocalised class, a pitch (903) is calculated for the frame. If a frame has even number and belongs to vocalised class, key word of the first length is calculated by absolute quantisation of the frame pitch (910). If a frame has uneven number and belongs to vocalised class, and a reliable frame is present, then a key word of the second length is calculated by differential quantisation of the frame pitch (905). If a reliable frame is absent, a key word of the second length is calculated by absolute quantisation of the frame pitch.

EFFECT: compact presentation of information on the class and pitch for maintaining low transfer bit rate without fidelity loss and stability against link errors.

24 cl, 12 dwg

FIELD: physics.

SUBSTANCE: speech analyser has a speech signal input unit, a frequency conversion unit, an autocorrelation unit and a base frequency detecting unit. The frequency conversion unit converts a speech signal received in the speech signal input unit to a frequency spectrum. The autocorrelation unit calculates autocorrelation oscillation during shift of the frequency spectrum on a frequency axis. The base frequency detecting unit calculates frequency based on the local interval between crests or depressions of the autocorrelation oscillation.

EFFECT: more accurate and reliable detection of voice frequency and more accurate evaluation of emotions.

9 cl, 5 dwg

FIELD: physics.

SUBSTANCE: autocorrelation values are defined as a basis for estimating the period of the fundamental tone in an audio signal segment. The first analysed delay range for autocorrelation calculations is divided into a first set of sections, and first autocorrelation values are determined for delays in multiple sections of that first set of sections. The second analysed delay range for autocorrelation calculations is divided into a second set of sections such that sections of the first set and sections of the second set overlap. Second autocorrelation values are determined for delays in multiple sections of that second set of sections.

EFFECT: efficient estimation of the fundamental tone of an audio signal.

31 cl, 6 dwg

FIELD: automatic voice recognition technologies.

SUBSTANCE: acoustic signal, observed at a point of person body, unknown to unauthorized personnel, is inputted to computing device, values of parameters of acoustic signal are determined, values of estimates of statistical characteristics of parameters of acoustic signal are determined and standards are formed on basis of these, grade of difference between acoustic signal and standards is determined, on basis of aforementioned grade decision is taken whether acoustic signal belongs to person, whose values of statistical characteristics were used during forming of standards.

EFFECT: higher resistance to interference, higher efficiency, higher trustworthiness.

6 dwg

FIELD: analysis and recognition of speech signals, can be used for recognition of speech patterns.

SUBSTANCE: device for realization of aforementioned speech phoneme recognition method has: computing system, including clock generator, controller, random-access memory device, central microprocessor unit, meant for forming bispectral signs and recognizing them on basis of speech phonemes, digital-analog converter, long-term memorizing device, video-controller and analog-digital converter, and also keyboard, display, headphones and a microphone.

EFFECT: increased precision of speech patterns recognition due to forming of phoneme signs for speech phonemes recognition based on application of bispectral analysis, based on transformation of digital code series, appropriate for speech signals, to bispectral zone, characterizing interaction between values of Fourier components at different frequencies within speech spectrum, and thus, to provide selection of an additional, significantly new information from speech signals, to increase precision of phoneme recognition.

2 cl, 5 dwg

FIELD: analysis and recognition of speech signals.

SUBSTANCE: in accordance to method, during recognition training of system generated are standard bispectral signs of phonemes - position of bispectral module maximums of sound signals and amplitude of bispectral module maximums of sound signal, and also standard signs of words, represented by sets of averaged time spans from the beginning of word to end and ending of all phonemes and pauses in word, and during recognition speech signal, appropriate for word interval, is divided on segments, formed wherein are bispectral signs - position of bispectral module maximums of sound signals and amplitude of bispectral module maximums of sound signal, compared to first and second solution taking criterions. Formed from solutions about recognized phonemes taken in process of comparison on all segments are two series of solutions about recognized phonemes, selected from which are most frequently encountered solutions (letter codes of phonemes), forming a set of letter codes of phonemes of word being recognized. During comparison of a set of letter codes of phonemes of recognized word to sets of letter codes of phonemes of all words of dictionary with consideration of all standard signs of words formed is array of values of recognition coefficients, equal to amount of coinciding letter codes of phonemes and codes of pauses and decision about recognition of word is taken in favor of the word of dictionary, during comparison to which maximal recognition coefficient was produced.

EFFECT: increased precision of recognition of spoken words.

8 dwg

FIELD: information technologies.

SUBSTANCE: process of electronic interpretation of a code text form (KTF) in the input language in KTF in the output language is organised in two stages. At the first stage the input language KTF is converted into the intermediate language KTF. At the second stage the intermediate language KTF is converted into the output language KTF. At the same time interpretation of a text into a text is organised as machine searching of separate word combinations, phrases or groups of phrases from the data base of interpretations, which were previously made by professional interpreters and stored in a mobile network, for instance, Internet. The data base of the mobile network translations is continuously supplemented since text forms requested for translation but unavailable in the data base of translations are displayed in the open access and are proposed for translation, for instance, on a commercial basis to professional interpreters. Besides, before using the communication terminal, the communication terminal owner speech is once verified at it compared to its written form, which is programmatically structured by means of this communication terminal.

EFFECT: increased accuracy of converting an audio signal in voice form entering the transmitting terminal into a coded text form in the input language, provided a memory with relatively small memory volume is available for use in this terminal.

6 cl, 4 dwg

FIELD: information technologies.

SUBSTANCE: method includes reception of a sound response by means of a processor from a call destination and processing of a sound response by means of a voice recogniser, having a language model to convert a sound response into a conclusion, which specifies the recognised speech in text form; and processing of a conclusion specifying the recognised speech, in text form, with a statistic classifier adjusted by verbal phrases, usually used by real people and automatic systems, together with establishment of non-verbal features associated with a sound response to provide a conclusion that specifies whether a call destination is a real person or a telephone answering machine. The classifier is separate from a language model. Processing is based on a statistical analysis of the conclusion that specifies the recognised speech in text form together with non-verbal features. Statistic analysis verifies the conclusion content, which specifies the recognised speech, and based on this inspection it determines whether the conclusion specifying the recognised speech is more statistically consistent with verbal phases that are usually used by real people or automatic systems.

EFFECT: improved accuracy of telephone answering machine detection.

18 cl, 6 dwg

FIELD: physics.

SUBSTANCE: acoustic space model is trained on the basis of the training speech attribute data using deep neural networks to determine the interdependence factors between the speech attributes in the training data. The deep neural network creates a single continuous acoustic spatial model based on the interdependence factors. Acoustic spatial model, thus, takes into account many interdependent speech attributes and gives the ability to simulate a continuous spectrum of the interdependent speech attributes. Further, there is a text receipt; receiving selection of one or more speech attributes, wherein each speech attribute has a weight of the selected attribute. The text is converted to the synthesized speech using the acoustic space model, and the synthesized speech has a selected speech attribute. The synthesized speech is output as audio having the selected speech attribute.

EFFECT: increasing the human voice naturalness in the synthesized speech.

14 cl, 4 dwg

FIELD: radio engineering, communication.

SUBSTANCE: method is carried out by translating spoken language from one language into another language which is realized using a device made in the form of two modules - a unit for processing signals from microphones and an electronic device containing a computer with appropriate software that can be connected to each other via wire or wireless links. The microphone processing unit is designed to perform part of the operations for processing signals from the microphone outputs and controlling the operation of the signal emitters, and it is performed with the possibility of connecting to it at least two microphones and signal emitters and performing it in the form of two channels for processing the signals of the microphones, Switching, interconnected so that it is possible to automatically alternately switch electrical signals generated by microphones to one common microphone Output of the microphone signal processing unit.

EFFECT: increasing the accuracy and speed of translation of spoken language from one language to another.

18 cl, 2 dwg

Up!