Method of making lists in programs by registration of voice messages by special device with following character recognition

FIELD: information technology.

SUBSTANCE: method of making lists in programs by registration of voice messages by special device with following character recognition is characterized by the fact that list items are received as short voice messages through a special device containing a microphone, microcontroller, memory, wireless module, battery, one or more activation buttons, the normal state of which is off; when one of the activation buttons is pressed, a special device is activated and the voice message digitized by the microcontroller is recorded into the memory, the wireless module connects to the communication device and sends a digitized voice message over the Internet to the server of the system where it is written to a database with reference to a specific instance a special device; the server of the system through the Internet with the help of an external service performs recognition of the voice message in the text, which is recorded in the database of the server of the system; then the text is sent via the Internet to the user device in the program list attached to the instance of the special device, the program user is able to see the list items received using a special device.

EFFECT: increased efficiency of using applications to compile lists, reduced effect of device-inertia, minimized time to enter a list item, minimized number of input actions and simplification of such actions to elementary ones.

1 dwg

 



 

Same patents:

FIELD: physics, acoustics.

SUBSTANCE: invention relates to speech analysis systems and can be used in speech recognition and synthesis means. The method is based on generating a likelihood ratio composite function logarithm using an input speech signal, finding the absolute maximum of the likelihood ratio composite function logarithm and calculating the argument of the absolute maximum of the likelihood ratio composite function logarithm.

EFFECT: high accuracy of estimating base frequency of a speech signal.

1 dwg

FIELD: physics.

SUBSTANCE: method comprises measuring signal and noise octave levels at a selected control point; determining the radius of the optimum area of arranging vibroacoustic signal sensors; calculating the optimum number of vibroacoustic signal sensors capable of intercepting speech over technical information leakage channels (TILC); calculating the maximum formant speech intelligibility from the estimated TILC; based on the values of maximum formant speech intelligibility obtained via separate TILC, using a developed relationship which takes into account the mutual "weight" of the TILC, determining coordinates of the optimum point of arranging IAS in a facility; determining formant speech intelligibility for the controlled TILC for optimum arrangement and orientation of IAS in a facility; calculating maximum formant speech intelligibility from the set of estimated TILC, which is converted to an output factor - an integral value of verbal speech intelligibility intercepted from the facility; comparing the output factor with a standard value, from which the conformity of evaluation results with voice information security requirements is determined.

EFFECT: high reliability of evaluating voice information security.

FIELD: information technology.

SUBSTANCE: verbal segments are extracted. Acoustic MFCC features of a vector are calculated. Each verbal segment is projected to the space EV of proper voices with a degree of 10 so that a set of Y vectors is obtained. Clustering centres C1 and C2 of the Y vectors are determined. Discriminative clustering is performed by calculation of parameters of planes H1, H2 and approximate determination of concentration areas of the Y vectors that are homogeneous as to speaker's information. Obtained data on the verbal segments are used for initialisation of VB diarisation based on a variation and Bayesian analysis. Marks of the segments as to the speakers during the whole pronouncing are obtained, on the basis of which correction of clustering centres C1 and C2 is performed; with that, operations of discriminative clustering, variation and Bayesian analysis and correction of clustering centres are performed subsequently at several iteration EV-VB stages. At each stage of iterations there performed is an analysis of complete segmentation as to the speakers, and at the absence of variations in segmentation on iteration it is stopped; after that, final segmentation representing the table correspondence between the verbal segments of an input signal and the speaker's index is obtained by Viterbi resegmentation.

EFFECT: improving accurate detection of a speaker for a dialogue in a telephone channel.

4 dwg, 1 tbl

FIELD: information technologies.

SUBSTANCE: system and method are used for speech recognition, which receive a speech signal at the inlet of a reception unit; process the speech signal with an information processing unit, including its processing with an analogue-to-digital converter with preset digitisation frequency and separation into segments, spectral analysis of speech signal segments and normalisation of spectrum at high frequencies; identify pauses, noise and speech signals in the normalised spectrum. Then on the basis of the initial speech signal and the normalised spectrum availability/absence of acoustic criteria of the speech signal is identified in each segment, combinatory sets of which are compared with preset parameters of phoneme groups in the memory unit, and based on comparison results a sequence of symbols is generated to indicate groups of phonemes corresponding to combinatory sets of acoustic criteria of each segment, conversion of which into a cohesive text is carried out with serial decoding of a combinatory combination of symbols of sequence phoneme groups on the basis of a dictionary marked by symbols of phoneme groups.

EFFECT: reduced duration and high accuracy of speech recognition.

19 cl, 5 dwg, 3 tbl

FIELD: information technologies.

SUBSTANCE: system and method to develop a Language Model of symbolic circuits are designed for use in an application of voice recognition, besides, the method includes generation of an n-gram Language Model, comprising the specified large set of symbols, besides, the n-gram Language Model comprises at least one symbol from the specified large set of symbols, building a new lexeme of the Language Model (LM) for each at least one symbol, extraction of pronunciations for each at least one symbol corresponding to the specified dictionary of pronunciations for obtaining representation of symbol pronunciation, development of at least one alternative pronunciation for each at least one symbol corresponding to representation of symbol pronunciation to create an alternative dictionary of pronunciations and compilation of the n-gram Language Model for use in the application of voice recognition, where the compilation of the specified Language Model corresponds to the new lexeme of the Language Model and to the alternative dictionary of pronunciations.

EFFECT: invention provides for higher probability of voice recognition.

14 cl, 4 dwg

FIELD: information technology.

SUBSTANCE: speech signal from the output of an electroacoustic transducer is summed up with a new frequency- and amplitude-stable signal. The obtained sum of signals is amplified, amplitude-limited and converted by multiplying with a copy of the primary speech signal into a new signal which is compared with a set threshold, and presence of a pause in the speech signal is indicated by the amplitude of the obtained signal being greater than the set threshold value.

EFFECT: low volume of computational operations during digital processing of speech signals.

2 cl, 3 dwg

FIELD: information technology.

SUBSTANCE: parameters of the input speech signal of the speaker in form of a pass phrase is compared with given accuracy e with stored standard parameters of input speech signals in form of the same pass phrase uttered by speakers known in advance, followed by authentication. Said parameters are the low-frequency part of the wavelet for conversion from the normalised distribution function of special points along the audio file corresponding to the input speech signal of the speaker in form of a pass phrase, selected by comparing the reading in that point in the audio file with preceding and next readings through generalised coefficients of linear prediction and a threshold T. Normalisation of the distribution function amounts to reducing it to standard length Len, obtained when calculating standard parameters of input speech signals in form of a pass phrase uttered by known speakers.

EFFECT: high reliability of speaker recognition when using a pass phrase with a limited length.

1 dwg

FIELD: information technology.

SUBSTANCE: input speech signal of a speaker undergoes segment-by-segment comparison with stored standard parameters of standard phrases uttered by speakers known in advance, for which parametric descriptions of successive segments of the input speech signal are compared with parametric descriptions of successive segments from those selected for comparison with said standard with subsequent authentication of the speaker. The parametric descriptions used is a transition matrix, for which is constructed a sequence of special points selected by comparing the reading in the segment with the surrounding of the reading determined through generalised coefficients of linear prediction and a threshold T. Further, the sequences of special points are merged into blocks with length L. A transition matrix similar to the transition matrix in a Markovian chain is constructed based on the number of special points in the block and the obtained matrix is compared with the model of the standard matrix with given accuracy ε and a decision is made on correct authentication of the speaker.

EFFECT: high reliability of speaker recognition when using a pass phrase with a limited length.

1 dwg

FIELD: information technology.

SUBSTANCE: identification of a speaker from arbitrary speech phonograms is carried out by evaluating similarity between a first phonogram of the speaker and a second reference phonogram. For the said evaluation, reference fragments of speech signals are selected on the first and second phonograms, on which there are formant tracks. At least three formants compare reference fragments in which values of at least two formant frequencies coincide. The similarity of the compared reference fragments is evaluated on the coincidence of values of the rest of the formant frequencies, and similarity of the phonograms on the whole is determined from the overall evaluation of similarity of all compared reference fragments.

EFFECT: reliable identification of a speaker for long and short phonograms, phonograms recorded in different channels with high level of interference and distortions, as well as phonograms with arbitrary speech of speakers in different psychophysiological states, speaking in different languages.

6 cl, 8 dwg

FIELD: physics.

SUBSTANCE: system has a module for input, identification and conversion of speech signal, a module for analysis and accumulation of frequency-amplitude characteristics (AFC) of the speech signal, a module for identification of deviations of spectra of the current speech signal, an electronic database of standard templates, a viewing module, a discrimination module with formation of series-connected module for identification of deviations of spectra of the current speech signal, discrimination module and viewing module, as well as a module for psycho-emotional correction, connected in series to the viewing module. The module for analysis and accumulation of AFC of the speech signal can determine temporary fluctuations of high/low frequency spectra of the speech signal. The module for identification of deviations of spectra of the current speech signal can determine deviation of said temporary fluctuations of high/low frequency spectra of the speech signal from standard templates. The discrimination module can generate and transmit a control signal for time interruption to the module for input, identification and conversion. The module for psycho-emotional correction can play a relaxing musical and/or speech track or reverse transmission of a fragment of the voice communication.

EFFECT: reduced activity and elimination of undesirable speech signals.

4 cl, 1 dwg

FIELD: information technology.

SUBSTANCE: classifier voice interface of a user terminal may receive a query, parse the query to identify an attribute and process the query to select a first domain-specific voice interface of a plurality of domain-specific voice interface based on the attribute, wherein each of the domain-specific voice interface contains information for processing queries of different types. The classifier voice interface may further instruct the first domain-specific voice interface to process the query and output in voice form a response of the first domain-specific voice interface to said query.

EFFECT: providing faster access to information and solving the task, efficient processing of user preferences and context.

27 cl, 8 dwg

Text input method // 2377664

FIELD: physics; computer engineering.

SUBSTANCE: invention relates to a method of entering text into a device. The first character is entered into the device by pressing and holding a key indicating the first character of the text input. Vocalisation of the text input is then heard. After that the probable candidate word for the first vocalisation word is then identified based on the first entered character and analysis of the vocalisation. Finally, the probable candidate word is displayed for the user.

EFFECT: more accurate speech recognition during vocalised input of text into a device without increasing computational power of the device.

39 cl, 6 dwg

FIELD: physics, communication.

SUBSTANCE: present invention relates to access and reproduction in a computer system, specifically to sequential multimodal input for mobile or cell phones. The essence of the method of the interaction of architecture client/server, which has a 2.5G telephone, which has a data channel for transmitting data and a vocal channel for speech transmission, lies in that it includes reception of Web-pages from the Web-server from the appropriate application through the data channel and the reproduction of the Web-page on the 2.5G telephone. The reproduction contains processing the Web-page in reply to the vocal input. Speech, received from the user, corresponds to at least one data field on the Web-page. The call is established from the 2.5G telephone to the telephone communication server via the vocal channel. Telephone communication server is remote from the 2.5G telephone and has the capability of speech processing. The Web-page with permission for using speech received from the Web-server, the corresponding Web-page, issued to the 2.5G telephone. Speech is transmitted from the 2.5G telephone to the telephone communication server. Speech is processed in accordance with the Web-page with the permitted speech for obtaining text data corresponding to the speech. Text data is transferred to the Web-server. A new Web-page is received on the 2.5G telephone via the data channel and is reproduced as one having text data.

EFFECT: provision for vocal input for fields, related to Web-pages, for effective vocal interaction, for the 2,5G telephone which has limited capabilities.

12 cl, 10 dwg

FIELD: physics, communication.

SUBSTANCE: present invention relates to access and reproduction in a computer system, specifically to sequential multimodal input for mobile or cell phones. A method for the interaction of architecture client/server with the help of mobile phone of the second generation (2G telephone), which has a data channel for transmitting data and a vocal channel for speech transmission, the method includes reception of Web-pages from the Web-server from the appropriate application through the data channel and the reproduction of the Web-page on the 2G telephone. Speech is received from the user in accordance with, at least, one data field of the specified Web-page. The call is established with the 2G telephone on the telephone communication server via the vocal channel. Telephone communication server is remote in relation to the 2G telephone and is made with the capability of speech processing. Telephone communication server receives a Web-page supporting speech capability from the Web-server in accordance with the Web-page, provided to the 2G telephone. Speech is transferred from the 2G telephone to the telephone communications server. Speech is processed in accordance with the mentioned Web-page supporting speech capability for obtaining text data, which is transmitted to the Web-server. 2G telephone gets a new Web-page through the data channel and reproduces this new Web-page, which has text data.

EFFECT: effective vocal interaction, for the 2G telephone which has limited capabilities.

20 cl, 10 dwg

FIELD: engineering of informative system with speech interaction system.

SUBSTANCE: interaction system is connected to informative system and to recognition system. Interaction of user with informative system is performed by using user phrases, processed by recognition system. To achieve the result, system contains application sphere module, meant for setting phrase settings in it, which are constructed on basis of classes of objects, classes of attributes and classes of actions, which are common for aforementioned systems of subject application area and which should be matched by certain types of objects, types of actions, types of attributes and their instances, generated by informative system, for construction of grammar of phrases inputted by user.

EFFECT: possible interaction of user with informative system.

8 cl, 1 dwg

The invention relates to speech recognition and, in particular, to the management of software resources of the computer using spoken commands

FIELD: engineering of informative system with speech interaction system.

SUBSTANCE: interaction system is connected to informative system and to recognition system. Interaction of user with informative system is performed by using user phrases, processed by recognition system. To achieve the result, system contains application sphere module, meant for setting phrase settings in it, which are constructed on basis of classes of objects, classes of attributes and classes of actions, which are common for aforementioned systems of subject application area and which should be matched by certain types of objects, types of actions, types of attributes and their instances, generated by informative system, for construction of grammar of phrases inputted by user.

EFFECT: possible interaction of user with informative system.

8 cl, 1 dwg

FIELD: physics, communication.

SUBSTANCE: present invention relates to access and reproduction in a computer system, specifically to sequential multimodal input for mobile or cell phones. A method for the interaction of architecture client/server with the help of mobile phone of the second generation (2G telephone), which has a data channel for transmitting data and a vocal channel for speech transmission, the method includes reception of Web-pages from the Web-server from the appropriate application through the data channel and the reproduction of the Web-page on the 2G telephone. Speech is received from the user in accordance with, at least, one data field of the specified Web-page. The call is established with the 2G telephone on the telephone communication server via the vocal channel. Telephone communication server is remote in relation to the 2G telephone and is made with the capability of speech processing. Telephone communication server receives a Web-page supporting speech capability from the Web-server in accordance with the Web-page, provided to the 2G telephone. Speech is transferred from the 2G telephone to the telephone communications server. Speech is processed in accordance with the mentioned Web-page supporting speech capability for obtaining text data, which is transmitted to the Web-server. 2G telephone gets a new Web-page through the data channel and reproduces this new Web-page, which has text data.

EFFECT: effective vocal interaction, for the 2G telephone which has limited capabilities.

20 cl, 10 dwg

FIELD: physics, communication.

SUBSTANCE: present invention relates to access and reproduction in a computer system, specifically to sequential multimodal input for mobile or cell phones. The essence of the method of the interaction of architecture client/server, which has a 2.5G telephone, which has a data channel for transmitting data and a vocal channel for speech transmission, lies in that it includes reception of Web-pages from the Web-server from the appropriate application through the data channel and the reproduction of the Web-page on the 2.5G telephone. The reproduction contains processing the Web-page in reply to the vocal input. Speech, received from the user, corresponds to at least one data field on the Web-page. The call is established from the 2.5G telephone to the telephone communication server via the vocal channel. Telephone communication server is remote from the 2.5G telephone and has the capability of speech processing. The Web-page with permission for using speech received from the Web-server, the corresponding Web-page, issued to the 2.5G telephone. Speech is transmitted from the 2.5G telephone to the telephone communication server. Speech is processed in accordance with the Web-page with the permitted speech for obtaining text data corresponding to the speech. Text data is transferred to the Web-server. A new Web-page is received on the 2.5G telephone via the data channel and is reproduced as one having text data.

EFFECT: provision for vocal input for fields, related to Web-pages, for effective vocal interaction, for the 2,5G telephone which has limited capabilities.

12 cl, 10 dwg

Text input method // 2377664

FIELD: physics; computer engineering.

SUBSTANCE: invention relates to a method of entering text into a device. The first character is entered into the device by pressing and holding a key indicating the first character of the text input. Vocalisation of the text input is then heard. After that the probable candidate word for the first vocalisation word is then identified based on the first entered character and analysis of the vocalisation. Finally, the probable candidate word is displayed for the user.

EFFECT: more accurate speech recognition during vocalised input of text into a device without increasing computational power of the device.

39 cl, 6 dwg

FIELD: information technology.

SUBSTANCE: classifier voice interface of a user terminal may receive a query, parse the query to identify an attribute and process the query to select a first domain-specific voice interface of a plurality of domain-specific voice interface based on the attribute, wherein each of the domain-specific voice interface contains information for processing queries of different types. The classifier voice interface may further instruct the first domain-specific voice interface to process the query and output in voice form a response of the first domain-specific voice interface to said query.

EFFECT: providing faster access to information and solving the task, efficient processing of user preferences and context.

27 cl, 8 dwg

Up!