Method for recognizing spoken control commands

FIELD: technology for analyzing speech under unfavorable environmental conditions.

SUBSTANCE: during transformation of spoken command first circular buffer is continuously filled with digitized signal, comb of recursive filters is applied to multiply loosened signal and spectral components are utilized to fill second circular buffer, limits of speech fragment are determined within it on basis of adaptive estimate of noise environment, spectral components of speech fragment are transferred to linear analysis buffer, shortened sign space is received from aforementioned buffer and produced spectral components are compared to standard vectors of database commands.

EFFECT: utilization of device under conditions of, for example, moving vehicle or mechanical industry with high noise pollution level provides for stable recognition of commands independently on particularities of narrators pronunciation, decreased memory volume.

7 cl, 2 dwg

 

The invention relates to the analysis of speech in adverse environments, such as in a moving vehicle or in the mechanical production with a high level of noise.

The known method dialog speech recognition, in which sequential recognition using acoustic descriptions of each of the words stored in the database, the comparison based on probabilistic models (U.S. patent No. 4866778, CL 10 G L 5/06, published in 1990), but he does not have a high efficiency, because it is not adapted to the noise environment and to articulate the pronunciation of the speaker.

There is also known a method for recognition of isolated words based on determining and storing for each word in the dictionary phonetic model, composed of a sequence of phonetic symbols corresponding to the phonemes of the word, determining and storing characteristic parameters showing the energy and the spectral composition of the phoneme, a sample of the audio signal corresponding to each isolated word spoken by the speaker, the comparison of the values of characteristic parameters for all models of the dictionary and select through the dynamic programming algorithm of a small number of candidate models (patent CIS countries No. 0420825, CL 10 G L 5/06, published in 1991).

zwesten the way implemented in the speech recognition system, based on samples of the speech signal, the energy analysis and spectral data in the range of the samples, using dictionaries and templates words, the analysis of the degree of similarity between the speech signals and templates, sorting words according to the degree of similarity (patent CIS countries No. 0431890, CL 10 G L 5/06, published in 1990).

However, these methods are not adapted to the noise environment and to articulate the characteristics of the speaker and require a considerable investment of time and equipment during the processing of the recognized words.

Closest to the present invention is the method according to the patent of Russian Federation №2047912, CL 6 G 10 L 5/06, 7/06, publication 11.10.1995, based on converting voice signals into a digital form through the codec, the digital signal processing to obtain the characteristic features, the formation of a database of standards of inputs, calculating the distances between the standards of the teams with the implementation of recognized commands, the decision about the refusal of recognition or the recognition of the command with the detected output commands to the control object.

The disadvantage of this method is the low immunity, and its implementation requires high quality input channel speech signal and high computing power.

The aim of the invention is the provision of Usto the sustainable recognize commands in conditions of high noise environment, independent of pronunciation for speakers, and reduce the amount of memory to implement the method.

This objective is achieved in that in the process of converting voice commands is continuous filling of the first ring buffer the digitized signal, the receive signal spectrum in quasilogarithmic scale by applying the recursive comb filters to repeatedly thinned signal and the filling of the signal spectrum of the second ring buffer, the definition in the second ring buffer in the presence and boundaries of the speech fragment-based adaptive estimation of noise environment, the transfer of the spectral component of the speech segment in the linear buffer analysis, secondary processing therein spectral component of the speech fragment of obtaining a reduced feature space and the comparison of the obtained spectral component vectors standards database commands.

To simplify the procedure of signal processing in the process of obtaining the spectrum of the signal in quasilogarithmic scale is the determination of the position of the next analysis window to the maximum of the envelope in the range of time intervals that are characteristic of periods of the fundamental tone corresponding to frequencies from 80 to 300 Hz, the differentiation signal in the selected analysis window, highlighting the high-frequency component of the JV is Ctra signal using a recursive filter of the second order, transform it into a primary intermediate signal by thinning the original twice, the selection of this intermediate signal of medium frequency component of the spectrum using the same recursive comb filters of the second order and converting it into a secondary intermediate signal by thinning the primary two times, highlighting the low-frequency component of the spectrum of this signal using the same recursive comb filters of the second order smoothing in time-frequency, mid-frequency and low-frequency component of the spectrum of the signal by the low pass filter of the second order and continuous filling of the second ring buffer received vectors of the spectral component.

To adapt to the noise environment determining the existence and boundaries of the speech fragment in the second ring buffer is made by analysis of excess spectral component above the adaptive threshold noise environment and compliance boundaries of time and energy owned the standards database commands, and upon receipt in the analysis process values "not speech" is updated threshold noise environment using filter low frequency of the first order.

To reduce necessary to implement the method of the memory in the process of secondary processing of the spectral component of the speech fragm the NTA in the linear buffer analysis is domestie spectral component in the vector of weighting coefficients, obtained by optimizing the recognition commands on the database, nonlinear sigmoidal processing spectral component for reducing the bit depth description of the signal, smoothing over time of the spectral component of the low pass filter of the second order, pairing neighboring spectral components to obtain a secondary spectral features and specification of the boundaries of the spoken commands.

Sustainability recognition commands regardless of the peculiarities of the pronunciation of the speakers is provided by the formation of dichloromethanicum the database creation process, which is to say the set of control commands is used 20 speakers of men and 20 women speakers and is clustering, for each utterance from the set {X(0),X(1),...X(N)} is determined by the number of proiznesenie, satisfying the condition D(xixj)<Thr, where

0<j<N

Thr is a certain threshold value,

D(xiXjis a measure of similarity proiznesenie, and as the center of the cluster is selected utterance, which has the maximum number of nearest, then the center of the cluster and the cast belonging to this cluster are excluded from the set, and over the remaining repeats the process of clustering and the cluster centers of each team form a database reference descriptions RA is cognizable speech commands.

To simplify the decision-making procedure for the definition of the command by comparing the spectral component of the speech fragment vectors standards database commands are comparing the lengths of the speech fragment and benchmarks database commands and calculate the distance from the fragment to the standards database commands by the method of nonlinear time alignment using a weighted Euclidean metric, in addition, in the calculation of the distance is determined by the maximum path similarity (measure D(xixj)) the finding of such a function temporal alignment

F()=C0C1...Withk...Withto,

designing a temporary storage area of one image to a temporary area to another and is a sequence of States of Ckdetermined by the difference between the vectors of two images, in which the path from a state Withtoin the state of0might be best here

With0- the initial state,

WithK- final state,

one way is described by a sequence of vectors

X={x0x1, ..., xi, ..., xm} - speech fragment, and the other

Y={y0, y1, ..., yj, ..., yN} - the standard commands.

In the process of finding the optimal path from a state WithKin the state of0vechicle is raised a distance matrix between sequences of vectors X and Y using the basic formula of dynamic programming

D(xi, yj-1)+d(xi, yj),

D(xi, yj)=min D(xi-1, yj-1)+d (xi,j),

D(xi-1, yj) +d(xi,yj),

where

0<=I<M,

0<=j<N

D(xi,yj)=W*(xk-yk)2weighted Euclidean metric,

here xkand yk- vectors belonging to the compared samples, while not calculated all the distances matrix, but only those that are located in a corridor along the diagonal of the matrix, denoted in figure 1 by the parameter WID, compute a set of distances between the sequence of vectors X, describing the speech fragment, and the sequence of vectors Yidescribing the standards database commands, the analysis of these distances and finding the next three standards database, and if the shortest distance corresponds to a certain threshold value, the decision on the recognized command, and if the distance exceeds the threshold, the decision about the refusal of recognition and return to the initial stage of the process of recognition (signal is in the first ring buffer).

Figure 2 shows a functional block diagram of a device that implements the proposed method of speech recognition commands. The device operates as follows.

From the microphone to the Codec in continuous will reiseportal signals of any type: speech and ambient noise. The analog signal from the microphone is converted by the Codec in digital and served in the 1st circular buffer. 1st Ring buffer is in standby mode and continuously fills the digitized signal received from the Codec.

The determinant of the position of the analysis window (OPOA) continuously analyzes the contents of the 1st ring buffer and when it maximum of the envelope of the signal defines the beginning and end of the analysis window of the digitized signal. The selection signal corresponding to an analysis window, with APOA goes to the block of spectral processing, consisting of a system of IIR filters, LPF and progresively, as shown in figure 2.

The selection is simultaneously supplied to the 1st filter LPF and the 1st comb of 4 IIR filters of the second order. The first IIR comb filters emit high-frequency spectral components that are supplied in the 2nd circular buffer. Output 1 filter LPF signal in the 1st thinner to decrease the frequency of the quantization of the original signal. LPF before the thinner avoids overlapping of frequencies in the signal after thinning.

Thinned the output signal of the 1st thinner simultaneously served on the 2nd filter LPF and the 2nd comb of 4 IIR filters of the second order, in which there is a selection of mid-frequency spectral component of the signal, Padova is held in the 2nd circular buffer. The output signal of the 2nd filter LPF is weeded 2nd thinner and with its release comes on the 3rd comb of 4 IIR filters of the second order and the additional comb 2 IIR filters, in which there is an allocation of low-frequency spectral component of the signal in the 2nd ring buffer.

Thus, the outputs of the IIR filter block spectral processing is done filling in the 2nd ring buffer all spectral components in quasilogarithmic scale, which is formed by a special selection coefficients in IIR filters.

The determinant of the position of the analysis window (OPOA) looking for the next peak of the envelope of the signal in the 1-m ring buffer and determines the position of the second analysis window. The second window signal is processed by the block of spectral analysis procedure described above. Thus, the continuous filling of the 2nd ring buffer the processed signal from each analysis window 1 of the ring buffer.

The definition block start/end slice continuously analyzes the contents of the 2nd ring buffer in excess of the values of spectral components in the selected analysis window above the adaptive threshold noise environment.

As a result of processing several consecutive Windows of analysis in the definition block start/end slice accepted the decision of the s or the selection of speech, or noise, and in the second case the block is the adaptation of the threshold noise environment. In the first case, the selection of speech, which is a sequence of spectral components of multiple Windows of analysis, written in the line buffer, where is read in the unit of the secondary signal processing, which provides the procedure for reduction of the feature space.

The dimension of each of the spectral components using tabular sigmoid function converted from 16 bits to 8 bits. In addition, a pairing of neighboring spectral components. Sigmoid function performs quasilogarithmic compression of spectral components with large amplitudes without changing the ratio of the spectral component with small and medium values. As a result of processing the spectral component of each analysis window is formed by the vector of secondary signs. Thus, the secondary treatment unit generates a sequence of vectors of the secondary characteristics of the selection.

The database stores the standards of the commands presented in the form of a sequence of vectors of secondary features described above order.

In block decision is consistent with the comparison of the selected speech fragment with the standards of the teams base the data. In comparison, calculates the set of distances from the sequence of vectors of the selected segment to the sequence of vectors for each sample commands. From this set of distances are the three closest distance, which is determined according to some threshold value. If you set the threshold to be exceeded, then the decision or of the refusal of recognition or the recognition of the appropriate command, which is displayed on the control object in a convenient form.

The positive effect consisting in the possibility of implementing robust method of detection commands on cheap processors General purpose low speed, for example, type Hyperstone E1-16XT (80 MHz)is achieved by a three-stage processing of the digitized signal (dvukhkontsevaya and in the linear buffer analysis) to obtain the characteristic features in the form of secondary spectral component with a reduced feature space.

1. Method of speech recognition control commands, comprising converting the voice command into a digital signal by the codec, the digital signal processing to obtain the characteristic features, the formation of dichloromethanicum database of standards of inputs, the calculation of distances between samples teams with the implementation of the recognized command is s, the decision about the refusal of recognition or the recognition of the command with the detected output commands to the control object, characterized in that in the process of converting voice commands is continuous filling of the first ring buffer the digitized signal, the receipt of this signal spectrum in quasilogarithmic scale by applying the recursive comb filters to repeatedly thinned signal and the filling of the spectral components of the second ring buffer, the definition in the second ring buffer in the presence and boundaries of the speech fragment-based adaptive estimation of noise environment, the transfer of the spectral component of the speech segment in the linear buffer analysis, secondary processing therein spectral component of the speech fragment of obtaining indicative of reduced space and the comparison of the obtained spectral component vectors standards database commands.

2. The method according to claim 1, characterized in that in the process of obtaining the spectrum of the signal in quasilogarithmic scale is the determination of the position of the next analysis window to the maximum of the envelope in the range of time intervals that are characteristic of periods of the fundamental tone corresponding to frequencies from 80 to 300 Hz, the differentiation signal in the selected analysis window, highlighting the high frequency components is t signal spectrum by using a recursive filter of the second order, transform it into a primary intermediate signal by thinning the original twice, the selection of this intermediate signal of medium frequency component of the spectrum using the same recursive comb filters of the second order and converting it into a secondary intermediate signal by thinning the primary two times, highlighting the low-frequency component of the spectrum of this signal using the same recursive comb filters of the second order smoothing in time-frequency, mid-frequency and low-frequency component of the spectrum of the signal by the low pass filter of the second order and continuous filling of the second ring buffer received vectors of the spectral component.

3. The method according to claim 1, characterized in that the determination of the existence and boundaries of the speech fragment in the second ring buffer is made by analysis of excess spectral component above the adaptive threshold noise environment and compliance boundaries of time and energy owned the standards database commands, and upon receipt in the analysis process values "not speech" is updated threshold noise environment using filter low frequency of the first order.

4. The method according to claim 1, characterized in that in the process of secondary processing of the spectral component of the speech fragment in the linear buffer analysis produced titsa domestie spectral component in the vector of weighting coefficients, obtained by optimizing the recognition commands on the database, nonlinear sigmoidal processing spectral component for reducing the bit depth description of the signal, smoothing over time of the spectral component of the low pass filter of the second order, pairing neighboring spectral components to obtain a secondary spectral features and specification of the boundaries of the spoken commands.

5. The method according to claim 1, characterized in that in the process of forming the database for reading a set of control commands is used 20 speakers of men and 20 women speakers and is clustering, for each utterance from the set {X(0), X(1), ... X(N)} is determined by the number of proiznesenie, satisfying the condition D(xixj)<Thr, where 0<j<N, Thr is a certain threshold value, and D(xixjis a measure of similarity proiznesenie, and as the center of the cluster is selected utterance, which has the maximum number of nearest, then the center of the cluster and the cast belonging to this cluster are excluded from the set and over the remaining repeats the process of clustering, and the cluster centers of each team form a database reference descriptions of recognized speech commands.

6. The method according to claim 1, characterized in that in the process of comparing spectral componentpackage fragment vectors standards database commands are comparing the lengths of the speech fragment and benchmarks database commands and calculate the distance from the fragment to the standards database commands method nonlinear time alignment using a weighted Euclidean metric in the calculation of the distance is determined by the way the maximum similarity by finding such a function temporal alignment F( )=C0C1... Withk... WithKdesigning a temporary storage area of one image to a temporary area to another and represents a sequence of States Withkdetermined by the difference between the vectors of two images, in which the path from a state WithToin the state of0might be best here With0- the initial state, WithTo- the final state, while a single image is described by a sequence of vectors X = {x0x1, ..., xi..., xM}and the other is Y = {y0, y1, ..., yj, ..., yN}and in the process of finding the optimal path from a state WithToin the state of0calculated a distance matrix between sequences of vectors X and Y using the basic formula of dynamic programming

D(xi,yj-1) + d(xi,yj);

D(xi,yj) = min D(xi-1,yj-1) + d(xi,yj);

D(xi-1,yj) + d(xi,yj),

where 0 ≤ I < M, 0 ≤ j < N;

D(xi,yj) = W* (xk- yk)2weighted Euclidean metric, C is, return x kand Ik- vectors belonging to the compared samples,

when this is calculated not all distance matrix, but only those that are located in a corridor along the diagonal of the matrix.

7. The method according to p. 1, characterized in that in the decision-making process of the discernment team is the analysis of the distances between the sequence of vectors X that describes the selected speech segment and a sequence of vectors Yidescribing the standards database commands, and finding the next three standards database, in this case, if the shortest distance corresponds to a certain threshold value, the decision on the recognized command, and if the distance exceeds the threshold, the decision about the refusal of recognition and return to the initial stage of the process of discernment to the signal in the first ring buffer.



 

Same patents:

The invention relates to the transmission of speech

The invention relates to suppression of noise in digital communication systems, based on the transmission frame, and relates, in particular, the method of suppressing noise in such systems based on the subtraction of spectra

FIELD: radio engineering.

SUBSTANCE: device has block for determining beginning and end of command, first memory block, block for syllable segmentation, block for time normalization of command, standard commands block, commands likeness calculator, while output of block for determining beginning and end of command is connected to first inputs of first memory block and syllable segmentation block, output of first memory block is connected to first output of command time normalization block, second output of which is connected to output of syllable segmentation block. Device additionally has supporting noise input, second memory block, block for time normalization of noise, first and second blocks for level normalization, signals mixer, while input of speech command is connected to output of block for determining beginning and end of command and to second inputs of syllable segmentation block and first memory block, bearing noise input is connected to first input of second memory block, to second input of which output of block for determining beginning and end of command is connected, output of second memory block is connected to first input of block for time normalization of noise, output of syllable segmentation block is connected to second inputs of block for time normalization of noise, of first and second level normalization block, standard commands block, and to third inputs of first and second memory block, output of block for time normalization of noise is connected to first input of signals mixer, to second input of which output of standard commands block is connected, first input of which is connected to first output of commands likeness calculator, output of signals mixer is connected to first input of second level normalization block, output of which is connected to second input of commands likeness calculator, to first input of which output of first level normalization block is connected, first input of which is connected to output of block for time normalization of command.

EFFECT: higher probability of correct command recognition during effect from noises.

6 dwg

The invention relates to the transmission of speech

The invention relates to a communication system and is used to perform encoding with linear prediction, excited by the ID variable speed

FIELD: radio engineering.

SUBSTANCE: device has block for determining beginning and end of command, first memory block, block for syllable segmentation, block for time normalization of command, standard commands block, commands likeness calculator, while output of block for determining beginning and end of command is connected to first inputs of first memory block and syllable segmentation block, output of first memory block is connected to first output of command time normalization block, second output of which is connected to output of syllable segmentation block. Device additionally has supporting noise input, second memory block, block for time normalization of noise, first and second blocks for level normalization, signals mixer, while input of speech command is connected to output of block for determining beginning and end of command and to second inputs of syllable segmentation block and first memory block, bearing noise input is connected to first input of second memory block, to second input of which output of block for determining beginning and end of command is connected, output of second memory block is connected to first input of block for time normalization of noise, output of syllable segmentation block is connected to second inputs of block for time normalization of noise, of first and second level normalization block, standard commands block, and to third inputs of first and second memory block, output of block for time normalization of noise is connected to first input of signals mixer, to second input of which output of standard commands block is connected, first input of which is connected to first output of commands likeness calculator, output of signals mixer is connected to first input of second level normalization block, output of which is connected to second input of commands likeness calculator, to first input of which output of first level normalization block is connected, first input of which is connected to output of block for time normalization of command.

EFFECT: higher probability of correct command recognition during effect from noises.

6 dwg

FIELD: technology for analyzing speech under unfavorable environmental conditions.

SUBSTANCE: during transformation of spoken command first circular buffer is continuously filled with digitized signal, comb of recursive filters is applied to multiply loosened signal and spectral components are utilized to fill second circular buffer, limits of speech fragment are determined within it on basis of adaptive estimate of noise environment, spectral components of speech fragment are transferred to linear analysis buffer, shortened sign space is received from aforementioned buffer and produced spectral components are compared to standard vectors of database commands.

EFFECT: utilization of device under conditions of, for example, moving vehicle or mechanical industry with high noise pollution level provides for stable recognition of commands independently on particularities of narrators pronunciation, decreased memory volume.

7 cl, 2 dwg

FIELD: physics.

SUBSTANCE: invention relates to noise evaluation, particularly to evaluation of noise in signals used for identifying images. The method and device evaluate additive noise in a noisy signal using step-by-step Bayesian analysis. Prior distribution of time-varying noise is allowed for, and hyperparametres (average value and dispersion) are recursively corrected using approximation for posterior noise, calculated at the previous step. Additive noise in the time domain is presented in the region of logarithmic spectrum or cepstrum before step-by-step Bayesian analysis. Results of both evaluations of average value and dispersion for noise for each separate frame are used for extension of speech signals in the same region of logarithmic spectrum or cepstrum.

EFFECT: more efficient evaluation of noise in signals when identifying images.

20 cl, 4 dwg

FIELD: information technology.

SUBSTANCE: alternative sensor signal is generated, wherein the alternative sensor is less sensitive to ambient noise than the microphone which is based on the principle of air conduction. A signal of the air conduction based microphone is generated. The signal of the alternative sensor and the signal of the air conduction based microphone are used to estimate the likelihood L(St) of the speech status St by estimating the separate likelihood component for each of the set of frequency components and merging the separate likelihood components to form an estimate of the likelihood of the speech status. The likelihood the speech status is used to estimate the value of reduced noise, which models the value of the reduced noise for the given speech status. The likelihood of the speech status is used together with the signal of the alternative sensor and the air conduction based microphone in order to estimate the value of clean speech for the clean speech signal.

EFFECT: generation of a high-quality speech signal.

13 cl, 6 dwg

FIELD: physics.

SUBSTANCE: method is performed by analysing metadata to determine, whether or not the metadata actually are or include profile metadata indicating the target profile, wherein the profile metadata are suitable for performing, at least, one of volume control, volume normalization, or dynamic range control of audio data in accordance with the target profile. The target profile determines the target loudness and/or, at least, one target characteristic of the dynamic range subjected to the rendering of the audio data version for playback by the audio playback device from the group of audio playback devices.

EFFECT: ensuring the reception of bit streams.

19 cl, 17 dwg

FIELD: technology for analyzing speech under unfavorable environmental conditions.

SUBSTANCE: during transformation of spoken command first circular buffer is continuously filled with digitized signal, comb of recursive filters is applied to multiply loosened signal and spectral components are utilized to fill second circular buffer, limits of speech fragment are determined within it on basis of adaptive estimate of noise environment, spectral components of speech fragment are transferred to linear analysis buffer, shortened sign space is received from aforementioned buffer and produced spectral components are compared to standard vectors of database commands.

EFFECT: utilization of device under conditions of, for example, moving vehicle or mechanical industry with high noise pollution level provides for stable recognition of commands independently on particularities of narrators pronunciation, decreased memory volume.

7 cl, 2 dwg

FIELD: acoustics.

SUBSTANCE: invention pertains to the method and device for subsequent processing of a decoded sound signal. The decoded signal is divided into a set of signals at frequency sub-ranges. Subsequent processing is done to at least, one of the signals in the frequency sub-ranges. After processing of at least one signal from the frequency sub-ranges, the signals from the frequency sub-ranges are summed up to form an output decoded sound signal, subject to the next processing. In that way, processing is localised in the necessary sub-range or sub-ranges, leaving the other sub-ranges practically unchanged.

EFFECT: increased perceptible quality of the decoded sound signal.

54 cl, 14 dwg

FIELD: physics.

SUBSTANCE: said utility invention relates to voice signal enhancement technique for enhancement of communication in the presence of background noise. In one invention version, a method for suppressing noise in the voice signal is proposed, which, for a voice signal having a spectral representation in the form of a set of frequency bins, involves the determination of a transmission scale factor for at least some of the said frequency bins, and the calculation of smoothed transmission scale factors. The calculation of smoothed transmission scale factors involves, at least for some of the frequency bins, combining the current value of the transmission scale factor and the smoothed transmission scale factor determined previously. The other invention version involves the separation of the frequency bin set into the first set of adjacent frequency bins and the second set of adjacent frequency bins with a border frequency between them, which separates the areas of application of various noise suppression methods, the change in the border frequency value being a function of the spectral structure of the voice signal.

EFFECT: efficient noise suppression by decreasing background noise level in voice signal.

79 cl, 4 dwg

FIELD: physics; acoustics.

SUBSTANCE: invention concerns to numeral processing of an audiosignal. The sound reproduction system contains device (1) inputs of a numeral audiosignal, processor (2, DSP) a numeral audiosignal and device (3) deductions of a numeral audiosignal and processor (2, DSP) a numeral audiosignal are contained by filter (21) upper frequencies with a transmission band (f) between the first and second frequencies, for example, between 300Hz and 2KHz, cramping amplifier (22) for squeezing and amplification of a signal to necessary limits of amplitude and terminator for restriction of the signal exceeding level of restriction and preferably filter (23) inferior frequencies for a filtration of the signal given by the cramping amplifier, and for granting of a target signal and transmission band (f) the filter of the inferior frequencies is in limits 2KHz-Fs/2 where Fs - frequency of sample and 10 Fs/2 can make, for example, 4KHz. Parametres of various devices, for example, frequency of a cutoff, the order of the band-pass filter, amplification, etc., preferably depend on the measured noise level.

EFFECT: making of system and an expedient of sound reproduction with the refined legibility.

23 cl, 14 dwg, 1 tbl

FIELD: physics; acoustics.

SUBSTANCE: invention relates to suppressing noise in speech signals. The method and system are used for evaluating authentic speech values, signal of an alternative sensor, received from a sensor, which is distinct from a microphone with air conduction. During evaluation, the signal of the alternative sensor is used exclusively, or together with the signal of the microphone with air conduction. Authentic speech value is evaluated without using a model, trained on training data with noise, collected from the microphone with air conduction. In one version of implementation, correction vectors are added to the vector formed from the signal of the alternative sensor to form a filter, which is used on the signal of the microphone with air conduction for authentic evaluation of speech. In other versions the fundamental pitch of the speech signal, which is used for decomposing the signal of the microphone with air conduction, is determined from the signal of the alternative sensor. The decomposed signal is then used to determine authentic evaluation of the signal.

EFFECT: provision for optimum evaluation of speech values in conditions when the signal of the alternative sensor differs from the signal of the microphone with air conduction.

15 cl, 11 dwg

FIELD: physics; communications.

SUBSTANCE: invention relates to removal of noise from speech signals received by hand-held mobile devices. The mobile hand-held device with multi-sensory speech enhancement comprises an air conduction microphone which converts acoustic waves to a microphone electrical signal which indicates the speech frame, at least one alternative sensor which uses bone conduction and gives out an electrical signal of the alternative sensor, indicating the said speech frame, and a processor which uses the microphone signal and the signal of the alternative sensor to evaluate value of clear speech for the speech frame. The mobile hand-held device can also include a proximity sensor, separate from the air conduction microphone, which indicates distance from the mobile device to the object, and a unit for evaluating a clear signal which uses the microphone signal, signal of the said alternative sensor and proximity sensor to remove noise from the microphone signal and thereby obtaining an amplified clear speech signal.

EFFECT: removal of noise from speech signals received by hand-held mobile devices and generation of sound, taking into account noise value to provide clear speech.

29 cl, 16 dwg

FIELD: physics; acoustics.

SUBSTANCE: invention relates to a method for synthesising a monophonic sound signal based on an existing encoded multichannel sound signal. The encoded multichannel sound signal contains separate parametre values for each channel of the multichannel sound signal for at least the upper frequency band, where parametre values of several channels are combined in a region for parametre values. Combination of parametre values is controlled for at least one parametre based on information on corresponding activity in the said several channels. After that, combined parametre values are used to synthesise a monophonic sound signal. The invention also relates to the corresponding sound decoder and the corresponding encoding system.

EFFECT: reduced computing load necessary for synthesising a monophonic sound signal based on an encoded multichannel sound signal.

18 cl, 9 dwg

FIELD: physics; acoustics.

SUBSTANCE: invention relates to processing broadband voice signals. According to one embodiment, the broadband voice encoder includes a narrow-band encoder and a high frequency band encoder. The narrow-band encoder encodes the narrow-band part of the broadband voice signal as a set of filter parametres and the corresponding encoded driving signal. The high frequency band encoder encodes part of the high frequency band of the broadband voice signal in accordance with the high frequency band signal to obtain a set of filter parametres. The high frequency band encoder generates a high frequency band signal by applying a nonlinear function to the signal based on the encoded narrow-band driving signal to generate a spectrally spread signal.

EFFECT: broadening of a narrow-band voice signal to support transmission and/or retention of broadband voice signals when transmission capacity increases.

40 cl, 45 dwg

FIELD: information technologies.

SUBSTANCE: wideband speech coder, according to version of implementation, includes a filter bank, having a track of low frequencies band processing and track of high frequencies band processing. Tracks of processing have overlapping frequency characteristics. Narrowband speech coder is arranged with the possibility to code speech signal generated by means of low frequencies band processing track, according to the first methodology of coding. Wideband speech coder is arranged with the possibility to code speech signal generated by means of high frequencies band processing track, according to the second methodology of coding, which differs from the first methodology of coding.

EFFECT: improved quality of wideband speech signals coding.

33 cl, 58 dwg

FIELD: physics.

SUBSTANCE: method and device for estimating speech signal values determine channel response of an alternative sensor using an alternative sensor signal and an air conduction microphone signal. The channel response is then used to estimate the clean speech value using at least part of the alternative sensor signal.

EFFECT: optimum estimation of speech signal value when noise conditions of test signals are matched with noise conditions of training signals.

26 cl, 7 dwg

Up!