RussianPatents.com

Method and apparatus for processing audio signal for speech enhancement using required feature extraction function. RU patent 2507608.

Method and apparatus for processing audio signal for speech enhancement using required feature extraction function. RU patent 2507608.
IPC classes for russian patent Method and apparatus for processing audio signal for speech enhancement using required feature extraction function. RU patent 2507608. (RU 2507608):

G10L21/02 - Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B0003200000; echo suppression in hand-free telephones H04M0009080000)
Another patents in same IPC classes:
Apparatus and method for generating synthesis audio signal and for encoding audio signal Apparatus and method for generating synthesis audio signal and for encoding audio signal / 2501097
Apparatus for generating a synthesis audio signal includes a first converter for converting a an audio signal on a time interval into a spectral representation; a spectral domain patch generator for performing a plurality of different spectral domain patching algorithms, wherein each patching algorithm generates a modified spectral representation comprising spectral components in an upper frequency band derived from corresponding spectral components in a core frequency band of the audio signal, and select a first patching for the first time interval and a second patching algorithm for a second different time interval in accordance with the patching control signal; a high-frequency reconstruction manipulator for manipulating the modified spectral representation to obtain a bandwidth extended signal; and a combiner for combining the audio signal having spectral components in the core frequency band with the bandwidth extended signal to obtain the synthesis audio signal.
Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program / 2498422
Signal linear prediction coefficient presented in the frequency domain is obtained by performing frequency analysis with linear prediction using a covariation technique or an autocorrelation technique. Once the filter power of the obtained linear prediction coefficient is corrected, the signal is frequency filtered using the corrected coefficient, thereby forming a signal time envelope.
Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program / 2498421
Signal linear prediction coefficient presented in the frequency domain is obtained by performing frequency analysis with linear prediction using a covariation technique or an autocorrelation technique. Once the filter power of the obtained linear prediction coefficient is corrected, the signal is frequency filtered using the corrected coefficient, thereby forming a signal time envelope.
Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program Speech encoder, speech decoder, speech encoding method, speech decoding method, speech encoding program and speech decoding program / 2498420
Signal linear prediction coefficient presented in the frequency domain is obtained by performing frequency analysis with linear prediction using a covariation technique or an autocorrelation technique. Once the filter power of the obtained linear prediction coefficient is corrected, the signal is frequency filtered using the corrected coefficient, thereby forming a signal time envelope.
Apparatus and method of calculating control parameters of echo suppression filter and apparatus and method of calculating delay value Apparatus and method of calculating control parameters of echo suppression filter and apparatus and method of calculating delay value / 2495506
Apparatus (200) for calculating control parameters of a noise filter (210), designed to filter a second audio signal in order to eliminate an echo signal based on a first audio signal, includes a computer (220) having a value determiner (230) for calculating at least one energy factor for a band-pass signal of at least two time-consecutive data units of at least one signal from a group of signals. The computer (220) also includes an average value determiner (250) for determining at least one average value of at least one calculated energy factor for the band-pass signal. The computer (220) also includes a modifier (260) for correcting at least one energy factor for the band-pass signal based on the calculated average value for the band-pass signal. The computer (220) also includes a device for calculating control parameters (270) for the suppression filter (210) based on at least one corrected energy factor.
Cross product-enhanced harmonic transformation Cross product-enhanced harmonic transformation / 2495505
Described is a system and a method of generating a high-frequency signal component from a low-frequency signal component. The system includes an analysing filter unit which forms a set of signals of the analysed subbands of the low-frequency signal component. The system also includes a linear processing unit for generating a signal of the synthesised subband with the synthesised frequency by modifying the phase of the first and second signals of the analysed subbands from the set of signals of the analysed subbands and combining the signals of the analysed subbands with the modified phase. Ultimately, the system includes a synthesis filter unit for generating a high-frequency signal component from a signal of the synthesised subband.
Oversampling in combined transposer filter bank Oversampling in combined transposer filter bank / 2494478
System comprises an analysis filter bank (501) comprising an analysis transformation unit (601) having a frequency resolution of Δf; and an analysis window (611) having a duration of DA; the analysis filter bank (501) being configured to provide a set of analysis subband signals from the low frequency component of the signal; a nonlinear processing unit (502, 650) configured to determine a set of synthesis subband signals based on a portion of the set of analysis subband signals, wherein the portion of the set of analysis subband signals is phase shifted by a transposition order T; and a synthesis filter bank (504) comprising a synthesis transformation unit (602) having a frequency resolution of QΔf; and a synthesis window (612) having a duration of Ds ; the synthesis filter bank (504) being configured to generate the high frequency component of the signal from the set of synthesis subband signals; wherein Q is a frequency resolution factor with Q≥1 and smaller than the transposition order T; and wherein the value of the product of the frequency resolution Δf and the duration DA of the analysis filter bank is selected based on the frequency resolution factor Q.
Apparatus and method of generating bandwidth extension output data Apparatus and method of generating bandwidth extension output data / 2494477
Apparatus (100) for generating bandwidth extension output data (102) for an audio signal (105) has noise floor measuring device (110), a signal energy characteristic (120) and a processor (130). The audio signal (105) has components in a first frequency band (105a) and components in a second frequency band (105b), the bandwidth extension output data (102) are adapted to control a synthesis of the components in the second frequency band (105b). The noise floor measuring device (110) measures noise floor data (115) of the second frequency band (105b) for a time portion (T) of the audio signal (105). The signal energy characteristic (120) derives energy distribution data (125), the energy distribution data (125) characterising an energy distribution in a spectrum of the time portion (T) of the audio signal (105). The processor (130) combines the noise floor data (115) and the energy distribution data (125) to obtain the bandwidth extension output data (102).
Audio signal synthesiser and audio signal encoder Audio signal synthesiser and audio signal encoder / 2491658
Audio signal synthesiser generates a synthesis audio signal having a first frequency band and a second synthesised frequency band derived from the first frequency band, and includes: a patch generator, a spectral converter, a primary signal processor and a combiner. The patch generator performs at least two different patching algorithms, wherein each patching algorithm generates a primary signal having signal components in the second synthesised frequency band using an audio signal having signal components in the first frequency band. The patch generator is adapted to select one of the at least two different patching algorithms depending on control information on a first time portion and a second patching algorithm depending on the control information on a second time portion different from the first time portion to obtain the primary signal for the first and the second time portion. The spectral converter converts the primary signal into a primary signal spectral representation. The primary signal processor generates the corresponding primary signal spectral representation depending on spectral domain spectral band replication parameters. The combiner combines an audio signal having signal components in the first group or a signal derived from the audio signal with the primary signal spectral representation or with a further signal derived from the primary signal spectral representation to obtain the synthesis audio signal.
Apparatus and method for calculating number of spectral envelopes Apparatus and method for calculating number of spectral envelopes / 2487428
Apparatus for calculating the number of spectral envelopes includes: a quantisation threshold calculator; a detector for detecting violation of the threshold value using the quantisation threshold; a processor for determining a first envelope border between the pair of neighbouring time portions; a processor for determining a second envelope border between a different pair of neighbouring time portions; a number processor for establishing the number of spectral envelopes having the first envelope border and the second envelope border; a switching decision unit formed to provide a decision switching signal; the decision switching signal yields a speech-like audio signal and a common sound-like audio signal, where the detector is capable of lowering the threshold value for speech-like audio signals. The method describes operation of said apparatus.
Method for recognizing spoken control commands Method for recognizing spoken control commands / 2271578
During transformation of spoken command first circular buffer is continuously filled with digitized signal, comb of recursive filters is applied to multiply loosened signal and spectral components are utilized to fill second circular buffer, limits of speech fragment are determined within it on basis of adaptive estimate of noise environment, spectral components of speech fragment are transferred to linear analysis buffer, shortened sign space is received from aforementioned buffer and produced spectral components are compared to standard vectors of database commands.
Method and device for frquency-selective pitch extraction of synthetic speech Method and device for frquency-selective pitch extraction of synthetic speech / 2327230
Invention pertains to the method and device for subsequent processing of a decoded sound signal. The decoded signal is divided into a set of signals at frequency sub-ranges. Subsequent processing is done to at least, one of the signals in the frequency sub-ranges. After processing of at least one signal from the frequency sub-ranges, the signals from the frequency sub-ranges are summed up to form an output decoded sound signal, subject to the next processing. In that way, processing is localised in the necessary sub-range or sub-ranges, leaving the other sub-ranges practically unchanged.
Method and device for enhancement of voice signal in presence of background noise Method and device for enhancement of voice signal in presence of background noise / 2329550
Said utility invention relates to voice signal enhancement technique for enhancement of communication in the presence of background noise. In one invention version, a method for suppressing noise in the voice signal is proposed, which, for a voice signal having a spectral representation in the form of a set of frequency bins, involves the determination of a transmission scale factor for at least some of the said frequency bins, and the calculation of smoothed transmission scale factors. The calculation of smoothed transmission scale factors involves, at least for some of the frequency bins, combining the current value of the transmission scale factor and the smoothed transmission scale factor determined previously. The other invention version involves the separation of the frequency bin set into the first set of adjacent frequency bins and the second set of adjacent frequency bins with a border frequency between them, which separates the areas of application of various noise suppression methods, the change in the border frequency value being a function of the spectral structure of the voice signal.
System and method of sound signal processing System and method of sound signal processing / 2347282
Invention concerns to numeral processing of an audiosignal. The sound reproduction system contains device (1) inputs of a numeral audiosignal, processor (2, DSP) a numeral audiosignal and device (3) deductions of a numeral audiosignal and processor (2, DSP) a numeral audiosignal are contained by filter (21) upper frequencies with a transmission band (f) between the first and second frequencies, for example, between 300Hz and 2KHz, cramping amplifier (22) for squeezing and amplification of a signal to necessary limits of amplitude and terminator for restriction of the signal exceeding level of restriction and preferably filter (23) inferior frequencies for a filtration of the signal given by the cramping amplifier, and for granting of a target signal and transmission band (f) the filter of the inferior frequencies is in limits 2KHz-Fs/2 where Fs - frequency of sample and 10 Fs/2 can make, for example, 4KHz. Parametres of various devices, for example, frequency of a cutoff, the order of the band-pass filter, amplification, etc., preferably depend on the measured noise level.
Method and device for increasing speech intelligibility using several sensors Method and device for increasing speech intelligibility using several sensors / 2373584
Invention relates to suppressing noise in speech signals. The method and system are used for evaluating authentic speech values, signal of an alternative sensor, received from a sensor, which is distinct from a microphone with air conduction. During evaluation, the signal of the alternative sensor is used exclusively, or together with the signal of the microphone with air conduction. Authentic speech value is evaluated without using a model, trained on training data with noise, collected from the microphone with air conduction. In one version of implementation, correction vectors are added to the vector formed from the signal of the alternative sensor to form a filter, which is used on the signal of the microphone with air conduction for authentic evaluation of speech. In other versions the fundamental pitch of the speech signal, which is used for decomposing the signal of the microphone with air conduction, is determined from the signal of the alternative sensor. The decomposed signal is then used to determine authentic evaluation of the signal.
Method for multi-sensory speech enhancement on mobile hand-held device and mobile hand-held device Method for multi-sensory speech enhancement on mobile hand-held device and mobile hand-held device / 2376722
Invention relates to removal of noise from speech signals received by hand-held mobile devices. The mobile hand-held device with multi-sensory speech enhancement comprises an air conduction microphone which converts acoustic waves to a microphone electrical signal which indicates the speech frame, at least one alternative sensor which uses bone conduction and gives out an electrical signal of the alternative sensor, indicating the said speech frame, and a processor which uses the microphone signal and the signal of the alternative sensor to evaluate value of clear speech for the speech frame. The mobile hand-held device can also include a proximity sensor, separate from the air conduction microphone, which indicates distance from the mobile device to the object, and a unit for evaluating a clear signal which uses the microphone signal, signal of the said alternative sensor and proximity sensor to remove noise from the microphone signal and thereby obtaining an amplified clear speech signal.
Synthesisation of monophonic sound signal based on encoded multichannel sound signal Synthesisation of monophonic sound signal based on encoded multichannel sound signal / 2381571
Invention relates to a method for synthesising a monophonic sound signal based on an existing encoded multichannel sound signal. The encoded multichannel sound signal contains separate parametre values for each channel of the multichannel sound signal for at least the upper frequency band, where parametre values of several channels are combined in a region for parametre values. Combination of parametre values is controlled for at least one parametre based on information on corresponding activity in the said several channels. After that, combined parametre values are used to synthesise a monophonic sound signal. The invention also relates to the corresponding sound decoder and the corresponding encoding system.
Systems, methods and device for broadband voice encoding Systems, methods and device for broadband voice encoding / 2381572
Invention relates to processing broadband voice signals. According to one embodiment, the broadband voice encoder includes a narrow-band encoder and a high frequency band encoder. The narrow-band encoder encodes the narrow-band part of the broadband voice signal as a set of filter parametres and the corresponding encoded driving signal. The high frequency band encoder encodes part of the high frequency band of the broadband voice signal in accordance with the high frequency band signal to obtain a set of filter parametres. The high frequency band encoder generates a high frequency band signal by applying a nonlinear function to the signal based on the encoded narrow-band driving signal to generate a spectrally spread signal.
Method and device for coding of voice signals with strip splitting Method and device for coding of voice signals with strip splitting / 2386179
Wideband speech coder, according to version of implementation, includes a filter bank, having a track of low frequencies band processing and track of high frequencies band processing. Tracks of processing have overlapping frequency characteristics. Narrowband speech coder is arranged with the possibility to code speech signal generated by means of low frequencies band processing track, according to the first methodology of coding. Wideband speech coder is arranged with the possibility to code speech signal generated by means of high frequencies band processing track, according to the second methodology of coding, which differs from the first methodology of coding.
Method and device for enhancing speech using several sensors Method and device for enhancing speech using several sensors / 2389086
Method and device for estimating speech signal values determine channel response of an alternative sensor using an alternative sensor signal and an air conduction microphone signal. The channel response is then used to estimate the clean speech value using at least part of the alternative sensor signal.

FIELD: information technology.

SUBSTANCE: apparatus for processing an audio signal to obtain control information for a speech enhancement filter (12) comprises a feature extractor (14) for extracting at least one feature in the frequency band of a plurality of frequency bands of a short-time spectral representation of a plurality of short-time spectral representations, where the at least one feature represents a spectral shape of the short-time spectral representation in the frequency band. The apparatus further comprises a feature combiner (15) for combining the at least one feature for each frequency band using combination parameters to obtain the control information for the speech enhancement filter for a time portion of the audio signal. The feature combiner can use a neural network regression method, which is based on combination parameters determined in a training phase for the neural network.

EFFECT: speech enhancement.

17 cl, 10 dwg

 

Field of the invention

The present invention relates to the field of audio signal processing and, in particular, to the increase of legibility of speech audio signals, so that the processed signal has the audio content that has enhanced objective or subjective speech clarity.

Prerequisites for creation of the invention and the prior art

Increase of legibility of speech is used for various applications. Widely known application of the use of digital signal processing in the hearing. Digital signal processing in the hearing offers new, effective drugs to treat hearing loss. Besides the higher the quality of the acoustic signal, digital hearing AIDS allow the use of special strategies speech processing. For many of these strategies is desirable presence of evaluation of the balance of speech-to-noise ratio (SNR) of the acoustic environment. In particular, considered the applications in which complex algorithms for speech processing optimized for a specific sound environment, but these algorithms may not be the case in situations that do not meet the specific conditions. This applies in particular to the schemes reduce the noise level, which may represent a process artifacts in a quiet environment, or in situations where the signal to noise ratio (SNR) is below a certain threshold. Optimal choice of parameters of compression algorithms and gain can depend on the ratio of speech-to-noise ratio, so that the adaptation of a set of parameters depending on the estimates of the signal/noise ratio, helps in the proof of an existing advantages. In addition, evaluation of the signal/noise ratio can be used as benchmarks for schemes reduce noise, such as Wiener filtering or spectral subtraction.

Other applications are in the field of increase of legibility of speech sound film. It was discovered that many people have difficulty understanding speech content of the film, for example, in connection with hearing impairments. To follow the plot of the film, it is important to understand the appropriate narration sound track, for example, monologues, dialogues, ads and narration. People who have problems with hearing, often experience an experience, when background sounds, such as ambient noise and music, played at too high a level in relation to the speech. In this case, it is desirable to increase the level of speech signals and weaken the background sounds in General, to strengthen the level of the speech signal in relation to the General level.

Well-known approach to enhance speech intelligibility - spectral weighing, also known as a short term spectral attenuation, presented in figure 3. The output signal of[k] is calculated by attenuation of the signal x[V] of the plot bandwidth of the incoming signals x[k] depending on the noise activity inside the signals plot bandwidth.

In the next incoming signal, x[k] assumes the presence of additive/added by mixing the desired speech signal s[k] and background noise b[k].

x [ k ] = s [ k ] + b [ k ] (1)

Increase of legibility of speech is an improvement in the objective sense and/or in the subjective quality of voice.

Reproduction of the input signal frequency area is calculated using the short-term Fourier transform (STFT), other temporarily-frequency transformations or filter Bank, as shown by 30. Incoming signal is then filtered into the frequency domain by equation 2, while the frequency response G(W) of the filter is calculated so that the activity of the noise is reduced. Output signal is calculated using the reverse processing of frequency-time transformations or filter-Bank, respectively.

Y ( ω ) = G ( ω ) X ( ω ) (2)

The corresponding spectral weighing G (?) are calculated on 31 for each spectral values of the spectrum of the input signal X(W) and estimates of the spectrum of noise

B ^ ( ω )

or, what is equivalent, using a measure of the linear section frequency bands

R ^ ( ω ) = S ^ ( ω ) / B ^ ( ω )

. Weighted spectral value is converted back to the staging area in 32. The vivid examples of the rules of noise are examples of spectral [S.Boll, "Suppression of acoustic noise in speech using spectral subtraction", IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.27, no.2, pp.113-120, 1979] Wiener filtering. Believing that the incoming signal is additive mixture of speech and noise signals and that it and the noise is not correlated, the values obtained for the spectral subtraction method are given in equation 3.

G ( ω ) = 1 - | B ^ ( ω ) | 2 | X ( ω ) | 2 (3)

Similar weighting are derived from estimates of the linear plot bandwidth SNR

R ^ ( ω )

according to equation 4.

Channel G ( ω ) = R ^ ( ω ) R ^ ( ω ) + 1 (4)

Various extensions of the spectral subtraction had already been proposed in the past, namely the use of factor over - subtraction-parameter spectral floor [.Berouti, R.Schwartz, J.Makhoul, "Enhancement of speech corrupted by acoustic noise", Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP; 1979], generic form [J.Lim, A.Oppenheim, "Enhancement and bandwidth compression of noisy speech", Proc. of the IEEE, vol 67, no.12, pp.1586-1604, 1979], the use of criteria of perception (for example, N.Virag, "Single channel speech enhancement based on masking properties of the human auditory system", IEEE Trans. Speech and Audio Proc., vol.7, no.2, pp.126-137, 1999) and multiband spectral subtraction (for example, S.Kamath, .Loizou, "A multi-band spectral subtraction method for enhancing speech corrupted by colored noise", Proc. of the IEEE Int. Conf. Acoust. Speech Signal Processing, 2002). However, the most important part of the spectral weighting method is the assessment of the instantaneous frequency spectrum noise or plot bandwidth SNR, which is prone to errors, especially if the noise is not stationary. Evaluation errors lead to the presence of residual noise, distortion components speech or music noise (artifact, which was described as "a melody tone quality" [.Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007]). A simple approach to the assessment of noise is a measurement and averaging the spectrum of noise during the speech pauses. This approach does not yield satisfactory results, if the noise spectrum changes with time during a speech activity, and if not managed to discover speech pause. Methods of estimation of the spectrum of the noise, even in speech activities have been proposed in the past and may be classified in accordance with .Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007? as

- Minimum tracking algorithms.

- Temporarily-recursive algorithms averaging.

- Algorithms based on the histogram.

Estimation of the spectrum of the noise with a minimum of statistics was proposed in work R.Martin, "Spectral subtraction based on minimum statistics", Proc. of EUSIPCO, Edingburgh, UK, 1994. The method is based on tracking the local minima of the energy of the signal in each section of the frequency band. Rule nonlinear update for the assessment of noise and quickly update was proposed work G.Doblinger, "Computationally Efficient Speech Enhancement By Spectral Minima Tracking In Subbands", Proc. of Eurospeech, Madrid, Spain, 1995.

Temporarily-recursive algorithms averaging evaluation and updating of the spectrum of noise when evaluating the signal-to-noise in the frequency band at a given frequency band are very low. This is done by calculating the recursive estimates of average weighting of the preceding noise and presented at the moment of the spectrum. Weighting is determined as a function of the probability that we are dealing with speech or as a function of the estimated SNR in a certain frequency band, for example, as described in the work I.Cohen "Noise estimation by minima controlled recursive averaging for robust speech enhancement", IEEE Signal Proc. Letters, vol.9, no.1, pp.12-15, 2002, and in L.Lin, W.Holmes, E.Ambikairajah Adaptive noise estimation algorithm for speech enhancement". Electronic Letters, vol.39, no.9, pp.754-755, 2003.

Methods based on the histogram, calculated on the assumption that the histogram energy plot frequencies very often . Important low-power mode accumulates the energy of segments without words or low energy segments of the speech. High-energy mode accumulates the energy of speech segments and noise. Energy noise in a specific area bandwidth is determined from the low-energy regime [H.Hirsch, .Ehrlicher "Noise estimation techniques for robust speech recognition", Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Detroit, USA, 1995]. To get the most complete presentation of the latest materials on this issue, you can contact .Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.

Methods of evaluation of the site bandwidth SNR based on controlled study using the functions of the amplitude modulation, described in J.Tchorz, .Kollmeier, "SNR Estimation based on amplitude modulation analysis with applications to noise suppression", IEEE Trans. On Speech and Audio Processing, vol.11, no.3, pp.184-192, 2003, and in M.Kleinschmidt, V.Hohmann, "Sub-band SNR estimation using auditory processing feature". Speech Communication: Special Issue on Speech Processing for Hearing Aids, vol.39, pp.47-64, 2003.

Other approaches to enhance speech intelligibility are the methods to filter synchronization pitch (for example, described in R.Frazier, S.Samsam, L.Braida, A.Oppenheim, "Enhancement of speech by adaptive filtering", Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Philadelphia, USA, 1976), spectral filtering - time modulation (STM) (for example in the work of N.Mesgarani, S.Shamma, "Speech enhancement based on filtering the spectro-temporal modulations", Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Philadelphia, USA, 2005) and filter based on a sine model playback input signal (for example, in the work of J.Jensen, J.Hansen, "Speech enhancement using a constrained iterative sinusoidal model", IEEE Trans. on Speech and Audio Processing, vol.9, no.7, pp.731-740, 2001).

Methods of evaluation of the site bandwidth SNR? based on controlled study using the functions of the amplitude modulation? listed in the works J.Tchorz, .Kollmeier, "SNR Estimation based on amplitude modulation analysis with applications to noise suppression", IEEE Trans. On Speech and Audio Processing, vol.11, no.3, pp.184-192, 2003, and in M.Kleinschmidt, V.Hohmann, "Sub-band SNR estimation using auditory processing feature, Speech Communication: Special Issue on Speech Processing for Hearing Aids, vol.39, pp.47-64, 200312, 13? lose the way you need two processing step spectrogram. First step processing spectrogram is creating a spectrogram time/frequency temporary area to the audio signal. Then, in order to stimulate modulation spectrogram, you want to convert another spectrogram time/frequency, which converts the spectral information of the spectral region in the area of modulation. Due to the inherent systematic delay and by the decision of a question of time/frequency? inherent in any algorithm of transformation, this operation additional conversion entails a number of problems.

The additional effect of this procedure is that noise assessment is not very accurate in conditions, when the noise is stationary and when you may receive various noise signals.

Short description of the invention

The present invention is to improve and increase speech intelligibility.

In accordance with the first aspect, this goal is achieved by means of a device for audio signal processing to obtain management information for the filter that increases the intelligibility of speech, including: device identification of characteristics to produce a temporal sequence of short-term spectral representations of a sound signal and to extract at least one of the characteristics in each band of the many frequency bands for many short-term spectral views, and at least one characteristic representing the spectral form of short-time spectral representation in a range of frequencies many frequency bands and the devices combining characteristics, combining at least one characteristic for each frequency band using the parameters combine to obtain management information for the filter enhances speech intelligibility for part-time audio signal.

In accordance with the second aspect, this goal is achieved by the method of audio signal processing to obtain management information for the filter enhances speech intelligibility, including obtaining time-series of short-term spectral representations of a sound signal; recovery of at least one characteristic in each band many frequency bands for many short-term spectral representations at least one characteristic representing the spectral form of short-time spectral representation in the frequency range of the set frequency ranges, as well as a combination of at least one of the characteristics for each band using the parameters of combine for obtain management information for the filter enhances speech intelligibility for part-time audio signal.

In accordance with the third aspect of this goal is achieved through a device to enhance speech intelligibility in the sound signal, including: a device for audio signal processing to obtain management information subjected to filtering for many bands, representing part-time audio signal; and controlled filter, the filter is controlled so that the band audio signal is alternating weakened taking into account the different bands on the basis of reference data.

In accordance with the fourth aspect of this goal is achieved by increase of legibility of speech in the sound signal, including: the method of audio signal processing to obtain control of the information, subject to filtering for many bands, representing part-time audio signal; and control the filter so that the band audio signal is alternating weakened taking into account the different bands PA basis of reference data.

In accordance with the sixth aspect of this task is solved by the use of a device combining characteristics to define the parameters of combine, including obtaining time-series of short-term spectral representations prepared by a sound signal, which is known for the control information for the filter increase of legibility of speech in the frequency range; removing at least one characteristic in each band of the set frequency range for many short-term spectral representations of at least one characteristic representing the spectral form of short-time spectral representation in a range of frequencies many bands; the filing of a device combining characteristics with at least one characteristic for each band; calculation of control information using interim options combining; various intermediate parameters combination; comparison of changing the control information with already known to control information; update intermediate parameters fit, when changing intermediate options for combining results in a for tests, it is appropriate already known to the control of information.

In accordance with the seventh aspect of this problem is solved with the help of a computer program to perform, while working on a computer with any of the methods of the invention.

The present invention is based on determining that attributable to the range of information about spectral shape audio signal within a specific range is very useful parameter to determine the key information for the filter increase of legibility of speech. In particular, the characteristic of certain information relating to the spectral form for a variety of ranges and for many subsequent short-term spectral representations, gives a useful description of the characteristics of sound signal to perform increase of legibility of speech audio signal. In particular, a number of characteristics of the spectral form, where each characteristic spectral form is correlated with a range of set of spectral bands, such as a strip of bark, or generally ranges with variable bandwidth the range of frequencies already provide a number of useful functions to determine the signal/noise ratio for each range. This characteristic of the spectral shape for many ranges are handled by a device combining characteristics for combining these characteristics with the help of parameters in combination with the aim of obtaining the control information for the filter increase of legibility of speech for part-time audio signal for each band. Preferably, that the device combining characteristics includes the neural network, which is operated by many parameters combination, where combining these parameters are determined in the configuration phase, which is carried out before doing the actual filtering increase of legibility of speech. In particular, neural networks represent a method of regression neural network. A particular advantage is that the parameters of the combination can be determined in the preparatory phase with the use of audio material, which may differ from the actual speech signal with high intelligibility, so that the phase of the preparation should be executed only once, and after this preparatory phase parameters combination of rigidly fixed and can be applied to every unknown audio signal from the speech, which maps to speech feature of the configured signal. For example, such a verbal response may be a language or group of languages, such as European languages in comparison with Asian languages and etc.

Preferably, that inventive concept evaluates noise, studying the characteristics of speech using highlight the desired characteristics/traits and in neural networks, where cleverly selected characteristics are the direct spectral characteristics low-level, which can be extracted efficient and easy way, and, importantly, can be extracted without major stop losing so inventive concept especially useful for the production of clear noise or counting the signal/noise ratio, even in situations where noise is not stationary, and where there are various noise signals.

Brief description of drawings

Preferred modification of the invention subsequently discussed in more detail with reference to the attached drawings, in which:

Figure 1 - Block diagram of a preferred device or method of audio signal processing;

Figure 2 - Block diagram of a device or method of training device combining characteristics according to the preferred modification of the present invention;

Figure 3 is a Block diagram to illustrate the operation of your device, increase of legibility of speech and methods in accordance with the preferred modification of the present invention;

Fig.8 - Presents a diagram of the sequence of the process to illustrate the preferred implementation calculate the gains for the frequency values and subsequent calculation of the increase of legibility of speech portion of the audio signal;

Figure 9 - example illustrates the spectral weighting, which shows the time the input signal, set signal/noise ratio for a site, the frequency band is set signal/noise ratio for the elements of the resolution frequency after interpolation values of spectral weighing and processed signal time; and

Figure 10 - block diagram the preferred implementation device combining characteristics using a layered neural networks.

A detailed description of the preferred modifications

Figure 1 illustrates the preferred device for audio signal processing 10 to obtain monitoring information 11 filter increase of legibility of speech 12. Filter increase of legibility of speech can be used in different ways, such as a controlled filter to filter audio signal 10 using the control information in the band for each set of frequency bands for upcoming speech audio signal with high intelligibility 13. As will be shown later, controlled filter can also be used as a transformation of time and frequency, where the individually designed amplification coefficients are used for spectral values or for the spectral band with subsequent conversion of frequency and time.

Device figure 1 includes devices feature extraction 14 for the time sequence of short-term spectral representations of a sound signal and to extract at least one of the characteristics in each frequency band many bands for many short-term spectral representations where at least one feature is the spectral form of short-time spectral representation in the band many frequency bands. In addition, the device selection signs of 14 can be used for other characteristics, in addition to the characteristics of the spectral shape. Output device selection signs of 14 there are several characteristics for a short audio spectrum, where these are few characteristics include, at least, one characteristic spectral form for each frequency band is set, consisting at least of 10 or more preferably, for example, from 20 to 30 frequency bands. These characteristics can be used as such or can be processed using conventional treatment or any other treatment, such as the medium-geometric or medium-arithmetic, or the median processing, or other processing of statistical moments (e.g. variance, skewness, ...) to get for each band raw or the average characteristic so that all the raw and/or average characteristics were included in the device combining characteristics 15. Device combining characteristics 15 combines many of the characteristics of the spectral shape and special features using the parameters of combination, which may be provided through the incoming parameter combination of 16 or who are or steadily programmed the device combining characteristics 15 so that you do not want incoming parameter combining 16. Output device combining characteristics is going to control information for the filter increase of legibility of speech for each frequency band or a section of the bandwidth of multiple frequency bands or many plots of frequency bands for part-time audio signal.

Preferably, that the device combining characteristics 15 is introduced as a cycle of regression neural circuit, but the device combining characteristics can also be entered as any other numerically or statistically controlled device combining characteristics that uses any combination surgery to the output characteristics of the device through the determination of features 14, so, in the end, the necessary monitoring information, such as the value of signal/noise, referring to the band or results related to the gain. In the preferred modification of application of neural networks phase required settings (phase adjustment" means the phase in which they study the examples). In this phase, the adjustment unit is being used for the preparation of the device of Association signs of 15, as shown in figure 2. In particular, figure 2 shows this device to configure a device combining characteristics 15 for definition of parameters of combining device combining characteristics. For this unit figure 2 includes a device for the determination of features 14, which is preferably used as the unit of feature extraction 14 figure 1. In addition, the device combining characteristics 15 is also used as a device combining characteristics 15 figure 1.

In addition to the figure 1, figure 2 includes the controller criterion of optimality of 20, that receives the incoming control information for the preparation of a sound signal, as shown in the 21. Phase adjustment is carried out on the basis of known configured sound signals, which have a known relationship speech-to-noise ratio in each range. The share of speech and proportion of the noise, for example, are entered separately from each other and the actual ratio signal/noise bandwidth is measured on the fly", i.e. in the process of studying. In particular, the controller optimality criterion 2 is valid for controlling device combining characteristics, so that the navigation device combining characteristics eats/boot using the characteristics of the device selection signs of 14. Based on these characteristics, and intermediate parameters of combining arising from previous iteration of the program, device combining characteristics 15 subsequently calculates the control information 11. This control information 11 is sent to the controller criterion of optimality and is considered to be the controller of the optimization criterion 20 comparative with control information 21 to set the alert tone. Intermediate options combining change in response to instructions from the controller of the optimization criterion 20, and using these various parameters, calculated an additional set of control information device combining characteristics 15. When further checking information fits better with the control information for sound 21, controller criterion of optimality 20 updates the settings for combining and sends the updated settings combining 16 at device combining characteristics for use in the following you run programs as intermediate parameters of combine. Alternatively, or additionally, the updated settings to combine can be stored in memory for later use.

Figure 4 shows an overview of the process of spectral weighing by allocating the required characteristics in the method of regression neural network. Parameters w neural network are calculated using the indicators SNR subzone signal/noise ratio R t and characteristics of the configured elements x t [k] in the course settings specified on the left side of figure 4. Assessment of noise and filtering increase of legibility of speech is shown on the right side of figure 4.

The proposed concept uses the method of spectral weighing and uses a new method for calculation of the spectral weights. Noise estimate is based on the teaching method and uses a number of inventive functions. Features aimed at establishing differences between the tonal and noisy components of the signal. In addition, proposed characteristics reflect/take into account changes in the properties of a signal on a larger time scale.

Parameter noise assessment presented here may be used for examining the various non-stationary background sounds. Functional evaluation of signal/noise ratio in non-stationary background noise is achieved through the allocation of characteristics and the method of regression neural networks, as shown in figure 4. The actual weight is calculated according to the estimates of the signal/noise ratio in frequency bands whose interval approaches the scale of Barca. The spectral resolution of the evaluation of the signal/noise ratio is very crude, to enable the measurement of spectral shapes in a range.

The left part of figure 4 corresponds to the phase adjustment, which in principle should be executed only once. Procedure in the left part of figure 4, marked as setting 41, includes a reference signal/noise ratio evaluation unit 21, which generates control information 21 to configure the audio input signal to the controller criterion of optimality 20 figure 2. Device retrieval of characteristics of 14 figure 4-side configuration corresponds to the device feature extraction 14 figure 2. In particular, figure 2 refer to show obtain a customized sound signal, which consists of narration and background part. To perform a useful reference background part b t and verbal part of the s t are available separately and added through the adder 43, before entering into the device feature extraction 14. Thus, the output of the adder 43 match the configuration of the audio input signal device feature extraction 14 figure 2.

Device settings neural networks marked on the 15, 20, blocks 15 and 20 and relevant connections, as shown in figure 2 or as realized/implemented by other results of similar links in a set of parameters combining w, which can be stored in memory and 40. These settings combination is then used in the device regression neural network 15, corresponding device combining characteristics 15 figure 1 when using the inventive concept applied, as noted by applying 42 figure 4. Device of spectral weighing in figure 4 correspond to the controlled filter 12 figure 1 and device feature extraction 14 in figure 4, where the right portion corresponds to the device selection signs of 14 to 1.

To determine the best set of characteristics for estimation of the subzone of relations signal/noise has been investigated by a number of 21 different characteristics. These characteristics were combined in a variety of configurations and were assessed by objective measurements and easy listening. The process of selecting characteristics leads to a set of characteristics including spectral activity, spectral flow, spectral density, spectral index, the coefficients of coding with linear prediction, and the corresponding spectral linear prediction. Spectral characteristics of the activity, flow, density and index are calculated on spectral coefficient corresponding to the scale of critical frequency bands.

Characteristics are described in detail taking into account 6. Additional features are characteristic connection triangle spectral activity and characteristic connection scheme triangle triangle low-frequency spectral activity, filtering and spectral flow. The structure of the neural network used in blocks 15, 20 or 15 figure 4 or preferably used in the device combining characteristics 15 figure 1 or figure 2, is discussed in regard to figure 10. In particular, the preferred neural network consists of a layer of incoming neurons 100. In General, can be used n incoming neurons, i.e. one neuron for each incoming response. It is preferable that the neural network was 220 incoming neurons corresponding to the number of characteristics. Besides, neural network includes the hidden layer 102 with neurons p hidden layer. Generally, p less than n and preferred modification of the hidden layer has 50 neurons. At the output of a neural network includes the facing layer 104 q neurons in the output. In particular, the number of neurons in the output is equal to the number of frequency bands, so that each neuron of the output provides control information for each frequency band, such information as the ratio of signal-to-noise ratio (speech-to-noise») for each band. If, for example, there are 25 different frequency bands available at the desired width of the range, which increases from low to high frequencies, the number q of neurons in the output will be equal to 25. Thus, the neural network is used to assess the subzone/plot band signal/noise ratio of the calculated lower-level characteristics. Neural network as indicated above, has 220 incoming neurons and one hidden layer with 102 50 neurons. The number of neurons in the output is equal to the number of frequency bands. Preferably, that hidden neurons include a feature activation, which is hyperbolic tangent, and enable function beyond neurons is an identity.

As a rule, each neuron layer of 102 or 104 receives all relevant inputs that are relative to the layer 102 outputs all incoming neurons. Then, each neuron layer 102 or 104 performs a weighted addition, where the weights correspond to the parameters of combine. The hidden layer may include offset values in addition to the parameters. Then, the offset values are also belong to the parameters of combine. In particular, each input can be weighted by their respective parameter combination. Output weighing transaction, which is defined by a standard module 106 figure 10, is the entrance to the adder 108 within each neuron. The output of the adder or entrance into a neuron can contain a nonlinear function 110, which can be placed at the output and/or input neuron for example, in the hidden layer, respectively.

Calculation of the neural network are configured on a combination of pure speech signal and the background noise, where the calculation of the reference signal/noise ratio is done using signals. The configuration process is illustrated on the left side of figure 4. Speech and noise mixed with the signal/noise ratio at 3 dB for each item and served in the device feature extraction. This signal/noise ratio is constant over time and the value of the wide frequency range signal to noise ratio. The dataset consists of 2304 combinations of 48 speech signals and 48 noise signals every 2.5 seconds long. Speech signals taken from various speakers in 7 languages. Noise signals is a record of traffic noise, the crowds of different noises unpolluted atmosphere.

For this spectral weighting rule it is necessary to determine the output of the neural network: neural networks can be configured using the reference values for time-varying subzone/sites band signal/noise ratio R(W) or using the spectral weight G(W), derived from the values of the signal/noise ratio. Modeling with subzone/plot band signal-to-noise ratio as a reference values gave the best objective results and a high score in the course of a simple listening compared to networks, which were prepared with the help of spectral weight. The neural network is configured using 100 loop iterations. Tuning algorithm used in this paper is based on differentiated of conjugate gradients.

Preferred modification of spectral weighing operations 12 will be discussed successively. Estimated indicators of the plot band signal/noise ratio are linearly predicted in relation to the frequency resolution of the incoming spectra and converted into a linear relationship

R ^

. Linear and plots the signal/noise ratio is smoothed out over time and frequency using IIR low-pass filtering to reduce the distortions that can occur as a result of errors of assessment. Low-frequency filtering frequency is necessary in the future to reduce the effect of the circular convolution, which occurs if the pulse transfer function of spectral weighing exceeds the length of the DFT framework. It is held twice, while the second filtration is carried out in reverse order (starting with the last of the sample), so that the final filter has the zero phase.

Figure 5 illustrates gain/transfer as a function of the signal/noise ratio. Applied gain (solid line) are compared with the ratio of spectral subordination (dashed line) and Wiener filter (dashed line).

Indicators of spectral weighing calculated in accordance with rule modified spectral subtraction in equation 5 and limited to -18 dB.

G ( ω ) = { R ^ ( ω ) α R ^ ( ω ) α + 1 | R ^ ( ω ) ≤ 1 R ^ ( ω ) β R ^ ( ω ) β + 1 | R ^ ( ω ) > 1 (5)

Parameters a=3.5 and beta=1 are determined experimentally. This is typical attenuation above 0 dB signal/noise ratio chosen in order to avoid distortion of the speech signal at the expense of residual noise. Attenuation curve as a function of the signal/noise ratio shown in figure 5.

Figure 9 shows an example of the input and output signals, indicators calculated plot band signal Shumi spectral weight. In particular, the Fig.9 has an example of spectral weighing: time signal input, calculated on a plot of frequency signal/noise ratio, calculated signal/noise ratio in the elements of the resolution frequency after interpolation, the spectral weight and the processed signal time.

Figure 6 illustrates an overview of preferred characteristics extracted device feature extraction 14. Device feature extraction is for each low resolution bandwidth, i.e. for each of the 25 frequency bands, for which you want to signal to noise ratio or the increasing magnitude, characteristics, represents the spectral form of short-time spectral representation in the band. The spectral shape of the bar represents the distribution of power/activity within the band and can be used by different calculation rules.

Characteristic of the preferred spectral form is a measure of the spectral density (SMEs), which is the geometrical values of the spectral values, divided by the arithmetic average of spectral magnitude. In determining / value force can be applied to each spectral value in the band before performing the N-th operation with root or operation of averaging.

As a rule, flatness the spectral measure can also be calculated when the power for the processing of each spectral magnitude in the calculation formula for SMEs in the denominator above the power used for the numerator. Whereas the denominator and the numerator can include arithmetic mean value calculation formula. For example, the power in the numerator equal to 2 and power in the denominator equal to 1. As a rule, power, used only in the numerator, should be greater than the power used in the denominator to obtain a generalized measure of the spectral density.

As can be seen from this calculation, that SMEs for the band, in which the activity is evenly distributed over the entire frequency band less than 1, and for many lines of frequency, at small values of the approaching 0, and in the case when the activity is concentrated in one spectral value within the zone, for example, the importance of SMEs is 1. Thus, the high importance of SMEs determines the range in which the activity is concentrated in a certain position in the band, while a small value SMEs shows that the activity is evenly distributed within the band.

Other characteristics of the spectral forms include spectral index, which reflects the asymmetry of the distribution around its center of gravity. There are other characteristics that are associated with the spectral form of short-term frequency representation within a specific range of frequencies.

While the spectral shape is calculated for the frequency range, there are other characteristics that are calculated for the frequency range, as shown in Fig.6, and further discussed in detail There are also additional features, which are not required to be calculated for a range of frequencies, but which are calculated for full throughput.

Spectral flow SF is defined as the difference between the spectra of consecutive timeframe 20 and often is entered using the distance function. In this work, spectral flow is calculated using the Euclidean distance in equation 6, with spectral coefficients of X(m,k), index, time frame m, plot index bands r, bottom and the top of the frequency band l r u and r, respectively.

S F ( m , r ) = aff q = l r u r ( | X ( m , q ) | - | X ( m - 1, q ) | ) 2 (6)

Measure the spectral density. There are various definitions for calculating the density of the vector or tone of the spectrum (which is inversely proportional to the density of the spectrum). The spectral measure of the density of SMEs, used here, is calculated as the ratio of geometric mean values and the arithmetic average of the spectral coefficients L signal plot strip, as shown in equation 7.

S M F ( m , r ) = ( aff q = l r u r log ( | X ( m , q ) | ) ) e / L 1 L aff q = l r u r | X ( m , q ) | (7)

Spectral index

The indicator measures the distribution asymmetry around the center of gravity and is defined as the third Central moment of the random variable divided by the cube of its standard deviation.

The coefficients of the linear prediction

Coefficients coding with linear indicator are the coefficients of the filter with some poles, which predicts the actual value of x(k) time series from the previous values so that the square of the error/square error

E = aff k ( x ^ k - x k ) 2

minimized.

x ^ ( k ) = - aff j = 1 p α j x k - j (8)

Coefficients coding with linear prediction is calculated using the method of autocorrelation.

The coefficients of the Fourier cosine transform for the frequency of pure tones power Spectra are in accordance with the Mel-scale using triangular weighting functions with specific weight for each frequency band. The coefficients of the Fourier cosine transform for the frequency of pure tones are calculated by the log and computation of the discrete cosine transform.

The coefficients of the corresponding spectral linear prediction (RAST A-PLP). The coefficients of the corresponding spectral linear prediction (RASTA-PLP) [.Hermansky, ., "RASTA-speech processing", IEEE Trans. To a question on speech and audio processing, vol. 2, no. 4, .578-589, 1994] calculated from the power spectra by step:

1. The amount of shrinkage spectral coefficients

2. Band-pass filtering activity bandwidth over time

3. The value of an expansion that relates to reverse processing step 2

4. Multiplication of weight, which corresponds to the curve of equal volume of sound

5. Simulation of perception volume by increasing power factors 0,33

6. A model calculation with some poles result of a spectrum by means of the method of autocorrelation.

The coefficients of the linear prediction (FMC)

The values of the FMC are calculated by analogy with the coefficients of the corresponding spectral linear prediction but without the use steps 1 to 3 [.Hermansky ", Analysis of perceptual linear prediction for speech", J.Ac. Soc. Am., the fact. 87, no. 4, .1738-1752, 1990].

Call features a triangle

Call features a triangle have been successful in the past in automatic speech recognition and classification of audio content. There are different ways to calculate them. Here it is calculated by convolution time sequence of functions with linear slope length of 9 samples (sampling frequency time series of the function is equal to the frequency of personnel STFT). Characteristics of the scheme triangle a triangle are obtained through the use of a triangular transaction to the characteristics of the connection triangle.

As noted above, preferably share the bandwidth with low resolution, which are similar to the state of perception of the human auditory system. Thus, linear separation of the band or the separation of the strips on the principle of Barca is preferred. This means that bandwidth with low average frequency than the strip with a high frequency. In calculating the measure spectral density, for example, summarizes the operation increases from the values of q, which, as a rule, is the lowest frequencies in the band and increases the value of the account u r , which is the highest spectral value of the fixed band. To get a better measure of the spectral density, it is preferable to use in the lower bands, at least, some or all of the spectral values of the lower and/or upper adjacent frequency bands. This means, for example, that the measure of the spectral density for the second strip is calculated using the spectral values of the second strip and, in addition, with the use of spectral values of the first bands and/or a third of the band. In the preferred modification, are used not only spectral values of the first or second runway, but are used and spectral values of the first and third of the band. This means that when calculating the SMEs for the second strip q in equation (7) is increased from l r , equal to the first (smallest) spectral value of the first runway and u r equal to the very high spectral value in the third lane. Thus the characteristic spectral form, which is based on a higher number of spectral values can be calculated to a specific range of frequencies, where the number of spectral values in the framework of the fringe adequately to the fact that the value of l, r, and u r show the spectral values in the same range as low-resolution frequency.

With regard to the coefficients of linear predictions that can be obtained device feature extraction, preferably, or use an encoding with linear prediction a j equation (8) or residual/incorrect values remaining after optimization, or any combination of the coefficients and the error values, such as multiplication or addition with a coefficient of normalizing so that the coefficients and values of quadratic errors influenced the characterization of the coding linear prediction, the selected device feature extraction.

The advantage of the characteristics of the spectral shape is that it is a characteristic of the smallest dimension. When, for example, takes account of the frequency range from 10 complex or real spectral values, using all of these 10 of complex or real spectral values not be useful, and will be a waste of computing resources. Thus, stands out/ejected characteristic spectral form that has a dimension that is less than the dimension of the initial data. When, for example, takes into account the activity, then the original data are of dimension 10, because there are 10 quadratic spectral values. To remove characteristics Spectro-forms, which can be efficiently used, extracted feature Spectro-forms whose dimension is less than the dimension of source data and that, preferably, is 1 or 2. Similar reducing the dimensionality given the raw data can be obtained, when, for example, is the selection of the polynomial low to envelope spectrum bands. When, for example, are installed on only two or three options, the characterisation of Spectro-form includes the two-or three-parameter polynomial or any other production system parameters. In General, all the settings that indicate the activity distribution in the band and who have low dimensionality, less than 5% or at least, less than 50% and less than 30% of the dimension of source data.

It was found that the use of spectral characteristics of forms, as it already leads to the preferred device features for audio signal processing, but it is preferable to use at least a characteristic of the band. Moreover, it was shown that the additional characteristics of the bands used in ensuring the improvement of the results is a spectral activity in a band, which is calculated for each time frame and frequency bands and standardized all activities time frame. This feature can be filtered or no filters with low bandwidth. In addition, it was found that the addition of the characteristics of the spectral flow profitable improves performance invented the machine so that an effective procedure to achieve good performance, when the characteristic spectral form in the band is in addition to the characteristic spectral activity in the band and characteristic spectral flow in the band. Along with additional features, this again improves performance invented apparatus.

Next will be discussed Fig.7 and 8 in order to ensure the preferred use of the device feature extraction 14, as shown in figure 1, figure 2 or figure 4. As a first step audio signal in order to ensure a block of random variable audio signals, as indicated in paragraph 70. Applicability of overlap. This means that one and the same voice tag sounds in two neighbour frames due to the overlap of the range, where it is preferable overlap 50% taking into account the uncertainties of audio signals. In step 71, is temporarily/frequency unit conversion sample values audio signals with the purpose of obtaining frequency representation with the first resolution, which is the highest resolution. In these purposes it turns out short time Fourier transform (STFT)entered efficient FFT (fast Fourier transformation). When the step 71 is used several times with temporarily successive blocks of selective value of the audio signal, it turns out spectrogram, known in this area. In step 72 spectral information and high-resolution, i.e. the spectral values with high resolution are grouped in the frequency bands with low resolution. When, for example, applies FFT with indicators of 1024 or 2048 incoming values, there are spectral values 1024 or 2048, but this resolution is not required and is not intended. Instead grouping step 72 led to the division of the spectrum of high resolution in a small number of bands, because such bands with different bandwidth known from strips of bark or logarithmic separation of the band. Then, in compliance with step grouping 72, calculation 73 characteristics of the spectral shape and preferably other characteristics is carried out for each band of low resolution. Although this is not indicated on Fig.7, additional characteristics associated with the entire frequency range can be calculated using the data collected in step 70, as these characteristics fully populated strip does not require any spectral separation, obtained in step 71 or 72.

In the result of step 73 data on the spectral shape, which have sizes m, where m is less than n and, preferably, equal to 1 or 2 in the band. This means that information for bandwidth provided after step, 72,is compressed to information from low parameters shown after step 73,the action of device feature extraction.

As indicated by figure 7, the next step 71 and step 72, temporarily step-frequency conversion and grouping can be exchanged for various operations. Output step 70 can be filtered using a set of filters, low-resolution, that, for example, so that the output obtained 25 signals plot strip. Can then be analyzed with high expansion of each area of the bar to get basic data for the calculation of the characteristics of the spectral shape. This can be done, for example, FFT analysis of the signal plot strip or any other signal analysis plot strip, for example, through the additional located cascade filter set.

Fig.8 illustrates the preferred order of input managed filter 12 figure 1 or characteristics of spectral weighing shown in figure 3 or designated 12 figure 4. In the result of step determine the key information bands with low resolution, such values of the signal-to-noise plot bands that are facing the regression unit neural network 15 figure 4, as indicated in step 80, the linear interpolation in high resolution in step 81. The final aim is to obtain a weighting factor for each spectral values obtained through short-time Fourier transform, implemented in step 30 figure 3 or provided in step 71 or alternative procedure, specified to the right of the steps 71 and 72. In the result of step 81, the result is a value of signal/noise ratio for each spectral value. However, this value of signal/noise is still in logarithmic region, and step 82 provides a logarithmic transformation of the region in the linear region for each spectral values with high resolution

In step 83 linear value of signal/noise ratio for each spectral values, i.e. at high resolution, smoothed out over time and frequency when using IIR filters with low bandwidth or, on the contrary, FIR filters, low-bandwidth, i.e. can be used with any surgery moving average. In step 84 spectral calculation for each value of high resolution frequency are calculated based on the smoothed values of linear signal/noise ratio SNR. This calculation is based on the function indicated on the figure 5, although the function listed on this Fig. is in logarithmic terms, while the spectral calculations for each frequency band with high resolution in step 84 calculated in the linear region.

Outcome of step 86 is a block of random variable, which has improved the performance of speech, that it may be perceived to be better compared with the corresponding incoming audio signal, where there hasn't been improving speech intelligibility.

Depending on specific requirements of applying the proposed method, the inventive methods can be implemented in hardware or in software. Implementation can be accomplished with the use of digital media, in particular, DVD or CD, having read and stored electronically control signals that interact with programmable computer systems to meet the inventive methods. In General, the presented invention is a computer software program code stored on a machine-readable carrier. Program code is used to apply inventive methods, when a computer software product works on a computer. In other words, inventive methods are computer program, which has code to be run at least one of the inventive method, where a computer program running on a computer.

The described modification simply illustrate the principles of the present invention. It is clear that the modification and change the mechanisms and parts described here, will be obvious to other specialists in this sphere. So here is expressed the intention to make restrictions only for the sphere of upcoming patent application, and not for specific details presented here in the form of description and explanation of the modifications.

1. Device for audio signal processing to obtain monitoring information for the filter increase of legibility of speech, including the device of feature extraction to produce a temporal sequence of short-term spectral representations of a sound signal and to extract at least one of the characteristics in each frequency band many bands for many short-term spectral views, at least one characteristic. representing the spectral form of short-time spectral representation in the band many frequency bands, as well as a device combining characteristics for the Association, at least one of the characteristics for each band using the parameters combine to obtain the control information for the filter increase of legibility of speech for the time part of the sound signal.

2. The device of claim 1, wherein the device feature extraction extracts, at least one additional characteristic, representing a characteristic of the short-time spectral representation other than the spectral shape, and where a device combining characteristics in common, at least one additional characteristic, and at least one characteristic for each band using the parameters of combining.

3. The device according to claim 1, where the unit of feature extraction is used for the operation of frequency conversion, in which for the sequence of moments of time is a sequence of spectral representations where the spectral representation have bands with uneven breadth of the band and where bandwidth is higher with increasing average frequency bands.

4. The device according to claim 1, where the unit of feature extraction quickly calculates as the first characteristic, the measure of the spectral density in the band, representing the distribution of radioactivity in the band, or as a second characteristic, the standardized measure of activity in the band, where standardization is based on the full activity sound framework from which the spectral representation and, where the device combining characteristics promptly applies for the use of measures of the spectral density for the band or standardized activity in the band.

5. The device of claim 1 in which the device feature extraction is used for additional extraction for each band measures spectral flow, representing the similarity or difference in time between consecutive spectral representations or measure the spectral index, where the spectral measure of the indicator is asymmetry around the center of gravity.

6. The device of claim 1, wherein the device feature extraction is used for additional extraction characteristics and encode linear prediction LPC, encoding with linear prediction of the LPC., including signal encoding errors with linear prediction LPC coefficients of linear prediction to a certain order or combination of signals coding error with linear prediction LPC coefficients and linear prediction, or want the device to feature extraction is used for additional extraction of FMC ratios or RASTA-PLP ratios or cosine coefficients of the Fourier transform for the frequency of pure tones or characteristics of the connection triangle.

8. The device of claim 1 in which the device feature extraction promptly applied for the calculation of the shape of the spectrum in the frequency range using spectral information of one or two adjacent frequency bands and spectral information only bands.

9. The device of claim 1 in which the device feature extraction quickly is used to retrieve information untreated characteristics for each characteristic in the block audio samples and a combination of sequence information characteristics of raw I'm in the band for the receipt of at least one features for bandwidth.

10. The device of claim 1 in which the device feature extraction promptly used for calculation of each frequency band number of spectral values for the unification of the number of spectral values to obtain at least one characteristic representing the spectral shape, so that at least one feature had dimension, which is less than the number of spectral values in the frequency band.

11. Method of audio signal processing with the purpose of reception of the control information for the filter increase of legibility of speech, including obtaining time-series of short-term spectral representations of a sound signal; allocation of at least one characteristic in each frequency band many bands for many short-term spectral representations, with at least one characteristic, representing the spectral form of short-time spectral representation in the band many bands, and also a combination of at least one of the characteristics for each band using the parameters combine to obtain the control information for the filter increase of legibility of speech for part-time audio signal.

12. Device for increase of legibility of speech in the sound signal, including a device for audio signal processing according to claim 1 to obtain reference data filter for many bands, representing part-time audio signal, and controlled filter, managed so that the band audio signal is alternating weakened in relation to different bands on the basis of reference data.

13. Device in accordance with paragraph 12, in which the device for processing includes the frequency Converter of time, providing spectral information, having a higher resolution than the spectral resolution, for which received the warning information, and processing device additionally contains control information subsequent post processor to interpolate the control information in high resolution, and smoothing interpolated control information for later retrieval treated tests information on which to set the parameters of the controlled filter.

14. Way to enhance speech intelligibility in the sound signal, including the method of audio signal processing in accordance with paragraph 11 to obtain reference data filter for many bands, representing part-time audio signal and control filter to strip the audio signal is alternating weakened in relation to various stripes on the basis of reference data.

15. The device for preparation of the device combining characteristics to define the parameters of combining device combining characteristics, including device selection of signs for receiving temporary a sequence of short-term spectral representations prepared by a sound signal, which is known for the control information for the filter increase of legibility of speech in the band and to extract at least one of the characteristics in each frequency band many bands for many short-term spectral representations at least one characteristic representing the spectral form of short-time spectral representation in the band many frequency bands, as well as the controller optimality criterion for the filing of a device combining characteristics with at least one characteristic for each band, for calculation of the key information with the use of intermediate parameters of combining, for variables intermediate parameters combination, to compare a variable to the control information with a known control information, and to update the intermediate parameters fit, when variables intermediate parameters lead to the control of information, better combined with the famous control information.

16. Method of preparation of the device combining characteristics to define the parameters of combining device combining characteristics, including obtaining time-series of short-term spectral representations of the preparation of a sound signal, which is known for the control information for the filter increase of legibility of speech in the band; obtaining at least one of the characteristics in each frequency band many bands for many short-term spectral representations at least one characteristic representing spectral the form of short-time spectral representation in the band many bands; the filing of a device combining characteristics at least one of the characteristics for each band; f, calculation of control information with the use of intermediate parameters combination; various intermediate parameters combination; comparison of variable control information with a known control information; update intermediate parameters of combining the various intermediate options combining lead to better compliance monitoring information known tests information.

17. Machine-readable medium of information, having the code to be run method on item 11, when the code is running on a computer or processor.

 

© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English.