Method and device for efficient masking of deleted shots in speech coders on basis of linear prediction

FIELD: technological processes.

SUBSTANCE: invention is related to method and device for improvement of masking shots of coded sound signal, which were deleted in the process of transfer from coder to decoder and for acceleration of restoration in decoder, after non-deleted shots of coded sound signal were accepted. When parameters of masking/restoration are determined in coder, they are transferred to decoder, where masking of deleted shots takes place and deleted shots are restored in accordance with parameters of masking/restoration. Masking/restoration parameters may be selected from the group that consists of the following: parameter of signal classification, parameter of energy information and parameter of phase information. Determination of masking/restoration parameters contains classification of sequential shots of coded sound signal as unvoiced shot, unvoiced conversion, voiced conversion, voiced shot or access shot, moreover, this classification is defined on the basis of the least part of the following parameters: parameter of normalized correlation, parameter of spectral gradient, parameter of relationship signal-noise, parameter of parameter main tone stability relative to parameter of shot relative energy and parameter of transition through zero.

EFFECT: improves masking of deleted shots of coded sound signal in the process of transfer from coder to decoder and acceleration of restoration in decoder.

177 cl, 7 dwg, 5 tbl

 

The technical field

The present invention relates to a method of digitally encoding a sound signal, and, in particular, but not exclusively, the speech signal with regard to the transmission and/or synthesis audio signal. In particular, the present invention relates to a stable encoding and decoding audio signals to maintain satisfactory performance in the case of the erased frame (frames), for example, due to channel errors in wireless systems or lost packets in network applications with packet transmission of speech.

The level of technology

In various applications, such as teleconferencing, multimedia and wireless communication, increases the need for effective ways digital narrowband and wideband speech coding provided a reasonable compromise between subjective quality and bit rate. Until recently, applications for speech coding was used bandwidth telephony, limited the range from 200 to 3400 Hz. However, wideband speech applications provide enhanced intelligibility and naturalness of communication compared with the bandwidth of standard telephony. It is established that the bandwidth in the range of 50-7000 Hz sufficient to provide usable is th quality, giving a sense of connection dialog. For normal audio this bandwidth gives an acceptable subjective quality, but still inferior to the quality of the radio in the FM band or as a compact disk (CD), which are in the ranges 20-16000 20-20000 Hz and Hz, respectively.

The speech encoder converts the voice signal into a digital bit stream that is transmitted over the communication channel or stored in a storage environment. The speech signal is then digitizes, i.e. discretizing and quantuum, usually 16 bits per one sample. The speech coder is these digital times, a small number of bits, maintaining a satisfactory subjective speech quality. Speech decoder or synthesizer with transmitted or stored by the bit stream and converts it back into sound.

One of the best ways to achieve a good compromise between subjective quality and bit rate is linear predictive coding with coded excitation (CELP). This encryption technology is the basis of several standards for speech coding in wireless and wired applications. When coding CELP sampled speech signal process sequential blocks of L samples, usually called frames, where L is a predetermined Chi is lo, corresponding typically 10-30 MS. In each frame is calculated and transmitted to the linear predictive (LP) filter. To calculate the LP filter is usually required preview (5-15)-millisecond speech segment from the next frame. The frame of the L samples are divided into smaller units called subquadrate. Usually the number subbarow is three or four, which gives (4-10)-millisecond subsidry. Each subcate the excitation signal is usually derived from two components: the past excitation and innovative fixed excitation code book. The component formed from the past excitation, often referred to as the adaptive excitation code book or excitation of the fundamental tone. The parameters characterizing the excitation signal, is encoded and transmitted to the decoder, where the recovered excitation signal is used as input LP-filter.

As the main application for speech coding with a low bit rate represents a system of wireless mobile communication and network with packet transmission of voice, is becoming very important improving the sustainability of speech codecs in case of Erasure frames. In wireless cellular systems, the energy of the received signal can be frequent and significant fading, which leads to large values of the frequency error bits, and is particularly pronounced at the boundaries of the cell. In this case, the channel decoder is unable to correct errors in a received frame, resulting in bugs detector, which is typically used after the channel decoder, and pronounce such a frame is erased. In network applications with packet transmission of speech, the speech signal is present in the form of packets, where each packet typically contains 20-millisecond frame. In communication systems with packet-switched packets in the router can be missing if the number of packets turned out to be very large or package I can reach the receiver after a long delay and shall be declared lost if the delay was greater than the length of the jitter buffer at the receiver side. In these systems codec happen erase frames, the frequency of which is usually 3 to 5%. In addition, the use of wideband speech coding is an important valuable quality of these systems, allowing them to compete with traditional public switched telephone network (PSTN), where the use of traditional narrowband speech signals.

Adaptive code book or a predictor of the primary colors in the CELP method plays an important role in maintaining high speech quality at low bit rate. However, since the content of the adaptive codebook based on the signal from the previous frame, the model code is and is sensitive to frame loss. When erasing or loss of frame content of the adaptive codebook at the decoder is different from its content in the encoder. Thus, after masking the lost frame and receive suitable subsequent frames of the synthesized signal is taken in suitable frames differs from the expected signal synthesis due to a change in the contribution of the adaptive codebook. The impact of the lost frame depends on the nature of the speech segment, which has been erased. If erasing appeared in the stationary segment of the signal, then you can perform effective masking erase frame, and the impact on subsequent suitable frames can be minimized. On the other hand, if the erase appeared at the beginning of the speech or in the transition region, the effect of erasing may be distributed over multiple frames. For example, if you have lost the beginning vocalized segment, then the content of the adaptive codebook lost the first period of the fundamental tone. This will seriously affect the predictor the main tone in the subsequent suitable frames that will lead to a large time delay before it will be ensured convergence of the signal synthesis to the estimated signal in the encoder.

The invention

The present invention relates to a method for improving dropout erase frames, which are called frames of the encoded sound signal, erased during transmission from the encoder to the decoder, and to accelerate the recovery of the decoder after were adopted sestertii frames of the encoded sound signal, and the method includes:

the definition in the encoder settings mask/restore;

transfer to the decoder parameters mask/restore defined in the encoder; and

in the decoder, the implementation of the masking erase frames and recovery in the decoder in accordance with the parameters of the mask/restore.

The present invention also relates to a method for masking erase frames, called frames, erased during transmission of the audio signal encoded according to the shape parameters of the encoding signal from the encoder to the decoder, and to accelerate the recovery of the decoder after were adopted sestertii frames of the encoded sound signal, and the method includes:

the definition of the decoder parameters mask/restore of settings signal encoding;

in the decoder, the implementation of the masking erased frames and recovery in the decoder in accordance with the parameters of the mask/restore.

According to the present invention also proposes a device for improving dropout erase frames, called frames of the encoded sound the signal erased during transmission from the encoder to the decoder, and to accelerate the recovery of the decoder after were adopted sestertii frames of the encoded sound signal, and the device includes:

means for determining, in the encoder settings mask/restore;

means for transmitting to the decoder parameters mask/restore defined in the encoder; and

in the decoder, a tool for the implementation of the masking erase frames and recovery in the decoder in accordance with the parameters of the mask/restore.

According to the invention, furthermore, a device for masking erase frames, called frames, erased during transmission of the audio signal, the encoded form-based coding parameters of the signal from the encoder to the decoder, and to accelerate the recovery of the decoder after were adopted sestertii frames of the encoded sound signal, and the device includes:

a means for determining the decoder parameters mask/restore of settings signal encoding;

in the decoder, a tool for the implementation of the masking erase frames and recovery in the decoder in accordance with the parameters of the mask/restore.

The present invention also concerns a system for encoding and is kodirovanija audio signal decoder, audio signal, using the above specified devices to improve dropout erase frames, called frames of the encoded sound signal erased during transmission from the encoder to the decoder, and to accelerate the recovery of the decoder after were adopted sestertii frames of the encoded sound signal.

The above and other objectives, advantages and features of the present invention are explained in the following, non-restrictive description of illustrative options for its implementation, are given only as examples, with reference to the accompanying drawings.

Brief description of drawings:

figure 1 - block diagram of the voice communication system illustrating the application of the device of the speech encoding and decoding according to the present invention;

figure 2 - block diagram of example devices wideband codec (AMR-WB encoder);

figure 3 - block diagram of example devices wideband decoding (AMR-WB decoder);

4 is a simplified block diagram of the AMR-WB encoder of figure 2, where the module subdescriptor module high-pass filter and the module predisease filter grouped into a single module preprocessing and where the search module main tone with feedback, the transmitter module response at zero input signal, the pulse generator module characteristics module search innovation is built excitation and module updates to memory are grouped into a single module, search the main tone and innovative codebook feedback;

figure 5 - expansion of the block diagram in figure 4, in which the modules added, referring to the illustrative version of the present invention;

6 is a block diagram clarifying the situation with the formation of an artificial attack; and

Fig.7 is a diagram showing an illustrative variant of a finite state machine classification of frames to mask erase.

Detailed description of illustrative options

Although in the following description of illustrative variants of the present invention is described with reference to the speech signal, it should be borne in mind that the concepts of the present invention are equally applicable to signals of other types, in particular, but not exclusively, to the sound signals of other types.

Figure 1 shows a system 100 voice communication using speech encoding and decoding in the context of the present invention. The voice communication system 100 of figure 1 supports the transmission of voice through the channel 101 communication. Although it may contain, for example, wire, optical line or fiber line, channel 101 link usually contains, at least partially, line RF link. Line RF communication often supports multiple simultaneously running speech transmission that requires sharing of bandwidth resources that can be found, for example, in cellular systems is telephony. Although not shown, the channel 101 may be replaced by a storage device in the embodiment of system 100 with a single device, where the encoded speech signal is recorded and stored for later playback.

In the voice communication system 100 of figure 1, the microphone 102 generates an analog speech signal 103, which is served in the analog-to-digital (A/D) Converter 104 to convert it into a digital speech signal 105. Digital encoder 106 encodes the digital speech signal 105, creating a set of parameters 107 encoding signal that is encoded in binary form and are delivered to the channel encoder 108. Optional channel encoder 108 adds redundancy in the binary representation of parameters 107 encoding signal before it is transmitted over the channel 101 connection.

In the receiver channel decoder 109 uses the redundant information in the received stream 111 bits for detecting and correcting channel errors introduced during transmission. Speech decoder 110 112 converts the stream of bits received from the channel decoder 109, back to the set of encoding parameters of the signal and generates a restored encoding parameters of digital signal synthesized speech signal 113. Digital synthesized speech signal 113, restored in the speech decoder 110, is converted to analog form 114 digital-to-analogy is new (D/A) Converter 115 and reproduced through the block 116 speaker.

Disclosed in the present description of an illustrative option effective way of masking erase frames can be used narrowband or wideband codecs with linear prediction. This illustrative variant of the invention is disclosed as applied to wideband speech codec standards developed by the International telecommunications Union (ITU) in the form of Recommendations G722.2 known as codec AMR-WB (adaptive multirate wideband codec) [ATU-T Recommendation G. 722.2 "Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)", Geneva, 2002]. This codec has also been selected for the Project third generation partnership (3GPP), intended for wideband telephony wireless systems of the third generation [3GPP TS 26.190, "AMR Wideband Speech Codec: Transcoding Functions", 3GPP Technical Specification]. AMR-WB codec can work with 9 bit rate lying in the range from 6.6 to 23,85 kbit/s for illustrative purposes the present invention is used, the bit rate equal to 12,65 kbit/s

It should be understood that this illustrative version of effective masking erase frames can be used for codecs other types.

In the following sections, first provides an overview of the AMR-WB encoder, and AMR-WB decoder. Then disclosed illustrative version of a new approach to improving ostoich the stability of operation of the codec.

A General idea of the AMR-WB encoder

Sampled speech signal is encoded on a block-by-block basis by device 200 coding according to Fig. 2, which is divided into eleven modules are numbered from 201 to 211.

Thus, the input speech signal 212 process unit basis, that is, in the above-mentioned blocks of L samples called frames.

According to figure 2, the input speech signal 212 is exposed subdirectly low frequency module 201 subdescriptor. The signal subjected to subdirectory with decreasing frequency from 16 to 12.8 kHz using methods well-known to specialists in this field of technology. The downsampled improves coding efficiency, is encoded as a smaller bandwidth. It also reduces the algorithmic complexity by reducing the number of samples in the frame. After subdirectly frequency frame of 320 samples with a duration of 20 MS is reduced to a frame of 256 samples (coefficient of subdirectly is 4/5).

Then the input frame is supplied in the optional module 202 preprocessing. Module 202 preprocessing can consist of a high-pass filter with a cutoff frequency of 50 Hz. The filter 202 upper frequencies eliminates unwanted audio components with a frequency below 50 Hz.

The signal that has passed downsampled and preprocessing,denoted as s p(n), n=0,1,2,...,L-1, where L is the frame length (256 when the sampling frequency of 12.8 kHz). In the illustrative embodiment, predisease filter 203 to the signal sp(n) are stored predistortion using a filter having the following transfer function:

P(z)=1-μz-1,

where μ - pre-emphasis coefficient with a value lying between 0 and 1 (default value μ 0.7). The purpose predisease filter 203 is to increase the high frequency content of the input speech signal. It also reduces the dynamic range of the input speech signal, making it more suitable for the implementation of algorithms with a fixed point. Predistortion also play an important role in achieving the right outcome perceptual weighting quantization errors, thus improving sound quality. Said explained in more detail below.

Output predisease filter 203 is designated as sp(n). This signal is used to perform LP analysis module 204. LP-analysis refers to methods well-known to specialists in this field of technology. In this illustrative embodiment, the implementation uses the autocorrelation method. When the autocorrelation method, the signal sp(n) is first subjected to a treatment usually using a Hamming window with a length of 30-40 MS. On the basis of the E. this signal, processed by the method of the window are calculated autocorrelation values, and to calculate coefficients andjLP-filter use the Levinson recursion-turbine, where j=1,...p and where p is the order of the LP, which is usually equal to 16 when wideband encoding. The parameters ajare the coefficients of the transfer function A(z) LP-filter, which is defined by the following ratio:

LP analysis is performed in module 204, which performs quantization and interpolation coefficients LP-filter. The coefficients of the LP filter is first converted into an equivalent area, more suitable for quantization and interpolation. The field of linear spectral pairs (LSP) and the spectral pairs of immittance (ISP) are two areas in which it is possible to effectively perform the quantization and interpolation. 16 the coefficients of the LP filter ajcan quantize using the order of 30-50 bits through a split or multi-stage quantization or combinations thereof. The purpose of interpolation is the ability to update the coefficients of the LP-filter in each cupcake when they are transmitted simultaneously in each frame, which improves the performance of the encoder without increasing the bit rate. Because there is no doubt that the quantization and interpolation coefficients LP-filter is well known to specialists in the Anna field of technology, they further in the present description are not described.

The following describes the rest of the encoding operation performed on the basis of subbarow. In this illustrative implementation, the input frame is divided into 4 subcode 5 MS (64 counts at the sampling frequency of 12.8 kHz). In the following description of the filter A(z) denotes sequentually interpolated LP filter Subhadra, and the filter In the(z)denotes the quantized interpolated LP filter Subhadra. The filter In the(z)feeds each Subcat in the multiplexer 213 for transmission over the communication channel.

In coders "analysis by synthesis" search parameters optimal pitch and newly introduced parameters is performed by minimizing the mean square error between the input speech signal 212 and the synthesized voice signal in the perceptual weighted area. The weighted signal sw(n) is calculated in the perceptual weighted filter 205 in accordance with the signal s(n) from predisease filter 203. Used perceptual weighted filter 205 with a fixed denominator, suitable for wideband signals. An example of a transfer function for the perceptual weighted filter 205 is defined by the following ratio:

W(z)=A(z/y1)/(1-y2z-1), where 0<y2<y1.

To simplify the analysis of the primary colors first module 206 searches the main tone of the be the feedback based on the weighted speech signal s w(n) is the time-delay TOLthe main tone without feedback. Then the analysis of the fundamental tone with feedback performed in module 207 search main tone with feedback on subcatalog basis, is limited in the vicinity of lag TOLthe main tone without feedback, which greatly simplifies the search LTP parameters: T (lag pitch) and b (strengthening the fundamental tone). Analysis of main tone without feedback is usually performed in module 206 every 10 MS (2 Subhadra) using methods well-known to specialists in this field of technology.

First it calculates the desired vector x for the analysis of LTP (long term prediction). Usually this is done by subtracting the response of soat zero input signal weighted filter synthesis of W(z)/(z)from the weighted speech signal sw(n). This response soat zero input signal is calculated by the evaluator 208 response at zero input signal according to the quantized interpolated LP filter In the(z)module 204 LP-analysis, quantization and interpolation, and the initial States of the weighted filter synthesis of W(z)/(z)stored in the module 211 to upgrade the memory in accordance with the LP filter A(z) and(z)and the vector u of excitation. This operation is well known to experts in the field of technical and therefore not further described.

In the generator 209 of the impulse response is calculated N-dimensional vector h of the impulse response of the weighted filter synthesis of W(z)/(z)using the coefficients of the LP filter A(z) and(z)module 204. This operation is well known to experts in the art and therefore not described in detail.

The parameters b, T, and j the main tone (or codebook pitch) feedback is calculated in module 207 search main tone with feedback, where the input is the desired vector x, the vector h of the impulse response and the delay time TOLthe main tone without feedback.

Search main tone is to nd the best values of the delay T and gain b primary colors that minimize the weighted mean square error of prediction of the primary colors, for example,

,

where j=1,2,...k

between the target vector x and the scaled filtered version of the previous excitation.

In particular, in this illustrative implementation, the search for the fundamental tone (codebook pitch) contains three stages.

At the first stage module 206 searches the main tone without feedback is estimated delay TOLthe main tone without feedback in accordance with the weighted speech signal s w(n). As shown above, the analysis of the main tone without feedback is usually performed every 10 MS (two Subhadra) using methods well-known to specialists in this field of technology.

In the second stage module 207 search main tone with feedback search criterion To search for integer values lag the main tone in the vicinity of the estimated lag TOL(usually ±5) main tone without feedback, which greatly simplifies the search procedure. To update the filtered code vectors yT(this vector is defined in the following description) uses a simple procedure that does not require the calculation of the convolution for each lag of the fundamental tone. An example of a criterion To search is specified by the expression

where t denotes the transposed vector.

As soon as the second stage was found optimal integer value of the fundamental tone, the third stage of the search module (207) using the criterion From the search check of the fraction in the vicinity of the optimal integer value of the fundamental tone. For example, in the standard AMR-WB resolution is used for subotsky equal to 1/4 and 1/2.

In the broadband signals harmonic structure exists only up to a certain frequency, depending on the speech segment. Thus, to ensure the Oia effective representation of the contribution of the fundamental tone in the voice segments of the wideband speech signal necessary flexibility to change the frequency in the broadband spectrum. This is achieved by processing the code vector of the fundamental tone through a set of filters forming frequency (for example, low-pass filters or band-pass filters). Then, you select a shaping filter frequency that minimizes the root mean squared weighted error e(j). The selected shaping filter frequency is determined by the index j.

Index T codebook pitch is encoded and transmitted to the multiplexer 213 for transmission over the communication channel. The gain b of the main tone quantized and transmitted to the multiplexer 213. To encode the index j is used, an additional bit, and this extra bit is also fed to the multiplexer 213.

Once determined the parameters b, T, and j the fundamental tone or LTP (long term prediction), the next step, in which the module 210 new search excitation of figure 2 finds the optimal new excitement. First updated the desired vector x by subtracting the contribution of LTP:

x'=x-byT,

where b is the gain of the fundamental tone, and yT- filtered vector codebook pitch (past excitation with delay T, filtered by a shaping filter frequency (index j) and subjected to convolution using the impulse response (h).

Search new excitation issue is neetsa in the code book innovations for finding the optimal code vector excitation with kand the gain g which minimize the root mean squared error E between the target vector x' and the scaled filtered version of the code vectorkfor example:

where H is a lower triangular convolution matrix derived from a vector h of the impulse response. The index k codebook innovations corresponding to the optimal code vectorkand the gain g are fed into the multiplexer 213 for transmission over the communication channel.

It should be noted that the code book innovations is a dynamic code book consisting of the algebraic codebook with subsequent adaptive pre-filter F(z), which increases the specific spectral components, to improve the quality of synthesized speech according to U.S. patent No. 5444816 issued by Adoul and others 22 August 1995 In this illustrative implementation, the search in the code book of the innovations done in module 210 through the algebraic codebook, as described in U.S. patent No. 5444816 (Adoul and others), issued August 22, 1995; No. 5699482 issued by Adoul, etc. 17 December 1997; No. 5754976 issued by Adoul, etc. may 19, 1998; and No. 5701392 (Adoul and others), dated December 23, 1997

A General idea of the AMR-WB decoder

Speech decoder 300 figure 3 illustrates the different steps start the traveler from the digital input signal 322 (input bit stream to the demultiplexer 317) to output a discrete speech signal 323 (the output of the adder 321).

The demultiplexer 317 selects binary information (input stream 322 bits)derived from the digital input channel, the parameters of the model synthesis. From each of the received binary frame there are the following options:

the quantized interpolated LP coefficients In(z), also called the parameters of the short-term prediction (STP), which are created for each frame;

the parameters T, b, and j (for each subcode) for long-term prediction (LTP); and

the index k codebook innovations and gain g (for each subcode).

Current speech signal is synthesized on the basis of these parameters, as explained below.

Code book 318 innovations in response to the index k generates a code vectorkinnovations that scales the decoded gain g through amplifier 324. In the illustrative embodiment, the implementation of the code book innovations, as described in the aforementioned U.S. patent No. 5444816, 5699482, 5754976 and 5701392 use to create code vectorkinnovation.

Formed scaled code-vector at the output of amplifier 324 is processed frequency-dependent corrector 305 main tone.

Correction of the frequency of the excitation signal u improves the quality of voice segments. The correction frequency is achieved by filtering code is the sector with kinnovations from a codebook of innovations (fixed) by the filter F(z) innovations (corrector 305 main tone), frequency response which introduces predistortion on the higher frequencies more than lower frequencies. The filter coefficients F(z) innovations associated with the value of the periodicity of the excitation signal u.

Effective illustrative method of producing filter coefficients F(z) innovations is their binding to the magnitude of the contribution of the main tone in General the excitation signal u. This leads to a dependence of the frequency characteristics from the frequency subbarow, and predistortion higher frequencies are stronger (stronger overall decline) for higher gain values of the fundamental tone. The filter 305 innovations has the effect of increasing the energy code vectorkinnovations at lower frequencies, when the excitation signal u more frequently, which improves the frequency of the excitation signal u rather at lower frequencies than at higher frequencies. The proposed form for the filter 305 innovations as follows:

where α - coefficient of frequency, obtained the level of periodicity of the excitation signal u. The factor periodicity α computes the I generator 304 coefficients vocalizations. First generator 304 coefficients vocalizations is calculated coefficient vocalizations rVas

rv=(Ev-EC)/(EV+EC),

where EV- energy scaled code-vector bvTand EC- energy scaled code-vector gckinnovations, that is,

and

Note that the value of rVlies between -1 and 1 (1 corresponds to purely vocalized signals, and -1 corresponds to a purely neoclitoris signals).

The above scaled code-vector bvTthe main tone is created by applying a delay T of the fundamental tone to the code book 301 main tone to create a code vector of the fundamental tone. Then the code vector of the fundamental tone is processed in the filter 302 lowpass cutoff frequency is selected in accordance with the index j of the demultiplexer 317, to create a filtered code vectors bTthe main tone. Then the filtered code vectors vTthe main tone is amplified with the amplification factor b main tone amplifier 326 to generate the scaled code-vector bvTthe main tone.

In this illustrative implementation, then the generator 304 coefficients vocalizations is calculated by the ratio α according to the expression

which corresponds to a value of 0 for a purely newcaledonia signals and a value of 0.25 for purely vocalized signals.

Thus, the adjusted signal cfis calculated by filtering the scaled code-vector gckinnovations in the filter 305 (F(z) innovations).

The adjusted excitation signal u' is calculated by the adder 320

It should be noted that this processing is not performed in the decoder 200. Thus, it is important to update the contents of the codebook 301 main tone using the past values of the excitation signal u without correction stored in the memory 303, to maintain synchronism between the encoder 200 and decoder 300. Accordingly, the excitation signal u is used to update the memory 303 codebook 301 main tone, and the adjusted excitation signal u' is used at the input of the filter 306 LP synthesis.

The synthesized signal s' is computed by filtering the adjusted excitation signal u' in LP-filter 306 synthesis, which has the form 1/(z)where(z)is quantized interpolated LP filter in the current cupcake. As can be seen from figure 3, the quantized interpolated LP coefficients In(z)on line 325 from the demultiplexer 317 p is given in LP-filter 306 synthesis of the corresponding setting parameters of the LP-filter 306. The filter 307 compensate for pre-emphasis is inverse to predskazuyema filter 203 of figure 2. The transfer function of the filter 307 compensate for pre-emphasis is

where μ - pre-emphasis coefficient whose value lies between 0 and 1 (default value μ=0,7). You can also use the filter of a higher order.

The vector s' is filtered in the filter D(z) 307 compensate for pre-emphasis to obtain the vector sdthat is processed in the filter 308 upper frequencies to eliminate unwanted frequencies below 50 Hz and then to obtain sh.

Overdistributed 309 implements the inverse process of processing in relation to subdescriptor 201 of figure 2. In this illustrative embodiment, when overdisclosure converts the sampling frequency of 12.8 kHz back to the original sampling rate of 16 kHz using methods well-known to specialists in this field of technology. The signal synthesis, past overdisclosure designated as S. the Signal S is also called a synthesized wideband intermediate signal.

S synthesis, past overdisclosure, does not contain high frequency components that were lost during the process of subdirectly (module 201 of figure 2) in the encoder 200. This provides the opriate low frequency synthesized speech signal. To restore the full bandwidth of the original signal in the module 310 executes the procedure of formation of the high-frequency components, which requires input from the generator 304 coefficients vocalizations (figure 3).

The resulting noise sequence z, held bandpass filtering module 310 of the formation of high-frequency components is formed by the adder 321 and the synthesized speech signal S which has passed overdisclosure, to obtain the final restored output speech signal soutthe output 323. An example of the process of recovery of high-frequency components are described in International patent PCT application published under the number WO 00/25305 may 4, 2000.

Bitwise distribution for the AMR-WB codec at a speed 12,65 kbit/s are shown in Table 1.

Table 1

Bitwise distribution mode 12,65 kbit/s
Bits/Frames
The parameters LP46
Delay pitch30 = 9+ 6+ 9+ 6
Filtering the main tone4 = 1+ 1+ 1+ 1
Gains28 = 7+ 7+ 7+ 7
Algebraic code book144 =36+ 36+ 36+ 36
Bi is mode 1
Total253 bits = 12,65 kbit/s

Sustainable masking erase frames

Erasing frames is the main factor affecting the quality of the synthesized speech in the digital voice communication, especially when working in wireless environments and networks with packet switching. In wireless cellular energy received signal may exhibit frequent strong fading, resulting in high frequencies of the error bit, which is more pronounced on the borders of the cell. In this case, the channel decoder is unable to correct errors in a received frame, and consequently, bugs detector, usually used after the channel decoder declares such a frame is erased. In network applications with packet voice transmission, such as the Protocol of voice over Internet (VoIP), speech signal is packaged with each package is usually 20-millisecond frame. When communicating with a packet switched router packet may be lost if the number of packets becomes too large, or the package may arrive at the receiver after a long delay, and he will have to be declared lost if the delay was greater than the length of the jitter buffer at the receiving side. In these systems, the operation of the codec is usually nutrition what is the appearance of erased frames with a frequency from 3 to 5%.

The problem of processing of erasing frames (FER) is essentially dualistic. First, when the indicator appears erased frame should be created for the missing frame by using the information sent in the previous frame, and based on the evaluation of the evolution of the signal in the missing frame. Successful evaluation depends not only on the strategy of masking, but also from a place in the speech signal, which have been erased. Secondly, it should be ensured smooth transition when recovered normal operation, i.e. when the block is erased frames (one or more) received the first usable frame. This is not a trivial task, because the true synthesis and design synthesis can develop in different ways. Upon receipt of the first valid frame is broken synchronization of the decoder to the encoder. The main reason for this is that the work of coders with a low bit rate based on the prediction of the fundamental tone, and during erased frames, memory contents of the main predictor tone does not match the contents of the memory in the encoder. This problem is exacerbated when multiple consecutive erased frames. As for masking, the difficulty of restoring the standard treatment depends on the type of the speech signal in which the error occurred.

The negative effect of the erase personnel who may be significantly reduced through adaptive use of masking and recovery standard processing (hereinafter recovery) for that type of speech signal, which has been erased. For this purpose, each speech frame to be classified. This classification can be performed in the encoder and transmitted to the decoder. Alternatively, the assessment may be performed in the decoder.

For the best masking and recovery there are several critical characteristics of the speech signal, which must be carefully controlled. These critical characteristics are the signal energy or amplitude, the magnitude of the periodicity of the spectral envelope and the period of the fundamental tone. In the case of restoring the speech signal further improvements can be achieved by using phase control. With a slight increase in bit rate for better control can be subjected to the quantization and pass some additional parameters. If additional bandwidth is not present, then the parameters can be estimated at the decoder. While providing control these settings masking, and recovery of erased frames can be significantly improved, in particular by improving the convergence of the decoded signal with the actual signal at the encoder and mitigate the effects of a mismatch between the encoder and the decoder when restoring standard processing.

In this illustrative embodiment, this is part II of the invention disclosed are methods for effective masking erase frames and methods for the selection and transmission parameters, improving performance and convergence of the decoder for the frames following the erased frame. These parameters include two or more of the following options: classification frame, energy, information about speech and information about the phase. In addition, the disclosed methods for the selection of these parameters in the decoder, if the transmission of additional bits is not possible. Finally, also disclosed are methods for improving the convergence of the decoder suitable for frames following the erased frame.

Ways of masking erase frame according to the present illustrative variant were used in the AMR-WB codec described above. This codec will serve as a rough basis for the implementation of the methods of masking FER in the following description. As explained above, the input speech signal 212 codec has a sampling rate of 16 kHz, but this is subdirectly with lower sampling rates up to 12.8 kHz before further processing. In the present illustrative embodiment, processing FER subdirectives signal.

4 shows a simplified block diagram of the AMR-WB encoder 400. In this simplified block diagram of subdescriptor 201, the filter 202 of the upper frequencies and the filter 203 pre-emphasis grouped together in a module 401 preprocessing. The module 207 search with feedback, the transmitter 208 response at no load) the m input signal, the transmitter 209 impulse response module 210 search of new excitement and module 211 updates the memory are grouped in a module 402 main tone and search codebook innovation with feedback. This grouping is done to simplify the introduction of new modules related to an illustrative version of the present invention.

Figure 5 presents the extension of the block diagram in figure 4, where the modules added, referring to the illustrative version of the present invention. These added modules from 500 to 507 are computed quanthouse and passed additional parameters to improve dropout FER and precision and recovery of the decoder after the erased frames. In this illustrative embodiment, these parameters include information relating to the classification, energy and phase of the signal (current position in the frame of the first pulse, related to the glottis).

The following sections detail the computation and quantization of these additional parameters, and these operations are explained with reference to figure 5. Among these parameters will be discussed in more detail classification signal. In the subsequent sections explain how effective masking FER using these additional parameters to improve the convergence.

Classification signal for musk is by FER and recovery

The basic idea underlying the use classification of the speech signal restoration in the presence of erased frames is that strategy is the ideal mask is different for quasi-stationary speech segments and speech segments with rapidly changing characteristics. While the best treatment of erased frames in the non-stationary speech segments can be eventually reduced to a quick convergence of the parameters in speech coding to the characteristics of environmental noise, in the case of quasi-stationary signal parameters in speech coding does not undergo significant changes and can be maintained almost constant over several neighboring erased frames before damping. In addition, the optimal method of recovery of the signal following the erased block of frames varies with a change in the classification of the speech signal.

The speech signal can be roughly classified into vocalized, newcaledonia and pause. Vocalized speech contains a significant amount of the periodic component and may be further divided into the following categories: vocalized attacks, vocalized segments, vocalized transitions and vocalized shifts. Vocalized attack is defined as the beginning of vocalized speech segme is that after a pause or neocaledonica segment. During vocalized segments of the voice signal parameters (spectral envelope, the period of the fundamental tone, the attitude of the periodic and nonperiodic component, energy) change slowly from frame to frame. Vocalized transition is characterized by rapid changes vocalized speech, for example, the transition between vowels. Vocalized shifts are characterized by a gradual decrease in the energy and sound of the voice at the end of a localized segments.

Newcaledonia part of the signal characterized by the absence of periodic components and can be further divided into unstable frames, energy and spectrum changes rapidly, and stable frames, where these characteristics remain relatively stable. Other frames are classified as silence. Frames of silence contain all the frames without the active speech, there are also frames only noise if there is background noise.

Not all of the above classes require a separate treatment. Therefore, in the technology of masking errors some classes of signal are grouped together.

Classification coder

If the bit stream has available bandwidth to include information on the classification, then the classification can be performed in the encoder. This gives several advantages. The most important of these is the fact that h is a hundred in the speech coders is a proactive view. Proactive view allows us to estimate the evolution of the signal in the next frame, and therefore, the classification can be performed taking into account the behavior of the signal in the future. Usually, the longer a proactive view, the better can be performed classification. An additional advantage is the simplification, as most of the signal processing required for masking erase frames, or otherwise required for speech coding. Finally, another advantage is to work with the original signal instead of the synthesized signal.

Classification of the frame is performed taking into account the strategy of masking and recovery. In other words, each frame is classified so that masking could be optimal if the next frame is missing, or that recovery could be optimal if the previous frame was lost. Some of the classes used for treatment of FER, do not require transfer, as they can be uniquely obtained in the decoder. In this illustrative embodiment uses five (5) separate classes, which are defined below:

Class UNVOICED (newcaledonia) contains all newcaledonia speech frames and all frames without active speech. Frame vocalized shift can also be classified as UNVOICED if the end has a tendency to devocalize the cell class and masking intended for newcaledonia frames can be used for the next frame in case of loss.

Class UNVOICED TRANSITION (newcaledonia transition) contains newcaledonia frames with possible vocalized attack at the end. However, the attack is still too short or not well formed for use masking intended for vocalized frames. Class UNVOICED TRANSITION can only follow the frame is classified as UNVOICED or UNVOICED TRANSITION.

The class VOICED TRANSITION (vocalized transition) contains vocalized frames with relatively subvocalizing characteristics. This is usually vocalized frames with rapidly changing characteristics (transitions between vowels) or vocalized shifts, finishing the entire frame. The class VOICED TRANSITION can only follow the frame is classified as VOICED TRANSITION, VOICED, or ONSET (offset).

The class VOICED contains vocalized frames with stable characteristics. This class can only follow the frame is classified as VOICED TRANSITION, VOICED, or ONSET.

The ONSET class contains all vocalized frames with stable characteristics, following after the frame is classified as UNVOICED or UNVOICED TRANSITION. Frames classified as ONSET, consistent vocalized frames attacks, where the DOS attack is well formed enough to use masking, designed for lost vocalized frames. Ways of masking used to erase the frame following the ONSET class, like how after class VOICED. The difference is in the recovery strategy. If you lost frame class ONSET (i.e., suitable frame VOICED comes after erasing, but the last good frame before erasing was the UNVOICED frame), for artificial restoration of the lost attack, you can use a special way. This script can be seen in Fig.6. Methods of artificial recovery attacks are described in more detail hereinafter. On the other hand, if suitable frame ONSET comes after erasing, and the last good frame before erasing was the UNVOICED frame, said special processing is not necessary, because the attack was not lost (was not in a lost frame).

7 shows the scheme of the classification conditions. If the available bandwidth is sufficient, the classification is performed in the encoder, and the results are transmitted using 2 bits. As can be seen from Fig.7, the class UNVOICED TRANSITION and the class VOICED TRANSITION can be grouped together because they can be clearly distinguished in the decoder (UNVOICED TRANSITION can only follow staff UNVOICED or UNVOICED TRANSITION, VOICED TRANSITION can only follow personnel ONSET, VOICED or VOICED TRANSITION). For classification cat the requirements use the following parameters: the normalized correlation r Xthe slope index of the spectrum (etthe signal-to-noise ratio snr, the stability of the fundamental tone pc, the relative energy of the signal at the end of the current frame ESand count zero-crossing zc. As can be seen from the subsequent detailed analysis, when calculating these parameters, use the preview as much as possible to consider the behavior of the speech signal in the next frame.

Normalized correlation rXis calculated as part of the module 206 searches the main tone without feedback figure 5. This module 206 generally issued every 10 MS (twice per frame) main tone without feedback. Here it is also used for issuing normalized scores correlation. These normalized correlation values calculated by the current weighted speech signal sW(n) and the past weighted speech signal with a delay of main tone without feedback. In order to simplify the calculation of the weighted speech signal sW(n) is subjected to subdirectly by a factor of 2 prior to analysis, the main tone without feedback, reducing the sampling frequency to 6400 Hz [3GPP TS 26.190, "AMR Wideband Speech Codec: Transcoding Functions", 3GPP Technical Specification]. The average correlation of rXdefined as

where rX(1), rX(2) - normalizada the Naya correlation of the second half of the current frame and proactive view, respectively. In this illustrative embodiment uses a proactive view in 13 MS in contrast to the standard AMR-WB, where used 5 MS. Normalized correlation rX(k) is calculated as follows:

where

Correlation rX(k) is calculated using the weighted speech signal sW(n). The moments of time tkrefer to the beginning of the current frame and is equal to 64 and 128 samples, respectively, at a rate or sampling frequency of 6.4 kHz (10 or 20 MS). Values of pk=TOLare selected estimates of the fundamental tone without feedback. The length of the calculation of the duration of the autocorrelation Lkdepends on the period of the fundamental tone. Values of Lkbelow (for a sampling frequency of 6.4 kHz):

Lk=40 counts for pk31 reference

Lk=62 reference for pk61 reference

Lk=115 samples for pk> 61 reference.

Such values of Lkensure that the length of the correlated vector contains at least one period of the fundamental tone, which allows to reliably detect the basic tone without feedback. For long periods the main tone (p1>61 reference) rX(1) and rX(2) are identical, that is computed only one to whom the report since the length of correlated vectors is sufficient for there was no need-based analysis proactive view.

The slope parameter of the spectrum of etcontains information about the energy distribution in frequency. In the present illustrative embodiment, the slope of the spectrum is estimated as the ratio of the energy concentrated at low frequencies, the energy is concentrated at high frequencies. However, it can also be assessed in other ways, for example with respect to the first two autocorrelation coefficients of the speech signal.

To perform the spectral analysis module 500 in figure 5 for spectral analysis and evaluation of the energy spectrum using the discrete Fourier transform. Frequency analysis and calculation of the tilt is performed twice per frame. Used a 256 point fast Fourier transform (FFT) with 50% overlap. Window for analysis are placed in such a way as to use the entire proactive view. In this illustrative embodiment, the beginning of the first window is placed on the 24 count after the start of the current frame. The second window is 128 samples next. For weighting the input signal frequency analysis, you can use other Windows. In the present illustrative embodiment, used the square root of the Hamming window (which is equivalent to the sine window). This is the OSC is particularly well-suited for methods with additional overlap. Thus, this particular spectral analysis can be used in a possible noise reduction algorithm based on spectral subtraction and analysis/synthesis with additional ceiling.

In the module 500 in figure 5 calculate the energy at high frequencies and low frequencies for perceptual critical bands. In the present illustrative embodiment, each critical band is considered up to the next number [J.D.Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria", IEEE Jour. on Selected Areas in Communications, vol.6, no.2, pp.314-323]:

Critical bands = {100.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0, 1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0, 4400.0, 5300.0, 6350.0} Hz.

The energy at higher frequencies is calculated in module 500 as the average of the energies of the last two critical bands

where the critical energy bands e(i) calculated as the sum of energies of the elements of the discretization in the critical band, averaged over the number of elements in the discretization.

The energy at lower frequencies is calculated as the average value of the energies in the top 10 critical bands. Average critical bands were excluded from the calculation to improve the discernment frame with a high concentration of energy at low frequencies (usually vocalic) and frames with a high concentration of energy at high frequencies (usually newcaledonia) between low and high frequency energy content is not typical for any of the classes it leads to errors when making decisions.

In the module 500 energy at low frequencies is calculated separately for long periods of pitch and short periods of the fundamental tone. For vocalized segments characteristic of the speech of women, to improve the quality distinguish between localized and non-localized segments you can use the harmonic structure of the spectrum. So, for short periods of main tonecompute the elements of sampling and summing take into account only those frequency elements discretization, which is close enough to the speech harmonics, that is,

where eb(i) - energy element discretization in the first 25 frequency elements discretization (the DC component is not included). Note that these 25 items sampling correspond to the first 10 critical bands. In the above sum is not equal to zero only members that relate to elements of the sample that is closer to the nearest harmonics than a certain frequency threshold. The count cnt is equal to the number of non-zero members. The threshold for an element discretization included in the specified amount was fixed equal to 50 Hz, i.e. it takes into account only those elements of the discretization, which is closer than 50 Hz to the nearest harmonics. Thus if the structure is harmonic at low frequencies, the amount will be included only member with high energy. On the other hand, if the structure is not harmonic, the selection of members will be random, and the amount will be less. Thus, it can be detected even newcaledonia sounds with high energy content at low frequencies. Such processing cannot be performed for longer periods of the fundamental tone, as the frequency resolution is not enough. The threshold value of the primary tone is 128 samples, corresponding to 100 Hz. This means that for periods of the fundamental tone is longer than 128 samples, and for knowingly newcaledonia sounds (i.e., when) estimate of the energy at low frequencies is performed for each critical band and is calculated as

The value of recalculated in module 501 noise assessment and correction of the normalized correlation is a correction, which is added to the normalized correlation in the presence of background noise for the following reason. In the presence of background noise, the average normalized correlation decreases. However, in order classification of signals this reduction should not affect decisions about the allocation of one or another segment to vocalized or newcaledonia class. Found that the relationship between the specified mind what Ishenim r eand the total energy of the background noise in dB is approximately exponential in nature and can be expressed using the following relationship:

where NdBmeans

where n(i) estimate the energy of the noise for each critical band, normalized in the same way as e(i), and gdB- the maximum noise level in dB allowed for procedure dampen the noise. The value of remust not be negative. It should be noted that when using an efficient algorithm for attenuation of noise and at a sufficiently high gdBthe value of realmost zero. This is true only when the attenuation of the noise is blocked or if the background noise is significantly higher than the maximum allowable loss. The impact of recan be adjusted by multiplying this term by a constant.

Finally, the resulting energy at lower and higher frequencies is obtained by subtracting the estimated power of noise from the previously calculated values ofand. That is,

where Nhand Nl- the average energy of the noise in the two (2) recent critical bands and the first ten (10) critical bands with the responsibility calculated using equations similar to equations (3) and (5), and fcthe correction factor, chosen so that they remained close to constant when changing the level of background noise. In this illustrative embodiment, the value of fcit was recorded equal to 3.

The slope of the spectrum (etis calculated in module 503 valuation of spectrum slope with a ratio of

and averaged in the dB area for two (2) frequency of tests performed on each frame

When measuring the signal-to-noise ratio (SNR) uses the fact that for the average coder agreement of the form of the signal ratio SNR is much higher for vocalized sounds. Parameter estimation snr must be performed at the end of the cycle subcode encoder and calculated in module 504 SNR calculation using ratio

where Esw- energy weighted speech signal Sw(n) of the current frame from the filter 205 with perceptual weighting, and Eethe energy of the error between the weighted speech signal and a balanced signal synthesis of the current frame from the filter 205' with perceptual weighting.

Indicator pc stability of the fundamental tone assesses the changing period of the fundamental tone. It is calculated in module 505 Klah the operations intensification signal in accordance with the estimated fundamental tone without feedback as follows:

Values of p0p1p2consistent with the estimates of the fundamental tone without feedback, calculated by the module 206 searches the main tone without feedback from the first half of the current frame, the second half of the current frame and proactive view, respectively.

The relative energy Esframe is calculated by the module 500 as the difference between the energy of the current frame in dB and its long-term average value

where the energy of the frameget the sum of the energies of the critical bands, averaged over the results of both spectral analyses performed for each frame

Averaged over a long period of energy is updated on the active speech frames using the following relationship:

.

The last parameter is the parameter zc zero-crossing, the calculated module 508 calculate the zero-crossing one frame of the speech signal. This frame starts in the middle of the current frame, using the two (2) Subhadra proactive view. In this illustrative embodiment, the counter zc zero-crossing counts the number of sign changes in the signal polojitelnogo is negative during this interval.

For a more sustainable implementation classification the classification parameters are considered together, forming a utility function fm. For this purpose, the classification parameters first scale in the range between 0 and 1, so the value of each parameter typical neocaledonica signal is converted to 0, and the value of each parameter typical vocalized signal is converted to 1. Between them, we use linear function. When considering the parameter px is its scaled version is obtained using expression

and limited in the range between 0 and 1. The coefficients kpand cpfunctions have been found experimentally for each of the parameters so that the signal distortion due to the use of methods of masking and recovery used in the presence of FER, was minimal. The values used in this illustrative implementation, are summarized in table 2:

Table 2

The classification parameters of the signal and the coefficients of the corresponding scaling functions
Valuekpcp
Normalized correlation 2,857-1,286
The spectrum slope0,041670
snrSignal-to-noise ratio0,1111-0,3333
pcThe stability of the basic colors-0,07143sm 1,857
EsThe relative energy of the frame0,050,45
zcCount zero-crossing-0,042,4

The utility function is defined as

where the Superscript s indicates the scaled version of the parameters.

Then classification is performed using the utility function fmand the following rules are summarized in table 3:

Table 3

Rules for classification of signals in the encoder
The class of the previous frameRuleThe class of the current frame
ONSETfm=0,66VOICED
VOICED
VOICED TRANSITION
0,66>fm=0,49VOICED TRANITION
fm<0,49UNVOICED
UNVOICED TRANSITIONfm>0,63ONSET
UNVOICED
0.63=fm>0,585UNVOICED TRANSITION
fm=0,585UNVOICED

In the case of the encoder with variable bit rate (VBR) and controlled source classification signal is inherent in the operation of the codec. The codec works with multiple bit rate, and a selection module speed is used to determine the bit rate used to encode each speech frame based on the nature of the speech frame (for example, vocalized, newcaledonia, transitional footage, frames and background noise is encoded every using a special encoding algorithm). Information about the encoding mode, and therefore, the speech class is implicitly expressed part of the bit stream and does not need to pass in an explicit form for processing FER. Then this information can be useful to reconsider the classification described above.

In application to the AMR-WB codec, the voice activity detection (VAD) is provided only by the speed-controlled source. The VAD flag is Aven 1 for active speech, and equal to 0 for pause. This option is useful for classification, as it directly indicates that further classification is not necessary, if its value is 0 (that is, the frame directly classified as UNVOICED). This parameter is an output module 402 voice activity detection (VAD). In the literature there are other VAD algorithms, and for the purposes of the present invention can use any algorithm. For example, you can use the VAD algorithm, which is part of the standard G.722.2 [ATU-T Recommendation G. 722.2 "Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)", Geneva, 2002]. Here the VAD algorithm based on the output of the spectral analysis module 500 (based on signal-to-noise ratio for each critical band). VAD used for classification differs from the VAD used for the coding in accordance with the "delay". In the speech coders using comfort noise generation (CNG) for segments without active speech (pause or only noise), the delay is often added after the speech bursts (e.g., CNG standard AMR-WB [3GPP TS 26192, "AMR Wideband Speech Codec: Comfort Noise Aspects", 3GPP Technical Specification]). While tightening the speech encoder continues to be used, and the system switches to CNG only after a period of tightening. In order classification for masking FER in such a high degree of protection is not necessary the STI. Therefore, the VAD flag for classification will be equal to 0 and also during a period of tightening.

In this illustrative embodiment, the classification is performed in module 505 on the basis of the above parameters: normalized correlations (or information about the sound of voices) rxtilt range et, snr, stability indicator pitch pc, the relative energy of the frame Esfrequency zero-crossing zcand VAD flag.

Classification decoder

If the application does not allow the transfer of information about the class (there is no possibility of transporting bits), the classification can be performed in the decoder. As already mentioned, the main disadvantage is the fact that in the speech decoders usually no approver view. Also, it is often necessary to limit the complexity of the decoder.

A simple classification can be performed by assessing the vocalizations of the synthesized signal. In the case of the encoder of the CELP type, you can use the evaluation vocalizations rVcalculated by equation (1). That is,

,

where Ev- energy scaled code-vector of the fundamental tone bvTand EcT- energy scaled code-vector gckinnovations. Theoretically pure vocalized signal rV=1, and for purely newok who became centralized signal r V=-1. The actual classification is performed by averaging the values of rVfor each of the four Subhadra. The resulting coefficient frV(the average value of rVevery four subbarow) is used as follows.

Table 4

The classification rule signal decoder
The class of the previous frameRuleThe class of the current frame
ONSETFrv>-0,1VOICED
VOICED
VOICED TRANSITION
-0.1=frv=-0,5VOICED TRANSITION
Frv<-0,5UNVOICED
UNVOICED TRANSITIONFrv>-0,1ONSET
UNVOICED
-0.1=frv=-0,5UNVOICED TRANSITION
Frv<-0,5UNVOICED

As in the classification in the encoder, to facilitate classification, you can use other parameters in the decoder, such as the parameters of the LP filter or stability of the fundamental tone.

If used the I encoder with variable bit rate and manageable source of information about the encoding mode is already part of the bit stream. Thus, if used, for example, pure newcaledonia encoding mode, the frame can be automatically classified as UNVOICED. Similarly, when using pure vocalized encoding mode, the frame is classified as VOICED.

Speech parameters for processing FER

There are several critical parameters that must be carefully adjusted to avoid annoying distortion when FER. If you can send a small number of additional bits, then these parameters can be estimated in the encoder to quantize and transmit. Otherwise, some of them can be estimated at the decoder. These parameters include the classification of the signal, information about energy, information on the phase information and vocalizations. The most important is the precise control of the energy of speech. You can also adjust the phase and frequency of speech to further improve dropout FER and recovery.

The importance of energy management comes to the fore when restored to normal operation after the erased block frames. Since most speech coders used in the work of the prediction, the decoder is impossible to obtain a correct estimate of the energy. In vocalized speech segments inaccurate value of energy can be maintained during the successive frames, which is very annoying especially when this inaccurate value of energy increases.

Even if energy management is the most important for vocalized speech because of the long-term prediction (prediction of the fundamental tone), it is also important for nelokalizovannaya speech. The reason for this lies in the fact that the coders of the CELP type commonly used prediction quantizer gain of innovations. Incorrect value of energy during newcaledonia segments can cause annoying high-frequency fluctuation.

Phase control can be ensured in several ways, and it depends largely on the available bandwidth. In this implementation, a simple phase control is provided during the last vocalized attacks by conducting a search in approximate position information of the pulse is related to the glottis.

Thus, in addition to information on the classification of the signal discussed in the previous section, the most important information is sent information about the signal energy and the position in the frame of the first pulse, related to the glottis (information phase). If you have enough bandwidth, you can also send information vocalizations.

Information about energy

Information about energy, you can rate and forward Lieb is in the residual LP-region, either in the field of speech signal. Sending information in the residual region has the disadvantage associated with the fact that is not taken into account the influence of the LP-filter synthesis. This can be especially unreliable in case of a restore speech after a few lost vocalized frames (when FER during vocalized speech segment). Upon receipt FER after vocalized frame is commonly used excitation of the last valid frame during masking by some strategy attenuation. When a new LP-filter synthesis with the first good frame after an Erasure may cause a mismatch between the excitation energy and the strengthening of the LP-filter synthesis. New filter synthesis can generate a signal synthesis with energy different from the energy of the last synthesized erased frame, and also on the energy of the original signal. For this reason, the energy is calculated and quantuum in the field of signal.

The energy Eqis calculated and quantized in module 506 estimation and quantization of energy. It was found that for the transmission of energy in 6 bits. However, this number of bits can be reduced without significant consequences if not available a sufficient number of bits. In this preferred embodiment uses a 6-bit uniform quantizer in the range from -15 dB to 83 dB in increments of 1.8 dB. The quantization index is set to the integer part of:

where E is the maximum energy of the signal for frames classified as VOICED or ONSET, or the average energy reference for other frames. For VOICED frames ONSET or the maximum value of the signal energy is computed synchronously with the main tone at the end of the frame as follows:

where L is the frame length, and the signal s(i) denotes the speech signal (speech signal with suppressed noise, if the noise). In this illustrative embodiment, s(i) denotes the input signal after subdirectly with decreasing frequency up to 12.8 kHz and preprocessing. If the delay is the main tone more than 63 times, tEequal to the delay of the fundamental tone feedback for the last Subhadra. If the delay of the basic tones that are less than 64 samples, then tEset equal to twice the delay of the fundamental tone feedback for the last Subhadra.

For these classes, E is the average energy per sample for the second half of the current frame, that is, tEis set equal to L/2, and E is calculated as

Information about the phase control

Phase control is especially important when recovering from a lost segment of lokalizovan the first speech for the same reasons, which were described in the previous section. After the block is erased frames loses sync storage devices decoder with memory devices of the encoder. To resynchronize the decoder can be sent to some phase information depending on the available bandwidth. In the described illustrative implementation sends information about the approximate position in the frame of the first pulse, related to the glottis. This information is then used to recover lost vocalized attacks, as described below.

We denote the rounded trailing the main tone feedback for the first Subhadra as TO. Module 507 search first pulse related to the glottis, and the quantization finds the position of the first pulse τ among the first times TOframe by searching for a reference with a maximum amplitude. The best results are obtained when the position of the first pulse, related to the glottis, is measured in the residual signal, filtered by a lowpass filter.

The position of the first pulse, related to the glottis, is encoded using 6 bits as follows. The precision used to encode the position of the first pulse, related to the glottis, depends on the value of the main tone feedback for the first Subhadra T O. This is possible because the specified value is known to both encoder and decoder, and it does not effect the distribution of the error after the loss of one or several frames. When TOless than 64, the position of the first pulse, related to the glottis, relative to the beginning of the frame is encoded directly with the precision of a reference. When 64=TO<128, the position of the first pulse, related to the glottis, relative to the beginning of the frame is encoded with an accuracy of up to 2 times by using simple integer division, that is τ/2. When TO=128, the position of the first pulse, related to the glottis, relative to the beginning of the frame is encoded with an accuracy of up to 4 times by additional fission τ 2. The decoder performs the reverse procedure. If TO<64, then the received quantized position is used as it is. If 64=TO<128, then the received quantized position is multiplied by 2 and increased by 1. If TO=128, then the received quantized position is multiplied by 4 and increases by 2 (increment by 2 leads to a uniformly distributed quantization error).

According to another variant of the invention, where the encoded form of the first pulse, related to the glottis, the position of the first pulse, related to the glottis, is determined by the ay correlation analysis of the residual signal and the possible pulse shapes, the signs (positive or negative) and regulations. The shape of the pulse can be taken from a codebook pulse shapes, known as in the encoder and in the decoder, and this method is known to experts in the art as vector quantization. Then the form, sign and amplitude of the first pulse, related to the glottis are encoded and transmitted to the decoder.

Information on periodicity

In the case of sufficient bandwidth information about the frequency, or information vocalizations can be calculated, transmitted and used in the decoder to improve dropout Erasure frames. Information vocalizations are estimated based on normalized correlation. It can be coded accurately 4 bits, but it will probably be sufficient 3 or even 2 bits, if required. Information vocalizations usually required only for personnel with periodic components, with a higher resolution vocalizations need for highly vocalized frames. Normalized correlation is given by equation (2), and this correlation is used as the indicator information vocalizations. She quantized in module 507 search first pulse related to the glottis, and quantization. In this illustrative embodiment, for encoding information vocalizations was used Kus is a rule-linear quantizer as follows:

,

.

Again encoded and transmitted to the integer part of i. Correlation rx(2) has the same meaning as in equation (1). In equation (18) vocalization linearly quantized in the range from 0.65 to 0.89 as increments of 0.03. In equation (19) vocalization linearly quantized in the range of 0.92 to 0.98 in increments of 0.01.

If you need a wider range of quantization, you can use the following linear quantization:

.

This equation quantum vocalization in the range from 0.4 to 1 with a step of 0.04. Correlationdefined in equation (2A).

Equations (18) and (19) or equation (20) are then used in the decoder to compute rx(2) or. Denote this quantized normalized correlation as rq. If the vocalization cannot be transferred, it can be estimated using the coefficient of vocalizations from equation (2A) by showing it in the range from 0 to 1.

Processing erased frames

Ways of masking FER in this illustrative embodiment, illustrated by the example encoder type ACELP. However, they can be easily applied to any speech codec, where the generated signal synthesis by filtering the excitation signal through LP filter Shin is ESA. The strategy of masking can be reduced to the convergence of the energy signal and the envelope of the spectrum to the estimated parameters of the background noise. The frequency of the signal converges to zero. The convergence rate depends on the parameters of the class of the last received valid frame and the number of consecutive erased frames, the speed is governed by the attenuation factor α. Factor αand, in addition, depends on the stability of the LP-filter for UNVOICED frames. Usually convergence manifests itself slowly, if the last received valid frame is in a stable segment, and quickly, if this frame is a segment transition. Values α summarized in table 5.

Table 5

Values of the attenuation factor α for masking FER
Last suitable the received frameThe number of consecutive erased framesα
ARTIFICAL ONSET0,6
ONSET, VOICED=31,0
>30,4
VOICED TRANSITION0,4
UNVOICED TRANSITION0,8
UNVOICED=10,6 θ + 0,4
>10,4

The coefficient of stability θ calculated on the basis of average distances between neighboring LP filters. Here the factor θ refers to the measure of the distance ISF (spectral frequency immittance), which is limited by the inequality 0θand large values θ correspond more stable signals. This leads to the reduction of energy fluctuations and the envelope of the spectrum, when inside a stable neocaledonica segment appears isolated erased frame.

Class signal remains unchanged during the processing of erased frames, that is, the class remains the same as last adopted suitable frame.

The construction of the periodic part of the excitation

For masking erased frames, the following is correctly accepted by the UNVOICED frame, the periodic part of the excitation signal is not generated. For masking erased frames, the following is correctly accepted by the frame other than the frame is UNVOICED, is formed by the periodic part of the excitation signal by repeating the last period of the fundamental tone of the previous frame. If we are talking about the first erased frame after a suitable frame, the momentum of the main tone is first filtered by a lowpass filter. As such a filter is used trichotomy linear phase Phil is the Tr with the impulse response of finite duration (FIR) filter coefficients, equal to 0.18, 0.64, and 0.18. If you have information vocalizations, the filter can also be selected dynamically with cutoff frequency-dependent vocalizations.

The period of the fundamental tone TCused to select the last pulse of the fundamental tone, and, therefore, used during masking, is determined so that you can avoid or reduce harmonics and subharmonics of the fundamental tone. When determining the period of TCthe main tone uses the following logic:

if ((T3<1.8 Ts) And (T3>0.6 Ts)) or (Tcnt=30), then Tc=T3otherwise Tc=Ts

Here T3- rounded period of the fundamental tone for the 4th Subhadra last fit of the received frame, and TS- rounded period of the fundamental tone for the 4th Subhadra last usable adopted vocalized frame with coherent estimates of the fundamental tone. Stable vocalized frame is defined here as a VOICED frame that precedes the frame vocalized type (VOICED TRANSITION, VOICED, ONSET). The coherence of the fundamental tone in this implementation is tested by analyzing whether the evaluation of the fundamental tone with feedback close enough, i.e. whether the relationship between the main tone of the last Subhadra, the main tone of the second Subhadra and the main tone is proshlogo of Subhadra the previous frame in the range of 0.7 to 1.4).

This definition of period TCthe main tone means that, if the basic tone at the end of last valid frame and the basic tone of the last stable frame close to each other, then use the basic tone of the last valid frame. Otherwise, this basic tone is unstable and instead use the basic tone of the last stable frame to avoid exposure to incorrect assessments of the basic tone at the vocalized attacks. However, this logic only makes sense if the latest stable segment is not too far in the past. So, you set the indicator Tcntthat limits the sphere of influence of the latest stable segment. If Tcntgreater than or equal to 30, that is, if there is at least 30 frames since the last update, TSthe main tone of the last valid frame is used on a regular basis. Tcntis set to 0 every time there was a stable segment, and updated TS. Next period TCis kept constant during masking for just erased block.

So as to build the periodic part is the last excitation pulse of the previous frame, its gain is approximately the particular in the beginning of the masked frame and mo is et to be set to 1. Then the gain decreases linearly across the frame from one sample to another to achieve value α at the end of the frame.

Values α correspond to table 5, except that these values are modified to erase the following for VOICED frames and the ONSET to account for the evolution of energy vocalized segments. This evolution can be extrapolated to some extent by using the gain values of the excitation of the fundamental tone for each subcode the last valid frame. In General, if the gain value is greater than 1, then the signal energy increases, and if they are less than 1, then the energy decreases. Thus, α multiplied by the correction factor fbcalculated as follows:

where b(0), b(1), v(2) and b(3) - reinforcement of the main colors for the four subbarow the last correctly received frame. The value of fblimit in the range between 0.98 and 0.85 are before using them to scale periodic part of the excitation. In this way avoid cases a strong increase and decrease of energy.

For erased frames, the following is correctly accepted by the frame, other than UNVOICED, the buffer excitation is only updated this periodic part of the excitation. This update can be used to build excited is I codebook pitch in the next frame.

Building a random part of the excitation

New (non-periodic part of the excitation signal is generated randomly. It can be shaped in the form of random noise or by using a codebook innovation CELP with a randomly generated vector indices. In the present illustrative embodiment, there was used a simple random number generator with approximately uniform distribution. Before adjusting the gain of innovations randomly generated innovation is scaled with respect to some reference value, tied here to a single energy reference.

At the beginning of the erased block, the gain gsinnovation is initialized by using the acceleration of innovations excitation of each subcode last suitable frame

where g(0), g(1)g(2) g(3) are the acceleration of the fixed code book, or innovations, for four (4) subbarow the last correctly received frame. The strategy of weakening the random part of the excitation is slightly different from the excitation attenuation of the fundamental tone. The reason for this is that the excitation of the fundamental tone (and therefore the frequency of excitation) tends to 0, while the random excitation tends to excitation energy generation whom Ortega noise (CNG). Weakening strengthening innovation is

where- strengthening innovation in the beginning of the next frame- enhancing innovations at the beginning of the current frame- strengthening of excitation used during the generation of comfort noise, and α is determined from table 5. By analogy with the weakening of the periodic excitation gain is attenuated linearly across the frame from reference to reference, starting withand to values ofthat will be achieved by the beginning of the next frame.

Finally, if the last fit (properly adopted or nasturtii) frame differs from UNVOICED, then the excitation is filtered through a linear phase FIR filter high-pass coefficients-0.0125, -0.109, 0.7813, -0.109, -0.0125. To reduce the amount of noise component during vocalized segments these filter coefficients are multiplied by a correction factor equal to (0,75-0,25 rv), and rv- coefficient of vocalizations, defined in equation (1). Then a random part of the excitation is added to the adaptive excitation for the formation of a common excitation signal.

If the latter is suitable frame belongs to the class of UNVOICED use only, the excitation is their innovations, which is further subjected to attenuation by a factor of 0.8. In this case, update the buffer of the last excitation excitation innovations as the periodic part of the excitation is absent.

Masking, synthesis and update of the envelope of the spectrum

For synthesizing the decoded speech must be received by the parameters of the LP-filter. The envelope of the spectrum is gradually moving to the calculated envelope of the noise environment. Here the representation of the ISF parameters LP is used in the form

In equation 25 I1(j) is the value of the j-th ISF current frame, I0(j) is the j-th ISF previous frame, In(j) is the j-th ISF estimated envelope of comfort noise, and p is the order of the LP filter.

Synthesized speech is obtained by filtering the excitation signal through LP-filter synthesis. The filter coefficients are calculated on the basis of representation ISF and interpolated for each subcode (four (4) times per frame) during normal operation of the encoder.

As in the quantizer gain of innovations, and the ISF quantizer is used prediction, their memory will not be updated after resume normal operation. To mitigate this effect, the memory content of the quantizers is evaluated and updated at the end of each erased frame.

Recovery normally is after erase

The problem of recovering erased block of frames is fundamental because of the strong prediction that is used in almost all modern speech coders. In particular, speech encoders type CELP achieve high signal-to-noise ratio for vocalized speech due to the fact that they use the past excitation signal for encoding the excitation of this frame (long-term prediction or prediction of the fundamental tone). The prediction is used in most of quantizers (LP-quantizer, the quantizer gain).

Artificial building attack

The most difficult situation, associated with the use of long-term predictions in CELP coders, occurs when the loss of vocalized attack. Lost attack means that vocalized speech attack appeared somewhere during the erased block. In this case, the last suitable the received frame was neoclitoris, and therefore, in the buffer excitation periodic excitation is not detected. However, the first usable frame after the erased block is vocalized, the buffer excitation encoder has a high frequency, and adaptive excitation was encoded using this last periodic excitation. Since this periodic part of the excitation is completely vanishes in D. the encoder, it may take several frames to recover from this loss.

If you lost frame ONSET (i.e. suitable frame VOICED comes after erasing, but the last good frame before erasing was the UNVOICED frame, as shown in Fig.6), for artificial restoration of the lost attack and initiate vocalized synthesis uses a special way. At the beginning of the 1st suitable frame after the lost attack artificially generated periodic part of the excitation in the form of a periodic chain of pulses that have passed the low pass filtering, which are separated by a period of the fundamental tone. In the present illustrative embodiment, the lowpass filter is a simple linear phase FIR filter with impulse response hlow={-0.0125, 0.109, 0.7813, 0.109, -0.0125}. However, this filter can also be selected dynamically with a cutoff frequency corresponding to information vocalizations, if such information is available. New part of the excitation is generated using a normal CELP decoding. The entries in the code book of the innovations you can also choose randomly (or the innovation can be generated randomly), so as to synchronize with the source signal was somehow lost.

In practice, the length of the artificial attack is limited by the fact that less is th least one full period of the fundamental tone constructed by this method, and this method is implemented before the end of Subhadra. Then resume regular processing ACELP. The period of the fundamental tone is the rounded average of the periods of the decoded main tone for all subbarow using the artificial restoration of attack. The chain of pulses that have passed the low pass filtering is implemented by placing the impulse response of the lowpass filter in the buffer adaptive excitation (previously initialized to zero). The first impulse response is centered in the quantized position(transmitted in the bit stream) relative to the beginning of the frame, and the remaining pulses are placed at a distance averaged pitch until the end of the last subcode that is an artificial restoration of the attack. If the available bandwidth is insufficient to transmit the position of the first pulse, related to the glottis, the first impulse response can be arbitrarily placed in the vicinity of a half period of the fundamental tone after the start of the current frame.

For example, for the length of Subhadra, component 64 counts, we assume that the periods of the fundamental tone in the first and second subcode will be p(0)=70,75 and p(1)=71. Because it exceeds the size of Subhadra, 64, IP is umstvennyi attack will be formed during the first two subbarow, and the period of the fundamental tone will be equal to the average value of the primary colors for the two subbarow, rounded to the nearest integer, i.e. 71. The last two Subhadra will be handled by the standard CELP decoder.

Then the energy of the periodic part of the excitation attack is scaled with a gain corresponding to the quantized and transmitted energy for masking FER (as defined in equations 16 and 17) and is divided by the gain of the LP filter synthesis. The strengthening of the LP-filter synthesis is calculated as

where h(i) is the impulse response of the LP filter synthesis. Finally, the strengthening of artificial attack is reduced by multiplying the periodic part by 0.96. Alternatively, this value may correspond to vocalizations, if there is available bandwidth for transmission of information vocalizations. In an alternative embodiment without deviating from the essence of the present invention artificial attack can also be formed in the buffer of the last excitation before entering the loop subcode decoder. This would provide the advantage that there is no need for special treatment for the formation of the periodic part of the artificial attack, instead of what you can use regular CELP decoding.

LP-iltr for speech synthesis output in the case of constructing artificial attack is not interpolated. Instead, for the synthesis of the entire scene in General use adopted LP-parameters.

Energy management

The most important problem in recovering erased block of frames is proper management of the energy of the synthesized speech signal. To control the energy of synthesis is necessary because usually in modern speech coders used a strong prediction. Energy management is particularly important when the block is erased frames occurs during vocalized segment. When an Erasure frame after vocalized frame during masking typically use the excitement of the last valid frame with some strategy of weakening. When a new LP-filter with the first good frame after an Erasure may be a mismatch between the excitation energy and the strengthening of the new LP-filter synthesis. New filter synthesis can generate a signal synthesis with energy different from the energy of the last synthesized erased frame, and also on the energy of the original signal.

Energy management during the first usable frame after the erased frame can be summarized as follows. The synthesized signal is scaled so that its energy matches the energy of the synthesized speech signal at the end of the last erased frame in the beginning is, I can pay tithing suitable frame, and that this energy sought to the value of the transmitted energy towards the end of the frame to prevent too significant increase energy.

Energy management is in the area of the synthesized speech signal. Even if energy management is carried out in the speech area, the excitation signal should be scaled, as it serves as a memory long-term predictions for subsequent frames. Then re-synthesis to smooth the transition. Let g0denotes the gain is used to scale the 1-th frame to the current frame, and g1- gain used at the end of the frame. Then, the excitation signal is scaled as follows:

where us(i) the scaled excitation, u(i) - excitation before scaling, L is the frame length, and gAGC(i) - gain whose value begins with g0and tends exponentially to g1

with the initialization of gAGC(-1)=gowhere fAGCthe attenuation coefficient, the value of which in this implementation is set equal to 0.98. This value was found experimentally as a compromise, providing a smooth transition from the previous (erased) frame, on the one hand, and the scaling of the last period of the fundamental tone for the current frame, to the extent possible, to correct (transmitted) values, on the other hand. This is important, pascalc is transferred to the energy value is measured synchronously with the main tone at the end of the frame. The gain value g0and g1defined as

where E-1energy, calculated at the end of the previous (erased) frame, E0the energy at the beginning of the current (reduced) frame, E1energy at the end of the current frame, and Eq- quantized information about the transmitted energy at the end of the current frame calculated in the encoder according to equations (16, 17). E-1and E1are calculated similarly, except that they are based on the synthesized speech signal s'. E-1calculated synchronously with the main tone using period TCthe main tone for the knockout, and E1uses the rounded period T3the main tone for the last Subhadra. E0is calculated similarly using the rounded values of T0the main tone for the first Subhadra, and equations (16, 17) is modified to the form

for VOICED frames and ONSET. tEwell rounded trailing the main tone or twice the length, if the tone is shorter than 64 samples. For other frames

at tEequal to half the length of the frame. Gain g0and g1further limited to the maximum allowable value for the of avoiding high energy level. It is in this illustrative implementation was set equal to 1.2.

When performing masking erase frames and recovery of the decoder when the gain of the LP filter of the first Nesterova frame in the aftermath of the erased frame, greater than the gain of the LP filter of the last frame erased during the specified erase frame, adjusts the energy of the excitation signal LP filter formed in the decoder during the first accepted Nesterova frame, to the value of the gain of the LP filter of the specified first accepted Nesterova frame, using the following equation.

If Eqcannot be transferred, then Eqis set equal to E1. However, if the Erasure occurred during vocalized speech segment (i.e., the last valid frame before erasing and the first usable frame after the Erasure classified as VOICED TRANSITION, VOICED, or ONSET), you must take additional measures because of the potential mismatch between the energy of the excitation signal and the gain of the LP filter, mentioned before. A particularly dangerous situation arises when the gain of the LP filter for the first Nesterova frame received after the erasing of the frame is greater than the gain of the LP filter of the last frame erased in the erase block. In this particular case, the energy of the Oia of the excitation signal LP-filter, formed in the decoder during the received first Nesterova frame, adjusts the gain of the LP filter a received first Nesterova frame, using the following relationship:

where ELP0the energy of the impulse response of the LP filter for the last valid frame before erasing, and ELP1energy LP filter suitable for the first frame after the Erasure. In this embodiment, the implementation of the used LP-filters last subbarow in one shot. Finally, the value of Eqlimited to the value of E-1in this case (erase vocalized segment without sending information about Eq).

The following exceptions related to transitions in the speech signal, causing additional reinstalling the values of g0. If artificial attack is used in the current frame, g0is set to 0,5g1to ensure a gradual increase in the energy of the attack.

If the first valid frame after you erase classified as ONSET, prevents excess gain goon g1.This precautionary measure is taken to prevent the forced gain control at the beginning of the frame (which probably is at least partially neoclitoris) from gain vocalized attack (at the end of the Adra).

Finally, during the transition from vocalized frame to newcaledonia (in this case, the last valid frame is classified as VOICED TRANSITION, VOICED, or ONSET, and the current frame is classified as UNVOICED) or during the transition from neocaledonica active speech period to an active speech period (the latter fit the received frame is encoded as comfort noise, and the current frame is encoded as an active speech), the gain g0is set equal to g1.

When erasing the vocalized segment may be a problem with a wrong value of energy and the frames following the first good frame after an Erasure. This can happen even if the energy of the first usable frame is adjusted as described above. To mitigate this problem, the energy management may continue until the end of vocalized segment.

Although the present invention has been described in the foregoing description with reference to the illustrative version of its implementation, this illustrative option can also be modified within the scope of the attached claims, without going beyond the scope and essence of the invention.

1. Method of masking frames of the encoded sound signal erased during transmission from the encoder to the decoder, and the method contains

definition wide-angle is in the encoder settings mask/restore;

transfer to the decoder parameters mask/restore defined in the encoder; and implementation of dropout erased frames and recovery in the decoder in accordance with the parameters of the mask/restore.

2. The method according to claim 1, additionally containing the quantization of the coder parameters mask/restore before passing parameters mask/restore in the decoder.

3. The method according to claim 1, in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

4. The method according to claim 3, in which the parameter definition information about the phase contains the definition of the position of the first pulse, related to the glottis, in the frame of the encoded sound signal.

5. The method according to claim 1, in which the implementation of the masking erased frames and recovery in the decoder contains the implementation of the recovery in the decoder in accordance with a specific position of the first pulse, related to the glottis, after at least one lost frame vocalized attack.

6. The method according to claim 1, in which the implementation of the masking erased frames and recovery in the decoder contains the loss of at least one frame of the attack, forming part of the period of the economic excitation in the form of a periodic sequence of pulses, subjected to low-pass filtering, and these pulses separated by a period of the fundamental tone.

7. The method according to claim 6, containing the quantization of the position of the first pulse, related to the glottis, before sending the position of the first pulse, related to the glottis, in the decoder;

moreover, the formation of the periodic part of the excitation contains the implementation of a periodic sequence of pulses, subjected to low-pass filtering, by first centering the impulse response of the lowpass filter on the quantized position of the first pulse, related to the glottis, in accordance with the beginning of the frame; and

the premise of the rest of the impulse response of the lowpass filter at a distance corresponding to the average value of the basic tone from the previous impulse response up to the end of the last Subhadra affected by the formation of the periodic part of the excitation.

8. The method according to claim 4, in which the parameter definition information about the phase further comprises encoding in the encoder of the form, sign and amplitude of the first pulse, related to the glottis, and the transmission of coded form, sign and amplitude of the encoder to the decoder.

9. The method according to claim 4, in which the position of the first pulse, related to the glottis, contains

p num="325"> the measurement of the first pulse, related to the glottis, in the form of a count of the maximum amplitude in the period of the fundamental tone; and

the quantization of the position of the reference maximum amplitude in the period of the fundamental tone.

10. The method according to claim 1, in which

the audio signal is a speech signal; and

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, frame neocaledonica transition frame vocalized transition, vocalized frame or frame attack.

11. The method according to claim 10, in which the classification of consecutive frames contain the classification as neocaledonica each frame, which is neoclitoris frame, each frame without active speech and each vocalized frame shift, the end of which tends to newcaledonia state.

12. The method according to claim 10, in which the classification of consecutive frames contain the classification as neocaledonica transition of each neocaledonica frame having an end with a possible vocalized attack, which is too short or formed insufficient to handle as vocalized frame.

13. The method according to claim 10, in which the classification of posledovatelnuju is a classification as a vocalized transition of each vocalized frame with relatively weak vocalized characteristics, including vocalized frames with rapidly changing characteristics and vocalized shifts that last the whole frame, and the frame is classified as vocalized transition, it is necessary only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

14. The method according to claim 10, in which the classification of consecutive shots is a classification as a vocalized each vocalized frame with stable characteristics, and a frame classified as vocalized, only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

15. The method according to claim 10, in which the classification of consecutive shots is a classification as a frame attack each vocalized frame with stable characteristics, next after the frame is classified as newcaledonia frame or as newcaledonia transition.

16. The method according to claim 10, containing the definition of the classification of the successive frames of the encoded sound signal based on at least part of the following parameters: parameter normalized correlation parameter spectral tilt parameter signal-to-noise, the parameter stability of the fundamental tone, the parameter concerning the sustained fashion energy frame and a parameter of zero crossing.

17. The method according to claim 10, in which determining the classification of consecutive frames contains

the calculation of the quality factor based on the normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing; and

comparison of the quality factor with a threshold value to determine a classification.

18. The method according to item 16, containing the calculated normalized correlation based on the current weighted version of the speech signal and the previous weighted version of the specified speech signal.

19. The method according to item 16, containing the parameter estimation of the spectral slope as the ratio of energy concentrated at low frequencies, and the energy is concentrated at high frequencies.

20. The method according to item 16, containing the evaluation of the parameter signal-to-noise ratio as the ratio of the energy weighted version of the speech signal of the current frame and the energy of the error between the weighted version of the speech signal of the current frame and the weighted version of the synthesized speech signal of the current frame.

21. The method according to item 16, containing the calculation of the parameter stability of the primary colors in accordance with the estimated fundamental tone without feedback for the first half of the tech is the future of the frame, the second half of the current frame and proactive view.

22. The method according to item 16, containing the calculated relative energy of the frame as the difference between the energy of the current frame and the long-term average energy of the active speech frames.

23. The method according to clause 16, which contains the definition of the parameter transition through zero as the number of cases change the sign of the speech signal with the first polarity to the second polarity.

24. The method according to item 16, containing the calculation of at least one of the parameters: parameter normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing using available proactive view to account for the behavior of the speech signal in the next frame.

25. The method according to clause 16, which additionally contains the definition for classifying successive frames of the encoded sound signal based on the flag of the detection of voice activity.

26. The method according to claim 3, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition vocalized transition vocalized frame or frame of the attack; and

the definition of dropout options/restore contains the calculation of the parameter information about the energy with respect to the maximum energy of the signal for frames classified as vocalized frames or frames of the attack, and the calculated energy information in relation to an average energy reference for other frames.

27. The method according to claim 1, in which determining the encoder parameters mask/restore contains the calculation of the parameter information vocalizations.

28. The method according to item 27, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal;

moreover, the method comprises determining the classification of the successive frames of the encoded sound signal based on the normalized correlation; and

the calculation of the parameter information vocalizations contains the evaluation of the specified information vocalizations based on normalized correlation.

29. The method according to claim 1, in which the implementation of the masking erased frames and recovery in the decoder contains

after taking Nesterova neocaledonica of the erased frame after frame, the formation of which the non-periodic part of the excitation signal LP-filter;

following the reception, after the erased frame, Nesterova frame other than neocaledonica, the formation of the periodic part of the excitation signal LP-filter by repeating the last period of the fundamental tone of the previous frame.

30. The method according to clause 29, in which the formation of the periodic part of the excitation signal LP-filter provides filtering duplicate of the last period of the fundamental tone of the previous frame through a lowpass filter.

31. The method according to item 30, in which

the definition of dropout options/restore contains the calculation of the parameter information vocalizations;

the lowpass filter has a cutoff frequency; and

the formation of the periodic part of the excitation signal contains dynamic adjustment of the cutoff frequency with respect to the parameter information vocalizations.

32. The method according to claim 1, in which the implementation of the masking erased frames and recovery in the decoder includes generating a random non-periodic, the new part of the excitation signal LP-filter.

33. The method according to p, in which the generation of a random non-periodic, the new part of the excitation signal LP-filter includes generating a random noise.

34. The method according to p, in which the generation of random, nonperiodic, new hour and the excitation signal LP-filter includes generating a random index vector codebook innovations.

35. The method according to p, in which

the audio signal is a speech signal;

the definition of dropout options/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

generating random non-periodic, the new part of the excitation signal LP filter further comprises

filtering the new part of the excitation signal through a high-pass filter, if the last correctly received frame is different from neocaledonica; and

use only new part of the excitation signal, if the last correctly received frame is neoclitoris.

36. The method according to claim 1, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack;

the implementation of the masking erased frames and recovery in the decoder contains, when the frame of the attack lost that okazyvaets the presence vocalized frame, following the erased frame, and neocaledonica frame before erasing frames, restoring the lost frame attack by forming the periodic part of the excitation signal in the form subjected to low-pass filtering a periodic sequence of pulses separated by a period of the fundamental tone.

37. The method according to p, in which the implementation of the masking erased frames and recovery in the decoder further comprises the formation of a new part of the excitation signal through a standard decoding.

38. The method according to clause 37, in which the formation of a new part of the excitation signal contains a random selection of records codebook innovations.

39. The method according to p, in which the recovery of the lost frame attack has a constraint length of the restored frame of the attack, so that at least one whole period of the fundamental tone is generated by restoring the frame of the attack, and restore the frame of the attack continues until the end of the current Subhadra.

40. The method according to 39, in which the implementation of the masking erased frames and recovery in the decoder further comprises, after the restoration of the lost frame attack, resume regular processing CELP, the period of the fundamental tone is the rounded average of decterov is the R periods of the fundamental tone all subbarow, where used the recovery frames of the attack.

41. The method according to claim 3, in which the implementation of the masking erased frames and recovery in the decoder contains

energy management synthesized sound signal generated by the decoder, and energy management synthesized sound signal includes scaling the synthesized audio signal to reproduce the energy of the synthesized sound signal at the beginning of the first Nesterova frame received after the erased frame, similar to the energy of the synthesized signal at the end of the last frame erased during the erase personnel; and

the convergence of the energy of the synthesized sound signal in a received first mesterton frame to the energy corresponding to the accepted setting information about energy towards the end of the received first Nesterova frame while limiting the buildup of energy.

42. The method according to claim 3, in which

parameter energy information is not transmitted from the encoder to the decoder; and

the implementation of the masking erased frames and recovery in the decoder contains, if the gain of the LP filter of the first Nesterova frame taken after erasing frames greater than the gain of the LP filter of the last frame erased during the erase frames, adjusting the energy of the excitation signal LP-filter, is formed in the decoder during the received first Nesterova frame, to strengthen the LP-filter the received first Nesterova frame.

43. The method according to 42, in which

adjusting the energy of the excitation signal LP filter formed in the decoder during the received first Nesterova frame, to strengthen the LP-filter the received first Nesterova frame contains using the following relationship:

where E1energy at the end of the current frame, ELP0the energy of the impulse response of the LP filter for the last Nesterova frame taken before erasing frames, a ELP1the energy of the impulse response of the LP filter to the received first Nesterova frame following the erased frame.

44. The method according to paragraph 41, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

when the first sestertii frame received after erasing frames classified as the shot attack, the implementation of the masking erased frames and recovery in the decoder contains a restriction to a specific gain value used for masshtabiv is of synthesized sound signal.

45. The method according to paragraph 41, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

moreover, the method includes providing equality of amplification, used for scaling the synthesized sound signal at the beginning of the first Nesterova frame taken after erasing frames, the gain used at the end of a received first Nesterova frame:

during the transition from vocalized frame to newcaledonia frame, if the last sestertii frame, taken before erasing frames classified as vocalized transition, vocalized frame or the frame of the attack, and the first sestertii frame received after erasing frames classified as newcaledonia frame; and

during the transition from the inactive speech period of active speech, when the latter sestertii frame, taken before erasing frames, encoded as comfort noise, and the first sestertii frame received after erasing frames coded as active speech.

46. Method of masking erased frames of the encoded sound signal erased during transmission from the encoder to the decoder, moreover, the method includes

the definition in the encoder settings mask/restore; and transmitting to the decoder parameters mask/restore defined in the encoder.

47. The method according to item 46, optionally containing quantization of the coder parameters mask/restore before passing the specified parameters mask/restore in the decoder.

48. The method according to Pb in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

49. The method according to p, in which the parameter definition information about the phase contains the definition of the position of the first pulse, related to the glottis, in the frame of the encoded sound signal.

50. The method according to 49, in which the parameter definition information about the phase further comprises encoding in the encoder of the form, sign and amplitude of the first pulse, related to the glottis, and the transmission of coded form, sign and amplitude of the encoder to the decoder.

51. The method according to 49, in which the position of the first pulse, related to the glottis, contains

the measurement of the first pulse, related to the glottis, in the form of a count of the maximum amplitude in the period of the fundamental tone; and

the quantization of the position of reference the maximum amplitude in the period of the fundamental tone.

52. The method according to item 46, in which

the audio signal is a speech signal; and

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack.

53. The method according to paragraph 52, in which the classification of consecutive frames contain the classification as neocaledonica each frame, which is newcaledonia frame, each frame without active speech and each vocalized frame shift, the end of which tends to newcaledonia.

54. The method according to paragraph 52, in which the classification of consecutive frames contain the classification as neocaledonica transition of each neocaledonica frame having an end with a possible vocalized attack, which is too short or formed insufficient to handle as vocalized frame.

55. The method according to paragraph 52, in which the classification of consecutive frames contain the classification as vocalized transition of each vocalized frame with relatively weak vocalized characteristics, including vocalized frames with rapidly changing characteristics and vocals the series shifts, which last the whole frame, and the frame is classified as vocalized transition, it is necessary only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

56. The method according to paragraph 52, in which the classification of consecutive frames contain the classification as vocalized each vocalized frame with stable characteristics, and a frame classified as vocalized, only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

57. The method according to paragraph 52, in which the classification of consecutive frames contain the classification as a frame attack each vocalized frame with stable characteristics, next after the frame is classified as newcaledonia frame or as newcaledonia transition.

58. The method according to paragraph 52, that contains the definition for classifying successive frames of the encoded sound signal based on at least part of the following parameters: parameter normalized correlation parameter spectral tilt parameter signal-to-noise, the parameter stability of the fundamental tone, the parameter is the relative energy of the frame and a parameter of zero crossing.

59. The method according to 58, which define classification consistent to the wood contains

the calculation of the quality factor based on the normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing; and

comparison of the quality factor with a threshold value to determine a classification.

60. The method according to 58, containing the calculated normalized correlation based on the current weighted version of the speech signal and the previous weighted version of the speech signal.

61. The method according to 58 containing the parameter estimation of the spectral slope as the ratio of energy concentrated at low frequencies, and the energy is concentrated at high frequencies.

62. The method according to 58, containing the evaluation of the parameter signal-to-noise ratio as the ratio of the energy weighted version of the speech signal of the current frame and the energy of the error between the weighted version of the speech signal of the current frame and the weighted version of the synthesized speech signal of the current frame.

63. The method according to 58, containing the calculation of the parameter stability of the primary colors in accordance with the estimated fundamental tone without feedback for the first half of the current frame, the second half of the current frame and proactive view.

64. The method according to 58, containing the list of parameter relative energy of the frame as the difference between the energy of the current frame and the long-term average energy of the active speech frames.

65. The method according to 58, containing the definition of the parameter transition through zero as the number of cases change the sign of the speech signal with the first polarity to the second polarity.

66. The method according to 58 containing the calculation of at least one parameter from the normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing using the available preview to account for the behavior of the speech signal in the next frame.

67. The method according to 58, additionally contains the definition for classifying successive frames of the encoded sound signal based on the flag of the detection of voice activity.

68. The method according to p, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

the definition of dropout options/restore contains the calculation of the parameter information about the energy with respect to the maximum energy of the signal for frames classified the data as vocalized or frame of the attack, and the calculation of the parameter information about the energy relative to the average energy reference for other frames.

69. The method according to item 46, in which the definition in the encoder settings mask/restore contains the calculation of the parameter information vocalizations.

70. The method according to p, in which

the audio signal is a speech signal;

the definition in the encoder settings mask/restore contains the classification of the successive frames of the encoded sound signal;

moreover, the method comprises determining the classification of the successive frames of the encoded sound signal based on the normalized correlation; and

the calculation of the parameter information vocalizations contains the evaluation of the specified parameter information based on normalized correlation.

71. Method for masking frames of the sound signal erased during transmission of the audio signal from the encoder, the decoder, respectively, the form of the encoding parameters of the signal, and the method contains

the definition of the decoder parameters mask/restore of settings signal encoding;

the implementation of the decoder mask erased frames and recovery in the decoder in accordance with the parameters of the masking/recovery, defined in the decoder.

72. Pic is b on p, in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

73. The method according to p, in which

the audio signal is a speech signal; and

the definition of the decoder parameters mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack.

74. The method according to p, in which the definition of the decoder parameters mask/restore contains the calculation of the parameter information vocalizations.

75. The method according to p, in which the implementation of the masking erased frames and recovery in the decoder contains

after taking Nesterova neocaledonica frame after the Erasure frame, generating the non-periodic part of the excitation signal LP-filter;

following the reception, after the erase frame, Nesterova frame other than neocaledonica, the formation of the periodic part of the excitation signal LP-filter by repeating the last period of the fundamental tone of the previous frame.

76. The method according to item 75, in which the formation of the periodic part of the excitation signal includes filtering the repeat is the existing of the last period of the fundamental tone of the previous frame through a lowpass filter.

77. The method according to p, in which

the definition of the decoder parameters mask/restore contains the calculation of the parameter information vocalizations;

the lowpass filter has a cutoff frequency; and

the formation of the periodic part of the excitation signal LP-filter contains a dynamic adjustment of the cutoff frequency with respect to the parameter information vocalizations.

78. The method according to p, in which the implementation of the masking erased frames and recovery in the decoder includes generating a random non-periodic, the new part of the excitation signal LP-filter.

79. The method according to p, in which the generation of a random non-periodic, the new part of the excitation signal LP-filter includes generating a random noise.

80. The method according to p, in which the generation of a random non-periodic, the new part of the excitation signal LP-filter includes generating a random index vector codebook innovations.

81. The method according to p, in which

the audio signal is a speech signal;

the definition of the decoder parameters mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized p is the transfer, vocalized frame or a frame of the attack; and

generating random non-periodic, the new part of the excitation signal LP filter further comprises filtering the new part of the excitation signal LP filter with high-pass filter, if adopted last sestertii frame differs from neocaledonica; and

use only new part of the excitation signal LP-filter, if adopted last sestertii frame is neoclitoris.

82. The method according to p, in which

the audio signal is a speech signal;

the definition of the decoder parameters mask/restore contains the classification of the successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack;

the implementation of the masking erased frames and recovery in the decoder contains, if the frame of the attack is lost, as indicated by the presence of vocalized frame following the erased frame, and neocaledonica frame before erasing frames, restoring the lost frame attack by forming the periodic part of the excitation signal in the form subjected to low-pass filtering periodic placentas the activity of pulses, separated by a period of the fundamental tone.

83. The method according to p, in which the implementation of the masking erased frames and recovery in the decoder further comprises the formation of a new part of the excitation signal LP filter with standard decoding.

84. The method according to p, in which the formation of a new part of the excitation signal LP-filter contains a random selection of records codebook innovations.

85. The method according to p, in which the recovery of the lost frame attack has a constraint length of the restored frame of the attack, so that at least one whole period of the fundamental tone is generated by restoring the frame of the attack, and the recovery continues until the end of Subhadra.

86. The method according to p, in which the implementation of the masking erased frames and recovery in the decoder further comprises, after the restoration of the lost frame attack, resume regular processing CELP, and the period of the fundamental tone is the rounded average of the decoded periods of the fundamental tone all subbarow using the recovery frames of the attack.

87. The method according to item 72, in which

parameter energy information is not transmitted from the encoder to the decoder; and

the implementation of the masking erased frames and restore Dec the Dera contains, if the gain of the LP filter of the first Nesterova frame taken after erasing frames greater than the gain of the LP filter of the last frame erased during the erase frames, adjusting the energy of the excitation signal LP-filter in the decoder during the received first Nesterova frame, to strengthen the LP-filter the received first Nesterova frame, using the following relationship:

where E1energy at the end of the current frame, ELP0the energy of the impulse response of the LP filter for the last Nesterova frame taken before erasing frame, a ELP1the energy of the impulse response of the LP filter to the received first Nesterova frame following the erased frame.

88. A device for implementing the masking frames of the encoded sound signal erased during transmission from the encoder to the decoder, and the device contains

means for determining, in the encoder settings mask/restore;

means for transmitting to the decoder parameters mask/restore defined in the encoder; and

a means for implementing masking erased frames and recovery in the decoder in accordance with the parameters of the mask/restore, certain means of determining.

89. Elimination of the ETS on p, additionally contains means for quantization in the encoder settings mask/ restore before passing parameters mask/restore in the decoder.

90. The device according to p in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

91. The device according to p, in which the means for determining the parameter information about the phase contains the means for determining the position of the first pulse, related to the glottis, in the frame of the encoded sound signal.

92. The device according to p, in which the means for implementing masking erased frames and restore the decoder includes a tool for the recovery of the decoder in accordance with a specific position of the first pulse, related to the glottis, after at least one lost frame vocalized attack.

93. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder includes means for forming, with the loss of at least one frame of the attack, part of a periodic excitation in the form subjected to low-pass filtering a periodic sequence of pulses separated by a period principal is she.

94. The device according to p containing means for quantizing the position of the first pulse, related to the glottis, before passing the specified position of the first pulse, related to the glottis, in the decoder; and

moreover, the means for forming the periodic part of the excitation provides a means to implement subjected to low-pass filtering a periodic sequence of impulses by:

centering the first impulse response of the lowpass filter on the quantized position of the first pulse, related to the glottis, in accordance with the beginning of the frame; and

the premise of the rest of the impulse response of the lowpass filter at a distance corresponding to the average value of the basic tone from the previous impulse response up to the end of the last Subhadra affected by the formation of the periodic part of the excitation.

95. The device according to p, in which the means for determining the parameter information about the phase further comprises means for encoding in the encoder of the form, sign and amplitude of the first pulse, related to the glottis, and means for transmitting coded form, sign and amplitude of the encoder to the decoder.

96. The device according to p, in which the means for determining the position of the first pulse, putting the I to the glottis, contains

means for measuring the first pulse, related to the glottis, in the form of a count of the maximum amplitude in the period of the fundamental tone; and

means for quantizing the position of the reference maximum amplitude in the period of the fundamental tone.

97. The device according to p, in which

the audio signal is a speech signal; and

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack.

98. The device according to p, in which the means for classifying successive frames includes a tool for classification as neocaledonica each frame, which is neoclitoris frame, each frame without active speech and each vocalized frame shift, the end of which tends to newcaledonia frame.

99. The device according to p, in which the means for classifying successive frames includes a tool for classification as neocaledonica transition of each neocaledonica frame having an end with a possible vocalized attack, which is too short or formed in is not the full extent for processing as vocalized frame.

100. The device according to p, in which the means for classifying successive frames includes a tool for classification as vocalized transition of each vocalized frame with relatively weak vocalized characteristics, including vocalized frames with rapidly changing characteristics and vocalized shifts that last the whole frame, and the frame is classified as vocalized transition, it is necessary only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

101. The device according to p, in which the means for classifying successive frames includes a tool for classification as vocalized each vocalized frame with stable characteristics, and a frame classified as vocalized, only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

102. The device according to p, in which the means for classifying successive frames includes a tool for classification as a frame attack each vocalized frame with stable characteristics, next after the frame is classified as newcaledonia frame or as newcaledonia transition.

103. The device according to p containing means is for determining the classification of the successive frames of the encoded sound signal on the basis of, at least part of the following parameters: parameter normalized correlation parameter spectral tilt parameter signal-to-noise, the parameter stability of the fundamental tone, the parameter is the relative energy of the frame and a parameter of zero crossing.

104. The device according to p, in which the means for determining the classification of consecutive frames contains

means for calculating the quality factor based on the normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing; and

means for comparing the quality factor with a threshold value to determine a classification.

105. The device according to p containing means for calculating a parameter normalized correlation based on the current weighted version of the speech signal and the previous weighted version of the speech signal.

106. The device according to p containing means for estimating the parameter of the spectral slope as the ratio of energy concentrated at low frequencies, and the energy is concentrated at high frequencies.

107. The device according to p containing means for estimating the parameter of the signal-to-noise ratio as the ratio of the energy weighted version of R. the key signal of the current frame and the energy of the error between the weighted version of the speech signal of the current frame and the weighted version of the synthesized speech signal of the current frame.

108. The device according to p containing means for calculating the parameter stability of the primary colors in accordance with the estimated fundamental tone without feedback for the first half of the current frame, the second half of the current frame and proactive view.

109. The device according to p containing means for calculating a parameter of the relative energy of the frame as the difference between the energy of the current frame and the long-term average energy of the active speech frames.

110. The device according to p containing a means for determining the parameter of the zero-crossing as the number of cases change the sign of the speech signal with the first polarity to the second polarity.

111. The device according to p containing means for calculating at least one parameter normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing using available proactive view to account for the behavior of the speech signal in the next frame.

112. The device according to p, further containing a means for determining the classification of the successive frames of the encoded sound signal based on the flag of the detection of voice activity.

113. The device according to p, in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

means for determining parameters mask/restore provides a tool to calculate the parameter information about the energy with respect to the maximum energy of the signal for frames classified as vocalized frame or the frame of the attack, and the means for calculating the parameter information about the energy relative to the average energy reference for other frames.

114. The device according to p, in which the means for determining the encoder parameters mask/restore provides a tool to calculate the parameter information vocalizations.

115. The device according to 114, in which the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal;

moreover, the device comprises a means for determining the classification of the successive frames of the encoded sound signalone the normalized correlation; and

means for calculating the parameter information vocalizations provides a tool for the evaluation of the specified information vocalizations based on normalized correlation.

116. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder contains

means for generating, after receiving Nesterova neocaledonica frame after frame erase, non-periodic part of the excitation signal LP-filter;

means for forming, after receiving Nesterova other than neocaledonica frame after frame erase, the periodic part of the excitation signal LP-filter by repeating the last period of the fundamental tone of the previous frame.

117. The device according to p, in which the means for forming the periodic part of the excitation signal LP-filter includes a lowpass filter for filtering duplicate of the last period of the fundamental tone of the previous frame.

118. The device according to p, in which

means for determining parameters mask/restore provides a tool to calculate the parameter information vocalizations;

the lowpass filter has a cutoff frequency; and

the means for forming the periodic part of the excitation signal includes means for DynamicResource the cut-off frequency with respect to the parameter information vocalizations.

119. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder includes a tool to generate random non-periodic, the new part of the excitation signal LP-filter.

120. The device according to p, in which the means for generating a random non-periodic, the new part of the excitation signal LP-filter includes a tool to generate random noise.

121. The device according to p, in which the means for generating a random non-periodic, the new part of the excitation signal LP-filter includes a tool to generate random index vector codebook innovations.

122. The device according to p, in which

the audio signal is a speech signal;

means for determining parameters mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

means for generating a random non-periodic, the new part of the excitation signal LP filter further comprises:

the high-pass filter for filtering the new part with the persecuted excitation; and

use only new part of the excitation signal, if the last correctly received frame is neoclitoris.

123. The device according to p, in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack;

a means for implementing masking erased frames and recovery in the decoder includes a tool to recover the lost frame attack, when the frame of the attack is lost, as indicated by the presence of vocalized frame following the erased frame, and neocaledonica frame before erasing frames, by forming the periodic part of the excitation signal in the form subjected to low-pass filtering a periodic sequence of pulses separated by a period of the fundamental tone.

124. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder further comprises a means for forming a new part of the excitation signal through the zoom is decoding.

125. The device according to p, in which the means for forming a new part of the excitation signal includes a means for random selection of records codebook innovations.

126. The device according to p, in which the means for recovering the lost frame attack provides a means to limit the length of the restored frame of the attack, so that at least one whole period of the fundamental tone is generated by restoring the frame of the attack, and the specified recovery continues until the end of Subhadra.

127. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder further comprises means for resuming, after the restoration of the lost frame attack, regular processing CELP, and the period of the fundamental tone is the rounded average of the decoded periods of the fundamental tone all subbarow using the recovery frames of the attack.

128. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder contains

means to control the energy of the synthesized sound signal generated by the decoder, and means to control the energy of the synthesized sound signal includes means for scaling the si is testirovanie audio signal for playback of energy specified synthesized sound signal at the beginning of the first Nesterova frame, adopted after erasing frames, similar to the energy of the synthesized signal at the end of the last frame erased during the erase personnel; and

means to converge the energy of the synthesized sound signal in a received first mesterton frame to the energy corresponding to the accepted setting information about energy towards the end of the received first Nesterova frame while limiting the buildup of energy.

129. The device according to p, in which

parameter energy information is not transmitted from the encoder to the decoder; and

a means for implementing masking erased frames and recovery in the decoder includes means for adjusting the energy of the excitation signal LP filter formed in the decoder during the received first Nesterova frame, to strengthen the LP-filter the received first Nesterova frame, if the gain of the LP filter of the first Nesterova frame taken after erasing frames greater than the gain of the LP filter of the last erased frame.

130. The device according to p, in which

means for adjusting the energy of the excitation signal LP filter formed in the decoder during the received first Nesterova frame, to strengthen the LP-filter the received first Nesterova frame includes means for using the following relationship:

where E1energy at the end of the current frame, ELP0the energy of the impulse response of the LP filter for the last Nesterova frame taken before erasing frame, a ELP1the energy of the impulse response of the LP filter to the received first Nesterova frame following the erased frame.

131. The device according to p, in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

if the first sestertii frame received after erasing frames classified as the shot attack, a means for implementing masking erased frames and recovery in the decoder includes means for limiting to a specified gain value used for scaling the synthesized sound signal.

132. The device according to p, in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal is AK neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

moreover, the device contains a means to ensure equality of amplification, used for scaling the synthesized sound signal at the beginning of the first Nesterova frame taken after erasing frames, the gain used at the end of this first adopted Nesterova frame:

during the transition from vocalized frame to newcaledonia frame, if the last sestertii frame, taken before erasing frames classified as vocalized transition, vocalized frame or the frame of the attack, and the first sestertii frame received after erasing frames classified as newcaledonia frame; and

during the transition from the inactive speech period of active speech, if the last sestertii frame, taken before erasing frames are encoded comfort noise, and the first sestertii frame received after erasing frames is encoded as active speech.

133. A device for implementing the masking frames of the encoded sound signal erased during transmission from the encoder to the decoder, containing a means for determining the encoder parameters mask/restore; and

means for transmitting to the decoder parameters mask/restore, is certain in the encoder.

134. The device according to p additionally contains means for quantization in the encoder settings mask/restore before passing the specified parameters mask/restore in the decoder.

135. The device according to p in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

136. The device according to p, in which the means for determining the parameter information about the phase contains the means for determining the position of the first pulse, related to the glottis, in the frame of the encoded sound signal.

137. The device according to p, in which the means for determining the parameter information about the phase further comprises means for encoding in the encoder of the form, sign and amplitude of the first pulse, related to the glottis, and means for transmitting coded form, sign and amplitude of the encoder to the decoder.

138. The device according to p, in which the means for determining the position of the first pulse, related to the glottis, contains

means for measuring the first pulse, related to the glottis, in the form of a count of the maximum amplitude in the period of the fundamental tone; and

means for quantizing the position of the reference maximum amplitude in the period basis is its tone.

139. The device according to p, in which the audio signal is a speech signal; and means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack.

140. The device according to p, in which the means for classifying successive frames includes a tool for classification as neocaledonica each frame, which is neoclitoris frame, each frame without active speech and each vocalized frame shift, the end of which tends to newcaledonia frame.

141. The device according to p, in which the means for classifying successive frames includes a tool for classification as neocaledonica transition of each neocaledonica frame having an end with a possible vocalized attack, which is too short or formed insufficient to handle as vocalized frame.

142. The device according to p, in which the means for classifying successive frames includes a tool for classification as vocalized transition of each vocalized frame with relatively weak is vocalized characteristics, including vocalized frames with rapidly changing characteristics and vocalized shifts that last the whole frame, and the frame is classified as vocalized transition, it is necessary only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

143. The device according to p, in which the means for classifying successive frames includes a tool for classification as vocalized frame each vocalized frame with stable characteristics, and a frame classified as vocalized, only for frames classified as vocalized transition, vocalized frame or the frame of the attack.

144. The device according to p, in which the means for classifying successive frames includes a tool for classification as a frame attack each vocalized frame with stable characteristics, next after the frame is classified as newcaledonia frame or as newcaledonia transition.

145. The device according to p containing a means for determining the classification of the successive frames of the encoded sound signal based on at least part of the following parameters: parameter normalized correlation parameter spectral tilt parameter relationships signalscan, the parameter stability of the fundamental tone, the parameter is the relative energy of the frame and a parameter of zero crossing.

146. The device according to p, in which the means for determining the classification of consecutive frames contains

means for calculating the quality factor based on the normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing; and

means for comparing the quality factor with a threshold value to determine a classification.

147. The device according to p containing means for calculating a parameter normalized correlation based on the current weighted version of the speech signal and the previous weighted version of the speech signal.

148. The device according to p containing means for estimating the parameter of the spectral slope as the ratio of energy concentrated at low frequencies, and the energy is concentrated at high frequencies.

149. The device according to p containing means for estimating the parameter of the signal-to-noise ratio as the ratio of the energy weighted version of the speech signal of the current frame and the energy of the error between the weighted version of the speech signal of the current frame and the weighted version of the synthesized p the key signal of the current frame.

150. The device according to p containing means for calculating the parameter stability of the primary colors in accordance with the estimated fundamental tone without feedback for the first half of the current frame, the second half of the current frame and proactive view.

151. The device according to p containing means for calculating a parameter of the relative energy of the frame as the difference between the energy of the current frame and the long-term average energy of the active speech frames.

152. The device according to p containing a means for determining the parameter of the zero-crossing as the number of cases change the sign of the speech signal with the first polarity to the second polarity.

153. The device according to item 45, containing means for calculating at least one of the parameters: parameter normalized correlation parameter spectral tilt parameter signal-to-noise parameter stability the main tone, setting the relative energy of the frame and a parameter of zero crossing using available proactive view to account for the behavior of the speech signal in the next frame.

154. The device according to p, further containing a means for determining the classification of the successive frames of the encoded sound signal based on the flag of the detection of voice activity.

155. The device according to p,in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

means for determining parameters mask/restore provides a tool to calculate the parameter information about the energy with respect to the maximum energy of the signal for frames classified as vocalized or as footage of the attack, and the means for calculating the parameter information about the energy relative to the average energy reference for other frames.

156. The device according to p, in which the means for determining the encoder parameters mask/restore provides a tool to calculate the parameter information vocalizations.

157. The device according to p, in which

the audio signal is a speech signal;

means for determining, in the encoder settings mask/restore provides a tool for classifying successive frames of the encoded sound signal;

moreover, the device comprises a means for determining the classification of consecutive frames cadiovascular signal based on the normalized correlation; and

means for calculating the parameter information vocalizations provides a tool for estimating parameter information vocalizations based on normalized correlation.

158. Device for masking frames of the sound signal erased during transmission of the audio signal from the encoder, the decoder, respectively, the form of the encoding parameters of the signal, and the device contains

a means for determining the decoder parameters mask/restore of settings signal encoding;

in the decoder, a tool for the implementation of the masking erased frames and recovery of the decoder in accordance with the parameters of the mask/restore, certain means of determining.

159. The device according to p in which the parameters mask/restore selected from the group consisting of a parameter signal classification, parameter energy information and parameter information about the phase.

160. The device according to p, in which the audio signal is a speech signal; and means for determining the decoder parameters mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack.

161. The mouth of ousto on p, in which the means for determining the decoder parameters mask/restore provides a tool to calculate the parameter information vocalizations.

162. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder contains

means for generating, after receiving Nesterova neocaledonica frame after frame erase, non-periodic part of the excitation signal LP-filter;

means for forming, after the reception, after erasing frames Nesterova frame other than neocaledonica Nesterova frame periodic part of the excitation signal LP-filter by repeating the last period of the fundamental tone of the previous frame.

163. The device according to p, in which the means for forming the periodic part of the excitation signal contains a low-pass filter for filtering duplicate of the last period of the fundamental tone of the previous frame.

164. The device according to p, in which

a means for determining the decoder parameters mask/restore provides a tool to calculate the parameter information vocalizations;

the lowpass filter has a cutoff frequency; and

the means for forming the periodic part of the excitation signal LP-filter includes means for DynamicResource the cut-off frequency with respect to the parameter information vocalizations.

165. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder includes a tool to generate random non-periodic, the new part of the excitation signal LP-filter.

166. The device according to p, in which the means for generating a random non-periodic, the new part of the excitation signal LP-filter includes a tool to generate random noise.

167. The device according to p, in which the means for generating a random non-periodic, the new part of the excitation signal LP-filter includes a tool to generate random index vector codebook innovations.

168. The device according to p, in which

the audio signal is a speech signal;

a means for determining the decoder parameters mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame of the attack; and

means for generating a random non-periodic, the new part of the excitation signal LP filter further comprises

the high-pass filter for filtering the newly-introduced the second part of the excitation signal LP-filter, if adopted last sestertii frame differs from neocaledonica; and

use only new part of the excitation signal LP-filter, if adopted last sestertii frame is neoclitoris.

169. The device according to p, in which

the audio signal is a speech signal;

a means for determining the decoder parameters mask/restore provides a tool for classifying successive frames of the encoded sound signal as neocaledonica frame, neocaledonica transition, vocalized transition, vocalized frame or frame attack;

a means for implementing masking erased frames and recovery in the decoder includes a tool to recover the lost frame attack, if the frame of the attack is lost, as indicated by the presence of vocalized frame after the Erasure frames, and neocaledonica frame before erasing frames by forming the periodic part of the excitation signal in the form subjected to low-pass filtering a periodic sequence of pulses separated by a period of the fundamental tone.

170. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder further comprises means for forms is of the new part of the excitation signal LP filter with standard decoding.

171. The device according to p, in which the means for forming a new part of the excitation signal LP-filter contains a means for random selection of records codebook innovations.

172. The device according to p, in which the means for recovering the lost frame attack provides a means to limit the length of the formed frame attack, so at least one whole period of the fundamental tone is generated by restoring the frame of the attack, and the specified recovery continues until the end of Subhadra.

173. The device according to p, in which the means for implementing masking erased frames and recovery in the decoder further comprises means for resuming, after the restoration of the lost frame attack, regular processing CELP, and the period of the fundamental tone is the rounded average of the decoded periods of the fundamental tone all subbarow using the recovery frames of the attack.

174. The device according to p, in which

parameter energy information is not transmitted from the encoder to the decoder; and

a means for implementing masking erased frames and recovery in the decoder includes means for adjusting the energy of the excitation signal LP filter formed in the decoder during the received first mastertag the frame, if the gain of the LP filter of the first Nesterova frame taken after erasing frames greater than the gain of the LP filter of the last frame erased during the erase frames, to strengthen the LP-filter the received first Nesterova frame, using the following relationship:

where E1energy at the end of the current frame, ELP0the energy of the impulse response of the LP filter for the last Nesterova frame taken before erasing frame, a ELP1the energy of the impulse response of the LP filter to the received first Nesterova frame following the erased frame.

175. System for encoding and decoding an audio signal containing

the coder of the audio signal, responsive to the audio signal, to generate a set signal encoding;

means for passing parameters to encode the signal in the decoder;

decoder for synthesizing a sound signal in accordance with the encoding parameters of the signal; and

device according to any one of p-132 for masking frames of the encoded sound signal erased during transmission from the encoder to the decoder.

176. A decoder for decoding the encoded audio signal containing

means, responsive to the encoded audio signal, to recover from Kodirov the frame of the audio signal set signal encoding;

means for synthesizing a sound signal in accordance with the encoding parameters of the signal; and

device according to any one of p-174 for masking frames of the encoded sound signal erased during transmission from the encoder to the decoder.

177. An encoder for encoding an audio signal containing means, responsive to the audio signal, to generate a set signal encoding;

means for transmitting the set of encoding parameters of a signal in the decoder in accordance with the encoding parameters of the signal to recover the audio signal; and a device according to any one of p-157 for masking frames erased during transmission of the encoding parameters of the signal from the encoder to the decoder.



 

Same patents:

FIELD: method for transmitting audio signals between transmitter and at least one receiver using priority pixel transmission method.

SUBSTANCE: in accordance to the invention, an audio signal is separated onto certain number n of spectral components, separated audio signal is stored in two-dimensional matrix with a set of fields with frequency and time as sizes and amplitude as corresponding value recorded in the field, then each separate field and at least two adjacent fields groups are formed and priority is assigned to certain groups, where priority of one group is selected the higher, the higher are amplitudes of group values and/or the higher are amplitude differences of values of one group and/or the closer the group is connected actual time, and groups are transmitted to receiver in the order of their priority.

EFFECT: ensured transmission of audio signals without losses even when the width of transmission band is low.

7 cl, 1 dwg

FIELD: audio signal encoding technology.

SUBSTANCE: in accordance to the method, at least a part of an audio signal is encoded to produce encoded signal, where the encoding includes encoding with prediction relatively to aforementioned at least a part of audio signal to produce prediction coefficients which represent time characteristics, such as time envelope curve, of aforementioned at least a part of audio signal, transformation of prediction coefficients to a set of times, which represent prediction coefficients, and inclusion of aforementioned set of times into encoded signal. For analysis/synthesis of overlapping frames relatively to time envelope curve, excessiveness in representation of linear spectrum for overlapping area may be used.

EFFECT: improved method for encoding at least a part of an audio signal.

2 cl, 7 dwg

FIELD: method for encoding a signal, in particular, sound signal.

SUBSTANCE: in accordance to the method, first set of values is provided, which is related to serial spans of time in first time interval of signal; second set of values is provided, which is related to successive periods of time in second time interval of signal; where first time interval has certain overlapping with second time interval; aforementioned overlapping contains at least two successive time periods of second interval; where at least one of values of second set, which are related to at least two successive time periods in aforementioned overlapping, is encoded relatively to the value of first set, which is closer in time to at least one value of second set, than any other value in second set.

EFFECT: increased efficiency of signal encoding.

9 cl, 4 dwg

FIELD: digital processing of speech signals.

SUBSTANCE: in accordance to the invention, during encoding and decoding of input stream frames, at compression and decompression sections, an algorithm for processing digital counts is used, which is based on computing coefficients of linear prediction with usage of scalar operations.

EFFECT: reduced entropy of signal being transmitted due to encoding of input stream frames.

2 cl, 2 dwg

FIELD: speech encoding.

SUBSTANCE: method and device for quantizing amplification for realization in the method for encoding digitized sound signal, processed during encoding in serial frames from L selections, where each frame is divided onto a certain number of sub-frames and each sub-frame contains a certain number of N selections, where N<L. In the method and device for quantizing amplification the original amplification of main tone is computed on basis of a certain number f of sub-frames, a part of code book of amplification quantization is selected relatively to original amplification of main tone, and amplifications of main tone and fixed code book are quantized together. Aforementioned combined quantization of main tone and fixed code book amplifications contains for a certain number f of sub-frames the search in amplification quantization code book in connection to a search criterion. Search in code book is limited to selected part of amplification quantization code book and to index of selected part of amplification quantization code book, which best corresponds to found search criterion.

EFFECT: increased traffic capacity of system.

8 cl, 3 tbl, 4 dwg

FIELD: analysis of sound signal quality, possible use for estimating quality of speech transferred through radio communication channels.

SUBSTANCE: in accordance to the method for machine estimation of sound signal quality, the signal is divided onto critical bands and spectral energy values are computed for critical bands, values of spectral likeness of active phase of fragments are determined, and quality of tested sound signal is determined by means of weighted linear combination of aforementioned quality values for each phase. The difference of the method is that selected fragments of active and inactive phase of both signals are synchronized, inactive phase spectrums are determined for each fragment, resulting spectrums of active and inactive phase of fragments are divided onto additional sets of bands, for each one of which spectral energy values are computed, resulting spectral energies of active and inactive fragment phases are compared in couples, to determine spectral likeness coefficients, resulting likeness coefficient for each phase is determined as an average value of likeness coefficients for all sets of bands, which is the estimate of quality of each phase.

EFFECT: ensured universality and optimized quality of estimation process depending on purposes of estimation.

5 cl, 13 dwg, 6 tbl

FIELD: encoding of audio-signals, in particular, encoding of multi-channel audio signals.

SUBSTANCE: in accordance to the invention, polyphonic signals are used for creation of main signal, typically, a signal and a collateral signal. A row of encoding schemes of collateral signal (xside) is implemented, each encoding scheme is characterized by a set of sub-frames of varying length, while total length of sub-frames corresponds to encoding frame length of encoding scheme. Encoding scheme for collateral signal (xside) is selected on basis of current content of polyphonic signals, and collateral remainder signal is created as a difference between collateral signal and main signal, scaled with usage of balancing coefficient, which is selected for minimization of collateral remainder signal. Optimized collateral remainder signal and balancing coefficient are encoded and implemented as encoding parameters, representing the collateral signal.

EFFECT: increased quality of perception of multi-channel sound signals.

5 cl, 15 dwg

FIELD: systems/methods for filtering signals.

SUBSTANCE: in accordance to invention, filtration of input signal is performed for generation of first filtered signal; first filtered signal is combined with aforementioned input signal for production of difference signal, while stage of filtering of input signal for producing first filtered signal contains: stage of production of at least one delayed, amplified and filtered signal, and production stage contains: storage of signal, related to aforementioned input signal in a buffer; extraction of delayed signal from buffer, filtration of signal for forming at least one second filtered signal, while filtration is stable and causative; amplification of at least one signal by amplification coefficient, while method also contains production of aforementioned first filtered signal, basing on at least one aforementioned delayed, amplified and filtered signal.

EFFECT: development of method for filtering signal with delay cycle.

10 cl, 10 dwg

FIELD: analysis and synthesis of speech information outputted from computer, possible use in synthesizer-informers in mass transit means, communications, measuring and technological complexes and during foreign language studies.

SUBSTANCE: method includes: analog-digital conversion of speech signal; segmentation of transformed signal onto elementary speech fragments; determining of vocalization of each fragment; determining, for each vocalized elementary speech segment, of main tone frequency and spectrum parameters; analysis and changing of spectrum parameters; and synthesis of speech sequence. Technical result is achieved because before synthesis, in vocalized segments periods of main tone of each such segment are adapted to zero starting phase by means of transferring digitization start moment in each period of main tone beyond the point of intersection of contouring line with zero amplitude, distortions appearing at joining lines of main tone periods are smoothed out and, during transformation of additional count in the end of modified period of main tone, re-digitization of such period is performed while preserving its original length.

EFFECT: improved quality of produced modulated signal, allowing more trustworthy reproduction of sounds during synthesis of speech signal.

2 cl, 8 dwg

FIELD: speech activity transmission systems in distributed system of voice recognition.

SUBSTANCE: distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.

EFFECT: reduction in overloading of circuit; reduced delay and increased efficiency of voice recognition.

3 cl, 8 dwg, 2 tbl

FIELD: electric communication, namely systems for data transmitting by means of digital communication lines.

SUBSTANCE: method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.

EFFECT: possibility for correcting errors occurred in transmitted digital trains by action of unstable parameters of communication systems and realizing telephone calls by means of low-speed digital communication lines.

5 cl, 20 dwg

FIELD: communication systems.

SUBSTANCE: method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.

EFFECT: higher efficiency.

4 cl, 4 dwg

FIELD: technologies for encoding audio signals.

SUBSTANCE: method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.

EFFECT: higher efficiency.

4 cl, 5 dwg

FIELD: speech recording/reproducing devices.

SUBSTANCE: during encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.

EFFECT: broader functional capabilities, higher efficiency.

3 cl, 24 dwg

FIELD: digital speech encoding.

SUBSTANCE: speech compression system provides encoding of speech signal into bits flow for later decoding for generation of synthesized speech, which contains full speed codec, half speed codec, one quarter speed codec and one eighth speed codec, which are selectively activated on basis of speed selection. Also, codecs of full and half speed are selectively activated on basis of type classification. Each codec is activated selectively for encoding and decoding speech signal for various speeds of transfer in bits, to accent different aspects of speech signal to increase total quality of synthesized speech signal.

EFFECT: optimized width of band, required for bits flow, by balancing between preferred average speed of transfer in bits and perception quality of restored speech.

11 cl, 12 dwg, 9 tbl

FIELD: medicine.

SUBSTANCE: method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.

EFFECT: partially retained speech spectrum.

FIELD: method and device for efficiency compression of audio signal to acoustic signal of level III of MPEG-1 standard with low information transfer speed.

SUBSTANCE: in accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.

EFFECT: provision of efficient compression of signal at low speed by compressing changing part of signal only by means of modified discontinuous cosine transformation.

5 cl, 11 dwg

FIELD: speech activity transmission systems in distributed system of voice recognition.

SUBSTANCE: distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.

EFFECT: reduction in overloading of circuit; reduced delay and increased efficiency of voice recognition.

3 cl, 8 dwg, 2 tbl

FIELD: analysis and synthesis of speech information outputted from computer, possible use in synthesizer-informers in mass transit means, communications, measuring and technological complexes and during foreign language studies.

SUBSTANCE: method includes: analog-digital conversion of speech signal; segmentation of transformed signal onto elementary speech fragments; determining of vocalization of each fragment; determining, for each vocalized elementary speech segment, of main tone frequency and spectrum parameters; analysis and changing of spectrum parameters; and synthesis of speech sequence. Technical result is achieved because before synthesis, in vocalized segments periods of main tone of each such segment are adapted to zero starting phase by means of transferring digitization start moment in each period of main tone beyond the point of intersection of contouring line with zero amplitude, distortions appearing at joining lines of main tone periods are smoothed out and, during transformation of additional count in the end of modified period of main tone, re-digitization of such period is performed while preserving its original length.

EFFECT: improved quality of produced modulated signal, allowing more trustworthy reproduction of sounds during synthesis of speech signal.

2 cl, 8 dwg

FIELD: systems/methods for filtering signals.

SUBSTANCE: in accordance to invention, filtration of input signal is performed for generation of first filtered signal; first filtered signal is combined with aforementioned input signal for production of difference signal, while stage of filtering of input signal for producing first filtered signal contains: stage of production of at least one delayed, amplified and filtered signal, and production stage contains: storage of signal, related to aforementioned input signal in a buffer; extraction of delayed signal from buffer, filtration of signal for forming at least one second filtered signal, while filtration is stable and causative; amplification of at least one signal by amplification coefficient, while method also contains production of aforementioned first filtered signal, basing on at least one aforementioned delayed, amplified and filtered signal.

EFFECT: development of method for filtering signal with delay cycle.

10 cl, 10 dwg

Up!