Alternating frame length encoding optimized for precision

FIELD: encoding of audio-signals, in particular, encoding of multi-channel audio signals.

SUBSTANCE: in accordance to the invention, polyphonic signals are used for creation of main signal, typically, a signal and a collateral signal. A row of encoding schemes of collateral signal (xside) is implemented, each encoding scheme is characterized by a set of sub-frames of varying length, while total length of sub-frames corresponds to encoding frame length of encoding scheme. Encoding scheme for collateral signal (xside) is selected on basis of current content of polyphonic signals, and collateral remainder signal is created as a difference between collateral signal and main signal, scaled with usage of balancing coefficient, which is selected for minimization of collateral remainder signal. Optimized collateral remainder signal and balancing coefficient are encoded and implemented as encoding parameters, representing the collateral signal.

EFFECT: increased quality of perception of multi-channel sound signals.

5 cl, 15 dwg

 

The technical field to which the invention relates.

The present invention relates to the encoding of audio signals, in particular to the coding of multichannel audio signals.

Prior art

On the market there is a great need to transfer and store the audio signals at low bit rate, while maintaining high quality sound. In particular, in cases where transmission resources or memory is limited, work with a low bit rate is the most important cost factor. In a typical case this takes place, for example, in streaming applications and messaging in mobile communication systems, for example GSM, UMTS or CDMA.

At the present time there is no standardized codecs, providing high stereo sound quality on the transmission speed in bits that are of economic interest for use in mobile communication systems. Using the available codecs possible monaural sound signals. Also available stereo transmission to some extent. However, restrictions on the transfer rate in bits usually require fairly drastic restrictions on the representation of the stereo.

The easiest way stereo or multi-channel encoded audio code is encoded signals of different channels separately as a separate and independent signals. Another basic method used in FM-stereotyperider, which provides compatibility with traditional mono radios to transmit sum and difference signal of the two channels used.

Modern audio codecs, such as MPEG-1/2 Layer III and MPEG-2/4 AAC apply the so-called United stereomovie. According to this method, the signals of different channels are processed together, and not separately and one after the other. The two most commonly used method combined stereociliary known as stereomovie method "Mid/Side (M/S) and stereomovie intensity, which usually apply for sub-bands of a stereo or multi-channel signals, which must be encoded.

M/S-stereomovie procedure similar to that described in FM-stereotyperider in the sense that it encodes and transmits the sum and difference signals of sub-bands of the channel and thereby takes advantage of the redundancy between the sub-bands of the channel. The structure and operation of the encoder based on the M/S-stereomotion described, for example, in U.S. patent No. 5285498, in the name J.D.Johnston.

Stereomovie intensity, on the other hand, can use prelevement stereo. It transmits the combined intensity channels (different ranges) together with some information about the location is ogenyi, showing how the intensity is distributed among the channels. Stereomovie intensity not only provides information about the spectral amplitude channels. Information about the phases is not transmitted. For this reason, and because the temporary channel information (more specifically, the interchannel time difference) is the most important psychoacoustic significance, especially at lower frequencies, stereomovie intensity can only be used at high frequencies of about, for example, 2 KHz. How stereociliary intensity is described, for example, in European patent No. 0497413, in the name of R, Veldhuis et al.

A recently developed method of stereociliary described, for example, in the proceedings of the conference called C.Faller et al. "Binaural cue coding applied to stereo and multi-channel audio compression", 112th AES convention, may 2002, Munich, Germany, C. Faller, etc. This method is a parametric method of encoding multi-channel audio signal. The basic principle is that on the side coding input signals of N channels C1, C2,..., CN are combined into one mono signal m. The mono signal is an audio signal encoded using any conventional monaural audio codec audio. Parallel to the retrieved parameters of channel signals, which describe a multichannel image. P is the parameters are encoded and transmitted to the decoder together bit stream of the audio signal. The decoder first decodes the mono signal m' and then regenerates channel signals C1', C2',..., CN' based on the parametric description of a multichannel image.

The principle method of stereo coding labels (BCC) is that it transmits the encoded mono signal and the so-called parameters of the BCC. The BCC parameters contain the encoded difference inter-channel level and difference channel times for sub-bands of the original multi-channel input signal. The decoder restores the various channel signals by applying the regulation at the sub-bands of level phase of the mono signal based on the parameters of the BCC. The advantage in comparison with M/S or stereochemistries intensity is that stereoformat containing a temporary channel information is transmitted at a much lower speeds in bits. However, this method requires a frequency-time transformations that require large amounts of computing, for each of the channels, as in the encoder and in the decoder.

Moreover, the BCC does not use the fact that a significant portion of stereoformat, especially at low frequencies, scattered, i.e. it does not come from any particular direction. Diffuse acoustic field exist in both channels of a stereo recording, but they are significant with apani not coincide in phase with respect to each other. If such an algorithm, as BCC, apply for accounts with a large number of diffuse acoustic field, reproduced stereo image will be distorted due to jumps from left to right, because the BCC algorithm can choose the signal to display only the specific frequency bands to the left or to the right.

Possible means for encoding a stereo signal and ensure good playback diffuse acoustic field is the use of a coding scheme, similar to the method used in FM stereoradio, namely, encoding mono (left+right) and difference (l-R) signals separately.

The method described in U.S. patent 5434948, in the name of C. E. Holt et al., uses a similar method as in BCC, to encode a mono signal and additional information. In this case the additional information comprises filters predictors and the optional residual signal. Filters predictors, estimated through least squares algorithm, as applied to the mono signal enable predictions of the multi-channel audio signals. Using this method you can implement coding at very low bitrates in bits multichannel sound sources, but at the cost of losing quality, as further described below.

Finally, for completeness it should be mentioned method, and is used in the three-dimensional audio. This method synthesizes the signals of the right and left channel by filtering the signals of the sound source by using the so-called based on the position of the head filters. However, this method requires that the signals of different sound sources were separated and, thus, cannot be generally applied to stereo and multi - channel encoding.

The invention

The problem in the existing encoding schemes based on coding frames of signals, in particular the main signal and one or more side signals, is that the division of audio information into frames can enter unattractive defects of perception. The division of information into frames of relatively long duration in General reduces the average required transmission speed in bits. This can be useful, for example, for music, with a large proportion of diffuse sound. However, for intense music with many transitions or speech rapid temporal variations "smeared" over the frame duration, leading to spurious noises or even to problems with proactive echo. The encoding of short frames, in contrast, provides a more accurate representation of the sound, minimizing energy, but requires a higher transmission speed in bits and large computational resources. The coding efficiency is essentially that the same may drop frames when a very small length. The introduction of a larger number of border personnel may also lead to differences in encoding options, which can manifest as defects of perception.

An additional problem in schemes based on coding primary and one or more side signals is that they require relatively significant computing resources. In particular, when using short frames, handling heterogeneity parameters from one frame to another is a difficult task.

When used long shots, error estimates with sound transitions can cause very significant side signals, in turn, increases requirement transmission speed.

The purpose of the present invention, therefore, is to provide a method and device coding that improves the perceptual quality multi-channel audio signals, in particular, to avoid defects, such as proactive echo, parasitic sounds or defects heterogeneity frames. An additional objective of the present invention is to provide a method and device coding, requiring less processing power and have more constant demands to the transmission speed in bits.

The above goals are achieved by methods and devices according to the invention. Basically, p is tifonicheskimi signals are used to generate the primary signal, in a typical case of the mono signal and the side signal. The main signal is encoded according to known principles of coding. Has been changed a number of encoding schemes of the side signal. Each encoding scheme is characterized by a set subbarow different lengths. The total length subbarow corresponds to the frame length coding of the coding scheme. Sets subbarow contain at least one Subcat. The coding scheme that should be used for the side signal is selected at least partially based on the current content of polyphonic signals.

In one embodiment, the choice is made (or coding) based on the analysis of the characteristics of the signal. In another embodiment, the side signal encoded by each of the coding schemes and measurement-based quality encoding selects the best encoding scheme.

In a preferred embodiment, the side residual signal is generated as the difference between the side signal and the main signal, scaled by the coefficient of balancing. Factor balancing is chosen to minimize the side residual signal. Optimized side residual signal and the coefficient of balancing encoded and provided as parameters representing the side C is cash. On the side of the decoder coefficient balancing, side residual signal and the main signal are used to restore the side signal.

In an additional preferred embodiment, the encoding side signal contains the scaling circuit energy to avoid the effects of proactive echo. Moreover, the various coding schemes contain different encoding procedure in a separate Subhadra.

The main advantage of the present invention is that the reliability of perception of sound signals. Moreover, the present invention still allows transmission of multi-channel signals at very low speeds in bits.

Brief description of drawings

The invention, together with its additional objectives and advantages are explained in the following description, with reference to the drawings, of which:

Figure 1 - block diagram of the transmission system of polyphonic signals;

Figa - block diagram of the encoder of the transmitting device;

Fig.2b - block diagram of the decoder of the transmitting device;

Figa diagram illustrating the encoding of frames of different lengths;

Fig.3b and 3c is a block diagram of embodiments of devices of the encoding side signals according to the present invention;

4 is a block diagram of a variant of implementation of the encoder, using the kodirovanie side signal with the factor balancing;

5 is a block diagram of a variant of implementation of the encoder, mnogosegmentnyh systems;

6 is a block diagram of a variant of implementation of the decoder suitable for decoding signals from the device shown in figure 5;

Figa and 7b is a diagram illustrating the defect proactive echo;

Fig is a block diagram of a variant of implementation of the encoding device side signal according to the present invention, using different principles of coding in various Subhadra;

Fig.9 illustrates the application of various principles of coding in different frequency sub-bands;

Figure 10 - block diagram of the sequence of the main steps of a variant of the method of encoding according to the present invention; and

11 is a block diagram of the sequence of the main steps of a variant of the method of decoding according to the present invention.

Detailed description of the invention

Figure 1 illustrates a typical system 1, in which the present invention can be advantageously used. The transmitting device 10 includes an antenna 12, which includes hardware and software that provides the ability to transmit radio signals 5 of the receiving device 20. The transmitting device 10 includes, among other components of the multi-channel encoder 14, which converts the signals of a number input the output channels 16 output signals, suitable for radio broadcast. Examples of suitable multi-channel encoders 14 described in more detail below. The input channels 16 can be provided, for example, from the memory 18 audio signals, for example, from a data file digital representation of audio recordings, audio recordings, audio tape or vinyl disk, etc. of the input channels 16 can also provide "live", for example, from a set of microphones 19. The audio signals are digitized, if they are not already in digital form, before entering the multi-channel encoder 14.

On the side of the receiving device 20, the antenna 22 with associated hardware and software to provide processing of the actual reception of radio signals 5, representing polyphonic audio signals. Implemented standard functionality, such as error correction. The decoder 24 decodes the received radio signals 5 and converts the audio data is transmitted, therefore, the signals of a number of output channels 26. Output signals can be generated, for example, a speaker 29 for immediate presentation or can be stored in the memory 28 of the audio signals of any type.

The system 1 may be, for example, the telephone conference system to provide audioblog or other audio applications. In some systems, such as, for example, the system of bodies of the phone conferences, communication should be duplex type, whereas, for example, the distribution of music from the service provider to the subscriber must be essentially one-sided type. The transmission of signals from the transmitting device 10 to the receiving device 20 can also be executed by any other means, for example, through other types of electromagnetic waves, cable or fiber, and combinations thereof.

Figa illustrates an implementation option encoder according to the present invention. In this embodiment, a polyphonic signal is a stereo signal, containing two channels a and b, adopted at the inlet 16A and 16B, respectively. The signals of channel a and b are fed into the device 32 pre-treatment, which can run different procedures preliminary signal. Signals (possibly modified) of the output device 32 pre-treatment, are summarized in the device 34 summation. The device 34 summation also divides the sum by two. The signal xmonogenerated in this way is the main signal of the stereo signals, because it essentially contains all the data from both channels. In this embodiment, the main signal thus represents a pure mono signal. The main signal xmonoserved on the device 38 encoding the main signal is a, which encodes the main signal according to any appropriate principles of coding. These principles are known from the prior art and therefore not further described. The device 38 encoding the main signal produces an output signal pmonorepresenting the coding parameters characterizing the main signal.

In the device 36 subtracting the difference (divided into two channel signals is provided as a side signal xside. In this embodiment, the side signal is the difference between the two channels in stereo. Side signal xsideserved in the device 30 encoding side signals. Preferred embodiments of the device 30 encoding side signals are further outlined below. According to the procedure coding side signals, which is described in more detail below, the side signal xsideconverted to the encoding parameters psiderepresenting the side signal xside. In some embodiments, the implementation of this encoding is performed using the information of the main signal xmono. Arrow 42 indicates such a possibility, when using either the original main signal xmono. In other embodiments, the implementation information of the main signal, which is used in condition the device 30 encoding side signals, can be deduced from the encoding parameters pmonorepresenting the main signal, as shown in broken lines 44.

The encoding parameters pmonorepresenting the main signal xmonoare the first output signal, and the encoding parameters psiderepresenting the side signal xsideare the second output signal. In a typical case, these two output signals, pmonoand psidetogether representing fully stereo sound, multiplexed into a single signal 52 of the transfer device 40 multiplexing. In other embodiments, implementation of the transfer of the first and second output signals pmonopsidecan be performed separately.

On fig.2b variant implementation of the decoder 24 according to the present invention is illustrated as a block diagram. Signal 54 that contains the encoding parameters representing the information of the main and side signal, is fed into the device 56 demuxing, which separates the first and the second input signal, respectively. The first input signal corresponding to the encoding parameters pmonothe main signal, is fed into the device 64 decoding the main signal. Traditionally, the encoding parameters pmonorepresenting the main signal, are used to generate decodetounicodeuris signal x" monothat to the maximum extent similar to the original signal xmono(figa) encoder 14 (figa).

Similarly, the second input signal corresponding side signal is fed into the device 60 decoding side signals. Here the encoding parameters psiderepresenting the side signal used for recovering the decoded side signal x"side. In some embodiments, the implementation of the decoding procedure uses information about the primary signal x"monothat is shown by an arrow 65.

The decoded main and side signals x"monox"sideserved in the device 70 summation, which produces an output signal which is a representation of the original signal of a channel. Similarly, the difference provided by the device 68 subtraction, is an output signal that is a representation of the original signal of the b channel. These channel signals may be processed in the postprocessor 74 according to the procedures of signal processing of the prior art. Finally, the channel signals a and b are fed to the outputs 26A and 26B decoder.

As mentioned in the description of the invention, the encoding is typically performed on a frame-by-frame basis. The frame contains audiolibri for a predefined period of time. In the lower part figa illustrated kad is SF2 duration L. Audiolibri where there's no shading in the field must be encoded together. The preceding sample and the subsequent sample is encoded in other frames. The division of samples into frames in any case will lead to some discontinuities in the boundaries of the frame. The offset of the sounds will lead to the displacement of encoding parameters, significantly changing at the border of each frame. This will lead to perceived errors. A possible way to some extent to compensate for this is to encode not only on the basis of samples that should be encoded, but also samples in the absolute vicinity of the frame, as indicated by the shaded parts. This allows for a smooth transition between different frames. As an alternative or Supplement sometimes also used interpolation methods for reducing defects of perception caused by the boundaries of the frame. However, all these procedures require significant additional computing resources, and for some special coding techniques may also be difficult to provide specific resources.

From this point of view it is advantageous to use the footage as much length as possible, because the number of border personnel will be small. Also the coding efficiency in a typical case becomes high, and the required transmission rate in bits is minimized. However, staff great lengths lead to problems with defects proactive echo and stray sounds.

By using shorter frames, for example, SF1 or even SF0, having a length L/2 and L/4, respectively, it is understood by experts in the art, the coding efficiency may be reduced, the bit rate may be higher, and problems with defects borders frames will increase. However, shorter frames are less susceptible to, for example, other defects of perception, such as parasitic sounds and proactive echo. To be able to minimize coding errors to the maximum extent possible, you should use a frame shortest length.

According to the present invention perception of audio signals is improved through the use of the length of the frame to encode such a signal, which depends on the current content of the signal. As the impact of different lengths of frames in the perception of the audio signal varies depending on the nature of the sound, which must be encoded, the improvement can be achieved by allowing the nature of the signal used to influence the length of the frame. Encoding the main signal is not the purpose of the present invention and therefore not described in detail. However, the length of the frame, the IP is alzhemed for the main signal, can be equal or not equal to the lengths of the frames used for the side signal.

Due to a small temporal variations may be advantageous, for example, in some cases, coding side signal using relatively long frames. This may occur in the case of records with a large number of diffuse acoustic field, for example, the live recordings. In other cases, such as conversations in the stereo mode, short frames, you may prefer. The decision about what the length of the frame to prefer, can be taken in two main ways.

One variant of implementation of the device 30 encoding side signals according to the present invention is illustrated in fig.3b that uses decision feedback. It uses the basic frame encoding length L. a number of schemes 81 encoding that is different from a separate set of 80 subbarow 90. Each set 80 subbarow 90 contains one or more subbarow 90 equal or different lengths. The total length of the set 80 subbarow 90, however, is always equal to the length of the base frame L encoding. According fig.3b, top coding scheme is characterized by a set subbarow, contains only one Subcat length L. the Next set subbarow contains two frames of length L/2. The third set contains two frames of length L/4, for which the following is a duty to regulate the frame L/2.

The signal xsidesupplied to the device 30 encoding side signal encoded by all schemes 81 encoding. In the upper diagram encoding the entire base frame encoding is encoded in full frame. However, other encoding schemes signal xsideencoded in each cupcake independently from each other. The result from each encoding scheme is supplied to the selector 85. The tool 83 measurement accuracy determines the measured value accuracy for each of the coded signals. The measured value accuracy is an objective quality value, preferably a measure of the signal-to-noise ratio or weighted signal-to-noise ratio. The measured values of accuracy associated with each coding scheme, are compared, and the result controls the means 87 switch for selecting encoding parameters representing the side signal from the coding scheme provides the best measured value accuracy, as the output signal psidefrom the device 30 encoding side signals.

Preferably, all possible combinations of length frames are tested, and selects a set subbarow, which provides the best objective as, for example, the ratio of signal to noise ratio".

In the present embodiment, the length of subbarow is selected according to:

where lsfis the length subbarow, lf- the frame length encoding, and n- integer. In the present embodiment, n is selected from 0 to 3. However, it is possible to use any length of frame, if the total length of the set remains constant.

On figs illustrates another variant of implementation of the device 30 encoding side signals. Here the decision about the length of the frame is a solution without feedback, based on the statistics of the signal. In other words, the spectral characteristics of the side signal is used as a base for making decisions about which encoding scheme is used. As before, there are various coding schemes, characterized by different sets subbarow. However, in this embodiment, the selector 85 precedes the actual coding. The input side of the signal xsideenters the selector 85 and the block 84 analysis of signals. The result of the analysis is the input signal of the switch 86, which uses only one of the circuits 81 encoding. The output signal of this encoding scheme is also the output signal of psidedevice 30 encoding side signals.

The advantage of this solution without feedback is that should be done only one actual coding. The lack of conclusion is raised in the analysis of the characteristics of the signal can be very complicated and may be difficult to predict the possible line of conduct in advance to be able to provide appropriate selection in the switch 86. A large amount of statistical sound analysis should be performed and included in the device 84 analysis of the signal. Any slight change in the coding schemes can lead to a radical change statistical behavior.

By selecting feedback (fig.3b) coding scheme can be replaced without performing any changes in the rest of the device. On the other hand, if it has to be analyzed number of encoding schemes, the requirements on computing will be high.

The advantage of this encoding with variable frame length for the side signal is that you can make a choice between the exact temporal resolution and coarse frequency resolution, on the one hand, and the coarse temporal resolution and accurate frequency resolution, on the other hand. The above embodiments of preserve the stereo image in the best possible way.

There are also some requirements on the actual encoding used in a variety of encoding schemes. In particular, when using the choice with feedback, vychislitel the e resources to perform a number of more or less simultaneous operations of coding, should be significant. The more complex the encoding process, the more computing power is required. Moreover, the low bit rate is also preferred.

The method presented in U.S. patent No. 5434948, uses a filtered version of mono (primary), to recreate adverse or differential signal. The filter parameters are optimized and may change in time. Then passed the filtering parameters representing the encoding of the side signal. In one embodiment, is also passed to the residual side signal. In many cases, this approach may be possible to use as the encoding side signal in the scope of the present invention. However, this approach has some drawbacks. The quantization of the coefficients of the filter and any residual side signal often require a relatively high transmission speed in bits, because the order of the filter must be high to provide an accurate assessment of the side signal. Evaluation of the filter can be problematic, especially in cases of intense music with many transitions. Error estimates provide a modified side signal, which is sometimes greater in amplitude than the e modified signal. This leads to higher requirements for transmission speed in bits. Moreover, if a new set of filter coefficients is calculated every N samples, the filter coefficients must be interpolated to achieve a smooth transition from one set of filter coefficients to another, as described above. Interpolation of the filter coefficients is a complex task, and the prediction errors are going to occur in large errors side signals, leading to higher transmission speeds in bits, required for differential encoder signal error.

To avoid the need for interpolation, it is necessary to update the filter coefficients for each sample and to use the analysis with backward compatibility. To obtain a good result it is necessary that the bit rate of the encoder residual signal was high enough. Therefore, it is not the best alternative to stereomotion with a low transmission rate.

There are cases, for example, quite often in music, when mono and differential signals are practically uncorrelated. Evaluation of the filter in this case becomes very complicated with the additional risk of deterioration in the differential encoder signal error.

The solution according to the U.S. patent 5434948 can work quite well in cases the Ah, when the filter coefficients change very slowly in time, for example, in systems of telephone conferences. In the case of music signals, this approach does not work optimally because the filters must be changed very quickly to keep track stereo. This means that should be used length subbarow with substantially different size, i.e. the number of test combinations increases rapidly. This, in turn, means that the requirements for the calculation of all possible coding schemes become impossible high.

Therefore, in the preferred embodiment, the encoding side signal based on the idea of reducing the redundancy between the mono and side signals through the use of simple ratio instead of balancing complex filter predictor with high demands on transmission speed in bits. Then encoded residual signal resulting from this operation. The magnitude of this residual signal is relatively small and does not require very high bit rate. This idea is very suitable for combination with the above approach, based on the set of variables of frames, since the computational complexity low.

Using factor balancing in combination with approach shots lane is variable length eliminates the need for complex interpolation and related problems, which can cause interpolation. Moreover, the use of a simple factor balancing instead of a complex filter causes fewer problems with the assessment as possible of the error estimates for the coefficient of balancing have less impact. The preferred solution enables you to play and panned signals and diffuse sound field with good quality and with limited requirements to the bit rate and computational resources.

Figure 4 illustrates a preferred implementation of stereocoder according to the present invention. This option is in many respects similar to that shown figa, but with a more detailed view of the device 30 encoding side signals. The encoder 14 implementation of this option has no pre-processing, and the input signals are fed directly to the devices 34, 36 of the summation and subtraction. The mono signal xmonois multiplied by a specific factor balancing gsmin the multiplier 33. In the device 35 subtracting the multiplied mono signal is subtracted from the side of the signal xside, i.e. the difference between the two channels for the formation of residual side signal. Factor balancing gsmis determined based on the content of mono - and side signals, the optimizer 37, so that m is to nameserving side residual signal according to a quality criterion. The quality criterion is the preferred criterion of the least squares method. Residual side signal encoded in the encoder 39 residual adverse signals in accordance with any procedures coding. Preferably, the encoder 39 residual side signals represents the encoder conversion at low transmission speeds in bits or linear predictive coder with excitation code (CELP). The encoding parameters psiderepresenting the side signal, then contain the encoding parameters pside residualrepresenting the residual side signal and the optimized ratio balancing 49.

In the embodiment according to figure 4 a mono signal 42 that is used for the synthesis of the side signal is the target signal xmonofor the encoder 38 mono. As mentioned above (in connection with figa), can also be used locally synthesized signal encoder 38 mono. In the latter case, the total delay in the encoder can be increased and the computational complexity of the side signal may increase. On the other hand, the quality can be better, because then you can fix encoding errors caused by the mono coder.

From a mathematical point of view, the basic coding scheme can be described as follows. We denote the two channel signal as a guide, which can be the left and right channel of a stereo pair. Channel signals are combined into a mono signal by summing and side signal by subtraction. In equation form operations are described as follows:

It is useful to reduce the scale of the signals xmonoand xsidetwice. This implies that there are other ways to create xmonoand xside. You can, for example, use:

In blocks of input signals modified, or residual, the side signal is calculated according to:

where f(xmonoxsideis a function of the coefficient of balancing, which, on the basis of the block of N samples, i.e. Subhadra, side and mono seeks to increase the proportion of the signal to be removed from the side signal. In other words, the coefficient of balancing is used to minimize the residual side signal. In a special case when it is minimized in the sense of least squares, this is equivalent to minimizing the residual energy of the side signal xside residual.

In the above special case f(xmonoxside) is described as:

where xside- side signal, xmono- the mono signal. Note that the function is s based on the block beginning at the " start of frame" and ends at "end of frame".

You can add weighting in the frequency domain to calculate the balancing. This is done by convolution of the signals xsideand xmonousing the impulse response of the filter weighing. You can then transfer the error estimation in the frequency range where they are easier to hear. It is defined as the perceptual weighting.

The quantized version of the values of the coefficient of balancing, a given function f(xmonoxside), transmitted to the decoder. It is preferable to consider the quantization after the formation of the modified side signal. Then it turns out the expression below:

where Qg(..) is a function of the quantization, which is applied to the coefficient balancing, given a function f(xmonoxside). Factor balancing is transmitted through a transmission channel. In normal panned from left to right signals coefficient balancing is limited to the interval [-1,0... 1,0]. If, on the other hand, the channels are not in phase with each other, the coefficient balancing may go beyond these limitations.

As an additional means to stabilize the image, you can limit the coefficient of balancing, but if malicounda cross-correlation between mono - and side signals defined by the following equation:

where

These situations occur quite often in the case of, for example, classical music or recorded music with a large number of diffuse sounds, where in some situations the channels a and b can almost compensate each other when creating a mono signal. The influence coefficient balancing is that it can change quickly, causing interference in the stereo image. The above limitation mitigates this problem.

Based on the filtering approach according to the U.S. patent 5434948 has a similar problem, but in this case, the solution is not so simple.

IfEsfunction encode (for example, the encoder conversion) residual side signal, andEmfunction encode a mono signal, the decoded signals a" and b" in the decoder can be described as follows (provided that γ=0,5):

An important advantage of the calculation of the coefficient of balancing for each frame is that you can avoid using interpolation. Instead, typically, as described above, the processing of the frames is carried out using overlapping frames.

The principle of coding using the coefficients of balancing works especially Ho is Osho in the case of music signals, where in a typical case, the necessary changes quickly to keep track of the stereo image.

Recently, multi-channel coding has become popular. An example is a 5.1-channel surround sound from DVD movies. Channels with this are as follows: left front, center front, right front, left rear, right rear, and a separate subwoofer (subwoofer). Figure 5 shows a variant implementation of the encoder, which encodes the front three channels in this configuration, using interchannel redundancy according to the present invention.

Three channel signal L, C, R provided on three inputs 16A-C, and the mono signal xmonocreated by summing these three signals. Added device 130 encoding the Central signal, which takes the Central signal xcentre. The mono signal 42 in this embodiment is a coded and decoded mono signal x"monomultiplied by a certain factor balancing gQin the multiplier 133. The device 135 subtracting the multiplied mono signal is subtracted from the Central signal xcentreto obtain the Central residual signal. Factor balancing gQis determined based on the content of mono - and Central signals, the optimizer 137, h is usually used to minimize the Central residual signal according to a quality criterion. Central residual signal is encoded in the encoder 139 Central residual signals according to any procedures coding. Preferably, the encoder 139 Central residual signal represents the encoder conversion at low transmission speeds in bits or CELP encoder. The encoding parameters pcentrerepresenting the Central signal, in this case contain the encoding parameters pcentre residualrepresenting the Central residual signal and the optimized ratio balancing 149. Central residual signal and the scaled summed mono signal in the device 235 summation, creating a modified Central signal 142 compensated taking into account the encoding errors.

Side signal xsidei.e. the difference between the left L and right R channels, is applied to the device 30 encoding side signals, as in previous versions of the implementation. However, here the optimizer 37 also depends on the modified Central signal 142 provided by the encoding device 130 of the Central signal. Therefore, the side residual signal is generated as an optimal linear combination of the mono signal 42, the modified Central signal 142 and the side signal in the device 35 of the subtraction.

The above concept of frames of variable length can be applied to ubim of the side and Central signals, or both.

6 illustrates a decoding device that is suitable for receiving coded audio signals from the encoder, shown in figure 5. Signal 54 is divided by the encoding parameters pmonorepresenting the main signal, the encoding parameters pcentrerepresenting the Central signal, and the encoding parameters psiderepresenting the side signal. The decoder 64 encoding parameters pmonorepresenting the main signal, is used to generate the main signal x"mono. The decoder 160 coding parameters pmonorepresenting the Central signal is used to form the Central signal x"centrebased on the main signal x"mono. The decoder 60, the encoding parameters psiderepresenting the side signal are decoded to obtain a side signal x"sidebased on the main signal x"monoand the Central signal x"centre.

This procedure can be mathematically expressed as follows:

The input signals xleftxrigntand xcentreare combined into a mono channel according to:

α, βand χin the remaining section is set at 1.0 for simplicity, but they can be set at arbitrary values. The values ofα, β andχcan be either constants, or dependent on the content of the signal, to select one or two channels, in order to achieve optimum quality.

Normalized cross-correlation between the mono and the Central signal is calculated as follows:

where

where xcentreCentral signal, xmono- the mono signal. The mono signal derived from the target of the mono signal, but you can also use the local synthesis of mono coder.

Central residual signal, which must be encoded, is expressed as follows:

where Qg(..) is a function of the quantization, which is applied to the coefficient balancing. Factor balancing is transmitted through a transmission channel.

If Ecfunction encode (for example, the encoder conversion) Central signal, Emfunction encode a mono signal, the decoded signal xcentrethe decoder can be described as follows:

Side residual signal, which must be encoded, is expressed as follows:

where gQsmand gQsc- quantized values of the parametersgsmandgscthat minimize the expression:

may be, for example, equal to 2 in order to minimize errors by the method of least squares. Parametersgsmandgsccan be quantized separately or together.

If Es- the coding function of the side residual signal, the decoded channel signals x"leftand xrightare defined as follows:

One of the most annoying defects of perception is the effect of proactive echo. On figa-b diagram illustrate this defect. Let the component of the signal changes with time as shown by curve 100. In the beginning, from the moment t0, the component signal is not present in audiolibri. At time t between t1 and t2 component signal suddenly appears. When the component signal is encoded by using the frame length t2-t1, the appearance of the component signal is "smeared" across the frame, as shown in curve 101. If decoding of the curve 101, the component signal appears at the time Δt before the scheduled entry of the component signal, and felt "proactive echo".

Defects proactive echo become more accentuated, if used long shots coding. By using shorter frame defect to some extent suppressed. Another way to solve the above problems precede the its echo is to use the fact that the mono signal is available on both the encoder and the decoder. This gives you the ability to scale the side signal according to the contour of the energy of the mono signal. The decoder performs the inverse scaling and, thus, some of the problems proactive echo can be mitigated.

Contour energy of the mono signal is calculated from the frame as follows:

where w(n) is the window function. The simplest window function is the rectangular window, but can be more desirable and other types of Windows, for example, the weighing function of Hamming.

Side residual signal in this case is scaled as follows:

,

In a more General form of the above equation can be written as follows:

,

where f (.) is a monotonic continuous function. In the decoder circuit energy is calculated from the decoded mono signal and is applied to the decoded side signal as follows:

,

Since this scaling contour energy in a sense an alternative to the use of shorter frames, this concept is particularly well suited for combining with the concept of frames of variable length, optionally as described above. When you have carried the channels at encoding schemes, using zoom on a path of energy, some of which do not apply, and others apply the scaling contour energy only during certain subbarow may be provided with a more flexible set of decoding schemes. On Fig illustrates an implementation option device 30 coding according to the present invention. Here, various coding schemes 81 contain shaded subsidry 91 representing the encoding, using the scale on the contour energy, and where there's no shading subsidry 92 representing the encoding procedure, do not use the scale on the contour energy. Therefore, there is a combination not only subbarow different lengths, but also subbarow with different coding principles. In the present illustrative example, the application of the scaling circuit energy is different in different encoding schemes. In the more General case, the principles of coding can be combined with the concept of variable length in the same way.

A set of encoding schemes shown in Fig, contains circuits that handle, for example, defects proactive echo in a variety of ways. In some cases, a longer subsidry with minimizing proactive echo according to the principle of contour energy. In other schemes used throughout the world is shorter subsidry without scaling the contour energy. Depending on the content of the signal is one of the alternatives may be more advantageous. For cases with a significant proactive echo coding scheme that uses short subsidry with scaling circuit energy can be required.

The proposed solution can be used in the full frequency range or in one or more different sub-bands. The use of sub-bands may be applied either to both the main and side signal, either one of them separately. The preferred implementation includes the separation of the side signal into several frequency bands. The reason simply is that it is easier to remove redundancy in the isolated frequency range than in the entire frequency range. This is especially important when encoding music signals with enriched spectral content.

Possible use case is to encode the frequency range below a predetermined threshold by using the above method. The predetermined threshold may preferably be 2 KHz or even more preferably 1 KHz. For the remainder of the useful frequency range, you can either encode one additional frequency range with the help of the method described above, or use a completely different method.

The motive is of the use of the method described above is preferable for the low frequencies is that diffuse acoustic field usually have a small amount of energy at high frequencies. The natural reason is that the absorption in a typical case, increases with frequency. In addition, the components of the diffuse acoustic field, believed to play a less important role for the auditory system of a person at higher frequencies. Therefore, it is useful to use this solution at low frequencies (below 1 or 2 KHz) and based on other, even more efficient in terms of bits, the encoding schemes at higher frequencies. The fact that the scheme is applied only at low frequencies, gives significant savings in bit rate as the bit rate in the proposed method is proportional to the desired bandwidth. In most cases, the mono coder can encode the entire frequency range, whereas the proposed coding side signals are assumed to occur only in the lower portion of the frequency range, as schematically illustrated in Fig.9. Reference position 301 marked the encoding side signal according to the present invention, a reference position 302 - any other encoding scheme side signal, the reference position 303 - encoding side signal.

There is also the possibility of IP is to alsowith the proposed method for several different frequency ranges.

Figure 10 basic steps of a variant of the method of encoding according to the present invention is illustrated in the flowchart of the sequence of operations. The procedure begins at step 200. At step 210 is encoded main signal is extracted from polyphonic signals. At step 212 is provided an encoding scheme, which contain subsidry with different length and/or order. Side signal derived at the step 214 of polyphonic signals coded by the coding scheme is selected based at least in part, on the actual content of the current polyphonic signals. The procedure ends in step 299.

Figure 11 main stages variant of the method of decoding according to the present invention is illustrated in the flowchart of the sequence of operations. The procedure begins at step 200. At step 220 is decoded adopted encoded main signal. At stage 222 is provided an encoding scheme, which contain subsidry with different length and/or order. Accepted side signal is decoded at step 224 by the selected coding scheme. At step 226 the decoded main and side signals are combined in a polyphonic signal. The procedure ends in step 299.

The above-described embodiments of should consider is regarded as illustrative examples of the present invention. Specialists in the art should be obvious that various modifications, combinations and changes may be made in the variants of implementation without deviation from the scope of the present invention. In particular, decisions relating to various parts in different variants of implementation, can be combined in other configurations, if technically feasible. Scope of the present invention is defined by the claims.

Links

EP 0497413

U.S. patent 5285498

Patent SSA

C.Faller et al., "Binaural cue coding applied to stereo and multi-channel audio compression", 112th AES convention, may 2002, Munich, Germany.

1. The encoding method polyphonic signals, comprising stages which generate a first output signal representing the coding parameters characterizing the main signal on the basis of the signals of at least first and second channel; and generate a second output signal representing the coding parameters characterizing the side signal on the basis of the signals of at least first and second channel in the frame encoding, characterized in that it further comprises a stage on which provide at least two encoding schemes, each of the at least two encoding schemes is characterized by a corresponding set of subbarow, together to form a frame Kadirova the Oia, the sum of the lengths of subbarow in each encoding scheme is equal to the frame length encoding; each set subbarow contains at least one Subcat; and the step of generating the second output signal contains the phase in which the chosen encoding scheme, at least partially, depending on the contents of the current side signal; the second output signal to encode each of subbarow selected set subbarow separately.

2. The method according to claim 1, characterized in that the step of generating the second output signal includes the steps that generate the encoding parameters representing the side signal, which is the first linear combination of signals of at least first and second channel within all subbarow each of the at least two sets subbarow separately; compute an overall measure of accuracy for each of the at least two encoding schemes, and this measure of accuracy is an objective quality value encoding signals; and choose a coded signal from the coding scheme having the best measure of accuracy, as the encoding parameters representing the side signal.

3. The method according to claim 2, characterized in that the measure of accuracy based on the measurement signal-to-noise ratio.

4. The method according to claim 1, characterized in that subsidry have DL the well l sfaccording to

,

where lf- the frame length encoding, n is an integer.

5. The method according to claim 4, characterized in that n is less than a predetermined value.

6. The method according to claim 5, characterized in that at least two encoding schemes contain all permutations of lengths subbarow.

7. The method according to any one of claims 1 to 6, characterized in that the step of generating encoding parameters representing the main signal contains the steps that create the main signal as a second linear combination of signals of at least first and second channel; and encode the main signal in the encoding parameters representing the main signal, the coding stage side signal contains steps that create a side residual signal as the difference between the side signal and the main signal, scaled by a factor balancing; while the ratio of balancing is defined as the coefficient that minimizes the side residual signal according to the quality criterion; encode side the residual signal and the coefficient of balancing in the encoding parameters representing the side signal.

8. The method according to claim 7, wherein the quality criterion is based on the measured value according to the method of least squares.

9. The method according to any of the at one of claims 1 to 6, characterized in that the coding stage side signal further comprises a stage on which scale the side signal to the circuit power main signal.

10. The method according to claim 9, characterized in that the scaled side signal represents a division ratio, which is a monotonic continuous function of contour energy of the main signal.

11. The method according to claim 10, characterized in that monotone continuous function is a function of the square root.

12. The method according to claim 10, characterized in that the contour energy Ewiththe main signal Xmonocalculate Subhadra according to

beginning Windows≤m≤the end of the window,

where L is an arbitrary coefficient, n is the index of summation, m is the sample within Subhadra and w(n) is the window function.

13. The method according to item 12, wherein the window function is the rectangular window.

14. The method according to item 12, wherein the window function is a function of the Hamming window.

15. The method according to any one of claims 1 to 6, otlichayuscheisya the fact that at least two encoding schemes contain various principles encoding side signal.

16. The method according to item 15, wherein at least a first coding scheme from at least two encoding schemes contains the first principle of the encoding side C is good for all subbarow, and at least a second coding scheme of at least two encoding schemes contains the second principle of the encoding side signal for all subbarow.

17. The method according to item 15, wherein the at least one coding scheme of at least two encoding schemes contains the first principle of the encoding side signal for one subcode and the second principle of the encoding side signal for another Subhadra.

18. The method according to claim 1, characterized in that the step of generating the second output signal, in turn, contains the steps that analyze the spectral characteristics of the side signal, which is the first linear combination of signals of at least first and second channel; select a set subbarow based on the analyzed spectral characteristics; and encode the side signal within all subbarow selected set subbarow separately.

19. The method according to any one of claims 1 to 6, characterized in that the step of generating the second output signal is used in a limited frequency range.

20. The method according to claim 19, wherein the step of generating the second output signal is used only for frequencies below 2 kHz.

21. The method according to claim 20, wherein the step of generating the second output signal is used only for frequencies below 1 kHz.

22. The way p is any of claims 1 to 6, characterized in that the polyphonic tones are musical signals.

23. The method of decoding polyphonic signals containing phases in which decodes the encoding parameters representing the main signal; decode the encoding parameters representing the side signal within the frame coding; and combine at least the decoded main signal and the decoded side signal in the signal, at least first and second channel, characterized in that it contains stages, which provide at least two encoding schemes, each of the two encoding schemes are characterized by subbarow that together make up the frame coding, the sum of the lengths of subbarow in each encoding scheme is equal to the frame length encoding, with each set subbarow contains at least one Subcat; and the step of decoding the encoding parameters representing the side signal contains the phase in which decodes the encoding parameters representing the side signal, independently in subcateg one of the at least two encoding schemes.

24. Device for encoding polyphonic signals containing input tool polyphonic signal containing at least first and second channel, means for generating the first output is ignal, representing the coding parameters characterizing the main signal on the basis of the signals of at least first and second channel; means for generating a second output signal representing the coding parameters characterizing the side signal on the basis of the signals of at least first and second channel in the frame encoding; and means for outputting; characterized in that it contains means for providing at least two encoding schemes, each of the at least two encoding schemes is characterized by a corresponding set of subbarow that together make up the frame coding, the sum of the lengths in each subbarow the coding scheme is equal to the frame length encoding; each set subbarow contains at least one Subcat; and means for generating the second output signal includes means for selecting the coding scheme at least partially based on the content of the current side signal; means for encoding the side signal in each of subbarow selected coding scheme separately.

25. The decoding device polyphonic signals containing means for input encoding parameters representing the main signal, and the encoding parameters representing the side signal, means for decoding parameters to the financing, representing the main signal; means for decoding the encoding parameters representing the side signal in the frame coding; means for combining at least the decoded main signal and the decoded side signal in the signal, at least first and second channel; and means for displaying, wherein the means for decoding the encoding parameters representing the side signal includes means for providing at least two encoding schemes, each of the at least two encoding schemes is characterized by a corresponding set of subbarow that together make up the frame encoding, the amount lengths subbarow in each encoding scheme is equal to the frame length encoding; each set subbarow contains at least one Subcat; and means for decoding the encoding parameters representing the side signal separately in subcateg one of the at least two encoding schemes.

26. Audio system containing at least one of: the encoder polyphonic signals at point 24 and the decoding device polyphonic signals by A.25.



 

Same patents:

The invention relates to the field of radio broadcasting, in particular to methods of reception and transmission of stereo signals

The invention relates to the field of radio broadcasting and audio stereo signals

FIELD: systems/methods for filtering signals.

SUBSTANCE: in accordance to invention, filtration of input signal is performed for generation of first filtered signal; first filtered signal is combined with aforementioned input signal for production of difference signal, while stage of filtering of input signal for producing first filtered signal contains: stage of production of at least one delayed, amplified and filtered signal, and production stage contains: storage of signal, related to aforementioned input signal in a buffer; extraction of delayed signal from buffer, filtration of signal for forming at least one second filtered signal, while filtration is stable and causative; amplification of at least one signal by amplification coefficient, while method also contains production of aforementioned first filtered signal, basing on at least one aforementioned delayed, amplified and filtered signal.

EFFECT: development of method for filtering signal with delay cycle.

10 cl, 10 dwg

FIELD: analysis and synthesis of speech information outputted from computer, possible use in synthesizer-informers in mass transit means, communications, measuring and technological complexes and during foreign language studies.

SUBSTANCE: method includes: analog-digital conversion of speech signal; segmentation of transformed signal onto elementary speech fragments; determining of vocalization of each fragment; determining, for each vocalized elementary speech segment, of main tone frequency and spectrum parameters; analysis and changing of spectrum parameters; and synthesis of speech sequence. Technical result is achieved because before synthesis, in vocalized segments periods of main tone of each such segment are adapted to zero starting phase by means of transferring digitization start moment in each period of main tone beyond the point of intersection of contouring line with zero amplitude, distortions appearing at joining lines of main tone periods are smoothed out and, during transformation of additional count in the end of modified period of main tone, re-digitization of such period is performed while preserving its original length.

EFFECT: improved quality of produced modulated signal, allowing more trustworthy reproduction of sounds during synthesis of speech signal.

2 cl, 8 dwg

FIELD: speech activity transmission systems in distributed system of voice recognition.

SUBSTANCE: distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.

EFFECT: reduction in overloading of circuit; reduced delay and increased efficiency of voice recognition.

3 cl, 8 dwg, 2 tbl

FIELD: method and device for efficiency compression of audio signal to acoustic signal of level III of MPEG-1 standard with low information transfer speed.

SUBSTANCE: in accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.

EFFECT: provision of efficient compression of signal at low speed by compressing changing part of signal only by means of modified discontinuous cosine transformation.

5 cl, 11 dwg

FIELD: medicine.

SUBSTANCE: method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.

EFFECT: partially retained speech spectrum.

FIELD: digital speech encoding.

SUBSTANCE: speech compression system provides encoding of speech signal into bits flow for later decoding for generation of synthesized speech, which contains full speed codec, half speed codec, one quarter speed codec and one eighth speed codec, which are selectively activated on basis of speed selection. Also, codecs of full and half speed are selectively activated on basis of type classification. Each codec is activated selectively for encoding and decoding speech signal for various speeds of transfer in bits, to accent different aspects of speech signal to increase total quality of synthesized speech signal.

EFFECT: optimized width of band, required for bits flow, by balancing between preferred average speed of transfer in bits and perception quality of restored speech.

11 cl, 12 dwg, 9 tbl

FIELD: speech recording/reproducing devices.

SUBSTANCE: during encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.

EFFECT: broader functional capabilities, higher efficiency.

3 cl, 24 dwg

FIELD: technologies for encoding audio signals.

SUBSTANCE: method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.

EFFECT: higher efficiency.

4 cl, 5 dwg

FIELD: communication systems.

SUBSTANCE: method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.

EFFECT: higher efficiency.

4 cl, 4 dwg

FIELD: electric communication, namely systems for data transmitting by means of digital communication lines.

SUBSTANCE: method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.

EFFECT: possibility for correcting errors occurred in transmitted digital trains by action of unstable parameters of communication systems and realizing telephone calls by means of low-speed digital communication lines.

5 cl, 20 dwg

FIELD: electric communication, namely systems for data transmitting by means of digital communication lines.

SUBSTANCE: method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.

EFFECT: possibility for correcting errors occurred in transmitted digital trains by action of unstable parameters of communication systems and realizing telephone calls by means of low-speed digital communication lines.

5 cl, 20 dwg

FIELD: communication systems.

SUBSTANCE: method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.

EFFECT: higher efficiency.

4 cl, 4 dwg

FIELD: technologies for encoding audio signals.

SUBSTANCE: method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.

EFFECT: higher efficiency.

4 cl, 5 dwg

FIELD: speech recording/reproducing devices.

SUBSTANCE: during encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.

EFFECT: broader functional capabilities, higher efficiency.

3 cl, 24 dwg

FIELD: digital speech encoding.

SUBSTANCE: speech compression system provides encoding of speech signal into bits flow for later decoding for generation of synthesized speech, which contains full speed codec, half speed codec, one quarter speed codec and one eighth speed codec, which are selectively activated on basis of speed selection. Also, codecs of full and half speed are selectively activated on basis of type classification. Each codec is activated selectively for encoding and decoding speech signal for various speeds of transfer in bits, to accent different aspects of speech signal to increase total quality of synthesized speech signal.

EFFECT: optimized width of band, required for bits flow, by balancing between preferred average speed of transfer in bits and perception quality of restored speech.

11 cl, 12 dwg, 9 tbl

FIELD: medicine.

SUBSTANCE: method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.

EFFECT: partially retained speech spectrum.

FIELD: method and device for efficiency compression of audio signal to acoustic signal of level III of MPEG-1 standard with low information transfer speed.

SUBSTANCE: in accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.

EFFECT: provision of efficient compression of signal at low speed by compressing changing part of signal only by means of modified discontinuous cosine transformation.

5 cl, 11 dwg

FIELD: speech activity transmission systems in distributed system of voice recognition.

SUBSTANCE: distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.

EFFECT: reduction in overloading of circuit; reduced delay and increased efficiency of voice recognition.

3 cl, 8 dwg, 2 tbl

Up!