Bandwidth expansion device

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to bandwidth expansion devices. An excitation signal based on an acoustic signal is generated; with that, the acoustic signal includes a variety of frequency components. A feature vector is distinguished out of the acoustic signal; with that, the feature vector includes at least one feature of a component in a frequency domain and at least one feature of a component in a time domain. At least one parameter of the spectrum shape is determined based on the feature vector; with that, at least one parameter of the spectrum shape corresponds to a sub-range signal containing frequency components that belong to an additional variety of frequency components. A signal of the sub-range is generated by the filtration of an excitation signal by means of a filter bank and weighing of a filtered excitation signal using at least one parameter of the spectrum shape.

EFFECT: technical result consists in the improvement of perception of an expanded acoustic signal.

21 cl, 10 dwg

 

AREA of TECHNOLOGY

The present invention relates to an apparatus and method that are designed to improve the quality of the audio signal. In particular, the present invention relates to an apparatus and method that are designed to extend the frequency band of the audio signal.

The LEVEL of TECHNOLOGY

Audio signals, such as speech or music, can be coded to allow for efficient transmission or storage.

Frequency band of the audio signals may be limited, and its size is usually determined by the available bandwidth of the transmission system and the capacity of the storage media. However, in some cases it is desirable to perceive the decoded audio signal in a wider frequency band than the one that was used in the original encoding of the audio signal. In these cases, the decoder can be implemented by means of an artificial extension of the bandwidth that can expand the frequency band decoded audio signal using the information defined solely on the basis of this decoded signal.

One of the areas in which applied such an artificial extension of the bandwidth is in the area of mobile communications. Typically, in a mobile communication system such as global system for mobile communication (GM, Global System for Mobile Communications), the voice signal may be limited to the frequency band less than 4 kHz, in other words, this signal is the narrowband speech signal. However, in natural speech may contain significant frequency components up to a frequency of 10 kHz. Additional high-frequency components can improve the overall quality and intelligibility of the speech signal, thereby forming a more clear and attractive sound compared to the equivalent narrow-band signal.

In the existing methods of improving the quality and intelligibility of narrowband speech signal by artificially expanding the bandwidth can be used code dictionary to generate additional high-frequency components. Code dictionary may contain the frequency vectors of different spectral characteristics, which together cover the range of interest of frequencies. The frequency range can be extended by the staff by selecting the optimal vector and the outcome of the spectral components decoded from a received signal.

In addition, the methods of artificial extension of the bandwidth technology can be applied which increases the sample rate to create alternate copies of a received high-frequency signal components. Then the amplitude of the energy levels and�alternativnyj frequency components can be adjusted to a representative of the high frequencies of the speech signal.

However, existing methods of artificial extension of the bandwidth can be characterized by poor quality and inefficiency.

For example, some methods of artificial extension of the bandwidth can be used system of classifying incoming speech frames according to their phonetic content, to determine the envelope of the high frequency range. Then the envelope can be used for forming the frequency spectrum generated by applying a low frequency.

However, high-frequency ranges, which are generated with the help of this approach may not always sound natural. This may occur partly because the transition between different phonemes in speech signal naturally is carried out smoothly. At the same time, the use of classification systems of phonemes can lead to discontinuities at the boundaries of decision making.

In addition, other factors if you use the above approach to artificial extension of the bandwidth can cause an unnatural sound, e.g., incorrect classification of incoming frames of the speech signal and inaccurate assessment of the shape of the spectrum of the upper band.

A BRIEF DESCRIPTION of CERTAIN embodiments of the INVENTION

The present invention assumes the use of existing schemes to artificially extend the frequency band can lead to the overall deterioration in the perceived naturalness of the extended audio signal. This deterioration in characteristics can be extended to the General perception hissing sounds.

IMPLEMENTATION OPTIONS AIMED AT SOLVING the aforementioned PROBLEM

In accordance with the first aspect of some embodiments of the present invention provides a method, comprising: generating the excitation signal based on the sound signal, the sound signal contains many frequency components; the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components, owned many additional frequency components; and generating subband signal by filtering the excitation signal through a Bank of filters and weighting of the filtered excitation signal using at least one shape parameter of the spectrum.

In accordance with a variant implementation of the method in generating the excitation signal may include generating a residual signal by filtering the sound signal from �using the inverse filter with linear prediction; filtering the residual signal using the supply end filter section containing the filter autoregressive moving-average filter-based linear prediction; and generating the excitation signal by multiplying the sampling and spectral coagulation of the output signal of the supply end filter section.

The supply end filter section may also contain a filter spectral tilt and harmonic filter.

Frequency components of the signal sub-band may be allocated according to psychoacoustic scale, broken into a set of overlapping ranges, and the frequency characteristics of the filter Bank can match the distribution of frequency components of the signal sub-band.

Overlapping ranges can be distributed according to the Mel-scale, and wherein the sub-band signal can be masked using the triangular function masking.

In an alternative embodiment, the overlapping ranges can be distributed according to the Mel-scale, and wherein the sub-band signal can be masked using a trapezoidal function masking.

The procedure for determining at least one shape parameter of the spectrum on the basis of the feature vector may include: using a neural network to determine at least one form of the spectrum on the basis of the feature vector when �that the feature vector extracted from the sound signal, can form the input target vector for the neural network and the neural network can be trained with the aim of providing the shape parameter of the spectrum sub-band for an input target vector.

The shape parameter of the spectrum can represent the value of the energy sub-band.

The shape parameter of the spectrum may represent an amplification factor of sub-band based on the value of the energy sub-band.

The value of the energy level of the sub-band can be weakened if the power of an audio signal reaches the estimated values of the noise level in the audio signal.

At least one attribute of a component in a frequency region in the feature vector may include at least one of the following values: group consisting of a plurality of energy levels of the audio signal, wherein each of the energy levels of the audio signal corresponds to the overlapping energy range of the audio signal; the value representing the centroid of the spectrum in the frequency domain of the audio signal; and a value representing the degree of uniformity of the spectrum in the frequency domain.

At least one symptom of the component in the time domain in the feature vector may include at least one of the following values: index gradient, based on the sum of the gradients in the exact�Ah sound signal, in which direction is changed, the waveform of the audio signal; the ratio of the energy of a frame of the audio signal to the energy of the previous frame audio signal, and classifying the sound signal as active or inactive speech activity detector.

The method may also include combining sub-band signal with the sound signal to produce the sound signal with extended bandwidth.

In accordance with the second aspect of some embodiments of the present invention proposes a device comprising at least one processor and at least one memory that stores computer code, wherein at least one memory and the computer code configured to interact with at least one processor device is performed at least the following operations: generation of the excitation signal based on the sound signal, the sound signal contains many frequency components; the selection of the feature vector from the audio signal, in this case, the feature vector contains at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; determining at least one shape parameter of the spectrum-based feature vector, wherein at least one par�meter form of the spectrum corresponds to the signal sub-band, containing frequency components that belong to an additional set of frequency components; and generating subband signal by filtering the excitation signal through a Bank of filters and weighting of the filtered excitation signal using at least one shape parameter of the spectrum.

In accordance with a variant implementation of the device, wherein at least one memory and the computer code configured to interact with at least one processor device to perform the generation of the excitation signal can also be configured to perform the following operations: generation of a residual signal by filtering the audio signal using the inverse filter with linear prediction; filtering the residual signal using the supply end filter section containing the filter autoregressive moving-average filter-based linear prediction; and generating the excitation signal by multiplying the sampling and spectral coagulation of the output signal of the supply end filter section.

The supply end filter section may also contain a filter spectral tilt and harmonic filter.

Frequency components of the signal sub-band may be allocated according to psychoacoustic scale, the broken kamnosestvo overlapping ranges, and the frequency characteristics of the filter Bank can match the distribution of frequency components of the signal sub-band.

Overlapping subbands may be distributed according to the Mel-scale, the signal sub-band may be masked using the triangular function masking.

In an alternative embodiment, the overlapping ranges can be distributed according to the Mel-scale, the signal sub-band may be masked using a trapezoidal function masking.

At least one memory and the computer code configured to interact with at least one processor device to perform the determining at least one shape parameter of the spectrum on the basis of the feature vector can also be configured to: use a neural network to determine at least one form of the spectrum on the basis of the feature vector, the feature vector extracted from the audio signal, generates an input target vector for the neural network and the neural network is trained with the goal of providing the shape parameter of the spectrum sub-band for an input target vector.

The shape parameter of the spectrum can represent the value of the energy sub-band.

The shape parameter of the spectrum may represent a ratio, low gas consumpti�HT sub-band gain, based on the value of the energy sub-band.

The value of the energy level of the sub-band can be weakened if the power of an audio signal reaches the estimated values of the noise level in the audio signal.

At least one attribute of a component in a frequency region in the feature vector may include at least one of the following values: group consisting of a plurality of energy levels of the audio signal, wherein each of the energy levels of the audio signal corresponds to the overlapping energy range of the audio signal; the value representing the centroid of the spectrum in the frequency domain of the audio signal; and a value representing the degree of uniformity of the spectrum in the frequency domain.

At least one symptom of the component in the time domain may include at least one of the following values: index gradient, based on the sum of the gradients at the points of the audio signal, which varies in the direction of the waveform of the audio signal; the ratio of the energy of a frame of the audio signal to the energy of the previous frame of the audio signal and classifying the sound signal as active or inactive speech activity detector.

At least one memory and the computer code is also configured to perform a combining sub-band signal with the sound signal d�I produce the sound signal with extended bandwidth.

In accordance with a third aspect of some embodiments of the present invention proposed a computer program product on a computer-readable medium is stored program code that when performed by a processor implements the following operations: generation of the excitation signal based on the sound signal, the sound signal contains many frequency components; the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components that belong to an additional set of frequency components; and generating subband signal by filtering the excitation signal through a Bank of filters and weighting of the filtered excitation signal using at least one shape parameter of the spectrum.

In accordance with a variant implementation of the present invention if the code of a computer software product for execution by the processor implements an operation of generation of the excitation signal,then the code may also implement the following operations: generating a residual signal by filtering the audio signal using the inverse filter with linear prediction; filtering the residual signal using the supply end filter section containing the filter autoregressive moving-average filter-based linear prediction; and generating the excitation signal by multiplying the sampling and spectral coagulation of the output signal of the supply end filter section.

The supply end filter section may also contain a filter spectral tilt and harmonic filter.

Frequency components of the signal sub-band may be allocated according to psychoacoustic scale, broken into a set of overlapping ranges, and the frequency characteristics of the filter Bank can match the distribution of frequency components of the signal sub-band.

Overlapping subbands may be distributed according to the Mel-scale, the signal sub-band may be masked using the triangular function masking.

In an alternative embodiment, the overlapping ranges can be distributed according to the Mel-scale, the signal sub-band may be masked using a trapezoidal function masking.

Code, executable by the processor and implements the operation of determining at least one shape parameter of the spectrum-based feature vector, can also realize the following operations: using neural networks to determine�Oia at least one form of the spectrum on the basis of the feature vector in this case, the feature vector extracted from the sound signal, can form the input target vector for the neural network and the neural network can be trained with the aim of providing the shape parameter of the spectrum sub-band for an input target vector.

The shape parameter of the spectrum can represent the value of the energy sub-band.

The shape parameter of the spectrum may represent an amplification factor of sub-band based on the value of the energy sub-band.

The value of the energy level of the sub-band can be weakened if the power of an audio signal reaches the estimated values of the noise level in the audio signal.

At least one attribute of a component in a frequency region in the feature vector may include at least one of the following values: group consisting of a plurality of energy levels of the audio signal, wherein each of the energy levels of the audio signal corresponds to the overlapping energy range of the audio signal; the value representing the centroid of the spectrum in the frequency domain of the audio signal; and a value representing the degree of uniformity of the spectrum in the frequency domain.

At least one symptom of the component in the time domain may include at least one of the following values: index gradient, based on the amount gradient� at the points of the audio signal, in which direction is changed, the waveform of the audio signal; the ratio of the energy of a frame of the audio signal to the energy of the previous frame of the audio signal and classifying the sound signal as active or inactive speech activity detector.

Code may also implement the join operation of the sub-band signal with the sound signal to produce the sound signal with extended bandwidth.

In accordance with a fourth aspect of some embodiments of the present invention proposes a device comprising: a generator excitation signal, configured to generate the excitation signal based on the sound signal, the sound signal contains many frequency components; block feature extraction, configured to highlight a feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; a unit for determining the spectral parameters configured to define at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components that belong to complement�lname multiple frequency components; and the filter Bank, configured to generate the subband signal by filtering the excitation signal and weighting the filtered excitation signal using at least one shape parameter of the spectrum.

The generator excitation signal may include: an inverse filter with linear prediction, configured to generate a residual signal by filtering the sound signal; the supply end filter section includes a filter, an autoregressive moving average configured to filter the residual signal, wherein the filter autoregressive moving average depends on the inverse filter with linear prediction; and increasing discretization configured to generate the excitation signal by multiplying the sampling and spectral coagulation of the output signal of the supply end filter section.

The supply end filter section may also contain: filter spectral tilt and harmonic filter.

Frequency components of the signal sub-band may be allocated according to psychoacoustic scale, broken into a set of overlapping ranges, and the frequency characteristics of the filter Bank correspond to the distribution of frequency components of the signal sub-band.

Overlapping subbands may be distributed according to the Mel-scale, with whitefish�al sub-band may be masked with the use of triangular and/or trapezoidal function masking.

Block determine the spectral parameters may include: neural network configured to determine at least one form of the spectrum on the basis of the feature vector, the feature vector extracted from the audio signal, generates an input target vector for the neural network and the neural network is trained with the goal of providing the shape parameter of the spectrum sub-band for an input target vector.

The shape parameter of the spectrum can represent the value of the energy sub-band.

The shape parameter of the spectrum may represent an amplification factor of sub-band based on the value of the energy sub-band.

The filter Bank may include an attenuator configured to attenuate the level of energy sub-band, if the power of the audio signal reaches the estimated values of the noise level in the audio signal.

At least one attribute of a component in a frequency region in the feature vector may include at least one of the following values: group consisting of a plurality of energy levels of the audio signal, wherein each of the energy levels of the audio signal corresponds to the energy of the overlapping sub-band audio signal; the value representing the centroid of the spectrum in the frequency domain of the audio signal; and a value are�its degree of uniformity of the spectrum in the frequency domain.

At least one symptom of the component in the time domain in the feature vector may include at least one of the following values: index gradient, based on the sum of the gradients at the points of the audio signal, which varies in the direction of the waveform of the audio signal; the ratio of the energy of a frame of the audio signal to the energy of the previous frame of the audio signal and classifying the sound signal as active or inactive speech activity detector.

The device may also comprise a signal combiner configured for combining the sub-band signal with the sound signal to form an audio signal with extended bandwidth.

The electronic device may include device described above.

The chipset may include device described above.

BRIEF description of the DRAWINGS

For better understanding of the present invention hereinafter in the examples provided reference to the attached drawings, in which:

Fig.1 schematically shows an electronic device that uses embodiments of the present invention;

Fig.2 schematically shows a system decoder that uses embodiments of the present invention;

Fig.3 schematically shows a decoder, which �ealized first variant implementation of the present invention;

Fig.4 schematically shows the expansion unit bandwidth in accordance with some embodiments of implementing the present invention;

Fig.5 illustrates the advantage of using the critical ranges and auditory masking to the input sound devices to expand bandwidth to simplify the process of feature extraction;

Fig.6 illustrates the advantage of using the critical ranges to simplify the process of generating the signal with an artificially extended frequency bandwidth;

Fig.7 illustrates the advantage of using the filter Bank, in which the subbands are determined by the critical bands;

Fig.8 shows an algorithm illustrating the operation of the device for expanding the frequency band in accordance with some embodiments of implementing the present invention;

Fig.9 shows an algorithm illustrating in more detail a number of operations performed by the device for expanding the frequency band in a variant implementation, shown in Fig.4; and

Fig.10 shows an algorithm illustrating in more detail a number of additional operations performed by the device in a variant implementation, shown in Fig.4.

SOME VARIANTS of carrying out the INVENTION

In more detail possible mechanisms for the implementation and�artificial extension of the bandwidth of the decoded audio signal. Initially with reference to Fig.1 is a block diagram of an exemplary electronic device 10 that may include a codec corresponding to the embodiment of the present invention.

Electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system. In certain other embodiments of the present invention, the device 10 can be any suitable component of the sound system or subsystem in the electronic device, such as an audio player (also known as an MP3 player) or media player (it is also called MP4 player).

Electronic device 10 includes a microphone 11, which through analog-to-digital Converter (ADC, analogue-to-digital converter) 14 connected to the CPU 21. The CPU 21 then through a digital to analogue Converter (DAC, digital-to-analogue converter 32 is connected to a loudspeaker (or loudspeakers) 33. The CPU 21 is also connected to a transceiver (RX/TX) 13, a user interface (UI, user interface) 15, and a memory 22.

The processor 21 may be configured for executing various program codes. Executable program codes 23 may contain codes for decoding an audio or speech signal. Executable program codes 23 may be stored, for example, in the memory 22, isatori if necessary, they are read by the processor 21. In the memory 22 may also be given a section 24 for storing data, e.g., data encoded in accordance with the options of implementing the present invention.

Code for decoding in accordance with the options of implementing the present invention may be implemented as electronic circuits hardware or firmware.

The user interface 15 enables a user to enter commands that control the electronic device 10, for example, using the keyboard and/or receive information from the electronic device 110, for example, on the display. The transceiver 13 enables communication with other electronic devices, e.g., via a wireless network.

It should be understood that the structure of the electronic device 10 may be modified in various ways.

In the electronic device 10 may include a microphone 11 is used for inputting a speech signal that should be transmitted to some other electronic device or saved in section 24 of the data memory 22. To this end, the user through the user interface 15 is activated, the corresponding application. It is an application that can be run by the CPU 21 initiates the execution by the CPU 21 of the code stored in the memory 22.

Analog-to-digital converters�ovatel 14 converts the input analog audio signal to digital signal and supplies this signal to the processor 21.

The electronic device 10 through the transceiver 13 accept from another electronic device bit stream containing the suitably encoded data. In an alternative embodiment, the encoded data can be stored in section 24 of the data located in the memory 22, for example, for subsequent presentation by the electronic device 10. In both cases, the CPU 21 may execute program code for decoding stored in the memory 22. The processor 21 decodes the received data, for example, thus, as described with reference to Fig.3 and 4, and delivers the decoded data to a digital to analogue Converter 32. Digital to analog Converter 32 converts the digital encoded data to analog audio data and outputs it through the loudspeaker (speaker) 33. The execution of the program code for decoding can be initiated by an application invoked by the user from among the user interface 15.

In addition, the encoded data may not immediately be output via the loudspeakers 33, and may be stored in section 24 of the data memory 22, for example, to provide or forward to another electronic device.

You must take into account that the schematic structure shown in Fig.3 and 4, and steps of the method, showing�record in Fig.8, 9 and 10, represent only a portion of the operations performed by the device for expanding the bandwidth of an example implementation in which the electronic device shown in Fig.1.

Basic operation of speech codecs and audio signals, known at the current level of technology, so the functions of these codecs that are not relevant to the explanation of the essence of embodiments of the present invention, are not described in detail.

The following describes embodiments of the present invention with reference to Fig.2-10.

Basic operations performed by the decoder of the speech and sound signals according to the options of implementing the present invention, shown in Fig.2. Fig 2 schematically shows the basic system 102 decoding. The system 102 may include a memory or media channel 106 (also called a communications channel) and the decoder 108.

The decoder 108 decompresses the bit stream 112 and generates an output audio signal 114. The transmission rate of the bit stream 112 and the quality of the output audio signal 114 relative to the input signal 110 are the main characteristics that define the operating parameters of the system 102 encoding.

Fig.3 schematically shows the decoder 108 in accordance with some embodiments of implementing the present invention. The decoder 108 includes an input 302 from which zakodirovana�nd stream 112 may be made through the media channel 106. Input 302, in some embodiments, is connected to the decoder 301 of the audio signal. The decoder 301 of the audio signal in such embodiments, configured to receive encoded data from the media channel or communication channel, whereby the received data can be stored and extracted. The decoder 301 of the audio signal in such embodiments, is also configured to decode encoded data received from the media channel 106, to form the output stream 304, based on the samples of the audio signal. The output stream of the audio signal from the decoder 301 of the audio signal may be supplied to the input device 303 artificial extension of the bandwidth. Device 303 to expand the bandwidth in some embodiments, the present invention can be configured to extend the bandwidth of the incoming stream 304 of the audio signal to form an output audio signal 306 with extended bandwidth.

The audio signal 306 with an extended frequency range may, in some embodiments of the present invention to form the output audio signal 114 transmitted from the decoder 108.

You should take into account that the decoder 301 of the audio signal may be a special way to tune for decterov�Oia input encoded data, transferred from the input 302. In other words, the technology of decoding audio signal is applied by the decoder 301 of the audio signal, can be determined by the technology of encoding a sound signal, used to generate the encoded data.

In addition, it should be borne in mind that in some embodiments of the present invention the decoder 301 of the audio signal may be configured to decode or sound, or speech encoded data.

For example, in some embodiments of the present invention the decoder 301 of the audio signal may be configured to decode the speech signal that can be encoded in accordance with the standard adaptive multirate voice coding (AMR, Adaptive multirate).

Detailed description of the AMR codec contains, for example, in the technical specifications 3GPP TS 26.090.

Next, with reference to Fig.4 is further described device 303 to expand the frequency band of the audio signal in accordance with some embodiments of implementing the present invention.

Device 303 artificial extension of the bandwidth has an input 401, which may be configured to receive the output stream 304 samples the audio signal from the decoder 301 of the audio signal.

It should be borne in mind that the decoded stream vibratsionnogo signal, entering the device 303 to expand the frequency bands can be considered as a signal of the lower frequency range. Device 303 to expand the bandwidth in some embodiments of the present invention may then analyze the signal of the lower frequency range to identify the specific signs. Identified signs in such embodiments, the present invention can then be used to generate a sound signal of the upper band, which can then be combined with the audio signal lower range for the formation of a sound signal 306 with extended bandwidth.

In addition, you must take into account that the upper frequency range of a sound signal with extended bandwidth in embodiments of the present invention may be formed without the necessity of obtaining additional information from the encoder.

In some embodiments of the present invention, the input signal of the lower frequency range can be characterized by a frequency bandwidth of a telephone line, a component of from 300 to 3400 Hz, with a sampling rate of 8 kHz. In these embodiments, the present invention device 303 to extend the bandwidth can convert the input audio signal into a wideband audio signals�l sampling frequency, component of 16 kHz, and the frequency range that can exceed the frequency range of the input signal.

You should take into account that in this case, the term "upper range" can mean extended frequency components generated by the device 303 to extend the runway.

In order to better understand the invention, hereinafter described in more detail the operation of the device 303 to expand the frequency band with reference to the algorithm shown in Fig.8.

In some embodiments, the device 303 to expand the frequency band of the audio signal contains the collector 403 frames.

Input 401 in some embodiments, is connected to the collector frames 403, through which the input audio signal (also called a stream of samples of a sound signal) is divided and merged into a continuous sequence of audio frames.

In some embodiments of the present invention the number of samples of the audio signal that are combined in a frame may depend on the sampling rate of the input audio signal.

For example, in some embodiments of the present invention, the sampling frequency of the audio input signal 304 may be 8 kHz. In such embodiments, the collector 403 may be configured to separate the input sound�vågå signal into multiple audio frames, each of which occupies a time interval equal to 12 MS. In other words, in this embodiment of the present invention, each audio frame contains 96 samples of the audio signal with a sampling frequency equal to 8 kHz.

In addition, the collector 403 frames in some embodiments of the present invention can be configured to use overlapping frames, resulting in the update time frames becomes less than the length of a frame of the audio signal.

For example, in some embodiments of the present invention the frame of the audio signal may be updated by the collector 403 frames every 10 MS (80 samples), provided that there can be overlap of 16 samples between frames.

You should take into account that the collector 403 personnel in some embodiments, the implementation can run on redundant sampling frequencies and with excessive frame size, and that the functioning of the device 303 to expand the bandwidth is not limited by the examples given under the description of some embodiments of the present invention.

The step of combining the input samples of the audio signal in the frame 404 of the audio signal performed by the collector 403 frames shown in Fig.8 as step 801 processing.

In some embodiments, the device 303 artificial expansion�the access frequency band comprises a Converter 405 from the time domain to the frequency domain.

The output signal from the collector 403 may, in some embodiments of the present invention be transmitted to the transmitter 405 from the time domain to the frequency, resulting in a temporary frame 404 of the audio signal may be an orthogonal transform on a frame-by-frame basis.

In some embodiments of the present invention orthogonal transformation can be implemented using fast Fourier transform (FFT, Fast Fourier Transform), resulting in a temporary frame 404 of the audio signal consisting of 96 samples can be converted into frequency domain using a 128 point FFT. In these embodiments of the present invention the use of 128 points FFT can be performed by additions of the frame 404 of the audio signal further zeroed samples.

You should take into account that in some embodiments of the present invention the conversion of the frame 404 of the audio signal into frequency coefficients simplifies the process of feature extraction in the frequency domain.

It should also be borne in mind that in some embodiments of the present invention the frequency coefficients generated for the frame 404 of the audio signal can be considered as a sound signal of the lower range in the frequency domain.

The step of converting the frame 404 of the audio signal into a representation in the frequency domain containing the frequency coefficients shown in Fig.8 as step 803 processing.

In some embodiments, the device 303 artificial extension of the bandwidth comprises a unit 407 feature extraction.

The coefficients in the frequency domain of the frame 404 of the audio signal can in these embodiments be transmitted to the input unit 407 feature extraction.

In some embodiments, block 407 feature extraction can also be configured to receive additional input from the collector 403 frames. This auxiliary input can be used to send the frame 404 of the audio signal directly from the collector 403 frames in block 407 feature extraction, resulting in not using the Converter 405 from the time domain to the frequency domain.

As shown in Fig.4, the frame 404 of the audio signal in the time domain may in these embodiments be transmitted via connection 440, established between the collector frames 403 and block 407 feature extraction.

Block 407 feature extraction may in some embodiments be used for feature extraction from the frame of the audio signal and converted into frequency domain frame of the audio signal. Signs, wydelennye using block 407 feature extraction, may in some embodiments be used to generate the enhanced frequency domain of a frame of the audio signal.

It should be borne in mind that in this case extended low-frequency area of the frame of the audio signal may be the signal of the upper frequency range.

It should also be borne in mind that in this case converted into frequency domain frame of the audio signal may, in some embodiments of the present invention to be called a signal in the frequency domain.

In some embodiments of the present invention, for each frame of the input audio signal in the frequency domain can be given a feature vector with nine dimensions that contains elements of both a frequency and time domain.

In some other embodiments of the present invention for every frame can be allocated to the feature vector with ten or other number of dimensions that contains elements of both a frequency and time domain.

In some embodiments of the present invention the first set of component characteristics in the frequency domain can be obtained by slicing the signal in the frequency domain into a number of overlapping sub-bands with subsequent determination of the energy of each subband. The energy value of each of paddy�Patna in such embodiments, may then form a component of the feature vector in the frequency domain.

In some embodiments of the present invention, the energy of each subband can be determined by squaring the amplitude of each coefficient in the frequency domain within the sub-band. In other words, the characteristics in the frequency domain can in these embodiments be allocated, at least partially, by determining the power spectral density of the frequency coefficients of the input signal.

In some embodiments of the present invention, the signal in the frequency domain can be divided into many overlapping subbands of equal bandwidth in accordance with the Mel-scale is obtained the psychoacoustic method.

For example, in some embodiments of the present invention, according to which the input audio signal is supplied to the device 303 to expand bandwidth with a sampling rate of 8 kHz, the audio signal lower range may correspond to an effective frequency range from 250 to 3500 Hz. In these embodiments, the signal in the frequency domain can be separated into five subbands of equal bandwidth in accordance with the Mel-scale is obtained the psychoacoustic method.

In some embodiments of the present invention, the transformation frequency components, �the value of which is determined in Hz, in units Mel-scale can be expressed as follows:

m=2595log10(1+f/700),

where f is the frequency in Hz and m is the value obtained after conversion of the Mel-scale and the corresponding frequency component.

In these embodiments of the present invention, each separated from equivalent (Mel-scale) overlapping sub-bands may be filtered according to the filter with a triangular bandwidth. In other words, the triangular mask can be applied to components in the frequency domain of each subband in order to obtain the value of the energy sub-band.

The use of masks are triangular in shape in some embodiments, the present invention allows to simulate the characteristics of auditory masking frequency within the same critical band auditory system of a person.

In other embodiments of the present invention equivalent to each of the divided overlapped sub-bands may be filtered according to the characteristics of the filters ranges with trapezoidal bandwidth.

You should take into account that in some embodiments of the present invention the masking filters with triangular or trapezoidal shape of the passband can be implemented so that filter�tion was carried out in a wider range compared with the critical range of the auditory system of a person.

It should be noted that in some embodiments of the present invention, the filter can be applied in turn to each subband in the frequency domain, thereby modeling the frequency resolution of the auditory system of humans across the entire range of the input audio signal. This advantage is illustrated in Fig.5, which shows the use of filters with a triangular frequency response to components of the signal in the frequency domain.

Fig.5 also shows that the auditory filters in the frequency domain may in some embodiments of the present invention to have a more narrow frequency band in the low frequency range compared to the auditory filters at higher frequencies. In addition, you can also see that the bandwidth of each subsequent auditory filter in some embodiments of the present invention is increased in accordance with the Mel-scale.

It should be borne in mind that in some embodiments of the present invention, the values of power spectral density for the frame of the input audio signal may be filtered using filters of sub-bands according to the Mel-scale. In other words, the values of the power spectral density can be filtered using a sequence of filters padapa�ons, based on auditory perception, as shown in Fig.5.

It should also be borne in mind that in some embodiments of the present invention the advantage of the above-described step filtration is the separation of presentation power spectral density of the input frame of the audio signal into a number of subbands that are uniformly spaced on the Mel-scale.

After filtration of the frame of the input audio signal is divided into a number of subband energy for each subband may, in accordance with these variants of implementation of the present invention be determined by summing the filtered values of the spectral power density within the sub-band.

In the General case, note that in some embodiments of the present invention the value of the energy level of the sub-band may be determined by the initial calculation of the frequency spectrum of the signal, based on which the power spectrum may be determined by squaring the values of the spectral amplitudes. Then, for each sub-band spectral power values, specific components of the analyzed sub-band, can be weighed (or to be formed) using the auditory filter, such as a triangular window mentioned above. The energy level of each pedeapsa�and then setting the sum of the weighted spectral power components in this sub-range.

In some embodiments, the present invention can be used five energy values of sub-bands, each of which may correspond to one of five sub-bands. However, one should take into account that in some other embodiments, the present invention can be defined more or fewer energy values of frequency bands.

Should take into account that the energy values of the sub-bands can provide a brief idea of the shape of the spectrum and power level of the frame 404 of the audio signal.

In addition, you must take into account that in some embodiments of the present invention the energy of subbands corresponding to the first five bands, can form the first five attributes of the feature vector obtained for each frame of the audio signal.

In some embodiments of the present invention the energy of subbands corresponding to the first five ranges, may be converted in accordance with the scale, expressed in decibels.

Block 407 feature extraction may, in some embodiments, the present invention also select from the signal in the frequency domain, the additional features in the frequency domain. These additional characteristics in frequency about�Asti can be based on the centroid, otherwise called "center of gravity", the frequency spectrum of the signal.

In some embodiments, the implementation of the present invention, the centroid frequency of the spectrum of the signal can be determined using the squaring of the amplitude of the frequency spectrum calculated by the Converter 405 from the time domain to the frequency domain.

The centroid frequency of the spectrum of a signal consisting of N samples, in accordance with some embodiments of implementing the present invention may be determined as follows:

C=(i=0N/2f(i)P(i)(N/2+1)i=0n/2P(i))

where i is the index denoting the frequency components within a lower frequency range of a sound signal, P(i) denotes the squared amplitude of the frequency component i, and f(i) denotes the frequency corresponding to the index i.

Should take into account that in some embodiments of the present invention, the centroid of the frequency spectrum of the signal can form the sixth component of the selected feature.

In some embodiments of the present invention, by determining the spectral flatness of the frame of the input audio signal can be received the seventh sign, based on the frequency domain. This characteristic can be used to indicate the tonal character of a frame of the input audio signal.

In these embodiments, the value of the spectral uniformity of the signal can be obtained by determining the relationship between the geometric mean and the arithmetic mean of the power spectrum of the signal.

The measure of spectral flatness in some embodiments, the present invention can be calculated according to the following formula:

xsf=log10i=NlNhP(i)nsf1Nsfi=NlNhP(i)

where P(i) about�mean value of the power spectrum with frequency index i, Nland Nhdenote the indices of the first and the last frequency components, which define the spectral measure of uniformity, and Nsfdenotes the number of components in this range.

In some embodiments, the implementation of the present invention, the spectral flatness measurement can be carried out within the frequency range from 300 Hz to 3.4 kHz.

As indicated above, block 407 feature extraction may in some embodiments also be isolated from the frame 404 of the audio signal features related to the time domain by processing the signal transmitted over the connection 440, in the time domain.

In some embodiments, the present invention is the first sign based on the time domain and allocating unit 407 feature extraction, can be an index gradient, based on the sum of the amplitudes of the gradient speech signal in the time domain.

You should take into account that the gradient in such embodiments, may be determined at any point of the form of the speech signal. However, in these embodiments, the index gradient can be determined for the points of the speech signal, which can change the sign of the value of the gradient. In other words, in some embodiments, the present izobreteniya gradient can be based on the sum of the amplitudes of the gradient at those points of the speech signal, which changes the direction of this signal.

In some embodiments of the present invention, the index Xgithe gradient can be determined as follows:

n=1NT1ΔΨ(n)|s(n)s(n1)|n=0N1(s(n)2),

where s(n) denotes the sample of the speech signal at time n, and NTrepresents the number of speech samples in the frame 404 of the audio signal, ΔΨ(n) may represent a change in the sign of the gradient at time n and can be determined as follows:

ΔΨ(n)=½|ΔΨ(n)-ΔΨ(n-1)|,

where ΔΨ(n) denotes the sign of the gradient of s(n)-s(n-1) and can be calculated by the following formula:

Ψ(n)=s(n)s(n1 )|s(n)s(n1)|

In some embodiments, the present invention can determine that the index of xgithe gradient can take low values during transmission vocalized sounds and high values during transmission devocalising sounds.

In some embodiments, the present invention can also mark the second sign in the time domain, which may depend on the relationship of the energies of frames of the audio signal.

In these embodiments, the indication may be determined by calculating the ratio of energy of the current frame 404 of the audio signal to the energy of the previous frame of the audio signal. The resulting value in some embodiments of the present invention can then be scaled in the range specified by dB.

Some embodiments of the present invention allow the use of the above grounds for distinguishing devocalising sound permanent stoppage from other newcaledonia speech sounds.

In some embodiments, the present invention can be obtained the third feature, environmental�I to the time domain, for the frame of the audio signal by determining whether a signal is in the active or inactive areas.

In these embodiments, the frame 404 of the audio signal may be processed by a voice activity detector (VAD, voice activity detector) to determine the condition of a signal is active or inactive.

In some embodiments of the present invention VAD may be implemented by initial conversion of the signal in the time domain (called in another embodiment the frame 404 of the audio signal) in a frequency region with suitable means to orthogonal transformation, such as FFT. After conversion into frequency domain input signal in VAD, the signal can be grouped into many sub-bands. Typically, in some embodiments of the present invention, the grouping operation can be performed using non-linear scales, where a larger number of frequency components are placed in the lower subbands, more important from the point of view of perception. The signal-to-noise ratio (SNR, signal to noise ratio) for each sub-band can then be calculated taking into account the signal energy and the background noise within each subband. The VAD decision can then be taken based on the comparison of the sum of the SNR for each subband and hell�pivnogo the threshold value.

Typically, in some embodiments of the present invention, the noise energy for each sub-band can adapt within a noisy input frames using an autoregressive scheme.

In some embodiments, the present invention can be used in many ways to prevent making wrong decisions VAD. For example, in some embodiments, may be applied on the delaying period", during which the VAD decision on the transition from an active to an inactive state is delayed to avoid taking the wrong decision, when the signal has newcaledonia characteristics. Other methods in some embodiments of the present invention may include the measurement of differences between the current frame and the frame to increase SNR threshold value of the VAD decision when transmitting signals with a high level of fluctuations.

In some embodiments, the present invention can be used ways to detect speech activity, for example, defined in the framework of the joint project coordinating the development of systems of the third generation (3GPP, 3rd Generation Partnership Project) 3GPP TS 26.090 for adaptive multirate (AMR) speech codec.

Should take into account that in some embodiments, the implementation�display of the present invention three temporary tag as described above, can form additional attributes identified by block 407 feature extraction. In other words, the index gradient, the ratio of energies and binary output VAD may in some embodiments, forming three additional component of the feature vector generated at block 407 feature extraction.

It should also be borne in mind that in some embodiments of the present invention, the feature vector determined by the block 407 feature extraction, can be determined for each frame based on the input sound signal 401.

The step of processing the frame 404 of the audio signal both in time and in frequency domain to highlight the feature vector is shown as step 805, the processing in Fig.8.

In some embodiments, the device 303 for the artificial extension of the bandwidth contains the processor 409 neural network.

The feature vector determined by the block 407 feature extraction, in some embodiments of the present invention is transmitted to the processor 409 neural network.

The processor 409 neural network may in some embodiments be used to partially generate the shape of the spectrum artificially generated signal 431 top of the range.

In some embodiments of the present invention, the processor 409 with neural�ti may include neural network, which using a variety of data can learn to develop their own capabilities in various embodiments and conditions, for example, in terms of various noise types and levels, and using different languages.

In some embodiments of the present invention for the development of a neural network can be applied a method of neuroevolution based on genetic algorithms. Such developing neural network can be recursive, in other words, they can accumulate and use historical information about the evolution process, with the parameters of these networks are not limited to the characteristics of the input vector, coming from block 407 feature extraction.

In some embodiments, the present invention can be used a method of neuroevolution based on increasing topologies of neural networks. Usually the algorithm according to this method begins with a minimal network topology, which can then incrementally grow by adding additional nodes and network links in conjunction with a modification of the weighting coefficients associated with the network nodes.

Typically, in some embodiments of the present invention, the neural network based on neuroevolution increasing topologies (NEAT, neuroevolution of augmenting topologies), mo�em to develop as a network, based on perception with a direct link containing only input and output neurons. As you development using discrete steps, the complexity of the network topology can be increased either by adding in the connective way of new neurons, or by creating new connections between previously unrelated) neurons.

In some embodiments of the present invention NEAT neural network can be trained offline using a training database comprising a plurality of samples of the audio signal.

In some other embodiments of the present invention the operation of classification and pattern recognition can be performed by any device or by any of the recognition algorithms, for example using artificial neural networks, self-organizing maps or self-organizing maps features, Bayesian networks, etc.

The samples of an audio signal from a training database in some embodiments, may represent the first signal, the filtered high-pass filter to simulate the input of the frequency characteristic of the mobile station. Filtering in some embodiments, the present invention can be performed by the input filter of the mobile station (MSIN mobile station input filter) in such a way�m, as defined by the standard G. 191 International telecommunication Union (ITU, International Telecommunications Union).

The feature vectors for each sample of the audio signal in a training database in some embodiments of the present invention may be provided as described above, for use in the process of training the neural network NEAT.

In addition, in some embodiments, the present invention can generate the set of target output for the neural network, each target output signal of the neural network corresponds to each sample of the audio signal contained in the training database. These target output signals can then be used to determine the operating parameters of the neural network in the learning phase. In other words, the output signal of the neural network for each sample of the audio signal of the training database can be compared with the corresponding target output signal to determine the operating parameters of the neural network.

In some embodiments of the present invention, the target output signal of the neural network can be generated as a result of determining the parameters associated with the spectral shape of the artificially generated signal upper range, for each respective sample audio signal�sponding database.

In addition, you should take into account, that the training described above, the neural network may be required to generate a target output signal for each sample from the training database, with each training sample audio signal may contain a broadband audio signal.

Target output signal associated with each training sample of the audio signal, in some embodiments of the present invention may be generated through an initial determination of a component of the upper range of each training sample wideband audio signal, and then generate the shape of the spectrum associated with each of certain high-frequency components.

Should take into account that each set of parameters the shape of the spectrum in some embodiments, the present invention can generate the target output signal of the neural network, and each output target signal in these embodiments of the present invention may be associated with a particular training sample of the audio signal contained in the training database.

In accordance with some embodiments of implementing the present invention, the training process the above neural network can be performed as follows: each broadband training with�drove can be divided into several frames, the length of each of which is determined by the length of the working frame of the device 303 to expand bandwidth; then can be determined component upper band of each frame; and next, for each component, the upper range can be calculated spectral shape, presented in the form of the energy levels of each sub-band (component upper band).

You should take into account that the energy levels of each of podpisano component upper band form the target values for the neural network analyzer.

It should also be noted that the above-mentioned signal upper range similar artificially generated signal 431 top of the range. In other words, the signal of the upper band is a representation of an artificially generated signal 431 top of the range, which is formed for the purpose of training the neural network processor 409 neural network.

In some embodiments of the present invention, the form of artificially generated spectrum of the upper band can be formed in the form of a set of energy levels, each of which might correspond to one of the plurality of subbands. In other words, the set of parameters of the shape of the spectrum artificially generated spectrum of the upper band may, in such embodiments, the implementation�ia to be formed in the same manner, as described above, the set of energy levels.

In some embodiments of the present invention, the spectral form of artificially generated spectrum of the upper band may be generated by means of the energy levels of four partially overlapping sub-bands, obtained on the basis of psychoacoustic Mel-scale. In other words, the frequency components of the broadband signal, sampled with a frequency of 16 kHz, can be modeled as four sub-bands, which are evenly spaced on a logarithmic scale in the range from 4 kHz to 8 kHz.

Bandpass filter associated with each sub-band may in some embodiments be implemented in the frequency domain using triangular window, and the energy level of each sub-band can then be determined by calculating a power spectrum of frequency components located in the sub-band.

In some embodiments of the present invention, the energy of each subband can be determined by summing the squares of the amplitudes of the frequency components in the filtered sub-band.

The advantage of using triangular window functions to the signal of the upper band can be seen in Fig.6, which shows the distribution of sub-bands for artificially generated signal 431 ver�his band in the frequency domain.

In addition, in Fig.6 shows that the base of each bandpass filter, in other words, the triangular window function, may lie approximately between the center frequencies of two adjacent sub-bands.

Thus, you should take into account that the above-described process of determining the energy levels for each of the overlapping sub-bands (also called shape parameters of the spectrum) can in turn be performed for each sample from the training database.

It should also be noted that these energy levels overlapping subbands in some embodiments, the present invention can form the target outputs for the neural network in the process of the Autonomous phase of learning. In other words, each set of energy levels overlapping sub-bands associated with the upper range of each broadband training sample from the database, generates the target output for the neural network NEAT.

You should take into account that in some embodiments, in which the NEAT neural network operates in "operational" mode, developed the genomes of the neural network can then be used to process each feature vector from the block 407 feature extraction. This, in turn, can be used by the processor 409 neural network to generate a form with�extra for signal 431 top of the range. In other words, the feature vector extracted from the audio signals (low range), can be used by the processor 409 neural network to generate an appropriate set of parameters the shape of the spectrum for artificially generated signal 431 top of the range.

The generation of an appropriate set of parameters the shape of the spectrum can be performed for the frame of the sound signal on a frame-by-frame basis.

It should also be taken into consideration that in some embodiments of the present invention at the output of the processor 409 neural network NEAT when working in "online" mode can be formed by the energy levels of the four sub-bands corresponding to four overlapping sub-bands Mel-scale as described above.

It should be borne in mind that in some embodiments of the present invention, the shape parameters of the spectrum, in other words, the energy levels for each of the sub-bands may be determined using the characteristics selected exclusively from the frame 404 of the audio signal (bottom band).

The step of determining by the processor 409 neural network parameters the shape of the spectrum shown in Fig.8 in the form of step 807 is processed.

In some embodiments of the present invention, the device 303 artificial extension of the bandwidth comprises a block 411 smoothing energy�AI range. Output processor 409 neural network can connect to the input unit 411 smoothing energy range.

Block 411 smoothing band energy may, in some embodiments of the present invention to filter the energy level for each subband based on the current and previous values. This can give the advantage consisting in the fact that neutralized the negative impact resulting from selection by the processor 409 neural networks of energy levels of subbands, which in some embodiments can be too high. In other words, as a result of filtering each energy level of the smoothed sub-band any fast-paced levels.

In some embodiments, block 411 smoothing band energy can supply the energy level of each subband in the autoregressive filter of the first order. In other words, a weighted average may be calculated for each energy level sub-band using the current level of energy sub-band and pre-filtered level of energy sub-band.

In some embodiments of the present invention autoregressive filter is applied to each energy level may be represented as follows:

Ef(n)=�E(n)+γE f(n-1),

where the values E(n) and Ef(n) represent, respectively, the energy level of the filtered subband and the level of energy sub-band in the instance n of the frame; φ denotes the weighting factor applicable to the current level of the energy E(n); and γ denotes the weighting factor applicable to the previous filtered energy Ef(n-1) sub-band.

In some embodiments of the present invention described above autoregressive filter can only be used for energy levels of subbands, which are filtered out more energy levels. In other words, the filter in such embodiments, can be applied only when E(n)>Ef(n-1).

You should take into account that in these embodiments of the present invention described above autoregressive filter can be applied to the energy level successively for each sub-band.

It should also be taken into consideration that the above-described filtering process can be performed for each frame n.

In the first group of embodiments of the values of φ and γ can take values of 0.25 and 0.75.

You should take into account that in some other embodiments of the present invention, the values of φ and γ can take values other than specified�'s above. For example, in some embodiments, the present invention can be applied to other values of φ and γ, for example, the values chosen to satisfy the equality φ+γ=1.

In some embodiments of the present invention block 411 smoothing band energy can perform additional processing step, whereby the signal of the upper band can be weakened if the power of the input audio signal 404 (in other words, the signal lower or telephone range) is close to the estimated level of adaptive noise.

To perform this additional processing step may calculate the energy of the input audio signal 404 for each frame. In some embodiments of the present invention, this computation may be performed by the assembler 403 frames.

Lower bound of the noise level of the input audio signal may, in some embodiments of the present invention be determined by the filtering circuit of energy within the frames of the input audio signal. Filtering may, for example, be performed using a recursive filter of the first order.

In some embodiments of the present invention in a recursive filter of the first order factors can be used, varying according to the change of direction of the contour �power. For example, in some embodiments of the present invention, in which circuit energy is changed in the upward direction, in a recursive filter of the first order can be applied a factor whose value may be different from the filter coefficient that is used to change the contour of energy in the lower direction.

The value of the filter coefficients may, in some embodiments, the present invention can be selected so that the estimated noise level gradually increased during periods of speech activity and rapidly decreased to a minimum in the intervals of the audio signal 404.

The energy levels associated with the current frame artificially generated signal 431 top of the range, may in some embodiments of the present invention be attenuated in accordance with the difference between the energy level of the current frame audio signal and the lower threshold noise estimation using transform with piecewise linear characteristic.

The above-described method of adaptive attenuation allows in such embodiments, to reduce the perceived noise in synthetically generated signal 431 top of the range.

Filtration step of the energy levels associated with each subrange artificially generated signal 431 upper range�on, shown in Fig.8 in step 809 processing.

In some embodiments of the present invention, the device 303 artificial extension of the bandwidth contains a generator 417 excitation signal, which increases discretization 419, Bank filters 421 and the processor 415 weighing range and summation.

Artificially generated signal 431 upper range may in such embodiments be formed, at least partially, by filing frames in the time domain to the input of the generator 417 excitation signal, which increases the sample rate of the output signal generator 417 excitation signal in an up-discriminatory 419, filtering the excitation signal with increased sampling in the Bank filters 421 and subsequent weighting of each sub-band signal by using the gain obtained on the basis of the relevant energy levels of the band Mel-scale. In other words, each sub-band at the output of the Bank 421 filters may in some embodiments be weighed individually by using the corresponding gain of the subband. The gain in some embodiments, can be obtained on the basis of the energy level associated with the particular sub-band and the energy levels associated with adjacent sub-bands. In the same� embodiments, the artificially generated signal 431 upper range may be formed by the joint weighted summation of the signals of sub-bands in the processor 415 weighing range and summation.

In some embodiments, the implementation of the present invention, the gain of each subband Bank filters 421 may be determined by the inverter 413 of the energy in the gain, resulting in an energy level associated with the specific sub-band filter Bank may in such embodiments be transformed into a suitable gain.

You should take into account that in some embodiments of the present invention, the frequency band within which the processor 409 neural network defines each energy level may correspond to the frequency band of each subsequent sub-band filter Bank. In other words, the subsequent filter Bank can use the same overlapping subbands, and the subbands used by the processor 409 neural network to determine the energy levels of the upper range.

In some embodiments, the implementation of the present invention, the filter Bank can generate four sub-bands, which can be equivalent to the four subbands used for obtaining energy levels of the upper range. However, in some other embodiments of the present invention to obtain the energy levels of the upper range can be used, the number of subranges is great�e or less than four.

An example of the distribution of frequencies of each subband Bank 421 filters applied within the first group of embodiments of the present invention shown in Fig.7.

By comparing the distribution of frequencies of the subbands shown in Fig.7 and Fig.6, it is possible to determine that the frequency band and the frequency distribution of the four sub-bands of the filter Bank is equivalent to the frequency distribution of the four sub-bands, which are used to obtain the energy levels of the upper band in the processor 409 neural networks. In other words, the center frequency and a frequency band, each subband is equivalent in both sets of banks of filters.

Fig.4 shows that the input of the Converter 413 of the energy in the gain may in some embodiments be connected with the output of block 411 smoothing energy range. In these configurations, the energy level associated with each sub-band can be transmitted from the block 411 smoothing band energy in the inverter 413 of the energy in the gain.

As noted above, the Converter 413 of the energy in the gain may in some embodiments of the present invention be used to determine the gain of each sub-band filter Bank.

In order to better understand the functioning of the system�s in some embodiments of the present invention, level E is the energy of the next sub-band is expressed as a function of the index to the sub-band.

In some embodiments of the present invention to determine the gain g(k) for each subband k Bank 421 filters can be applied an iterative method.

For a better understanding of the invention hereinafter with reference to the algorithm shown in Fig.9, describes the step of determining the gain of each subband Bank 421 filters.

The step of supplying energy sub-band from the output of block 411 smoothing the energy range shown in Fig.9 in the form of step 901 is processed.

You should take into account that in some embodiments of the present invention, the window function obtained psychoacoustic method, can be a triangular window function corresponding to the Mel-scale as described above.

In addition, you should take into account that the structure of the sub-band, obtained the psychoacoustic method for artificially generated signal 431 upper range may in these embodiments, contain a number of overlapping subbands, resulting in the energy of one sub-band can contribute to the energy of the neighboring sub-bands. An example of overlapping sub-bands shown in Fig.7, which shows that the energy of the second p�of diapason contributes to the energy of the first and third neighboring subbands.

In the first example, the initial coefficient of g0(k) the gain may be determined for each subband by estimating a gain value, which is obtained in the form of energy E sub-band k without considering neighboring sub-bands.

In some embodiments of the present invention, this initial rate g0(k) gain for subband k may be calculated as follows:

g0(k)=E(k)ck

where E(k) is the energy level of the sub-band k, and Ck- pre-computed constant representing the energy of the k-th synthesized range.

The step of determining the initial ratio of g0(k) gain for subband K shown in Fig.9 in the form of step 903 is processed.

After determining the initial values of g0(k) the gain of the specific subband can be computed a new estimated value of the coefficient g1(k) the gain on the basis of weight of the initial gain for a particular sub-band k. A new estimated value of the coefficient g1(k) the gain of the specific subband k may, in some embodiments of the present Fig�plants be considered as a first iteration during the execution of the algorithm for determining the coefficient g(k) sub-band gain. A weighting procedure the initial gain may in these embodiments of the present invention to be performed with regard to the relationship of the energy levels E(k) for subband k, which is called an energy level E sub-band (k) to the value of the energy level of the sub-band k, which applies to adjacent ranges. In the first iteration of a process of determining the amplification factor of the value of the sub-band energy level for subband k may be denoted as E0(k). The weighting factor in such embodiments, may then be determined by taking the square root relationship of energies.

You should take into account that the value of E(k) energy subband k may, in some embodiments of the present invention to provide an energy value of the sub-band defined at the output of block 411 smoothing band energy in the process of step 809 processing.

The operation of determining the weighting factor shown in Fig.9 in the form of steps 905 and 907 processing.

In accordance with some embodiments of implementing the present invention, a new estimated value of the gain in the first iteration for sub-band k can be calculated as follows:

g1(k) =g0(k)E(k)E0(k)

In the General case the result of performing the iteration i of the algorithm can be obtained following the amplification factor of sub-band k:

gi(k)=gi1(k)E(k)Ei1(k),

where gi(k) denotes the gain corresponding to the i-th iteration, gi-1(k) denotes the value of the gain of the sub-band corresponding to the previous iteration (i-1), and Ei-1(k) corresponds to the value of the energy level of the subband k. In some embodiments of the present invention, the value of Ei-1(k) may be determined as a weighted sum of the squared values of the coefficients gi-1(k) gain and works adjacent gain values of the neighboring sub-bands, i.e., gi-1(k-1)*gi-1(k) and gi-1(k)*gi-1(k+1).

The advantage of these variants of the implementation�ment of the present invention is when defining the values of Ei-1(k) take into account the neighboring energy subbands.

In some embodiments of the present invention within the above calculate the value of Ei-1(k) may also be performed weighting the squares of the coefficients of amplification and multiplication related gains on the weights. The weighting factors can be defined as follows: frequencies located above the center point of the top filter Bank sub-band filters 421, are within the scope of unity-gain, and frequency below the center point of the lowermost filter Bank sub-band filters 421, also located in the area of unity-gain.

The step of weighing the gain on the basis of the previous iteration to form a new value of gain is shown in Fig.9 in the form of step 909 is processed.

The algorithm for determining the gain can be performed for several iterations, until an interrupt condition.

A step of determining conditions for interruption shown in Fig.9 as the step 911 of processing, and the step of re-executing the process on the next iteration in that case, if the interrupt condition is not met, shown in Fig.9? as a step 913 processing.

For example, in some embodiments, the present�th invention is determined, that is sufficient to perform two iterations of the algorithm for estimating the gain of the subband. This value is determined experimentally, as it provides the effective result.

The step of determining that in the process of the current iteration determined by the gain for a particular sub-band shown in Fig.9 in the form of step 915 processing.

You should take into account that in some embodiments of the present invention described above, the process of determining the gain may be repeated for each overlapping sub-band artificially generated signal upper range.

For example, in some embodiments of the present invention described above, the process of determining the gain can be performed simultaneously for each sub-band to take account of the impact of the neighboring sub-bands.

It should also be taken into consideration that in some embodiments of the present invention described above, the process of determining the amplification factor of sub-band can be performed for each frame of the audio signal.

The step of determining the gain of each subband Bank filters 421 shown in Fig.8 in step 811 processing.

The gains of the sub-bands can then be transmitted to the processor 415 �of zvishavane range and summation over the connection from the inverter 413 of the energy in the gain.

As stated above, the artificially generated signal upper range can be formed by transmitting a signal to the Bank filters 421 and subsequent weighting of each output signal according to the corresponding sub-band gain of the sub-band.

In some embodiments, the present invention should take into account that the process of filtering the excitation signal using a filter Bank and then weighing each subsequent subband signal using the corresponding gain of the sub-band may be regarded as obtaining the spectrum shape of the upper range of artificially generated signal 431 top of the range.

The excitation signal in some embodiments of the present invention may be generated based on the input (narrowband) audio signal supplied to the device 303 artificial extension of the bandwidth, in other words, the signal 401.

In order to simplify the process of generation of the excitation signal to the filter Bank, the output of the collector 403 may in some embodiments further be connected to a generator 417 of the excitation signal. Then, in such embodiments may filter frame 404 of the input sound signal with the ISP�Lovanium analysis with linear prediction (LP Linear predictive), to generate the excitation signal with uniform spectrum.

In some embodiments of the present invention, the filtering using analysis with linear prediction can be performed on a frame-by-frame basis, whereby the filter coefficients, based on the analysis with the LP, can be calculated for each frame 404 of the audio signal.

For a better understanding of the process of generating the excitation signal the following describes the operation of the generator 414 of the excitation signal with reference to the algorithm shown in Fig.10.

To determine the filter coefficients using the analysis with LP, generator 417 excitation signal may, in some embodiments, to analyze short-term correlation of the frame 404 of the audio signal in accordance with the information provided by the collector 403 frames.

In some embodiments, the present invention analysis of short-term correlations of a frame of the audio signal can be performed in the process of encoding with linear prediction (LPC, linear predictive coding). This method is based on calculating either the covariance function or the autocorrelation coefficient of the frame of the input audio signal within the range of various delays, the when this range of delays may be determined by the order of the filter.

<> In some embodiments, the implementation of the present invention, the LPC analysis can be performed using the autocorrelation method, in which the result of the calculation of autocorrelation values within the range of latencies (defined by the order of the filter) may be formed in the form of a symmetric square matrix, known as a Toeplitz matrix. The matrix is Toeplitz symmetric about the main diagonal, and all its items in any given diagonal are equal. To determine the coefficients of the LPC filter with the matrix in some embodiments, can be inverted using the algorithm of Levinson-Durbin.

In some embodiments, the implementation of the present invention, the LPC analysis may be performed using a covariance function.

When using covariance functions for the formation of the covariance matrix can be determined covariance in a range of different delays within the audio frame. The matrix size is determined by the range delay, on the basis of which different values are calculated covariance.

As noted above, should be taken into account that the range of delays within which can be calculated covariance values, determined by the number of LPC coefficients and, therefore, the subsequent order�about the filter using LP analysis.

In some embodiments of the present invention covariance matrix is symmetrical about the main diagonal. However, unlike the Toeplitz matrix, the values within a given diagonal are not necessarily the same. In these embodiments, in order to obtain the coefficients of the filter using LPC, the matrix can be inverted using the decomposition Holeckova.

You should take into account that in these embodiments, to perform the covariance method does not require scaling of the frame of the audio signal using an appropriate forming the window before LPC analysis. Thus, in such embodiments, the function of forming a window in the collector 403 may not be executed.

The step of determining the LPC coefficients of the input frame 404 of the audio signal shown in Fig.10 in the form of step 1001 is processed.

After the filter coefficients using LPC, defined in the generator 417 excitation signal, the input frame 404 of the audio signal in some embodiments of the present invention can be filtered using the LP analysis, for the formation of the LP residual signal.

In some embodiments of the present invention the shape of the filter using the LP analysis, can be� represented by the following expression:

A(z)=1+i=1Mαiz1,

where α is the coefficient of the filter using LPC, z - a delay i of the sampling intervals, and M is the order of the filter using LPC.

In some embodiments of the present invention, the order M LPC can be defined by ten. This value is determined experimentally, as it provides the effective result.

The step of converting the frame 404 of the audio signal by the filter using the LPC analysis, is shown in Fig.10, step 1003 is processed.

The residual signal LP can then be filtered using the filter, autoregressive moving average (ARMA auto regressive moving average), formed on the basis of the coefficients of the LPC filter calculated for the current frame of the audio signal.

It should also be taken into consideration that the process of filtering the LP analysis may, in some embodiments, to influence the amplification of the spectral minima of the signal so that the overall spectral shape can be uniform. However, the spectral minima can usually be associated with areas of nizkovoltnaya signal/noise ratio of the decoded audio signal. Therefore, in some embodiments, the filtering process with the LP analysis can have a negative impact, which consists in the amplification of noise LP residual signal.

To counteract some of the effects described above, the ARMA filter may in some embodiments be applied to the residual signal LP. The advantage of the application of the ARMA filter in some embodiments of the present invention consists in a slight strengthening of the formant with a slight weakening of the spectral minima. This approach also has the advantage consisting in the reduction of the noise level LP residual signal.

The shape of the ARMA filter in some embodiments, may be similar to the form of the supply end filter used in many codecs, such as AMR codec specified in the technical specification 3GPP TS 26.090 joint project to coordinate the development of third generation systems.

The shape of the ARMA filter can be represented by the following expression:

Hff(z)=A(z/β)A(z/α)=1+i=1 Mαiβiz11+i=1Mαiαizi

where the coefficients α and β can be considered as weighting factors, the value of which may be within the range of 0<β<α<1. The factor α affects the extraction of a specified poles of the ARMA filter towards the center of the unit circle, and, similarly, the coefficient β is the effect on the elongation of zero points toward the center of the unit circle.

In some embodiments of the present invention the weighting factors α and β can be determined respectively by the values of 0.9 and 0.5. These values are determined experimentally, as it provides the effective result.

You should take into account that in other embodiments of the present invention the weighting factors ARMA filters can take values that differ from those used in the first group of variants of the implementation.

Step postfiltration residual signal generated by the filter LPC analysis, is shown in Fig.10, kaksi 1005 processing.

In some embodiments of the present invention, in order to improve the quality of the residual signal is used as described above, the ARMA filter can be an additional processing step, which consists in applying a filter spectral tilt.

You should take into account that in these embodiments, as a result of use of the above filter possible ARMA spectral tilt of frequencies filtered LP residual signal. To neutralize this negative effect, the filter spectral tilt may in some embodiments be applied to the residual filtered through ARMA signal LP, which, in turn, may exacerbate the attenuated frequency components to recover mostly uniform spectrum of the LP residual signal.

In some embodiments of the present invention, the above filter spectral tilt may be in the form of a filter with poles and zeros of the first order, which can be defined by the following expression:

Ht(z)=1μz11+μz1 ,

where the amplification coefficient µ is proportional to the first reflection coefficient of the above filter HffARMA and can be determined as follows:

μ=ktR(1)R(0),

where R(0) and R(1) are, respectively, zero and first autocorrelation coefficients of the truncated impulse response filter HffARMA, a ktis a constant, by which the control of the spectral tilt in the filter.

In some embodiments of the present invention, the constant ktcan be a value of 0.6. This value is determined experimentally, as it provides the effective result.

The step of applying the spectral tilt to the output section of postfiltration ARMA shown in Fig.10 in step 1007 processing.

In some embodiments of the present invention may perform additional processing step, which may diminish the harmonics LP residual signal. This additional processing step may in particular embodiments be used for cases in which the input signal of the lower range is characterized by powerful�harmonic components. For example, some women may have particularly strong vocalized areas, which are manifested in non-natural metal ringing in the enhanced signal.

To neutralize this impact, in some embodiments of the present invention additional harmonic filter can be applied to the residual signal LP in the form:

Hpf(z)=1-kpfgz-M,

where M is the period of the basic tone (or lag) LP residual signal, a g - corresponding optimal gain of the pitch. The coefficient kpfin some embodiments of the present invention used for controlling the amount of attenuation applied within each period of the basic tone. In other words, the coefficient kpfcan be used to control harmonic components of the LP residual signal.

In some embodiments of the present invention, the coefficient kpfcan be a value of 0.65. This value is determined experimentally, as it provides the effective result.

In some embodiments of the present invention, the period M of the pitch (or trailing) and the corresponding optimal gain g of the pitch can be determined using estimates of the lag of the pitch without reverse�th communication according to which the correlation value of the frame of the audio signal can be calculated within a number of different delays of the pitch. The period M of the pitch and the corresponding optimal gain g of the pitch can then, in such embodiments, be determined as values of delay and gain of the pitch that maximize the correlation value of the frame of the audio signal.

In some other embodiments of the present invention the period of the basic tone and the optimal lag of the pitch can be determined by maximizing the correlation values LP residual signal and not the input frame of the audio signal.

An example algorithm for determining a suitable main tone, which can be used in the filtering process of the harmonic components contained in the technical specification 3GPP TS 26.090 AMR codec, developed in the frame of joint project to coordinate the development of third generation systems.

Should take into account that the above-described structure of the harmonic filter may correspond to a comb filter.

The operation of filtering the harmonic components of the LPC residual signal shown in Fig.10 in step 1009 processing.

It should also be taken into account that the output of the comb filter in some embodiments imple�of estline can form the excitation signal.

The operation of generating the excitation signal from generator 417 excitation signal shown in Fig.8 in step 813 processing.

The output signal of the generator 417 excitation signal may, in some embodiments of the present invention be supplied to the input boost discretization 419.

In some embodiments, increasing discretization 419 can perform upsampling of the input LP residual signal using the specified index.

In these embodiments, increasing the discretization can be performed by inserting zero samples between each of the samples of the LP residual signal. To create a continuous signal in the time domain can be used, the processes overlap and addition.

Should take into account that low-pass filtering may not be used in the above improves discretization 419 to enable overlay of the spectra in the spectrum of the LP residual signal. This allows you to generate the signal, extended to the entire range.

In some embodiments, the present invention improves the discretization applied to the residual signal LP may be performed using a ratio equal to two. In other words, multiplying discretize�Oia, applied to the residual signal LP may be performed in the range from 8 kHz to 16 kHz by inserting zero samples between each of the sample values.

Surgery increases the sample rate applied to the excitation signal filter Bank shown in Fig.8 in the form of step 815 is processed.

The residual signal LP after increasing the sampling rate may, in some embodiments of the present invention to form the excitation signal with the increased sampling for Bank 421 filters.

As indicated above, the Bank 421 filters may, in some embodiments, to have frequency characteristics similar to those that are used to determine the energy levels of the subbands on the basis of data from the CPU 409 neural networks. In other words, the Bank 421 filters can in such embodiments be formed as a plurality of overlapping sub-bands corresponding to the same received psychoacoustic way Mel-scale, and the scale used to determine the energy levels for the spectrum of artificially generated signal 431 top of the range.

Thus, you should take into account that the distribution of sub-bands 421 in the Bank of filters may, in some embodiments of the present invention is approximately equal to the critical ranges of CL�dashed.

In some embodiments of the present invention, each sub-band filter Bank can be individually implemented using a filter with linear phase impulse response (FIR frequency impulse response).

In some embodiments of the present invention a Bank of filters 421 may include four sub-bands, each of which is implemented as a FIR filter with 128 taps.

Each sub-band signal may in some embodiments be formed by filtering an excitation signal using a FIR filter.

Fig.7 shows the distribution of sub-bands 421 in the Bank of filters in accordance with the first group of embodiments of the present invention.

The operation of generating signals of a plurality of subbands by feeding the excitation signal to the input of the Bank filters 421 shown in Fig.8 in step 817 processing.

Signals of frequency bands with output Bank 421 filters can then be supplied to the input of the processor 415 weighing range and summation.

Then, the controller 415 weighing range and summation in some embodiments, the implementation can individually weigh each subband signal using the corresponding gain.

As indicated above, the gain of paddy�of Patnow can be determined for each subband by the inverter 413 of the energy in the gain. The gains of the sub-bands can be transmitted from the inverter 413 of the energy in the gain via the auxiliary input to the processor 415 weighing range and summation.

After individual weighing of each subband signal using the corresponding gain of the sub-band, the weighted signals of sub-bands may in some embodiments be combined to form an artificially generated signal 431 top of the range.

The weighing operation of each sub-band signal using the corresponding weighting factor is shown in Fig.8 in step 823 processing.

In some embodiments of the present invention, the gains of the sub-bands may gradually change from frame to frame in each sub-band. In other words, the gain of the specific subband can be calculated by interpolation between the gain of the subband in the current and subsequent frames.

Interpolation gain of the sub-bands within the frame sequence may in some embodiments of the present invention be performed using sinusoidal changes.

Should take into account that in some embodiments �of sushestvennee sampling rate artificially generated signal 431 top of the range is associated with an equivalent bandwidth extended audio signal 435 at the Nyquist rate.

For example, if you determine that an artificially generated signal 431 of the upper band is the frequency band of the Nyquist equivalent to the frequency band of the input audio signal 401 according to Nyquist, the sampling frequency is artificially generated signal 431 top of the range could be doubled compared to the sampling frequency of the audio input signal 401. In other words, the sampling rate is artificially generated signal 431 upper range may be twice the sampling frequency of the audio input signal 401 to accommodate the additional frequency components generated in the process of artificial extension of the bandwidth.

In addition, you must take into account that, in General, the sampling frequency of the audio signal 435 with artificially enhanced bandwidth in some embodiments, also may coincide with sampling rates artificially generated signal 431 top of the range.

In some embodiments, the frequency band of the input frame of the audio signal 404 according to the Nyquist may be 4 kHz. In such embodiments, the process of artificial extension of the bandwidth is created artificially generated signal upper range, which occupies the frequency range from 4 kHz to 8 �Hz with a sampling rate of 16 kHz.

Artificially generated signal 431 top of the range, in some embodiments, is then piped to the input of the adder 427, in which the signal 431 is combined with the input audio signal 433 is applied to improve sampling, the formation of the signal 435 with extended bandwidth.

It should be borne in mind that in some embodiments of the present invention, the sampling frequency of the audio input signal 433 may coincide with sampling rates artificially generated signal 431 top of the range.

To facilitate the process of increasing the sampling rate of the audio signal input audio signal 401 may, in some embodiments, the present invention also input additional increase of discretization 423. Further increasing discretization 423 may, in such embodiments, to perform upsampling of the input audio signal 401 using the ratio that matches the ratio of increase of discretization 419 used in the path of the residual signal.

Should take into account that additional increases discretization 423 may be implemented by inserting zeros between each sample of the input audio signal 401 with subsequent low-pass filtering cut�tiroideo signal to remove unwanted imaginary components.

In some embodiments, additional boosting discretization 423 can perform upsampling the audio input signal 401 with a ratio equal to two. In these embodiments, the sampling frequency of the audio input signal 401 may be increased from 8 kHz to 16 kHz.

Surgery increases the sample rate of the input audio signal 401 is performed so that the sampling rate was the same as for artificially generated signal 431 top of the range, shown in Fig.8 in step 819 processing.

The yield increase of discretization 423 may in some embodiments be connected to the input device 425 signal delay. Device 425 signal delay may in such embodiments be configured to time delay the sampling of the input audio signal applied to improve sampling.

In some embodiments, the device 425 signal delay may perform a delay of the audio input signal 401 is applied to improve sampling, so that it's lined with artificially generated signal 431 top of the range.

The operation delay of the input audio signal applied to improve sampling, shown in f�G. 8 in the form of step 821 processing.

In such embodiments of the present invention, a delayed input audio signal applied to improve sampling, generates an input signal 433 to adder 427, in which the input audio signal is combined with an artificially generated signal 431 upper range for signal 435 with extended frequency band, as described above.

Operation signal 435 with extended frequency band shown in Fig.8 in the form of step 825 is processed.

Signals 435 with extended frequency band can then be fed to the output device 306 303 artificial extension of the bandwidth.

Thus, in General, in at least one embodiment of the present invention, the method includes: generating the excitation signal based on the sound signal, the sound signal contains many frequency components; the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components, which Parking places�RATM additional set of frequency components; and generating subband signal by filtering the excitation signal through a Bank of filters and weighting of the filtered excitation signal using at least one shape parameter of the spectrum.

Although in the above examples describe embodiments of the present invention employed in the composition of the codec of the electronic device or unit 10, it is necessary to take into account that the invention, as described below, may be implemented as part of any of the decoding process of the audio signal. For example, embodiments of the present invention can be implemented in the decoder of the audio signal, which can perform decoding of audio signals transmitted over fixed or wired communication lines.

Consequently, the user equipment may include a device for expanding the frequency band, similar to that described in the above embodiments of the present invention.

It should be noted that the term user equipment covers the user of the wireless communication equipment of any suitable type, such as mobile telephones, portable data processing device or portable web browsers.

In addition, elements of the terrestrial network public mobile (PLMN, publc land mobile network) may also contain the above codecs audio signal.

In General, various embodiments of the present invention may be implemented in hardware or special-purpose circuits, software, logic or any combination of these funds. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited to the means listed. Although various aspects of the present invention may be illustrated and described as block diagrams, algorithms, or using some other graphical representations, quite clearly, what is described here blocks, devices, systems, methods or techniques can be implemented (not limited to these examples) in the form of hardware, software, firmware, application specific circuits, or logic circuits, universal hardware or controller or other computing devices, or some combination of these funds.

Embodiments of this invention can be implemented by computer software executable by the processor Moby�professional device, for example a processor unit, or through hardware, or through a combination of software and hardware. Furthermore, in this regard it should be noted that various shown in the drawings, the logic blocks may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.

Thus, briefly stated, in at least one embodiment of the present invention, the device is configured to perform the following operations: generation of the excitation signal based on the sound signal, the sound signal contains many frequency components; the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain; determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band, containing frequency components that belong to an additional set of frequency components; and generating the sub-band signal by filtering the excitation signal through a Bank fil�ditch and weighing the filtered excitation signal using at least one shape parameter of the spectrum.

Memory blocks can be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology and consist of devices such as semiconductor memory devices and systems, magnetic memory devices and systems, optical memory, read only memory and removable memory. Data processors may be of any type suitable to the local technical environment, and may, for example, contain one or more mainframe computers, specialized computers, microprocessors, digital signal processors (DSP, digital signal processor) and processors based on a multicore architecture, as well as other similar devices.

Embodiments of the present invention can be made in the form of various components, such as modules of the integrated circuits. Overall, the design of integrated circuits is a highly automated process. There are comprehensive and effective software for converting a logic level design into a semiconductor circuit, ready for etching and forming a semiconductor basis.

Programs produced by, for example, companies Synopsys, Inc., Mountain view, California and Cadence Design, of San Jose, California, automatic�and bred conductors and components are placed on a semiconductor chip using well established rules of design, and libraries, which store pre-recorded constructive modules. Upon completion of the development of semiconductor circuits resulting design in a standardized electronic format (e.g., Opus, GDSII, etc.) can be transferred to the means of production of a semiconductor device or a production unit for manufacturing.

The description by using typical examples, not limiting embodiment of the invention, provides a full and informative description of embodiments of the present invention. However, a specialist in the relevant field of technology within the foregoing description in conjunction with the accompanying drawings and claims, can offer various modifications and adaptation. However, any kinds of such and similar modifications of the ideas remain within the scope of the present invention defined in the attached claims.

1. Means for expanding the frequency band of the audio signal, including:
the generation of the excitation signal from the sound signal, the sound signal has a frequency band and contains many frequency components;
the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and according to møn�Shea least one symptom of a component in a time domain;
determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components that belong to many additional frequency components that extend the frequency band of the audio signal; and
generating subband signal by filtering the excitation signal through a Bank of filters and weighting the filtered excitation signal by use of said at least one shape parameter of the spectrum,
in this case, the shape parameter of the spectrum is the value of the energy sub-band, and this value is reduced, if the power of the audio signal reaches the estimated values of the noise level in the audio signal.

2. A method according to claim 1, characterized in that the generation of the excitation signal includes:
generating a residual signal by filtering the audio signal using the inverse filter with linear prediction;
filtering the residual signal using the supply end filter section containing the filter autoregressive moving-average filter-based linear prediction; and
the generation of the excitation signal by multiplying the sampling and spectral coagulation of the output signal of scleractinia.

3. A method according to claim 2, characterized in that the supply end filter section also contains a filter spectral tilt and harmonic filter.

4. A method according to claim 1 or 2, characterized in that the frequency components of the signal sub-band are allocated according to psychoacoustic scale, contain many overlapping bands, and the frequency characteristics of the filter Bank correspond to the distribution of frequency components of the signal sub-band.

5. A method according to claim 4, characterized in that overlapping ranges are distributed according to the Mel-scale, with the sub-band signal is masked by using at least one of the following:
triangular mask functions and
keystone masking functions.

6. A method according to claim 1 or 2, characterized in that determining at least one shape parameter of the spectrum-based feature vector includes:
using a neural network to determine at least one form of the spectrum on the basis of the feature vector, the feature vector extracted from the audio signal, generates an input target vector for the neural network and the neural network is trained to provide the shape parameter of the spectrum sub-band for an input target vector.

7. A method according to claim 1, characterized in that the shape parameter of the spectrum represents the gain �of diapason, based on the value of the energy sub-band.

8. A method according to claim 1 or 2, characterized in that at least one symptom of the component in the frequency domain includes at least one of the following:
the group consisting of a plurality of energy levels of the audio signal, wherein each of the plurality of energy levels of the audio signal corresponds to the overlapping energy range of the audio signal;
value representing the centroid of the frequency spectrum of the audio signal; and
a value representing the degree of uniformity of the frequency spectrum.

9. A method according to claim 1 or 2, characterized in that at least one symptom of the component in the time domain includes at least one of the following:
the index gradient, based on the sum of the gradients at the points of the audio signal, which varies in the direction of the waveform of the audio signal;
the ratio of the energy of a frame of the audio signal to the energy of the previous frame of the audio signal; and
the classification of the sound signal as active or inactive speech activity detector.

10. A method according to claim 1 or 2, which also includes the merging of the sub-band signal with the sound signal to produce the sound signal with extended bandwidth.

11. A device for expanding the frequency band of the audio signal containing at measures� one processor and at least one memory, in which is stored computer code, wherein at least one memory and the computer code configured to interact with at least one processor device is performed at least the following operations:
the generation of the excitation signal based on the sound signal, the sound signal has a frequency band and contains many frequency components;
the selection of the feature vector from the audio signal, wherein the feature vector comprises at least one attribute of a component in a frequency domain and at least one attribute of a component in a time domain;
determining at least one shape parameter of the spectrum-based feature vector, wherein at least one shape parameter of the spectrum corresponds to the signal sub-band containing the frequency components that belong to many additional frequency components that extend the frequency band of the audio signal; and
generating subband signal by filtering the excitation signal through a Bank of filters and weighting of the filtered excitation signal using at least one shape parameter of the spectrum,
in this case, the shape parameter of the spectrum is the value of the energy sub-band, and this value is reduced, if the power of z�ecologo signal reaches the estimated values of the noise level in the audio signal.

12. The device according to claim 11, characterized in that the at least one memory and the computer code configured to interact with at least one processor device is performed at least the generation of the excitation signal, and also configured to perform the following operations:
generating a residual signal by filtering the audio signal using the inverse filter with linear prediction;
filtering the residual signal using the supply end filter section containing the filter autoregressive moving-average filter-based linear prediction; and
the generation of the excitation signal by multiplying the sampling and spectral coagulation of the output signal of the supply end filter section.

13. The device according to claim 12, characterized in that the supply end filter section also contains a filter spectral tilt and harmonic filter.

14. Device according to any one of claims. 11-13, characterized in that the frequency components of the signal sub-band are allocated according to psychoacoustic scale, broken into a set of overlapping ranges, and the frequency characteristics of the filter Bank correspond to the distribution of frequency components of the signal sub-band.

15. The device according to claim 14, characterized in that the overlapping ranges raspredeljaetsjana Mel-scale, in this case, the sub-band signal is masked using a triangular function of masking and/or trapezoidal function masking.

16. Device according to any one of claims. 11-13, characterized in that the at least one memory and the computer code configured to interact with at least one processor device is performed by at least determining at least one shape parameter of the spectrum on the basis of the feature vector is also configured to perform the following:
using a neural network to determine at least one form of the spectrum on the basis of the feature vector, the feature vector extracted from the audio signal, generates an input target vector for the neural network and the neural network is trained to provide the shape parameter of the spectrum sub-band for an input target vector.

17. The device according to claim 13, characterized in that the shape parameter of the spectrum represents the gain of the sub-band based on the value of the energy sub-band.

18. Device according to any one of claims. 11-13, characterized in that at least one symptom of the component in the frequency domain in the feature vector includes at least one of the following:
the group consisting of a plurality of energy levels of the audio signal, wherein each�first of the plurality of energy levels of the audio signal corresponds to the overlapping energy range of the audio signal;
value representing the centroid of the spectrum in the frequency domain of the audio signal; and
a value representing the degree of uniformity of the spectrum in the frequency domain.

19. Device according to any one of claims. 11-13, characterized in that at least one symptom of the component in the time domain in the feature vector includes at least one of the following:
the index gradient, based on the sum of the gradients at the points of the audio signal, which varies in the direction of the waveform of the audio signal;
the ratio of the energy of a frame of the audio signal to the energy of the previous frame of the audio signal; and
the classification of the sound signal as active or inactive speech activity detector.

20. Device according to any one of claims. 11-13, characterized in that the at least one memory and the computer code is also configured to combine the sub-band signal with the sound signal to produce the sound signal with extended bandwidth.

21. The machine-readable medium on which is stored program code, wherein said code executable by a processor, implements a method according to any one of claims. 1-10.



 

Same patents:

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of synchronising wireless headphones. In one embodiment, one speaker operates as the main speaker and other speaker operates as a secondary speaker. The main speaker receives digital audio data from a source and, in addition to reproducing digital audio received from the source, relays the digital audio to the secondary speaker. The main speaker additionally sends synchronisation data to the secondary speaker, such as data indicating the buffer status or the reproduction position of the main speaker. The secondary speaker uses the synchronisation data from the main speaker to control, for example, the status of its buffer or reproduction position so that the two speakers reproduce audio synchronously (for example, within 30 ms). In one embodiment, the main speaker uses a connection-based protocol, such as TCP/IP, in order to transmit buffered audio data to the secondary speaker, and uses a connectionless protocol, such as UDP or ICMP, for synchronisation data. Furthermore, the speakers can change the main and secondary speaker roles.

EFFECT: reducing Haas effect.

36 cl, 6 dwg

FIELD: radio engineering, communication.

SUBSTANCE: method of picking up speech signal in presence of interference, which comprises converting an input mixture of an acoustic signal and interference into an electrical signal, filtering with a band-pass filter to obtain a mixture of a speech signal and interference with a given bandwidth, which is amplified in a low frequency amplifier; an analogue-to-digital converter (ADC) generates readings of the mixture of the signal and interference in digital form and transmits said readings to a computing device, which forms pairs of sums of amplitudes of the readings in a certain manner and calculates signal amplitudes for each moment in time using the obtained summation results by solving corresponding systems of linear equations.

EFFECT: high efficiency of picking up a speech signal in the presence of interference.

2 dwg, 1 tbl

FIELD: physics, video.

SUBSTANCE: invention relates to a method of sounding video broadcasts. A source media stream is transmitted from a video source to commentator devices. Unique media streams are formed, which are then mixed with the source media stream with a calculated delay. Said streams are then transmitted through a broadcaster to viewers in the form of separate channels which can be switched with each other.

EFFECT: forming separate channels with the same video track and different audio tracks from different commentators while enabling automatic or manual switching between channels, wherein the commentaries can be in different languages.

2 cl, 1 dwg

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to mobile computing devices. The result is achieved by receiving an indication to a touch anywhere on a touch screen interface of a mobile computing device; upon reception of an indication to a touch anywhere on the touch screen interface, activating a listening mechanism of a speech recognition unit and displaying dynamic visual feedback of a measured sound level of a spoken utterance received by the speech recognition unit, wherein the displayed visual feedback is rendered as centred around an area on the touch screen at which a touch is received.

EFFECT: providing a maximum size of a target area on a screen to initiate listening of a speech recognition unit.

15 cl, 7 dwg

Electronic computer // 2523220

FIELD: information technology.

SUBSTANCE: electronic computer has random access memory, the output of which is connected to an arithmetic logic unit, as well as rows of photocells which respond to red light and are connected through switches to the random access memory. The output of the arithmetic logic unit is connected through switches to thirty comparison units. Outputs of the thirty comparison units are connected to control electrodes of thirty switches, respectively. A pulse generator is connected to inputs of the thirty switches, outputs of which are connected to inputs of the thirty switches, respectively. Outputs of the thirty switches are connected to the random access memory of a bit-map display.

EFFECT: computer speech recognition using lip reading.

6 dwg

FIELD: radio engineering, communication.

SUBSTANCE: device for embedding digital information into an audio signal includes, connected to its first input, a first input of a subtractor and a band-pass filter, the output of which is connected to the first input of an adder and the second input of the subtractor, the output of which is connected, through series-connected residual signal analyser and replacement signal former, to the second input of the adder, the output of which is connected to the first input of an additional adder and a signal divider, the output of which is connected, through series-connected amplitude spectrum computer, global masking threshold computer and subsidiary signal former, to the second input of the additional adder, wherein additional inputs of the replacement signal former and the subsidiary signal former are connected to the second input of the embedding device, and the output of the additional adder is connected to the output of the embedding device.

EFFECT: minimising duration of transmission with minimum possible coding redundancy of a separate information message.

6 dwg

FIELD: radio engineering, communication.

SUBSTANCE: method of transmitting speech signals includes breaking down the speech signal into equal fragments, encoding said fragments, sending to a communication channel, decoding and constructing the speech signal from the fragments, the method being characterised by that at the transmitting side, two speech signals are selected instead of one speech signal; each is broken down into equal fragments; all even fragments are deleted from the first speech signal and all odd fragments are deleted from the second speech signal; the remaining fragments are encoded and sent to a communication channel; at the receiving side, for the first speech signal, even fragments deleted at the transmitting side are replaced with previously obtained odd fragments from the first speech signal, and for the second speech signal, odd fragments deleted at the transmitting side are replaced with previously obtained even fragments from the second speech signal.

EFFECT: transmitting speech signals in a narrowband channel, transmitting two speech signals with acceptable quality over one available channel.

2 cl, 10 dwg

FIELD: physics.

SUBSTANCE: user system for creating an atmosphere, such as a lighting system, can automatically create a certain atmosphere by simply using a keyword which is entered at the input of the system. A keyword, for example "eat", "read", "relax", "sunny", "cool", "party", "Christmas", "beach", may be spoken or typed by the user and may enable the user to find and explore numerous atmospheres in an interactive and playful way. Finding atmosphere elements related to the keyword may be done in various ways according to versions of the invention. The invention also allows a non-expert in designing or creating atmosphere scenes to control the creation of a desired atmosphere in an atmosphere creation system.

EFFECT: broader functional capabilities of creating a lighting atmosphere, easier control of said atmosphere.

14 cl, 6 dwg

FIELD: physics.

SUBSTANCE: method comprises configuring an audio encoder to operate in different operating modes such that if the active operating mode is a first operating mode which depends on a set of available frame coding modes does not overlap with a first subset of time-domain coding modes, and overlaps with a second subset of frequency-domain coding modes, whereas if the active operating mode is a second operating mode which depends on a set of available frame coding modes overlaps with both subsets, i.e. the subset of time-domain coding modes as well as the subset of frequency-domain coding modes.

EFFECT: reduced delay and high efficiency of encoding from the view point of the ratio of speed and distortion.

19 cl, 6 dwg

FIELD: physics, acoustics.

SUBSTANCE: group of inventions relates to means of analysing time variations of audio signals. Disclosed is an apparatus for obtaining a parameter describing variation of a signal characteristic of a signal based on actual transform-domain parameters describing an audio signal in transform-domain which includes a parameter determiner. The parameter determiner is configured to determine one or more model parameters of a transform-domain variation model describing evolution of the transform-domain parameters depending on one or more model parameters representing a signal characteristic, such that a model error, representing deviation between a modelled temporal evolution of the transform-domain parameters and evolution of the actual transform-domain parameters, is brought below a predetermined threshold value or minimised.

EFFECT: designing highly reliable means for obtaining a parameter describing time variation of a signal characteristic.

27 cl, 9 dwg

FIELD: physics, video.

SUBSTANCE: invention relates to means of processing multi-channel audio or video signals using a variable prediction direction. Two audio or video channels are combined to obtain a first combination signal as a mid signal and a residual signal which can be obtained using a predicted side signal obtained from the mid signal. The first combination signal and the residual prediction signal are encoded and written into a data stream together with the prediction information obtained by an optimiser based on an optimisation target and a prediction direction indicator indicating a prediction direction associated with the residual signal. A decoder uses the prediction residual signal, the first combination signal, the prediction direction indicator and the prediction information to obtain a decoded first channel signal and a decoded second channel signal. In an encoder example or in a decoder example, a real-to-imaginary transform can be applied for estimating the imaginary part of the spectrum of the first combination signal.

EFFECT: high audio or video quality.

19 cl, 31 dwg, 2 tbl

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to means of stereo encoding and decoding using complex prediction in the frequency domain. A decoding method for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels, comprises the upmixing steps of: (i) computing a second frequency-domain representation of a first input channel; and (ii) computing an output channel based on the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient.

EFFECT: high speed of encoding in the range of high bit transfer rates.

14 cl, 19 dwg, 1 tbl

FIELD: information technology.

SUBSTANCE: in a selective signal encoder, an input signal is first encoded (1004)using a core layer encoder to produce a core layer encoded signal. The core layer encoded signal is decoded (1006) to produce a reconstructed signal, and an error signal is generated (1008) as the difference between the reconstructed signal and the input signal. The reconstructed signal is compared (1010) with the input signal. One of two or more enhancement layer encoders is selected (1014, 1016) depending on the comparison and is used to encode the error signal. The core layer encoded signal, the enhancement layer encoded signal and a selection indicator are output (1018) to a channel (e.g., for transmission or storage).

EFFECT: high-quality speech and audio reproduction at acceptable low data rates.

18 cl, 10 dwg

FIELD: information technology.

SUBSTANCE: disclosed is an encoding device which can accurately specify a band having a large error among all bands by using a small calculation amount. The device includes: a first position identification unit (201) which uses a first layer error conversion coefficient indicating an error of decoding signal for an input signal so as to search for a band having a large error in a relatively wide bandwidth in all the bands of the input signal and generates first position information indicating the identified band; a second position identification unit (202) which searches for a target frequency band having a large error in a relatively narrow bandwidth in the band identified by the first position identification unit (201) and generates second position information indicating the identified target frequency band; and an encoding unit (203) which encodes a first layer decoding error conversion coefficient contained in the target frequency band. The first position information, the second position information, and the encoding unit are transmitted to a communication partner.

EFFECT: determining band with considerable coding error from all bands with low computational complexity.

8 cl, 45 dwg

FIELD: information technology.

SUBSTANCE: audio encoder (100) for encoding frames presented in form of audio signal samples to obtain encoded frames, wherein a frame consists of a plurality of time domain audio signals, including a predictive coding analysis stage (110) and determining information on coefficients of a synthesis filter and prediction domain frame information based on a frame of audio samples. The audio encoder (100) further includes a domain converter (120) for converting a frequency domain audio sample frame and obtaining a frame spectrum and an encoding domain computer (130) for making a decision on encoded data for a frame based on information on coefficients and information on a prediction domain frame, or based on the frame spectrum. The audio encoder (100) includes a controller (140) for determining information on a switching coefficient for cases when the encoding domain computer decides that encoded data of the current frame are based on information on coefficients and information on a prediction domain frame, and [for cases] when data of a previous frame were encoded based on the spectrum of the previous frame and redundancy reducing encoder (150) for encoding information on the prediction domain frame, information on coefficients, information on the switching coefficient and/or frame spectrum.

EFFECT: improved concept of audio encoding using encoding domain switching.

14 cl, 29 dwg

FIELD: information technology.

SUBSTANCE: when a frame immediately preceding a target encoding frame to be encoded by a first encoding unit operating according to a linear predictive coding scheme is encoded by a second encoding unit operating according to a coding scheme different from the linear predictive coding scheme, the target encoding frame can be encoded according to the linear predictive coding scheme by initialising the internal state of the first encoding unit. Consequently, encoding processing performed according to a plurality of coding schemes including the linear predictive coding scheme and a coding scheme different from the linear predictive coding scheme can be realised.

EFFECT: improved speech quality.

7 cl, 5 dwg

FIELD: information technology.

SUBSTANCE: when a frame immediately preceding a target encoding frame to be encoded by a first encoding unit operating according to a linear predictive coding scheme is encoded by a second encoding unit operating according to a coding scheme different from the linear predictive coding scheme, the target encoding frame can be encoded according to the linear predictive coding scheme by initialising the internal state of the first encoding unit. Consequently, encoding processing performed according to a plurality of coding schemes including the linear predictive coding scheme and a coding scheme different from the linear predictive coding scheme can be realised.

EFFECT: improved speech quality.

7 cl, 5 dwg

FIELD: information technologies.

SUBSTANCE: method is proposed to compensate for losses of sound signal frames in the MDCT area, including: a step a, during which, when the current lost frame is the P frame, a set of forecast frequencies is received, for each frequency in this set they use phases and amplitudes of multiple frames before the (P-1) frame in the area MDCT-MDST to forecast phase and amplitude of the P frame, and the forecast phase and amplitude are used for production of MDCT coefficients of the P frame, corresponding to each frequency; a step b, at which for frequencies outside the set of forecast frequencies the MDCT coefficients of multiple frames before the P frame are used for calculation of MDCT coefficient values of P frame on these frequencies; a step c, during which they perform inverse modified discrete cosine transformation (IMDCT) for MDCT coefficients of the P frame at all frequencies for production of the signal in the time area for the P frame. Also a compensator is proposed for losses of frames. The invention has advantages of no delay, low volume of calculations, low volume of memory space and simplicity of realisation.

EFFECT: increased efficiency of compensation for losses of sound signal frames.

24 cl, 8 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to systems for encoding audio signal sources. Provided is subband block-based harmonic transposition, where the time block of complex discrete values of subbands is processed by common phase modification. Superposition of multiple modified discrete values yields the resultant effect of limiting undesirable cross products, making it possible to use coarser frequency resolution and/or lesser degree of oversampling. In one embodiment, the invention further includes a window function suitable for use with cross product-enhanced, subband block-based HFR. A hardware embodiment may include an analysing filter unit (101), a control data-configurable subband processing module (102) and a synthesising filter unit (103).

EFFECT: efficient implementation of high-frequency reconstruction (HFR) through enhancement with cross products, where a new component with frequency QΩ+rΩ0 is generated based on existing components with frequencies Ω and Ω+Ω0.

63 cl, 9 dwg

FIELD: physics, computer engineering.

SUBSTANCE: present invention relates to signal processing means. An encoder sets an interval including 16 frames as interval section to be processed, outputs high-frequency band encoded data to obtain the high-frequency band component of an input signal and low-frequency band encoded data obtained by encoding the low-frequency band signal of the input signal for each section to be processed. In this case, for each frame, a coefficient used in estimation of the high-frequency band component is selected and the section to be processed is divided into continuous frame segments including continuous frames from which the coefficient with the same section to be processed is selected. In addition, high-frequency band encoded data are produced which include data including information indicating the length of each continuous frame segment, information indicating the number of continuous frame segments included in the section to be processed and a coefficient index indicating the coefficient selected in each continuous frame segment.

EFFECT: improved sound quality with frequency band expansion.

23 cl, 51 dwg

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to signal processing means. The system receives an encoded low-frequency band signal and encoded energy information used for frequency shift of the encoded low-frequency band signal. The low-frequency band signal is decoded and energy suppression of the decoded signal is smoothed. The smoothed low-frequency band signal is frequency shifted to generate a high-frequency band signal. The low-frequency band signal and the high-frequency band signal are then merged and output.

EFFECT: high quality of the decoded signal.

20 cl, 14 dwg

Up!