RussianPatents.com

Method and device for encoding an audio signal with usage of harmonics extraction

Method and device for encoding an audio signal with usage of harmonics extraction
IPC classes for russian patent Method and device for encoding an audio signal with usage of harmonics extraction (RU 2289858):

G10L19 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, e.g. for compression or expansion, source-filter models or; psychoacoustic analysis
Another patents in same IPC classes:
Method for simulating auditory patient perception of acoustic signal after cochlear implantation Method for simulating auditory patient perception of acoustic signal after cochlear implantation / 2277375
Method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.
Multi-mode encoding device Multi-mode encoding device / 2262748
Speech compression system provides encoding of speech signal into bits flow for later decoding for generation of synthesized speech, which contains full speed codec, half speed codec, one quarter speed codec and one eighth speed codec, which are selectively activated on basis of speed selection. Also, codecs of full and half speed are selectively activated on basis of type classification. Each codec is activated selectively for encoding and decoding speech signal for various speeds of transfer in bits, to accent different aspects of speech signal to increase total quality of synthesized speech signal.
Method and device for reproducing speech signals and method for transferring said signals Method and device for reproducing speech signals and method for transferring said signals / 2255380
During encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.
Improved spectrum transformation and convolution in sub-ranges spectrum Improved spectrum transformation and convolution in sub-ranges spectrum / 2251795
Method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.
Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band / 2249860
Method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.
Method for compaction and decompaction of speech messages Method for compaction and decompaction of speech messages / 2244963
Method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.
Method for compaction and decompaction of speech messages Method for compaction and decompaction of speech messages / 2244963
Method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.
Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band / 2249860
Method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.
Improved spectrum transformation and convolution in sub-ranges spectrum Improved spectrum transformation and convolution in sub-ranges spectrum / 2251795
Method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.
Method and device for reproducing speech signals and method for transferring said signals Method and device for reproducing speech signals and method for transferring said signals / 2255380
During encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.
Multi-mode encoding device Multi-mode encoding device / 2262748
Speech compression system provides encoding of speech signal into bits flow for later decoding for generation of synthesized speech, which contains full speed codec, half speed codec, one quarter speed codec and one eighth speed codec, which are selectively activated on basis of speed selection. Also, codecs of full and half speed are selectively activated on basis of type classification. Each codec is activated selectively for encoding and decoding speech signal for various speeds of transfer in bits, to accent different aspects of speech signal to increase total quality of synthesized speech signal.
Method for simulating auditory patient perception of acoustic signal after cochlear implantation Method for simulating auditory patient perception of acoustic signal after cochlear implantation / 2277375
Method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.
Method and device for encoding an audio signal with usage of harmonics extraction Method and device for encoding an audio signal with usage of harmonics extraction / 2289858
In accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.
Method and device for transmission of speech activity in distribution system of voice recognition Method and device for transmission of speech activity in distribution system of voice recognition / 2291499
Distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.
Method for analysis and synthesis of speech Method for analysis and synthesis of speech / 2296377
Method includes: analog-digital conversion of speech signal; segmentation of transformed signal onto elementary speech fragments; determining of vocalization of each fragment; determining, for each vocalized elementary speech segment, of main tone frequency and spectrum parameters; analysis and changing of spectrum parameters; and synthesis of speech sequence. Technical result is achieved because before synthesis, in vocalized segments periods of main tone of each such segment are adapted to zero starting phase by means of transferring digitization start moment in each period of main tone beyond the point of intersection of contouring line with zero amplitude, distortions appearing at joining lines of main tone periods are smoothed out and, during transformation of additional count in the end of modified period of main tone, re-digitization of such period is performed while preserving its original length.
Method for reverse filtration, method for synthesizing filtration, device for reverse filtration, device for synthesizing filtration and tools containing such devices Method for reverse filtration, method for synthesizing filtration, device for reverse filtration, device for synthesizing filtration and tools containing such devices / 2297049
In accordance to invention, filtration of input signal is performed for generation of first filtered signal; first filtered signal is combined with aforementioned input signal for production of difference signal, while stage of filtering of input signal for producing first filtered signal contains: stage of production of at least one delayed, amplified and filtered signal, and production stage contains: storage of signal, related to aforementioned input signal in a buffer; extraction of delayed signal from buffer, filtration of signal for forming at least one second filtered signal, while filtration is stable and causative; amplification of at least one signal by amplification coefficient, while method also contains production of aforementioned first filtered signal, basing on at least one aforementioned delayed, amplified and filtered signal.

FIELD: method and device for efficiency compression of audio signal to acoustic signal of level III of MPEG-1 standard with low information transfer speed.

SUBSTANCE: in accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.

EFFECT: provision of efficient compression of signal at low speed by compressing changing part of signal only by means of modified discontinuous cosine transformation.

5 cl, 11 dwg

 

The technical field

The present invention relates to a method of compressing an audio signal, and more particularly to a method and apparatus for efficient compression of the audio signal in audio signal level 3 MPEG-1 with a low data transmission rate, in bits.

Prior art

MPEG-1 (group of experts on the moving image 1) establishes the requirement for compression of digital video and compression of digital audio and is supported by the International organization for standardization (ISO). MPEG-1 audio is used to compress 16-srednego audio discretizing a sampling frequency of 44.1 kHz and recorded on 60 minute or 72-motom compact disc (CD), and is classified according to 3 levels in accordance with the method of the compression and complexity of the codec (coder-decoder).

Level III is the most complex, uses significantly more filters than level II, and applies Huffman coding. When encoding with velocity 112 kbps can listen to the sound of excellent quality. When encoding with a speed of 128 kbit/s sound very close to the original sound. When encoding with a rate of 160 kbit/s and 192 kbit/s sound quality is such that the human ear can not distinguish it from the original sound. Usually audio layer 3 standard MPEG1 denote as audio MP3.

Audio MP3 is formed by discrete cosine transform (DCT) distribution of bits based on a psychoacoustic model 2, quantization, etc. More specifically, although the number of bits used to compress audio data supported a minimum, modified DCT (MDCP) is performed using the psychoacoustic model 2.

Compression methods audio a person's ear is the most important. The human ear cannot hear if the sound intensity is at a certain level or below. If someone speaks loudly in the office, you can easily recognize who is talking. However, if at this moment flying the plane, talking to cannot hear. Even after the plane has passed, the conversation is still impossible to hear because of the delayed sound. Accordingly, in the psychoacoustic model 2 selects data having a volume equal to or greater than the threshold masking level, among the data having a volume equal to or greater than the minimum limit of audibility, the corresponding relaxed atmosphere. Sampling is performed in each sub-band.

However, when the audio signal is compressed at low speed of data transmission in bits, which is less than 64 kbit/s, psychoacoustic model 2 is not suitable because the number of Bito is, used for quantization of the signal, the signal type of advanced echo limited. Therefore, in order to overcome this problem, caused by the slow audio MP3 low speed, the present invention provides a method for efficient processing of the audio signal at low speed by removing a harmonic component from the original signal using fast Fourier transform (FFT), adopted in psychoacoustic model 2, and compression only changing component using MGCP.

In the FFT process, adopted in conventional psychoacoustic model, is only the analysis of the signal and the result of the FFT is not used. Because of the compression of the signal is the result of the FFT is not used, it can be seen as an unnecessary waste of resources.

In the Korean publication patent No. 1995-022322 described the method of allocation of bits using a psychoacoustic model. However, the known method differs from the method according to the present invention increased compression efficiency by removing a harmonic component from the original signal using the FFT adopted in the psychoacoustic model.

In the Korean publication patent No. 1998-072457 described method and device for processing signals in a psychoacoustic model 2, in which the amount of computation considerably reduce the W by reducing congestion calculations when compressing audio. That is, a method of processing signals includes the step of obtaining an individual masking boundary values using the FFT, the phase selection General masking boundary value and the phase offset to the next frequency position. This method is similar with the present invention in relation to the use values of the FFT, but differs in that it uses a different method of quantization.

In U.S. patent No. 5930373 describes how to improve the quality of the audio signal using the residual harmonics of the low frequency signal. However, the known method and the method of quantization according to the present invention differ in the use of different methods of residual harmonics.

The invention

To solve the above and other problems, an aspect of the present invention is the provision of a method for efficient processing of the audio signal with a low speed by removing a harmonic component from the original signal using the fast Fourier transform (FFT)used in the psychoacoustic model 2, and compression only residual changing components using a modified discrete cosine transform (MDCP).

The above and other aspects of the present invention implement the I in the method of coding an audio signal, using harmonics. In this method, first received audio pulse code modulation (PCM), and from the received PCM audio data are extracted harmonic components using psychoacoustic model 2. Then it performs a modified discrete cosine transformation (MDCP) on the received PCM audio data from which removed the extracted harmonic components. Then subjected MGCP audio quanthouse, and of the quantized audio data and the extracted harmonic components of the package of audio signals.

The above and other aspects of the present invention is also implemented method of encoding an audio signal using harmonic components, in which the PCM audio data are first received and stored. Then the stored data is applied psychoacoustic model 2, based on the characteristics of earshot of the person to receive the result of the fast Fourier transform (FFT), information about perceptual energy on the received data and information on the distribution of bits used for quantization. After that, from the received PCM audio data are extracted harmonic components using the information of the FFT. Then the extracted harmonic components are encoded, and the code is rowanne harmonic components are decoded. Then you MGCP on a number of samples of the received PCM audio data from which removed the extracted harmonic components, which depends on the value of information about perceptual energy. Then subjected MGCP audio quanthouse by distributing bits in accordance with the information on the distribution of bits. Finally, the quantized subjected MGCP audio data and coded harmonics package of audio signals.

The above and other aspects of the present invention, moreover, are implemented in the device for encoding audio using harmonic components. In this device the module storing audio data PCM receives and stores the audio data PCM. The runtime psychoacoustic model 2 accepts PCM audio data from the storage module PCM audio data and performs a psychoacoustic model 2 for information of the FFT, information about perceptual energy on the received data and information on the distribution of bits used for quantization. The extraction module harmonics removes harmonic components from the received PCM audio data using the information of the FFT. The encryption module of the harmonics encodes the extracted harmonic components, giving coded harmonizes the e components. Module decode harmonics decodes encoded harmonics. Module MDCP performs MDCP on stored audio data PCM data from which removed the decoded harmonic components, in accordance with information about the perceptual energy. The quantization module quantum subjected MGCP audio data in accordance with information on the distribution of bits. Module of generation of the bitstream level III MPEG converts the quantized subjected MGCP audio data and coded harmonics, obtained from the encryption module of the harmonics in the audio package level III MPEG.

To implement the above and other aspects of the present invention provides a computer-readable recording medium on which is stored a computer program for performing the above methods.

Brief description of drawings

Figure 1 - the format of the audio level III, MPEG-1;

figure 2 - block diagram of an apparatus for forming audio level III, MPEG-1;

figure 3 - block diagram of the algorithm, illustrating the process of computing in psychoacoustic model;

4 is a block diagram of the device according to the present invention for forming a low audio level III, MPEG-1;

5 is a block diagram of an algorithm illustrating the retrieval of harmonics, coding harmonics and decoding of harmonics on the new psychoacoustic model 2;

figa, 6B, 6C and 6D sampling harmonic components extracted in stages to extract harmonic components with the use of the FFT in the psychoacoustic model 2;

7 is a table showing the limited frequency ranges, varying in accordance with the values of K; and

Fig is a block diagram of an algorithm illustrating the process according to the present invention for forming an audio stream by removing a harmonic component.

The preferred embodiment of the invention

According to figure 1, the audio level III standard (MPEG)-1 consists of blocks access audio (BDAS) 100. BDS 100 is a minimum unit that can be independently accessed, and which compresses and stores the data with a prescribed number of samples. BDS 100 includes a header 110, control bits cyclic redundancy code (CRC) 120, the audio data 130 and auxiliary data 140.

The header 110 stores synchroscope, the ID information, level information, information regarding whether there is a bit of protection, information measure of transmission speed in bits, information of frequency samples, information on whether there is a bit of filling, a bit of privacy, information mode, information expansion mode, information, copyright, information is s on whether the audio stream of the original or a copy of, and information characteristics of the predistortion.

CRC 120 is optional. The presence or absence of CRC 120 defined in the header 110, and the length of the CRC 120 is 16 bits.

Audio data 130 represent the area that contains compressed audio data.

Auxiliary data 140 represent data, which filled the remaining space, or the end of audio data 130 does not reach the end of BDAS. In the auxiliary data 140 can be put arbitrary data different from the audio MPEG.

Figure 2 is a block diagram of an apparatus for forming audio level III of MPEG-1. The input module 210 of the audio pulse code modulation (PCM) has a buffer to store the audio data PCM. The input module 210 of the audio PCM receives, as an audio PCM data blocks, each of which consists of 576 samples.

Module 220 perform psychoacoustic model 2 accepts PCM audio data from the input buffer module 210 of the audio PCM data and performs a psychoacoustic model 2. The module 230 discrete cosine transform (DCT) accepts PCM audio data in blocks with samples and performs a DCT operation simultaneously with the execution of the psychoacoustic model 2.

Module 240 of the modified DCT (MDCP) performs MDCP using the PR the changes psychoacoustic model 2 and the DCT, performed by the module 230 DCT. If perceptual energy is greater than a predetermined threshold value, MDK is performed using a short window. If perceptual energy is less than a predetermined threshold value, MDK is performed using a long window.

In perceptual coding, which is a method of compressing an audio signal, the reproduced signal differs from the original signal. That is, detailed information that people can't perceive, using the characteristics of the human ear, can be omitted. Perceptual energy refers to the energy that a person can perceive.

The module 250 performs quantization quantization using information about the distribution of bits obtained by applying the psychoacoustic model 2, and using the result of the operation MDCP. Module 260 forming the bitstream level III, MPEG-1 converts the quantized data in the data due to the introduction in the area of audio data bit stream MPEG-1, using Huffman coding.

Figure 3 is a block diagram of an algorithm illustrating the process of computing in the psychoacoustic model. First, at step 310 PCM audio data are taken in blocks, each of which consists of 576 samples. Then, at step 320 with ISOE what Itanium received PCM audio data are formed long window, each of which consists of 1024 samples, or a short window, each of which consists of 256 samples. There is one package consists of a set of samples.

Then, at step 330, performs a fast Fourier transform (FFT) on the Windows generated at step 320, one window at a time.

Then, at step 340 applies psychoacoustic model 2.

At step 350 are set to perceptual energy using psychoacoustic model 2 is applicable to the module MDCP and module MDCP selects the window that you want to use. Calculated value of the ratio of signal to masking (OSM) for each threshold bandwidth applied to the quantization module, for determining the number of bits to be distributed.

Finally, at step 360 is performed MDCP and quantization using the values of the perceptual energy and values OSM.

Figure 4 is a block diagram of a device for forming a low audio level III of MPEG-1 according to the present invention. The storage device 410 audio PCM has a buffer to store the audio data PCM. Module 420 perform psychoacoustic model 2 performs FFT on 1024 samples or 256 samples simultaneously and displays information about perceptual energy and information on the distribution of bits.

As described above with reference to figure 3, is then applied psychoacoustic model 2, displays information about perceptual energy and information on the distribution of bits which depends on OSM. Because the module 420 perform psychoacoustic model 2 performs FFT module 430 extract harmonics extracts the harmonic component from the FFT, as described below with reference to Fig.6.

Module 440 encoding harmonics encodes the extracted harmonic component, and transmits the encoded harmonic component in the module 480 of the formation of the bitstream level III of MPEG-1. Encoded harmonic component generates audio standard MPEG-1, together with the quantized audio data. The encoding process of the harmonic component are described in detail below.

Module 450 decoding harmonics decodes the encoded harmonic component to get the PCM data in the time domain. Module 460 MDCP subtracts the decoded harmonic component of the original input signal PCM and performs MDCP on the result of the subtraction. If the value of the information about perceptual energy taken from module 420 psychoacoustic model 2 is more than a predetermined threshold value, MDCP performed simultaneously on 18 samples. If the value of the information about perceptual energy taken from module 420 perform psychoacoustic model 2 is equal to the Lee less than a predetermined threshold value, MDCP simultaneously performed on 36 samples.

Removing the harmonic component is performed on the frequency domain data using conditions tonal/naturalnego decisions and characteristics of earshot, which is defined in the psychoacoustic model 2, described in detail below.

Module 470 quantization performs quantization using information on the distribution of bits received by the module 420 perform psychoacoustic model 2. Module 480 of the formation of the bitstream level III of MPEG-1 pattisue data of the harmonic components generated by module 440 encoding harmonics, and the quantized audio data received by the module 470 quantization, for receiving compressed audio data.

Figure 5 is a block diagram of the algorithm, illustrating the step 510 extract harmonics, step 520 encoding harmonics and phase 530 decoding harmonics based on the psychoacoustic model 2. The steps that are performed in the psychoacoustic model 2 figure 5, the same as the steps that are performed in the psychoacoustic model 2 figure 3. At step 510 extract the harmonic component uses the result of the FFT performed on the basis of the execution engine psychoacoustic model 2. At step 520 the extracted harmonic component is encoded in bytoyota MPEG-1. Step 510 extract harmonics are described in more detail below with reference to figa-6D.

Figa, 6B, 6C and 6D illustrate a sample extracted in stages, when the harmonic components are removed with the use of the FFT performed in psychoacoustic model 2. If you enter PCM audio data, as shown in figa, first FFT is performed on the received data, to determine the sound pressure for each data item. Selects one of the multiple received PCM audio data, the sound pressure was obtained. If the values of the PCM audio data from the left and right sides of the selected data is less than the selected value of the audio data PCM data is retrieved only the selected PCM audio data. This process applies to all the received PCM audio data.

Sound pressure is the value of the energy sampling in the frequency domain. In the present invention only the samples having a sound pressure exceeding a predetermined level, defined as the harmonic components. Accordingly, the extracted sample, shown in figv. Then extract sample having the sound pressure exceeding a predetermined level. For example, if the predetermined level is set equal to 7.0 dB, sample, having a sound pressure lower 7,0 dB, neweurasia, and only the sample shown in figs. Not all of the remaining samples are considered as harmonics, and of the remaining samples are extracted some sample according to table 7. Therefore, the final remains of the sample shown in fig.6D.

7 is a table showing a limited frequency range, which varies in accordance with the value of K, provided that K is a value that represents the location of sampling in the frequency domain, if the value of K is less than 3 or greater than 500, the sample values that are presented within the limited frequency range of 0 to be 0 and, accordingly, are not selected. Similarly, as shown in Fig.7. if the value K is equal to or greater than 3 and less than 63, the corresponding value of the range is set equal to 2. If the value of K equal to or greater than 63 and less than 127, the corresponding value range is set to 3. If the value K is equal to or greater than 127 and less than 255, the corresponding value range is set to 6. If the value K is equal to or greater than 255 and less than 500, the corresponding value of the range is set equal to 12.

Choose from 500 as the limit is determined by the limit of audible frequencies person based on the assumption that there is no difference in the quality of the reproduced sound between when the counted values of the samples, the corresponding frequency equal to or greater than 500, and when they are ignored.

Therefore, only the values of the samples presented on fig.6D, are extracted and defined as harmonic components.

Coding 520 harmonics includes the encoding of the amplitudes, coding frequencies and coding phases. These three ways of encoding using equations 1 and 2:

where AmpMax indicates the maximum amplitude, Enc_peak-AmpMax denotes the value of the result obtained by encoding the values AmpMax, and Amp denotes an amplitude that is different from the maximum amplitude.

When encoding amplitude, when the maximum amplitude is set as the value AmpMax, the maximum amplitude of the first encoded into 8-bit logarithmic scale to obtain Enc_peak_AmpMax, as shown in Equation (1), and the other amplitude Amp is encoded into 5-bit logarithmic scale to obtain Enc-Amp, as shown in Equation (2).

When coding frequencies are encoded only the samples corresponding to the values of K ranging from 58 (2498 Hz) to 372 (16 kHz), with regard to the hearing characteristics of the person. Since 314 obtained by subtracting 58 of 372, the sample is encoded using 9 bits.

The coding phase is carried out using 3 bits.

After such extraction of harmonics and coding is harmonic coded harmonic components are decoded, and then undergo MDCP.

Fig is a block diagram of an algorithm illustrating the process of forming the audio stream by removing harmonic components according to the present invention. First, at step 810 PCM audio data are received and remembered. Then at step 820 to the stored data is applied psychoacoustic model 2 using the characteristics of earshot of the person, to obtain information of the FFT, information about perceptual energy on the received data and information on the distribution of bits used for quantization. After that, at step 830 of the received PCM audio data are extracted harmonic components using the information of the FFT.

Harmonic components are extracted in the following process. First get the sound pressure for each of the set of received audio data PCM data using the information of the FFT. Then select one of the set of received audio data PCM data sound pressure are obtained. If the values of the audio data PCM left and right sides of the selected data is less than the value of the selected audio data PCM data is retrieved only the selected PCM audio data. This process applies to all accepted PCM audio data. Once the PCM audio data retrieved in the previous step, and blecause only PCM audio data, each of them have a sound pressure greater than a predetermined value of 7.0 dB. Finally, the harmonic components are extracted regardless of the selection of PCM audio data in a predefined frequency band of the audio data retrieved in the previous step.

After extracting the harmonics at step 830 to step 840 extracted harmonic components are encoded and displayed. Then, at step 850 coded harmonic components are decoded.

Then, at step 860, the received PCM audio data from which removed the decoded harmonic components are MDCP according to the information about perceptual energy. In this case, if the value of the perceptual energy is greater than a predetermined threshold value, is MDCP using short window, for example, while 18 samples. If the value of the perceptual energy is less than a predetermined threshold value, MDK is performed using a long window, for example, simultaneously on 36 samples.

Then, at step 870, the values of MDCP quanthouse through the allocation of bits in accordance with the information on the distribution of bits.

Finally, at step 880, the quantized audio data and coded harmonic components are subjected to Huffman coding is for a package of audio signals.

Embodiments of the present invention can be written as computer programs and can be implemented on a universal digital computers that execute the programs using a computer readable recording media. Examples of machine-readable recording media include a magnetic memory device such as ROM (permanent memory device), floppy disks, hard disks, etc), optical recording media (e.g. CD-ROM (write once CDs) or DVD (multi-purpose digital drives)and media data in the form of a carrier wave (e.g., transmission via Internet).

Although the present invention is primarily shown and described with reference to preferred options for its implementation, specialists in the art should understand that they may be various modifications in form and detail without deviating from the scope and essence of the present invention, as defined by the attached claims. Therefore, the disclosed embodiments of should not be seen as restrictive, but rather as illustrative. Scope of the present invention is defined not by the above description but by the claims, and all differences within the scope equivalent to the scope of the claims should be interpreted as including the present invention.

Industrial applicability

As described above, in the present invention the number of bits of quantization generated when forming the low audio level III of MPEG-1, reduced to a minimum. When using the results of the FFT used in the psychoacoustic model 2, the harmonic components are simply removed from the input audio signal, and compresses only the changing part using MGCP. Therefore, the input audio signal can be efficiently compressed at a low bit rate.

1. The method of encoding an audio signal using harmonic components, comprising: (a) receiving audio data b) removing harmonic components from the received audio data, (c) performing a transform on the received audio data without the extracted harmonic components and quantization subjected to conversion of the audio data, (d) forming a package of audio signals from chantavanich audio data and the extracted harmonic components.

2. The method according to claim 1, wherein removing the harmonic components of the received audio data is performed by using psychoacoustic model 2.

3. The method according to claim 1, in which the conversion on the received audio data without the extracted harmonic components is performed through a modified discrete cosine conversions is of (MDCP).

4. The method of encoding an audio signal using harmonic components, comprising: (a) receiving and saving audio pulse code modulation (PCM) and the use of a psychoacoustic model 2 based on the characteristics of earshot of the person to the stored data to obtain the fast Fourier transform (FFT), information about perceptual energy on the received data and information on the distribution of bits used for quantization, (b) removing harmonic components from the received PCM audio data using the information of the FFT, (c) encoding the extracted harmonic components, removal coded harmonic components and decoding coded harmonics, (d) performing MDCP on the samples of the received PCM audio data without dekodirovaniya extracted harmonic components, and the number of samples depends on the value of information about perceptual energy relative to a predefined threshold value, (e) quantization after performing MDCP received PCM audio data decoded without the extracted harmonic components through distribution of bits in accordance with the information on the distribution of bits, and (f) forming a package of audio signals from chantavanich after vypolneniem audio data decoded without the extracted harmonic components and of the derived encoded extracted harmonic components.

5. The method of encoding an audio signal according to claim 4, in which step (b) includes (b1) receiving the sound pressure for a variety of received PCM audio data using the information of the FFT (b2) selecting element from a set of audio data PCM data for which the received sound pressure, and removing the selected item of audio data PCM in that case, if the value of the PCM audio data from the right and left sides of the selected item of audio data PCM is less than the value of the selected item of audio data PCM data (b3) applying step (b2) for all received audio data PCM data (b4) extracting the audio data from the PCM you extracted in step (b2) or (b3), only the audio data PCM data sound pressure which is greater than a predetermined sound pressure, and (b5) removing the PCM audio data that exist within a predetermined frequency range, depending on the frequency location of the audio data PCM data extracted in step (b4).

6. The method of encoding an audio signal according to claim 5, in which a predetermined sound pressure step (b4) is 7.0 dB.

7. The method of encoding an audio signal according to claim 4, in which step (d), if the value of the information about perceptual energy is greater than a predetermined threshold value, then MDCP simultaneously performed on 18 samples, or if the value of the information the purpose of perceptual energy less than a predetermined threshold value, then MDCP simultaneously performed on 36 samples.

8. Device for encoding audio using harmonic components containing module audio data PCM data receiving and storing the audio data PCM data module perform psychoacoustic model 2, the receiving PCM audio data from the storage module PCM audio data and performing psychoacoustic model 2 for information of the FFT, information about perceptual energy on the received data and information on the distribution of bits used for quantization, the extraction module harmonics that extracts the harmonic components of the received PCM audio data using the information of the FFT, the encryption module of the harmonics that encodes the extracted harmonic components, and outputs the coded harmonics, the decoding module of the harmonics, decoding the encoded harmonic components, module MDCP performing MDCP on stored audio data PCM data decoded without the extracted harmonic components, in accordance with the aforementioned information about perceptual energy module quantization quantizing subjected MGCP audio data in accordance with the information of the bit allocation, and module of generation of the bitstream level III, MPEG, transforming quantian is e, subject MGCP audio data and coded harmonics, obtained from the encryption module of the harmonics in the audio package level III MPEG.

9. Device audio encoding of claim 8, in which the extraction module harmonics performs the extraction of harmonics through the following steps: receiving a sound pressure for a variety of received PCM audio data using the information of the FFT, the selection of an item from the set of audio data PCM data for which the received sound pressure, and removing the selected item of audio data PCM in that case, if the value of the PCM audio data from the right and left sides of the selected item of audio data PCM is less than the value of the selected item of audio data PCM data, said retrieving all the accepted audio data PCM and re-extracting from the extracted first audio data PCM only PCM audio data, sound pressure are greater than a predetermined sound pressure, and removing from the extracted second PCM audio data, and those PCM audio data that are within a predefined frequency range, depending on the frequency location.

10. Device audio encoding of claim 8, in which the module MDCP performs MDCP simultaneously on 18 samples, if the value of the information the AI on perceptual energy is greater than a predetermined threshold value, or does MDK simultaneously on 36 samples, if the value of the information about perceptual energy less than a predetermined threshold value.

11. The computer-readable recording medium for storing a computer program encoding audio using harmonic components, and referred to the program executed by the computer, is designed to implement the steps of the method according to claim 1.

12. The computer-readable recording medium for storing a computer program encoding of the audio signal at the use of harmonic components, and referred to the program executed by the computer, is designed to implement the steps of the method according to claim 4.

 

© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English.