Method and device for encoding and decoding object-oriented audio signals

FIELD: physics, acoustics.

SUBSTANCE: invention relates to encoding and decoding an audio signal in which audio samples for each object audio signal may be localised in any required position. In the method and device for encoding an audio signal and in the method and device for decoding an audio signal, audio signals may be encoded or decoded such that audio samples may be localised in any required position for each object audio signal. The method of decoding an audio signal includes extracting from the audio signal a downmix signal and object-oriented additional information; generating channel-oriented additional information based on the object-oriented additional information and control information for reproducing the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multichannel audio signal using the processed downmix signal and the channel-oriented additional information.

EFFECT: high accuracy of reproducing object audio signals.

7 cl, 20 dwg

 

The technical field to which the invention relates

The present invention relates to a method and apparatus of encoding an audio signal and method and apparatus of decoding an audio signal in which sound images for each object audio signal can be localized in any desired position.

The level of technology

According to the methods of encoding and decoding multi-channel audio a number of channel signals in a multi-channel signal in the overall mix down to a smaller number of channel signals, transmit additional information associated with the source channel signals, and restoring the multi-channel signal having the same number of channels as the original multichannel signal.

Methods of encoding and decoding object-based audio signal is essentially equivalent to the methods of encoding and decoding a multichannel audio signal against the lowering of mixing multiple sound sources into a smaller number of signals of the audio source and transmit additional information relating to the original sound sources. However, in the methods of encoding and decoding object-based audio object signals, which are the basic signals (for example, a musical instrument or human holo�) channel signal, interpreted as channel signals in the methods of encoding and decoding a multichannel audio signal, and thus the aforementioned signals can be encoded.

In other words, in the methods of encoding and decoding object-based audio signal of each object signal is considered to be the object that is to be encoded. In this sense, the methods of encoding and decoding object-based audio signals differ from the methods of encoding and decoding a multichannel audio signal, in which the operation of encoding a multichannel audio signal is performed simply on the basis of inter-channel information, regardless of the number of elements of the channel signal, which must be encoded.

Disclosure of the invention

The technical problem

According to the present invention proposes a method and apparatus of encoding an audio signal and method and device for decoding an audio signal in which audio signals can be encoded or decoded so that the sound images can be localized at any desired position for each object audio signal.

Technical solution

According to the aspect of the present invention, there is provided a method of decoding an audio signal, comprising stages, which extract the signal constantly think about lowering the�tion and object-oriented additional information from the audio signal; form a channel-oriented additional information based on the object-oriented additional information and control information for reproducing the signal down-mixing; process the signal down-mixing using a decorrelated channel signal; and form a multi-channel audio signal using the processed signal, the step-down mix and channel-oriented additional information.

According to the aspect of the present invention is provided an apparatus of decoding an audio signal includes a demultiplexer which extracts the audio signal down-mixing and object-oriented additional information; Converter parameters, which forms a channel-oriented additional information and control information for playback of the signal down-mixing; CPU step-down mixing, which modifies the signal down-mixing using decorrelated signal down-mixing, if the signal is down-mixing is a stereo signal down-mixing; and a multi-channel decoder which generates multi-channel audio signal using the modified signal down-mixing, semi�enny CPU step-down mixing and channel-oriented additional information.

According to another aspect of the present invention, there is provided a method of decoding an audio signal, comprising stages, which are extracted from the audio signal down-mixing and object-oriented additional information; form a channel-oriented additional information, and one or more processing parameters based on object-oriented additional information and control information for reproducing the signal down-mixing; forming a multi-channel audio signal using the signal of the step-down mix and channel-oriented additional information; and modifying the multi-channel signal using the processing parameters.

According to another aspect of the present invention is provided an apparatus of decoding an audio signal includes a demultiplexer which extracts the audio signal down-mixing and object-oriented additional information; Converter parameters, which forms a channel-oriented additional information, and one or more processing parameters based on object-oriented additional information and control information for reproducing signal p�lowers mixing; multi-channel decoder which generates multi-channel audio signal using the signal of the step-down mix and channel-oriented additional information; and a channel processor that modifies the multi-channel signal using the processing parameters.

According to another aspect of the present invention, there is provided a computer-readable recording medium on which is stored a method of decoding an audio signal, comprising stages, which are extracted from the audio signal down-mixing and object-oriented additional information; form a channel-oriented additional information based on the object-oriented additional information and control information for reproducing the signal down-mixing; process the signal down-mixing using a decorrelated channel signal; and form a multi-channel audio signal using the processed signal down-mixing, obtained by permutation, and channel-oriented additional information.

According to another aspect of the present invention, there is provided a computer-readable recording medium on which is stored a method of decoding an audio signal, comprising stages, which take izational signal down-mixing and object-oriented additional information; form a channel-oriented additional information, and one or more processing parameters based on object-oriented additional information and control information for reproducing the signal down-mixing; forming a multi-channel audio signal using the signal of the step-down mix and channel-oriented additional information; and modifying the multi-channel signal using the processing parameters.

Advantages

Is provided by the method and apparatus of encoding an audio signal and method and device for decoding an audio signal in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal.

Brief description of the drawings

The present invention will become more apparent from the following detailed description and accompanying drawings that are given merely for illustrative purposes and therefore should not be construed as limiting the present invention, in which:

Fig.1 is a block diagram of a conventional encoding/decoding object-based audio signal;

Fig.2 is a block diagram of the device for decoding an audio signal according to the first embodiment of the present invention;

<> Fig.3 is a block diagram of the device for decoding an audio signal according to a second embodiment of the present invention;

Fig.4 is a graph for explaining the influence of the difference of the amplitudes and the time difference, which are independent from each other, on the localization of sound images;

Fig.5 - schedule of specific functions according to the difference between the amplitudes and the difference of time required for the localization of sound images in a predetermined position;

Fig.6 illustrates the format of control data that includes information of harmonics;

Fig.7 is a block diagram of the device for decoding an audio signal according to a third embodiment of the present invention;

Fig.8 is a block diagram of the module art gain when down-mixing (ADG), which can be used in the module decoding the audio signal illustrated in Fig. 7;

Fig.9 is a block diagram of the device for decoding an audio signal according to the fourth embodiment of the present invention;

Fig.10 is a block diagram of the device for decoding an audio signal according to the fifth embodiment of the present invention;

Fig.11 is a block diagram of the device for decoding an audio signal according to the sixth embodiment of the present invention;

Fig.12 is a block diagram of the device for decoding an audio signal according to the seventh �the Ariant implementation of the present invention;

Fig.13 is a block diagram of the device for decoding an audio signal according to an eighth embodiment of the present invention;

Fig.14 is a diagram explaining the use of three-dimensional (3D) information to the frame decoding device in the audio signal illustrated in Fig. 13;

Fig.15 is a block diagram of the device for decoding an audio signal according to a ninth embodiment of the present invention;

Fig.16 is a block diagram of the device for decoding an audio signal according to a tenth embodiment of the present invention;

Fig.17-19 - diagrams for explaining a method of decoding an audio signal according to the embodiment of the present invention; and

Fig.20 is a block diagram of the device for encoding an audio signal according to the embodiment of the present invention.

The implementation of inventions

Hereinafter the present invention will be described in more detail with reference to the accompanying drawings, which show exemplary embodiments of the invention.

The method and apparatus of encoding an audio signal and method and device for decoding an audio signal according to the present invention can be applied to the operations of processing the object-oriented audio signal, but the present invention is not limited to this. In other words, the method and apparatus of encoding an audio signal of isposal device and decoding of an audio signal can be applied to different transactions signal processing, different from the operations of processing the object-oriented audio signal.

Fig.1 illustrates a block diagram of a conventional encoding/decoding object-based audio signal. Audio signals input to the encoding device of object-oriented audio, in General, not correspond to the channels of the multichannel signal, and are independent of the object signals. In this sense, the device coding of object-oriented audio signal differs from the device for encoding a multichannel audio signal into which enter the channel from the multichannel signal.

For example, channel signals, such as signals of the front left channel and the signal of the front right channel to 5.1 channel signals can be introduced in a multi-channel audio signal, whereas the object audio signals, such as the human voice or the sound of a musical instrument (for example, the sound of a violin or piano), which are smaller than the channel signals can be input to the encoding device of object-oriented audio signal.

As shown in Fig. 1, the system encoding/decoding object-based audio signal includes encoding device of object-oriented audio device and decoding object-based audio�of the pressure switches. The device coding of object-based audio signal includes an object encoder 100 and the decoder device of object-oriented audio signal includes an object decoder 111 and the block 113 playback.

Object encoder 100 receives N object audio signals and generates an object-oriented signal of the step-down mixing with one or more channels and additional information, including the number extracted from the N object signals of pieces of information, such as information to the energy difference, the information of the phase difference and the correlation value. Additional information and an object-oriented signal down-mixing are combined into a single bitstream, and the bitstream is transmitted in an object-oriented decoder device.

Additional information may include a flag indicating that it should be the encoding of channel-based audio signal, or to perform the encoding of object-oriented audio signal, and thus on the basis of the flag additional information may be determined whether to perform the encoding of channel-based audio signal, or encoding of object-oriented audio. Additional information may also include information envelope, INF�rmatio groups information period of silence and delay information relating to the object signals. Additional information may also include information of the difference of levels of objects, information of correlation between objects, information gain when down-mixing information the difference between the levels of the channels reducing mixing and information of the absolute energy of the object.

The object decoder 111 receives an object-oriented signal down-mixing and additional information from the device coding of object-based audio signal and restores the object signals having properties similar to the properties of the N object audio signals, on the basis of object-oriented signal down-mixing and additional information. The object signals generated by the object decoder 111, has not yet been appointed to any position in the multichannel space. Thus, the block 113 playback appoints each of the object signals generated by the object decoder 111, to a given position in the multichannel space, and determines the levels of the object signals so that the object signals can be reproduced from the proper respective positions indicated by block 113 playback, with appropriate appropriate levels, defined by block 113 play�extermination. Control information relating to each of the object signals generated by the object decoder 111 may vary in time, and thus the spatial position and levels of the object signals generated by the object decoder 111 may vary according to control information.

Fig. 2 is a block diagram of the device 120 decoding the audio signal according to the first embodiment of the present invention. As shown in Fig. 2, the device 120 decodes the audio signal includes an object decoder 121, block 123 playback and Converter 125 parameters. The device 120 decodes the audio signal also may include a demultiplexer (not shown) that extracts the signal down-mixing and additional information from the input into a stream of bits, and it applies to all devices for decoding the audio signal according to other variants of implementation of the present invention.

The object decoder 121 generates a number of object signals based on the signal down-mixing and the modified additional information, provided by the inverter 125 parameters. Block 123 playback appoints each of the object signals generated by the object decoder 121, to a predetermined position in a multi-channel space and ODA�makes the levels of object signals, generated by the object decoder 121 according to the control information. Converter 125 generates parameters modified for additional information by combining the additional information and control information. Then the transmitter 125 transmits parameters modified additional information in the object decoder 121.

The object decoder 121 may have the ability to perform adaptive decoding by analyzing control information in a modified additional information.

For example, if the control information indicates that the first object signal and second object signals assigned to one position in a multi-channel space and have the same level, the conventional decoding device of the audio signal to decode the first and second object signals separately and then to assemble them in a multi-channel space through the operation of mixing/playback.

On the other hand, the object decoder 121 of the device 120 decodes the audio learns from control information in the modified additional information that the first and second object signals assigned to one position in a multi-channel space and have the same level as if they were one source of the sound. Accordingly, the object decoder 121 �coderay first and second object signals by interpreting them as a single sound source without a separate decoding. As a result, the decoding complexity is reduced. In addition, due to the reduction in the number of sound sources that must be processed, the complexity of mixing/playback is also reduced.

The device 120 decodes the audio signal can be effectively used in situations where the number of object signals is greater than the number of output channels, since the set of object signals with high probability must be assigned to the same spatial position.

Alternatively, the device 120 decodes the audio signal can be used in a situation where the first object signal and second object signals assigned to the same position in the multichannel space, but have different levels. In this case, the device 120 decodes the audio signal decodes the first and second object signals by interpreting the first and second object signals as one signal, instead of decoding the first and second object signals separately and transfer the decoded first and second object signals in block 123 playback. More specifically, the object decoder 121 may obtain information relating to the difference between the levels of the first and second object signals, from the control information in the modified additional information, and to decode the first and W�Roy object signals based on the received information. As a result, even if the first and second object signals have different levels, the first and second object signals can be decoded as if they were a single sound source.

As another alternative, the object decoder 121 may adjust the levels of the object signals generated by the object decoder 121 according to the control information. Further, the object decoder 121 may decode the object signals, the levels of which are adjusted. Accordingly, the block 123 playback should not regulate the levels of the decoded object signals provided by the object decoder 121, and simply assembles the decoded object signals provided by the object decoder 121, in a multi-channel space. In short, since the object decoder 121 regulates the levels of the object signals generated by the object decoder 121 according to control information, block 123 playback can easily compose the object signals generated by the object decoder 121, in multi-channel space without the need to adjust the levels of the object signals generated by the object decoder 121. Therefore, it is possible to reduce the complexity of the mixing/playback.

According to the embodiment of the implementation of Fig. 2, the object decoder device 120 decodes the audio signal can�adaptive t to perform the decoding operation by analyzing the control information, thereby reducing the decoding complexity and the complexity of the mixing/playback. Can be used a combination of the above-described methods performed by the device 120 decodes the audio stream.

Fig. 3 is a block diagram of the device 130 decodes the audio signal according to a second embodiment of the present invention. As shown in Fig. 3, the device 130 decodes the audio signal includes an object decoder 131 and the block 133 playback. The device 130 decodes the audio signal is characterized in that the additional information in it passed not only in the object decoder 131, but also in block 133 playback.

The device 130 decodes the audio signal can efficiently perform an operation of decoding, even when the object signal corresponding to the period of silence. For example, the signals of the second to fourth objects can correspond to the period of music in which play musical instruments, and the signal of the first object might be a period of silence, during which the accompaniment is played. In this case, information indicating which of a plurality of object signals correspond to the period of silence may be included in supplementary information, and additional information may be transmitted in block 133 FOTS�of otvedeniya, and also in the object decoder 131.

The object decoder 131 can minimize the speed of decoding not only by the decoding object signals corresponding to the silence period. The object decoder 131 specifies the object signal corresponding to the value to 0, and transmits the object signal in block 133 playback. Object signals having the value 0, in General, are interpreted as object signals having a value different from 0, and thus undergo surgery mixing/playback.

On the other hand, the device 130 decodes the audio signal conveys additional information that includes information indicating which of a plurality of object signals correspond to the period of silence, in block 133 playback, and thereby prevents the processing object signals corresponding to the silence period, by an operation of mixing/playback, performed by the block 133 playback. Consequently, the device 130 decodes the audio signal may prevent unnecessary increase in the complexity of the mixing/playback.

Block 133 playback can use the information of the mixing parameters, which is included in the control information, in order to localize the sound image of each object signal in the stereo stage. �nformation of mixing parameters may include only the information of the amplitude or amplitude information and time information. Information of mix parameters not only affects the localization of sound of stereo images, but also on the psychoacoustic perception of spatial sound quality by the user.

For example, when comparing two sound images are formed using the temporary pan and method of amplitude panning, respectively, and are reproduced in one place using 2-channel stereo loudspeaker, it is found that the method of amplitude panning can facilitate precise localization of sound images, and that using the temporary pan you can create sounds with a strong sense of space. Thus, if the block 133 playback method uses only amplitude panning in order to link the object signals in multi-channel space, the block 133 of playback may be able to accurately localize each sound image, but may not be able to create as strong a sense of sound, as when using the temporary pan. Users sometimes prefer to localize sound images to strong sensations of sound or Vice versa according to the type of sound sources.

Fig. 4(a) and 4(b) explain the influence of the intensity (diff�STI amplitude) and the time difference in the localization of sound images, performed when reproducing signals using 2-channel stereo speaker. As shown in Fig. 4(a) and 4(b), the sound image can be localized at a given angle according to the difference of the amplitudes and the time difference, which are independent from each other. For example, the difference of the amplitudes of approximately 8 dB, the difference of time of about 0.5 MS, which is equivalent to a difference of the amplitudes of 8 dB can be used to localize the sound image at the angle of 20°. Consequently, even if the information of the mixing parameters are ensured only the difference of the amplitudes, you can get different sounds with different properties by converting the difference of the amplitudes of the difference in time, which is equivalent to the difference of the amplitudes in the course of localization of the sound images.

Fig. 5 illustrates the functions relating to the correspondence between the difference of the amplitudes and time differences that are required in order to localize sound images at angles of 10°, 20° and 30°. The function illustrated in Fig. 5, can be obtained on the basis shown in Fig. 4(a) and 4(b). As shown in Fig. 5, to localize the sound image at the predetermined position can be provided with various combinations of the difference of the amplitudes of the deviations of the time. For example, as information of the mixing parameters for localization� sound image at an angle of 20° is provided by the difference of the amplitudes of 8 dB. According to the function illustrated in Fig. 5, the sound image can also be localized at an angle of 20° using a combination of the difference of the amplitudes of 3 dB and the difference of time of 0.3 MS. In this case, as the information of the mixing parameters can be provided not only information of the difference of the amplitudes, but also information of the time difference, thereby improving the feeling of space.

Therefore, to form the sounds with the properties required by the user, during the operation of mixing/playback, the information of the mixing parameters can be appropriately transformed so that what from panning amplitude panning, and time is appropriate for a user may be performed. I.e., if the information of the mixing parameters includes only the information of the difference of the amplitudes and the user desired sounds with a strong sense of space, information of the difference of the amplitudes can be converted into information of the difference of time equivalent to the information of the time difference, with reference to the psychoacoustic data. Alternatively, if the user requires sounds like with a strong sense of space and precise localization of sound images, the information of the difference of the amplitudes can be converted into a combination of information of the difference of the amplitudes and inform�tion of the difference of time, equivalent to the original amplitude information. Alternatively, if the information of the mixing parameters includes only the information of the difference of time, and the user prefers precise localization of sound images, the information of the time difference can be converted into information of the difference of amplitudes, equivalent to the information of the difference of time, or can be converted into the combination information of the time difference and the information of the difference of the amplitudes, which can satisfy the preference of the user by increasing the accuracy of localization of the sound image and sense of space.

As a further alternative, if the information of the mixing parameters includes information of the difference of the amplitudes and the time difference, and the user prefers precise localization of sound images, the combination information of the difference of the amplitudes and the information of the time difference can be converted into information of the difference of the amplitudes of the equivalent combinations of the original information of the difference of the amplitudes and the information of the time difference. On the other hand, if the information of the mixing parameters includes information of the difference of the amplitudes and the time difference, and the user prefers the improvement in the feeling of space, the combination information of the difference of the amplitudes and information� the time difference can be converted into information of the time difference, equivalent combination of information of the difference of the amplitudes and the initial information of the time difference. As shown in Fig. 6, the management information may include information mixing/playback, and the harmonic information pertaining to one or more object signals. The harmonic information may include at least one of information of the pitch, information natural frequencies and information prevailing bandwidth pertaining to one or more object signals, and descriptions of the energy and spectrum of each subband of each of the object signals.

Information of harmonics can be used to handle the object signal during playback operations, since the resolution of the play unit, which performs this operation in units of sub-bands is insufficient.

If the harmonic information includes information of the pitch, related to one or more object signals, the gain of each of the object signals may be adjusted by loosening or strengthening of a predetermined frequency domain using a comb filter or inverse comb filter. For example, if one of a plurality of object signals is a vocal signal, the object signals can be used as karaoke �UTEM weakening only vocal signal. Alternatively, if the harmonic information includes information predominant frequency domain, including one or more object signals, can be performed the process of weakening or strengthening of the dominant frequency region. As a further alternative, if the harmonic information includes spectrum information pertaining to one or more object signals, the gain of each of the object signals can be controlled by performing attenuation or amplification without limiting any of the boundaries of the sub-bands.

Fig. 7 is a block diagram of an apparatus 140 for decoding an audio signal according to another embodiment of the present invention. As shown in Fig. 7, the device 140, the audio decoding uses multi-channel decoder 141 in place of object decoder and play unit, and decodes a number of object signals after the object signal are properly arranged in a multi-channel space.

More specifically, the device 140 decodes the audio signal includes a multi-channel decoder 141 and the inverter 145 parameters. Multi-channel decoder 141 generates a multi-channel signal, the object whose signals are already recorded in multi-channel space, on the basis of the signal down michiro�ing and spatial information parameters, which is channel-oriented additional information provided by inverter 145 parameters. Converter 145 options, analyzes the additional information and control information transmitted by the encoding device of the audio signal (not shown), and generates spatial information parameters on the basis of the result of the analysis. More specifically, the inverter 145 parameters generates information spatial parameters by combining the additional information and control information, which includes information of the playback settings and information mixing. Ie Converter 145 parameters is carried out by converting the combination of additional information and control information in the spatial data module respectively one-to-two (OTT) or module two-to-three (TTT).

The device 140, the audio decoding can perform decoding multichannel operation, in which the joint operation of object-oriented decoding and operation of mixing/playback, and thus may skip decoding of each object signal. Therefore, it is possible to reduce the decoding complexity and/or mixing/playback.

For example, when there are 10 object signals, and the multi-channel signal, Paul�Chennai on the basis of the 10 object signals, needs to be played in 5.1-channel speaker playback system, a normal decoding object-based audio signal generates the decoded signals properly corresponding to 10 object signals, based on the signal down-mixing and additional information, and then generates 5.1-channel signal through the right layout 10 object signals in the multichannel space, so that the object signals may be suitable for 5.1-channel speaker environment. Nevertheless, not sufficient to form 10 object signals during the formation of a 5.1-channel signal, and this problem becomes more serious as the difference between the number of object signals and the number of channels of the multichannel signal that is to be formed increases.

On the other hand, according to the embodiment of the implementation of Fig. 7 device 140 decodes the audio signal generates spatial information parameters, suitable for 5.1-channel signal, based on the additional information and control information and transmits the information of the spatial parameters and the signal down-mixing multi-channel decoder 141. Then the multi-channel decoder 141 generates 5.1-channel signal using the spatial information parameters�ditch and signal down-mixing. In other words, when the number of channels that should be output is 5.1 channels, the device 140 decodes the audio signal may simply form a 5.1-channel signal based on the signal down-mixing without the need to generate 10 object signals and, thus, is more effective than traditional device for decoding an audio signal, in terms of difficulty.

The unit 140 decodes the audio signal is effective when the amount of computation required to calculate the spatial information parameters corresponding to each of the OTT module and TTT module through the analysis of additional information and control information transmitted by the encoding device of the audio signal, less amount of computation required to perform the operation of mixing/playback after decoding of each object signal.

The unit 140 decodes the audio signal can be obtained by adding a module for forming spatial information parameters by the analysis of additional information and control information in the conventional device of decoding multi-channel audio signal and therefore can maintain compatibility with conventional device of decoding multichannel audio. Also, the device 140 Deco�of debugger can improve sound quality by using existing means of a conventional device for decoding multi-channel audio signal, such as an envelope shaper, the tool time processing of the sub-bands (STP) and decorrelator. With all this in mind we must conclude that all the advantages of a conventional method of decoding multi-channel audio signal can be easily applied to a method of decoding audio object.

The spatial information parameters transmitted in the multi-channel decoder Converter 141, 145 parameters, can be compressed, so as to be suitable for transmission. Alternatively, the spatial information parameters may have the same format as the format of the data transmitted in the usual device of multi-channel encoding. I.e. spatial information parameters may be subjected to the operation of decoding a Huffman or operation of a control decoding and thus can be passed to each module as uncompressed spatial data labels. The first is suitable for the transmission of spatial information parameters to the device decoding multi-channel audio in a remote location, and the second is convenient because you do not have the device decoding a multichannel audio signal to convert the compressed data of the spatial labels in the uncompressed spatial data labels that can be easily used in the operation of decterov�deposits.

The configuration of the spatial information delay based on the analysis of additional information and control information may cause a delay between the signal down-mixing and spatial information parameters. In order to circumvent this, may provide an additional buffer for either signal down-mixing, either for spatial information parameters, so that the signal down-mixing and spatial information parameters can be synchronized with each other. These methods however are inconvenient because of the need for additional buffer. Alternatively, additional information can be transmitted ahead of the signal down-mixing taking into account the possibility of delays between the signal down-mixing and spatial information parameters. In this case, the spatial information parameters obtained by combining the additional information and control information need not be adjusted, and can easily be used.

If the set of object signals from the signal down-mixing have different levels, the module art amplification of step-down mixing (ADG), which can directly compensate for the signal p�lowers mixing can determine relative levels of object signals, and each of the object signals can be assigned to a given position in the multichannel space using spatial data labels, information such as the difference between the levels of channels, information interchannel correlation (ICC) and other information of the prediction coefficients of the channels (CPC).

For example, if the management information indicates that a predetermined object signal should be assigned to a given position in a multi-channel space and has a higher level than the other object signals, a conventional multi-channel decoder may compute the difference between the energies of the channels in the signal down-mixing and to divide the signal down-mixing on the number of output channels based on the calculation results. However, conventional multi-channel decoder is unable to raise or lower the volume of a specific audio signal down-mixing. In other words, the conventional multi-channel decoder simply distributes the signal down-mixing according to the number of output channels and thus cannot raise or lower the volume on the signal down-mixing.

It is relatively easy to assign each of a number of object signals in the signal down-mixing, the formed object decoder�, to a given position in the multichannel space according to control information. However, special techniques are required in order to increase or decrease the amplitude of the predetermined object signal. In other words, if the signal is down-mixing generated by the object decoder is used as it is, it is difficult to reduce the amplitude of each object signal in the signal down-mixing.

Hence, according to the embodiment of the present invention, the relative amplitude of the object signal may vary according to control information through the use of the ADG module 147 illustrated in Fig. 8. More specifically, the amplitude of any one of the object signals from the signal down-mixing, object transmitted by the encoder, may be increased or decreased using the ADG module 147. The signal down-mixing, obtained by compensating the executed ADG module 147 may be exposed to a multi-channel decoding.

If the relative amplitude of the object signal in the signal of the step-down mixing properly adjusted using the ADG module 147 can object to perform decoding using conventional multi-channel decoder. If the signal is down-mixing, the formed object�th decoder, is a mono or a stereo signal or a multichannel signal with three or more channels, the signal down-mixing can be processed ADG module 147. If the signal is down-mixing generated by the object decoder has two or more channels, and a predetermined object signal, which must be adjusted ADG module 147, there is only one of the channels of the signal down-mixing, ADG module 147 can only be applied to the channel, which includes a predetermined object signal, instead of applying to all channels of the signal down-mixing. The signal down-mixing, processed ADG module 147 above method, can be easily processed using conventional multichannel encoder without the need to modify the structure of the multi-channel decoder.

Even when the final output signal is a multichannel signal that can be reproduced multi-channel speaker system, and is a stereo signal, ADG module 147 may be used to adjust the relative amplitude of the object signal of the final output signal.

Alternatively, the application of the ADG module 147, the information gain, the master gain value, which must be applied to each object signal, may be included in the control information during the formation of a number of object signals. To do this, the structure of the conventional multi-channel decoder can be modified. Despite the need for modification of the existing multi-channel decoder, this method is convenient in terms of decoding complexity by applying gain values to each object signal during a decoding operation without the need to calculate ADG and compensate for each object signal.

Fig. 9 is a block diagram of the device 150 of decoding an audio signal according to the fourth embodiment of the present invention. As shown in Fig. 9, the device 150 of the decoding of the audio signal is characterized by forming a stereo signal.

More specifically, the device 150 of decoding an audio signal includes a multi-channel binaural decoder 151, the first Converter 157 parameters and the second inverter 159 parameters.

The second inverter 159 parameters analyzes additional information and control information, which is provided by the encoding device of the audio signal, and configures the spatial information parameters on the basis of the result of the analysis. The first Converter 157 parameters configures the information of the stereo parameters, which can be used multi-channel binaural decoder 151 by adding three-dimensional (3D) information, such as function modeling the perception of sound (HRTF) spatial information parameters. Multi-channel binaural decoder 151 forms a virtual three-dimensional (3D) signal by applying the virtual three-dimensional information of the parameters to the signal down-mixing.

The first Converter 157 parameters and the second inverter 159 parameters can be replaced by a single module, i.e. a module 155 transformation parameters, which receives additional information, the control information and the HRTF parameters and configures information stereo parameters based on the additional information, control information and the HRTF parameters.

Traditionally, in order to form a stereo signal to reproduce the signal down-mixing, which includes the 10 object signals, using the headphones, the object signal to form 10 decoded signals that are appropriate to the 10 object signals based on the signal down-mixing and additional information. Then the play unit assigns each of the 10 object signals to a predetermined position in a multi-channel space with reference to the control information, so as to meet the requirements of a 5-channel speaker environment. After that, BL�to play forms a 5-channel signal, which can be played 5-channel speaker system. Further, the playback unit applies the HRTF parameters for 5-channel signal, thereby forming a 2-channel signal. Briefly, the above-mentioned traditional method of decoding an audio signal includes a reproduction of the 10 object signals, transforming the 10 object signals in 5-channel signal and forming a 2-channel signal based on a 5-channel signal, and it is thereby ineffective.

On the other hand, the device 150 of the decoding of the audio signal can easily generate a stereo signal that can be reproduced by means of headphones, based on the object audio signals. In addition, the device 150 of the decoding of the audio signal configures the spatial information parameters by the analysis of additional information and control information, and thereby can generate a stereo signal using a conventional multi-channel binaural decoder. Moreover, the device 150 of the decoding of the audio signal may use a conventional multi-channel binaural decoder, even when equipped with built-in transducer parameters, which receives additional information, the control information and the HRTF parameters and configures information stereo W�parameters on the basis of additional information, control information and the HRTF parameters.

Fig. 10 is a block diagram of the device 160 audio decoding according to the fifth embodiment of the present invention. As shown in Fig. 10, the device 160 audio decoding comprises processor 161 of the step-down mix, multi-channel decoder 163 and the inverter 165 parameters. The CPU 161 of the step-down mixer and Converter 165 parameters can be replaced by a single module 167.

Converter 165 parameters generates spatial information parameters, which can be used multi-channel decoder 163, and the information of parameters that can be used by the CPU 161 of the step-down mixing. The CPU 161 of the step-down mixer performs a pre-processing operation with the signal down-mixing and transmits the signal down-mixing, the resulting pre-processing operation, in a multi-channel decoder 163. Multi-channel decoder 163 performs an operation of decoding the signal down-mixing, transmitted by the CPU 161 of the step-down mixing, thereby outputting the stereo signal is a binaural stereo signal or a multichannel signal. Examples of pre-processing operation performed by the CPU 161 of the lowering Mick�of debugger, include the modification or conversion of the signal down-mixing in the time domain or the frequency domain using a filter.

If the signal is down-mixing input to the device 160 decoding of the audio signal is a stereo signal down-mixing may need to be subjected to preliminary treatment step-down mixing performed by the CPU 161 of the step-down mixing, before entering into a multi-channel decoder 163, since the multi-channel decoder 163 cannot convert component signal down-mixing, corresponding to the left channel, which is one of the many channels, the right channel, which is another of the many channels. Therefore, in order to move the object position signal related to the left channel, the right channel, the signal is down-mixing input to the device 160 audio decoding, can be pre-processed by the CPU 161 of the step-down mixing, and pre-processed signal down-mixing can be introduced in a multi-channel decoder 163.

Pre-processing the stereo signal down-mixing can be performed on the basis of information pre-processing obtained from complete�encourages creativity information and control information.

Fig. 11 is a block diagram of the device 170 of decoding an audio signal according to the sixth embodiment of the present invention. As shown in Fig. 11 unit 170 decodes the audio signal includes a multi-channel decoder 171, the channel processor 173 and a Converter 175 parameters.

The Converter 175 parameters generates spatial information parameters, which can be used multi-channel decoder 173, and the information of parameters that can be used by the channel processor 173. Channel processor 173 performs the post-processing operation with the signal output multi-channel decoder 171. Examples of the signal output multi-channel decoder 171, include stereo, binaural stereo and multichannel signal.

Examples of post-processing operations performed by the processor 173, include the modification and conversion of each channel or all channels of the output signal. For example, if the additional information includes information the natural frequencies belonging to a predetermined object signal, the channel processor 173 may remove harmonic components from a given object signal with the reference information on the natural frequencies. Method of decoding a multichannel audio signal may be inadequate� effective for to use the karaoke system. However, if the information eigenfrequencies related to the vocal object signals included in the additional information, and harmonic components of the vocal object signals are removed during the operation, post-processing, it is possible to realize a high-performance karaoke system by use of the embodiment of Fig. 11. Variant implementation of Fig. 11 can also be applied to object signals, different from the vocal object signals. For example, it is possible to remove the sound of a given musical instrument using the embodiment of Fig. 11. You can also enhance the set of harmonic components using information from the natural frequencies related to the object signals, using the embodiment of Fig. 11.

Channel processor 173 may perform additional processing effects for signal down-mixing. Channel processor 173 may add a signal obtained by additional treatment effects, the signal output multi-channel decoder 171. Channel processor 173 may alter the range of the object or to modify the signal down-mixing, if necessary. If not suitable to directly perform the operation of processing effects, such as reverberat�I, for the signal of the step-down mixing and pass the signal obtained by the operation processing effects in multi-channel decoder 171, the processor 173 step-down mixing can add a signal obtained by the operation processing effects into the output multi-channel decoder 171, instead of performing processing effects with the signal down-mixing.

The device 170 decoding of the audio signal may be designed to include not only the channel processor 173, but also the CPU step-down mixing. In this case, the CPU step-down mixing can be placed in front of a multi-channel decoder 173, and the channel processor 173 may be placed after multi-channel decoder 173.

Fig. 12 is a block diagram of the device 210 for decoding an audio signal according to the seventh embodiment of the present invention. As shown in Fig. 12, the device 210 of the audio decoding uses multi-channel decoder 213 is an object of the decoder.

More specifically, the device 210 of decoding an audio signal includes a multi-channel decoder 213, a transcoder 215, block 217 playback and base 219 three-dimensional data information.

Block 217 playback detects three-dimensional positions of a plurality of object signals based on three-dimensional information�ation, the corresponding index data included in the control information. Transcoder 215 forms a channel-oriented additional information by synthesizing the information of the position related to the number of object audio signals, to which three-dimensional information is used by block 217 playback. Multi-channel decoder 213 outputs the three-dimensional signal by applying channel-oriented additional information to the signal down-mixing.

Function modeling the perception of sound (HRTF) can be used as three-dimensional information. HRTF is a transfer function that describes the transmission of sound waves between a sound source at an arbitrary position and the eardrum and returns a value that varies according to the direction and elevation of the sound source. If the signal without directivity is filtered using HRTF, the signal can be heard as if it was reproduced from a certain direction.

When the input bit stream is accepted, the device 210 of the decoding of the audio signal retrieves an object-oriented signal down-mixing and the information of object-oriented parameters from the input bit stream using a demultiplexer (not shown). Next, block 217 playback extracts index data from opravlyaushi� information which are used to determine positions of a plurality of object signals, and obtains three-dimensional information corresponding to the extracted index data from the database 219 three-dimensional data information.

More specifically, the information of the mixing parameters, which is included in the control information used by the device 210 of the decoding of the audio signal may include not only information, but also the index data required for searching three-dimensional information. Information of mixing parameters may also include time information related to the time difference between channels, the position information and one or more parameters obtained by adequately combining the information of the level and time information.

The position of the audio object can be determined initially according to the mixing parameters by default and can be changed later by the application of three-dimensional information corresponding to the position required by the user to the object audio signal. Alternatively, if the user wants to apply a three-dimensional effect to only a few object audio signals, the information of the level and time information related to another object audio signals, the user wants not to apply tehm�RNA the effect can be used as the information of the mixing parameters.

Transcoder 217 forms a channel-oriented additional information pertaining to M channels, due to the synthesis of object-oriented parameters related to the N object signals transmitted by the encoding device of the audio signal, and position information of a certain number of object signals, which unit 217 playback applies three-dimensional information, such as HRTF.

Multi-channel decoder 213 generates an audio signal based on the signal down-mixing and channel-oriented additional information generated by the transcoder 217, and generates a three-dimensional multi-channel signal by performing three-dimensional reproduction using three-dimensional information included in a channel-oriented additional information.

Fig. 13 is a block diagram of the device 220 decodes the audio signal according to an eighth embodiment of the present invention. As shown in Fig. 13, the device 220 decodes the audio signal differs from the unit 210 decodes the audio signal illustrated in Fig. 12, so that the transcoder 225 transmits channel-oriented additional information and three-dimensional information separately in megachannel�th decoder 223. In other words, the transcoder device 225 220 decodes the audio signal receives a channel-oriented additional information corresponding to the M channels of information object-oriented parameters related to the N object signals, and transmits channel-oriented additional information and three-dimensional information, which is applied to each of the N object signals in a multi-channel decoder 223, whereas the transcoder device 217 210 decoding audio signal transmits channel-oriented additional information, including three-dimensional information in a multi-channel decoder 213.

As shown in Fig. 14, channel-oriented additional information and three-dimensional information may include the set of indices of the frames. Thus, the multi-channel decoder 223 can sync to channel-oriented additional information and three-dimensional information with reference to the indices of the frames of each of the channel-oriented additional information and the three-dimensional information, and thereby can apply three-dimensional information to the frame of the bit stream corresponding to the three-dimensional information. For example, three-dimensional information, which has index 2, can be applied to frame 2 having index 2.

Since channel-oriented additional information�Oia and three-dimensional information includes the indices of the frames, effectively identifies the temporal position of the channel-oriented additional information that needs to be applied three-dimensional information, even if three-dimensional information is updated in time. In other words, the transcoder 225 includes three-dimensional information and the number of indexes of the frames in a channel-oriented additional information, and thereby the multi-channel decoder 223 can easily sync channel-oriented additional information and three-dimensional information.

The CPU 231 of the step-down mixing, transcoder 235, block 237 playback and database of three-dimensional information can be replaced by one module 239.

Fig. 15 is a block diagram of the device 230 for decoding an audio signal according to a ninth embodiment of the present invention; As shown in Fig. 15, the device 230 decoding of the audio signal differs from the device 220 decodes the audio signal illustrated in Fig. 14, due to the additional inclusion of a CPU 231 of the step-down mixing.

More specifically, the device 230 for decoding an audio signal includes a transcoder 235, block 237 playback, 239 base data the three-dimensional information, the multi-channel decoder 233 and the CPU 231 of the step-down mixing. Transcoder 235, block 237 playback, the base 239 Dunn�x three-dimensional information and the multi-channel decoder 233 are the same, as their respective counterparts illustrated in Fig. 14. The CPU 231 of the step-down mixer performs the operation pre-processing the stereo signal down-mixing to adjust the position. Base 239 data three-dimensional information may be included in block 237 playback. Module to apply a specific effect to the signal down-mixing may also be provided in the device 230 decode the audio stream.

Fig. 16 illustrates a block diagram of the device 240 for decoding an audio signal according to a tenth embodiment of the present invention. As shown in Fig. 16, the device 240 decoding of the audio signal differs from the unit 230 decodes the audio signal illustrated in Fig. 15, the inclusion of multi-point adder 241 of the control module.

That is, a device 240 for decoding an audio signal, similar to the device 230 decoding of an audio signal, includes a processor 243 of the step-down mixing a multi-channel decoder 244, the transcoder 245, block play 247 and 249 base data the three-dimensional information. Multi-point adder 241 of the control module combines multiple streams of bits obtained object-oriented coding, thereby obtaining a single stream of bits. For example, when the first bit stream to the first audio signal and second�th bit stream to the second audio signal are inputted, multi-point adder 241 of the control module retrieves the first signal down-mixing from the first bit stream, extracts the second signal of the step-down mixing from the second bit stream and generates the third signal down-mixing by combining the first and second signals of step-down mixing. In addition, multi-point adder 241 of the control module extracts the first object-oriented additional information from the first bit stream, extracts a second object-oriented additional information from the second bit stream and generates third object-oriented additional information by combining first object-oriented additional information and second object-oriented additional information. Then multi-point adder 241 of the control module generates a bit stream by combining the third signal down-mixing and the third object-oriented additional information, and outputs the generated bit stream.

Hence, according to the tenth embodiment of the present invention can effectively handle even the signals transmitted by two or more communication partners, in comparison with the case of encoding and decoding of each object signal.

To Mnogotochie�th adder 241 of the control module included a number of signals of step-down mixing which, respectively, are extracted from the plurality of streams of bits and associative associated with various compression codecs, to a single channel step-down mixing the signal down-mixing may need to be converted into signals pulse code modulation (PCM) or signals in a predetermined frequency domain according to the types of codecs compress the signal down mix PCM signals or signals obtained by the conversion may need to be combined, and the signal obtained by combining may need to be converted using a predetermined compression codec. In this case, a delay may occur according to the enabled signals down mix to PCM signal or a signal in a given frequency region. However, the delay may not be properly estimated by the decoder. Therefore, the delay may need to be included in the bitstream and transmitted together with the bit stream. The delay may indicate the number of samples of delay to the PCM signal or the number of samples of delay at a given frequency region.

During the operation of the encoding object-oriented audio signal a significant number of input signals may need to be processed in comparison with the number of input signals are typically processed during normal operation of the multi-channel encoding (� example, operations 5.1-channel or 7.1-channel coding). Therefore, the method of encoding object-oriented audio requires much higher data rates bits than the conventional method of encoding object-oriented multi-channel audio signal. However, because the method of encoding object-oriented audio signal entails the processing object signals that are smaller channel signals, it is possible to generate dynamic output signals using the encoding of object-oriented audio signal.

Next, with reference to Fig. 17-20 will be described a method of encoding an audio signal according to the options of implementing the present invention.

In the method of encoding object-oriented audio object signals may be set to represent the individual sounds such as human voice or the sound of a musical instrument. Alternatively, sounds with similar characteristics, such as the sounds of stringed musical instruments (e.g., violin, viola and cello), sounds that belong to the same band of frequencies, or sounds that are classified in a single category according to directions and angles of their sound sources can be grouped and given the same object signals. As� another alternative object signals can be specified using a combination of the above methods.

A certain number of object signals may be transmitted as a signal down-mixing and additional information. During the creation of information that must be transferred, the energy or power of a signal down-mixing or each of the object signals of the signal down-mixing is calculated initially for the purpose of detecting the envelope of the signal down-mixing. The calculation results can be used to convey the object signals or signals of step-down mixing or calculate the ratio of the levels of object signals.

The algorithm is linear predictive coding (LPC) can be used for lower speeds in bits per second. More specifically, the number of LPC coefficients, which represent the envelope of the signal is generated through the analysis of the signal, and the LPC coefficients are transmitted instead of transmitting envelope information relating to the signal. This method is effective against the transmission of bits. Nevertheless, since it is very likely that LPC coefficients differ from the actual signal envelope, this method requires the addition process, such as error correction. Briefly, the method that entails the transfer of information about�ibusa signal, can guarantee high sound quality, but leads to a significant increase in the volume of information that must be passed. On the other hand, a method that entails the use of LPC coefficients, to reduce the amount of information that must be passed, but requires additional processing, such as correction of errors, and reduces the sound quality.

According to the embodiment of the present invention, can be used the combination of these methods. In other words, the envelope signal can be represented by the energy or power signal or an index value, or another value, such as LPC coefficient corresponding to a power or energy signal.

The envelope information relating to the signal, can be obtained in terms of time sections or frequency sections. More specifically, as shown in Fig. 17, the envelope information relating to the signal, can be obtained in units of frames. Alternatively, if the signal is represented by a structure of frequency bands using filters block such as a block of quadrature mirror filters (QMF), the envelope information relating to the signal, can be obtained in units of frequency sub-bands, sub-bands of frequencies that are smaller than the ranges of frequencies of groups of frequency sub-bands or groups of sections of the sub-bands of frequencies. As a further alternative, a combination of frame-based method, based on frequency sub-bands method and partitioned based on the frequency sub-bands method can be used within the scope of the present invention.

As another alternative, given the fact that low frequency components of the signals, in General, have more information than high-frequency components of the signal envelope information related to low-frequency components of the signal can be transferred as is, whereas the envelope information relating to high-frequency components of the signal can be represented by LPC coefficients or other values, and LPC coefficients or other values can be passed in lieu of the envelope related to high-frequency components of the signal. However, the low frequency components of the signal may not necessarily have more information than high-frequency components of the signal. Therefore, the above method should be applied flexibly according to circumstances.

According to the embodiment of the implementation, the envelope information or the index information corresponding to part (hereinafter referred to as the dominant part) of the signal, which seems predominant in the frequency-time axis, can be transferred, and the information of the envelope or index�s data relevant nephroblastoma part of the signals may not be transmitted. Alternatively, the values (for example, LPC coefficients) that represent the energy and power of the dominant part of the signal can be transmitted, and the values corresponding nephroblastoma part of the signal may not be transmitted. As another alternative, the envelope information or the index information corresponding to the dominant part of the signal can be transmitted, and the values that represent the energy and capacity nephroblastoma part of the signal can be transmitted. As another alternative, information relating only to the dominant part of the signal can be transmitted to nephroblastoma part of the signal can be estimated on the basis of information relating to the dominant part of the signal. As a further alternative, a combination of the above methods can be used.

For example, as shown in Fig. 18, if the signal is divided by the predominant period and repealedly period, information relating to the signal, can be transmitted in four different ways, as shown in position (a) to(d).

To transfer a certain number of object signals in the form of a signal down-mixing and additional information, signal down-mixing should be divided into �notesto elements as part of the decoding operation, for example, taking into account the ratio of levels of the object signals. To ensure independence between the elements of the signal down-mixing should be performed decorrelation operation.

Object signals, which are units of encoding in the way of object-oriented coding, have greater independence than channel signals, which are units of encoding in the way that multi-channel encoding. In other words, the channel signal includes a number of object signals, and thus should be decorrelated. On the other hand, the object signals are independent from each other, and thereby channel separation can easily be performed using the characteristics of object signals without the need for surgery decorrelation.

More specifically, as shown in Fig. 19, the object signals A, B and C, as it turns out, are predominant on the frequency axis. In this case there is no need to divide the signal down-mixing to a number of signals according to the ratio of the levels of object signals A, B and C and perform the decorrelation. Instead, information related to the predominant period of the object signals A, B and C, may be transferred to, or gain value may be applied to each frequency component of each of the object signals A, B � C, thereby skipping the decorrelation. Therefore, it is possible to reduce the amount of computation and to reduce the bit rate by an amount which otherwise would have been required in the form of additional information required for decorrelation.

Briefly, to skip the decorrelation, which is performed in order to guarantee the independence among a certain number of signals received by the division of signal down-mixing according to the ratio of the ratios of signals obtained by dividing the signal down-mixing according to the ratio of the ratios of the number of object signals, information related to the frequency domain, which includes each of the object signals may be transmitted as additional information. Alternatively, different values of gain can be applied to prevailing period during which each object signal is predominant, and nerealizuemo the period during which each object signal seems to be less dominant, and thus information related to the predominant period, mainly can be provided as additional information. As another alternative, the information related to the predominant period may be transmitted as additional information, and�formation, related to nerealizuemo period, may not be transferred. As a further alternative, a combination of the methods mentioned above, which are alternatives to the method of decorrelation may be used.

The above methods are alternatives to the method of decorrelation can be applied to all object signals or only to some object signals, which are easily discern two dominant periods. Also the above methods that are alternatives to the method of decorrelation can be variably applied in units of frames.

The encoding of object audio signal using the residual signal is further described in detail.

In General, the method of coding audio object number of the object signals encode, and the results of the coding pass in the form of a combination of signal down-mixing and additional information. Then a number of object signals is restored from the signal down-mixing through decoding the additional information and the restored object signals properly mixed, for example, at the user's request according to the control information, thereby forming a first channel signal. A method of encoding object-oriented audio signal in line�Yong, so feel free to change the output channel signal according to the control information using a mixer. However, the method of encoding object-oriented audio signal can also be used to form the channel output a predetermined manner, regardless of control information.

This additional information may include not only the information required to obtain a certain number of object signals from the signal down-mixing, but also the information of the mixing parameters required in order to form the channel signal. Thus, it is possible to form the final channel output signal without the aid of a mixer. In this case, this algorithm, as residual coding can be used to improve the quality of the sound.

The usual method residual coding includes coding signal and encoding the error between the coded signal and the original signal, i.e., the residual signal. During the operation of decoding the encoded signal is decoded by compensating the error between the coded signal and the original signal, thereby restoring the signal, which is similar to the original signal to the maximum extent possible. Because the error between the coded signal and Isho�tion signal is generally small, you can reduce the amount of information is additionally required in order to perform residual coding.

If the final output signal of the decoder is fixed, as additional information may be provided not only information of the mixing parameters required to construct the final channel signal, but also information residual coding. In this case, you can improve the sound quality.

Fig. 20 is a block diagram of the device 310 coding of audio signals according to the embodiment of the present invention. As shown in Fig. 20, the device 310 of the audio coding using different residual signal.

More specifically, the device 310 of encoding an audio signal includes an encoder 311, a decoder 313, the first mixer 315, a second mixer 319, adder driver 317 and 321 of streams of bits.

The first mixer 315 performs the operation mixing with the original signal, and the second mixer 319 performs an operation of mixing a signal obtained by performing the operations of encoding and then decoding operation of the source signal. The adder 317 calculates a residual signal between the output signal from the first mixer 315, and the output signal from the second mixer 319. Shaper 321 streams of bits, adds the residual signal to additional info�information and transmits the addition result. Thus, it is possible to improve the quality of the sound.

The calculation of the residual signal can be applied to all parts of the signal or only the low frequency parts of the signal. Alternatively, the calculation of the residual signal can be selectively applied to frequency areas, including the overriding signals for frame-by-frame basis. As a further alternative, a combination of the above methods can be used.

Because the amount of additional information, which includes information of the residual signal is much larger than the amount of additional information that does not include residual information signals, calculating a residual signal can be applied only to certain parts of the signal that directly affect the quality of the sound, thereby avoiding an excessive increase of the bit rate. The present invention can be implemented as computer-readable code recorded on a computer-readable recording media. A machine-readable recording media may be any type of recording devices in which data is stored machine-readable way. Examples of computer readable recording media include ROM, RAM, CD-ROM, magnetic tapes, floppy disks, storage devices and optical data carrier wave (e.g., data transmission through the Internet�). Computer-readable recording medium can be distributed across multiple computing systems connected over the network so that computer-readable code written on them and enforced them in a decentralized way. Functional programs, code and code segments needed for realizing the present invention, can be easily interpreted by specialists in this field of technology.

Industrial applicability

As described above, according to the present invention sound images are localized for each object audio signal due to the advantages of the methods of encoding and decoding object-based audio signal. Thus, the ability to create more realistic sounds during playback, the object audio signals. In addition, the present invention can be applied to interactive games and thus can provide the user a more realistic experience of virtual reality.

Although the present invention specifically shown and described with reference to exemplary embodiments of its implementation, specialists in the art should understand that it can be made various changes in form and content, not beyond the nature and scope of the present invention, which is defined by the following�th by the claims.

1. A method of decoding an audio signal, comprising stages on which:
take the signal down-mixing, containing at least one object signal and object-oriented additional information generated when the at least one object signal is subjected to the step-down mixing to get the signal down-mixing, and signal down-mixing and object-oriented additional information taken from the audio signal;
accept control information for controlling position or level of at least one object signal;
form a channel-oriented additional information based on the object-oriented additional information and control information;
form a processed signal down-mixing based on the signal down-mixing, object-oriented additional information and control information for controlling the position of at least one object signal; and
form a multi-channel audio signal using the processed signal, the step-down mix and channel-oriented additional information
at the same time as the signal down-mixing, and the processed signal down-mixing consists of a left-ka�Ala and right channels.

2. A method of decoding an audio signal according to claim 1, wherein the processed signal down-mixing is formed by performing add effects to the signal down-mixing.

3. A method of decoding an audio signal according to claim 1, wherein the signal generating step-down mixing contains the modification of the signal down-mixing either in the time domain or in frequency domain.

4. The device decoding the audio signal that contains:
a demultiplexer receiving a signal down-mixing, containing at least one object signal and object-oriented additional information generated when the at least one object signal is subjected to the step-down mixing to get the signal down-mixing, and signal down-mixing and object-oriented additional information taken from the audio signal;
the Converter parameters, receiving control information for controlling position or level of at least one object signal and generates a channel-oriented additional information based on the object-oriented additional information and control information;
CPU step-down mixing, forming a processed signal of the lowering of mikhirev�tion on the basis of the signal down-mixing, object-oriented additional information and control information for controlling the position of at least one object signal; and
multichannel decoder that generates multi-channel audio signal using the processed signal, the step-down mix and channel-oriented additional information
at the same time as the signal down-mixing, and the processed signal down-mixing consists of a left channel and right channel.

5. The device decoding the audio signal according to claim 4, wherein the processed signal down-mixing is formed by performing add effects to the signal down-mixing.

6. The device decoding the audio signal according to claim 4, in which the CPU step-down mixing modifies the signal down-mixing either in the time domain or in frequency domain.

7. The computer-readable recording medium on which is recorded a method of decoding an audio signal, comprising stages:
the reception signal down-mixing, containing at least one object signal and object-oriented additional information generated when the at least one object signal is subjected to the step-down mixing to get the signal down-mixing, and the signal decreases�the total mixing and object-oriented additional information taken from the audio signal;
receiving control information for controlling position or level of at least one object signal;
the formation of channel-oriented additional information based on object-oriented additional information and control information;
the formation of the processed signal down-mixing based on the signal down-mixing, object-oriented additional information and control information for controlling the position of at least one object signal; and
the formation of the multi-channel audio signal using the processed signal, the step-down mix and channel-oriented additional information
at the same time as the signal down-mixing, and the processed signal down-mixing consists of a left channel and right channel.



 

Same patents:

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of encoding audio signals and related spatial information in a format which is independent of the playback scheme. A first set of audio signals is assigned to a first group. The first group is encoded as a set of mono audio tracks with associated metadata describing the direction of the signal source of each track relative to the recording position and the initial playback time thereof. A second set of audio signals is assigned to a second group. The second group is encoded as at least one set of ambisonic tracks of a given order and a mixture of orders. Two groups of tracks comprising the first and second sets of audio signals are generated.

EFFECT: providing a technique capable of presenting spatial audio content independent of the exhibition method.

26 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to a surround sound system. multi-channel spatial signal comprising at least one surround channel is received. Ultrasound is emitted towards a surface to reach a listening position via reflection of said surface. The ultrasound signal may specifically reach the listening position from the side, above or behind of a nominal listener. A first drive unit generates a drive signal for the directional ultrasound transducer from the surround channel. The use of an ultrasound transducer for providing the surround sound signal provides an improved spatial experience while allowing the speaker to be located, for example, in front of the user. An ultrasound beam is much narrower and well defined than conventional audio beams and can therefore be better directed to provide the desired reflections. In some scenarios, the ultrasound transducer may be supplemented by an audio range loudspeaker.

EFFECT: high quality of reproducing audio and high efficiency of the surround sound system.

12 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal (Xdn,k) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

FIELD: physics, acoustics.

SUBSTANCE: invention relates to processing signals in an audio frequency band. The apparatus for generating at least one output audio signal representing a superposition of two different audio objects includes a processor for processing an input audio signal to provide an object representation of the input audio signal, where that object representation can be generated by parametrically guided approximation of original objects using an object downmix signal. An object manipulator individually manipulates objects using audio object based metadata relating to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an output audio signal having one or multi-channel signals depending on a specific rendering setup.

EFFECT: providing efficient audio signal transmission rate.

14 cl, 17 dwg

FIELD: radio engineering, communication.

SUBSTANCE: described is a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker system, wherein each virtual sound source position is associated to each channel. The device includes a correlation reducer for differently converting, and thereby reducing correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a centre and a non-centre channel of the plurality of channels, in order to obtain an inter-similarity reduced combination of channels; a plurality of directional filters, a first mixer for mixing output signals of the directional filters modelling the acoustic transmission to the first ear canal of the listener, and a second mixer for mixing output signals of the directional filters modelling the acoustic transmission to the second ear canal of the listener. Also disclosed is an approach where centre level is reduced to form a downmix signal, which is further transmitted to a processor for constructing an acoustic space. Another approach involves generating a set of inter-similarity reduced transfer functions modelling the ear canal of the person.

EFFECT: providing an algorithm for generating a binaural signal which provides stable and natural sound of a record in headphones.

33 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: method comprises estimating a first wave representation comprising a first wave direction measure characterising the direction of a first wave and a first wave field measure being related to the magnitude of the first wave for the first spatial audio stream, having a first audio representation comprising a measure for pressure or magnitude of a first audio signal and a first direction of arrival of sound; estimating a second wave representation comprising a second wave direction characterising the direction of the second wave and a second wave field measure being related to the magnitude of the second wave for the second spatial audio stream, having a second audio representation comprising a measure for pressure or magnitude of a second audio signal and a second direction of arrival of sound; processing the first wave representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure, a merged direction of arrival measure and a merged diffuseness parameter; processing the first audio representation and the second audio representation to obtain a merged audio representation, and forming a merged audio stream.

EFFECT: high quality of a merged audio stream.

15 cl, 7 dwg

FIELD: physics.

SUBSTANCE: apparatus (100) for generating a multichannel audio signal (142) based on an input audio signal (102) comprises a main signal upmixing means (110), a section (segment) selector (120), a section signal upmixing means (110) and a combiner (140). The section signal upmixing means (110) is configured to provide a main multichannel audio signal (112) based on the input audio signal (102). The section selector (120) is configured to select or not select a section of the input audio signal (102) based on analysis of the input audio signal (102). The selected section of the input audio signal (102), a processed selected section of the input audio signal (102) or a reference signal associated with the selected section of the input audio signal (102) is provided as section signal (122). The section signal upmixing means (130) is configured to provide a section upmix signal (132) based on the section signal (122), and the combiner (140) is configured to overlay the main multichannel audio signal (112) and the section upmix signal (132) to obtain the multichannel audio signal (142).

EFFECT: improved flexibility and sound quality.

12 cl, 10 dwg

FIELD: information technology.

SUBSTANCE: invention relates to lossless multi-channel audio codec which uses adaptive segmentation with random access point (RAP) and multiple prediction parameter set (MPPS) capability. The lossless audio codec encodes/decodes a lossless variable bit rate (VBR) bit stream with random access point (RAP) capability to initiate lossless decoding at a specified segment within a frame and/or multiple prediction parameter set (MPPS) capability partitioned to mitigate transient effects. This is accomplished with an adaptive segmentation technique that fixes segment start points based on constraints imposed by the existence of a desired RAP and/or detected transient in the frame and selects a optimum segment duration in each frame to reduce encoded frame payload subject to an encoded segment payload constraint. RAP and MPPS are particularly applicable to improve overall performance for longer frame durations.

EFFECT: higher overall encoding efficiency.

48 cl, 23 dwg

FIELD: physics.

SUBSTANCE: method and system for generating output signals for reproduction by two physical speakers in response to input audio signals indicative of sound from multiple source locations including at least two rear locations. Typically, the input signals are indicative of sound from three front locations and two rear locations (left and right surround sources). A virtualiser generates left and right surround output signals suitable for driving front loudspeakers to emit sound that a listener perceives as emitted from rear sources. Typically, the virtualiser generates left and right surround output signals by transforming rear source input signals in accordance with a sound perception simulation function. To ensure that virtual channels are well heard in the presence of other channels, the virtualiser performs dynamic range compression on rear source input signals. The dynamic range compression is preferably performed by amplifying rear source input signals or partially processed versions thereof in a nonlinear way relative to front source input signals.

EFFECT: separating virtual sources while avoiding excessive emphasis of virtual channels.

34 cl, 9 dwg

FIELD: information technologies.

SUBSTANCE: invention discloses the method for reproduction of multiple audio channels, according to which out-of-phase information is extracted from side and/or rear side channels contained in a multi-channel audio signal.

EFFECT: improved reproduction of a multi-channel audio signal.

15 cl, 10 dwg

FIELD: physics, acoustics.

SUBSTANCE: group of inventions relates to expansion of a compressed audio signal which consists of one or more compressed audio channels into an expanded audio signal. An expansion unit is set up to use current variable expansion parameters to expand a compressed audio signal in order to obtain an expanded audio signal, wherein current variable expansion parameters comprise current variables of smoothed phase values. A parameter determiner is set up to obtain one or more current smoothed expansion parameters for use in the expansion unit based on input information on sampled expansion parameters. The parameter determiner is set up to combine a scaled version of the previous smoothed phase value and a scaled version of input phase information, using a phase change limiting algorithm to determine the current smoothed phase value based on the previous smoothed value and input phase information.

EFFECT: high quality of the expanded audio signal.

13 cl, 7 dwg

FIELD: physics, computer engineering.

SUBSTANCE: hardware unit for expanding a compressed audio signal into an expanded audio signal, comprising one or more expanded audio channels, including a parameter processing unit, configured to apply expansion parameters for expanding the compressed audio signal and obtain an expanded audio signal. The parameter processing unit is configured to apply phase shift to the compressed audio signal and obtain a phase-shifted version of the compressed audio signal when storing a decorrelated phase-invariable signal. The parameter processing unit is also configured to sum the phase-shifted version of the compressed audio signal and the decorrelated signal and obtain an expanded audio signal.

EFFECT: expanding a compressed audio signal into an expanded audio signal.

16 cl, 4 dwg

FIELD: radio engineering, communication.

SUBSTANCE: analogue speech signal is sampled with a standard frequency of 8000 Hz. The sampled speech signal is transmitted to the input of a bandpass filter with cut-off bands of 0.3 kHz and 3.4 kHz. Discrete Fourier transform is performed over the filtered signal to obtain expansion coefficients. Further, the expansion coefficients are rearranged in reverse order. Inverse discrete Fourier transform is then performed, after which the spectrum of the speech signal becomes inverted with respect to the initial spectrum. The disclosed transformation is characterised by that the signal becomes inverted on time.

EFFECT: faster transformation.

6 dwg

FIELD: physics.

SUBSTANCE: determination is ensured by making the conclusion on psychophysiological conditions of a person by variation in time of the ratio of absolute magnitude of arbitrary jitter of speech signal main tone period, duration of pauses in speech signal, duration of key depression and intervals between key depressions, duration of depression and intervals between depressions of left mouse key, mouse motion signal and image oscillation period exceeding the threshold to their total number.

EFFECT: higher accuracy of determination.

8 cl, 14 dwg

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to means of encoding and decoding object-based audio signals. The method comprises extracting from the audio signal a first audio signal and a first audio parameter, wherein a musical object is channel-based encoded, and a second audio signal and a second audio parameter in which a vocal object is object-based encoded; generating a third audio signal using at least one of the first and second audio signals; generating a multi-channel audio signal using at least one of the first and second audio parameters and the third audio signal.

EFFECT: providing means of encoding and decoding audio.

9 cl, 16 dwg

FIELD: physics, audio.

SUBSTANCE: invention relates to encoding and decoding audio signals. The technical result is achieved due to an audio decoder for obtaining decoded audio information based on entropy encoded audio information, which includes a context-based entropy decoder configured to decode the entropy-encoded audio information depending on a context, which is based on previously-decoded audio information in a non-reset state. The context-based entropy decoder is configured to select mapping information for deriving the decoded audio information from the encoded audio information, depending on the context. The context-based entropy decoder consists of a context resetter configured to reset the context for selecting the mapping information to a default context, which is independent from the previously-decoded audio information, in response to overhead in the encoded audio information.

EFFECT: enabling adaptation of rules for mapping entropy decoding information to signal statics.

19 cl, 21 dwg

FIELD: physics, video.

SUBSTANCE: invention relates to means of processing multi-channel audio or video signals using a variable prediction direction. Two audio or video channels are combined to obtain a first combination signal as a mid signal and a residual signal which can be obtained using a predicted side signal obtained from the mid signal. The first combination signal and the residual prediction signal are encoded and written into a data stream together with the prediction information obtained by an optimiser based on an optimisation target and a prediction direction indicator indicating a prediction direction associated with the residual signal. A decoder uses the prediction residual signal, the first combination signal, the prediction direction indicator and the prediction information to obtain a decoded first channel signal and a decoded second channel signal. In an encoder example or in a decoder example, a real-to-imaginary transform can be applied for estimating the imaginary part of the spectrum of the first combination signal.

EFFECT: high audio or video quality.

19 cl, 31 dwg, 2 tbl

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of encoding audio signals and related spatial information in a format which is independent of the playback scheme. A first set of audio signals is assigned to a first group. The first group is encoded as a set of mono audio tracks with associated metadata describing the direction of the signal source of each track relative to the recording position and the initial playback time thereof. A second set of audio signals is assigned to a second group. The second group is encoded as at least one set of ambisonic tracks of a given order and a mixture of orders. Two groups of tracks comprising the first and second sets of audio signals are generated.

EFFECT: providing a technique capable of presenting spatial audio content independent of the exhibition method.

26 cl, 11 dwg

FIELD: radio engineering, communication.

SUBSTANCE: adaptive delta codec includes a source and a receiver of an analogue signal, a digital communication channel, a coder containing a waveform digitizer, a comparator, an inverter, a JK trigger, a transmission adaptation circuit including a voltage divider, an operating transmission amplifier, the first, the second and the third resistors, a capacitor, as well as a clock-pulse generator (CPG), a decoder containing an amplifier, an analogue switch, a low-pass filter, a reception adaptation circuit including a voltage divider, an operating reception amplifier, the first, the second and the third resistors, and a capacitor.

EFFECT: improving transmission quality of a voice signal via digital communication channels at low transmission rate at simultaneous simplification of the device structure.

2 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to encoding and decoding a multichannel audio signal. The audio signal decoder is designed to generate a decoded representation of a multichannel audio signal based on the encoded representation of the multichannel audio signal and includes a time warping decoder for reconstructing time warping of multiple audio signals included in the encoded representation of the multichannel audio signal. The audio signal encoder generates an encoded representation of a multichannel acoustic signal and includes a generator of the encoded representation of the audio signal, which in turn selectively generates a representation of the audio signal containing information about the general time warping outline, which cumulatively characterises multiple audio channels of the multichannel acoustic signal, or an encoded representation of the audio signal containing information about individual time warping outlines, separately characterising each of the multiple audio channels, where the choice depends on the similarity or difference between time warping outlines relating to each of the multiple audio channels reflected in the information.

EFFECT: improved characteristics of an encoder/converter for modified discrete cosine transform with time warping, providing an effective bit rate when storing and/or transmitting a multichannel audio signal.

14 cl, 40 dwg

FIELD: information technology.

SUBSTANCE: apparatus for encoding a mutichannel audio signal has a multichannel audio signal receiver, having a first and a second audio signal from a first and a second microphone, a time difference module for determining time difference between the first and second audio signals by combining successive observations of cross-correlations between the first and second audio signals, wherein the cross-correlations are normalised to derive state probabilities accumulated using a Viterbi algorithm to achieve time difference with built-in hysteresis, and the Viterbi algorithm calculates the state probability for each given state in form of a combined contribution of all routes included in that state, a delay module for multichannel audio signal compensation by delaying the first or second audio signal in response to the time difference signal, a monophonic module for generating a monophonic signal by combining multichannel audio signal compensation channels, and a monophonic signal encoder.

EFFECT: high quality and efficiency of encoding.

10 cl, 5 dwg

Up!