Binaural rendering of multi-channel audio signal

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal ( X d n , k ) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

 

This invention relates to binaural rendering multichannel audio signal.

Have been proposed many algorithms for encoding an audio signal for efficient encoding or compression of audio data of one channel, i.e. monospecific signals. Using psychoacoustic, sound samples appropriately scaled, quanthouse or even set to zero to eliminate the relative entropy (mismatch), for example, PCM (pulse code modulation) encoded audio signal. Also removes redundancy.

As a next step, we used the similarity of the left and right channel audio stereo for efficient encoding/compression audio stereo signals.

But ahead of applying formulate further requirements to the algorithms of audio coding. For example, when organizing teleconferences, computer games, music, etc. multiple audio signals, which are partially or even completely uncorrelated, must be transmitted in parallel. In order to maintain the desired encoding bit rate, the audio signals are low enough to be compatible with applications from low bit rate; recently were PR is logeny audio coder-decoders, which is mixed with the lowering of plural input audio signals into a signal down-mixing, such as a stereo or even mono down-mix. For example, the MPEG surround sound mixes with low input channels in the signal down-mixing method prescribed in the standard. Stereo downmix is performed using the so-called OTT1and TTT-1blocks for downward mixing of two signals in one and three signals in two respectively. To mix with the decrease more than three signals, uses a hierarchical structure of these blocks. Each OTT-1the unit produces in addition to the mono down-mix of a difference of levels between the two input channels, as well as the parameters of the inter-channel coherence/correlation, representing the coherence or cross-correlation between two input channels. The settings are made along with the signal downward mixing of encoder MPEG surround audio in the MPEG data stream surround sound. Similarly, each TTT-1the unit transmits the prediction coefficients, giving the opportunity to restore the three input channels of the resulting stereo down-mixing. Factors predicting channel is also transmitted as the additional is Naya information in the data stream of MPEG surround sound. The MPEG decoder surround sound mixes with increasing signal down-mixing using a given additional information and restores the original channels included in the encoder MPEG surround sound.

However, the MPEG surround sound, unfortunately, does not satisfy all the requirements in many applications. For example, the MPEG decoder surround sound designed to enhance mixing signal down-mixing of encoder MPEG surround sound so that the input channels of encoder MPEG surround sound restored for what they are. In other words, the data stream of MPEG surround sound plays with the configuration of the loudspeaker used for encoding, or using typical configurations, such as a stereo.

However, according to some applications it would be convenient if the configuration of the loudspeaker can be freely changed on the side of the decoder.

To meet the latest requirements, is currently developing a standard coding of spatial audio object (SAOC). Each channel is considered as an individual object, and all objects are mixed down signal down-mixing. Thus, objects are treated as C is yovie signals, being independent from each other, not adhering to any particular configuration of the loudspeaker, but subject to availability randomly place the (virtual) speakers on the side of the decoder. Individual objects can include individual sound sources, such as instruments or voice channels. Unlike MPEG decoder surround sound decoder SAOC can freely individually to mix with increased signal down-mixing for playback of individual objects to any configuration of the loudspeaker. To ensure the SAOC decoder the ability to restore individual objects encoded in the data stream SAOC, the difference of levels of the object and the objects forming together a stereo (or multichannel) signal, the parameters of interobjective cross-correlation is transmitted as additional information in the SAOC bit stream. In addition, the SAOC decoder/transcoder provides information showing how the individual objects were mixed with the down signal down-mixing. Thus, the decoder can recover individual channels SAOC and visualize these signals to any configuration of the loudspeaker using a controlled user information visualization.

However, although the above codec decode the market, that is, the MPEG surround sound and SAOC can transmit and render multi-channel audio content on the configuration of the loudspeaker, with more than two loudspeakers, the growing interest in the headphones as the sound playback system requires that these coder-decoders could also visualize the audio content on headphones. In contrast to the playback speaker stereo and audio content that is played through the headphones, is perceived in the head. The lack of effect of acoustic highway (tracks) from sources in certain physical positions to the ear drum causes the unnatural sound of a spatial image, because the cues that determine the perceived azimuth, elevation, and distance to the sound source, mainly missing or are very inaccurate. Thus, to solve the problem of unnatural sound, caused by inaccurate or missing replicas localization of sound sources on the headphones, different methods have been proposed modeling virtual installation of the loudspeaker. The idea is to impose replica localization of sound sources on each signal of the loudspeaker. This is achieved by filtering the audio signal using the so-called functions modeling the perception of sound (HRTFs) or bin the Ural pulse characteristics of the room (BRIRs), if the acoustic properties of the room are included in these measurement data. However, filtering each signal through the loudspeaker of the above functions would require significantly more computational power on the side of the decoder/playback. In particular, visualization of multi-channel audio signal at the locations of the "virtual" speakers would have to be done first, in this case, where each signal of the loudspeaker, thus obtained, is filtered through the corresponding transfer function or impulse response to obtain the left and right channel of the binaural output signal. Even worse is the fact that thus obtained binaural output signal would have a bad sound quality due to the fact that to receive signals of the virtual loudspeaker relatively large number of synthetic signals of decorrelation must be mixed in the signals increase mixing to compensate for the correlation between the original nanoreliability input audio signals;

correlation is the result of downward mixing of multiple input signals into a signal down-mixing.

In the current version of the encoder-decoder SAOC parameters SAOC additional information provide the user the massive interactive spatial visualization of sound objects through the use of any playback settings, includes headphones. Binaural rendering on headphones provides spatial control of position of the virtual object in three-dimensional space through the use of function parameters modeling the perception of sound (HRTF). For example, binaural rendering in SAOC can be implemented by restricting this case case mono down-mix SAOC, where input signals are equally mixed in mono. Unfortunately, mono stereo downmix requires that all audio signals were mixed into a single mono signal down-mixing so that the original properties of correlation between the original sound signals maximum was lost, and therefore the quality of visualization of the binaural output signal is not optimal.

Thus, the purpose of this invention is the provision schemes binaural rendering multichannel audio signal so that the binaural rendering is improved, while at the same time destroyed restrictions on freedom of establishment signal down-mix from the original audio signals.

This goal is achieved through the use of a device according to claim 1 and the method according to claim 10.

One of the main ideas underlying the present invention is that Zap the IC binaural rendering multichannel audio signal from a stereo down-mixing is more preferable, than start binaural rendering multichannel audio signal from a mono down-mix, due to the fact that some objects present in the individual channels of a stereo down-mix; the amount of decorrelation between the individual sound signals are better preserved; and the choice between the two channels of a stereo down-mixing on the side of encoder ensures that the properties of the correlation between the audio signals in different channels down-mixing partially preserved. In other words, due to downward mixing of encoder interobjective coherence deteriorate, which should be taken into account on the side of decoding, where interchannel coherence binaural output signal is an important measure of perception of the width of the virtual sound source, but using a stereo down-mixing instead of the mono down-mix reduces the amount of degradation so that the recovery/generation of the appropriate size interchannel coherence through binaural rendering stereo down-mixing provided the best quality.

Following the main idea of this application is that the above-mentioned control of the ICC (ICC=micanol the Naya coherence) can be achieved through decorrelating signal, forming a perceptual equivalent to a mono down-mix channel down-mixing a stereo down-mixing, however, decorrelation to the mono down-mix. Thus, while the use of stereo down-mixing instead of the mono down-mix preserves some properties of the correlation multitude of audio signals, which would be lost when using the mono down-mix, binaural visualization can be based on decorrelating signal, which is representative of the first and second channel down-mixing, thus reducing the number of decorrelation or treatments synthetic signal compared to the decorrelation of each stereo channel down-mixing.

With reference to the drawings of the preferred embodiment of this application are described in more detail, where:

Figure 1 shows a block diagram of the location of encoder/decoder SAOC, which can be implemented the implementation of the present invention;

Figure 2 shows a schematic and illustrative image spectral representation monoscope signal;

Figure 3 shows a block diagram of an audio decoder, capable of binaural rendering coz the ACLs implementation of the present invention;

Figure 4 shows the block diagram of the preprocessing block down-mixing 3 according to the implementation of the present invention;

Figure 5 shows the block diagram of the stages performed by the block processing parameters SAOC 42 figure 3, according to the first alternative; and

6 shows a graph illustrating results of the listening test.

Before hereinafter will be described in detail the implementation of the present invention is an encoder-decoder SAOC and the SAOC parameters passed in the SAOC bit stream, in order to facilitate the understanding of the specific implementation, outlined below in more detail.

Figure 1 shows the General scheme of encoder SAOC 10 and the SAOC decoder 12. The encoder SAOC 10 receives as input N objects, that is, audio signals 141-14N. In particular, the encoder 10 includes a device down-mixer 16, which receives audio signals 141-14Nand mixes them up signal down-mix 18. Figure 1 is a signal down-mixing approximately shown as a stereo down-mixing. However, the encoder 10 and the decoder 12 can also work in mono, in this case, the signal is down-mixing would be a mono signal down-mixing. The following description, however, concentrates on the case is area down-mixing. Signal channels of a stereo down-mix 18 labeled LO and RO.

To enable the SAOC decoder 12 to restore individual objects 141-14Nthe device downward mixing 16 provides the SAOC decoder 12 for more information, including SAOC parameters, including the difference between the levels of the object (OLD), the parameters of interobjective cross-correlation (IOC), the gain values down-mixing (DMG) and the difference in levels of channel down-mixing (DCLD). Additional information 20 including SAOC parameters, along with the signal down-mixing 18, generates an output data stream SAOC 21 received by the SAOC decoder 12.

The SAOC decoder 12, which includes increasing the mixer 22 receives the signal down-mixing 18, as well as additional information 20 to recover and visualize audio signals 141and 14Nfor any selected user set of channels 241-24M; in this visualization, the prescribed information visualization 26, included in the SAOC decoder 12, and the HRTF parameters 27, which is described in more detail below. The following description focuses on binaural rendering, where M'=2 and the output signal is specifically designed for playback through headphones, although the decoding 12 can perform the imaging is also on the other (nebenorejau) configuration of the loudspeaker according to the commands from the user input 26.

Sound signals 141-14Ncan be entered in the device downward mixing 16 in any region coding, for example, a temporary or spectral region. If audio signals 141-14Nserved in the unit step-down mixer 16 into the time domain, such as encoded PCM (pulse code modulation), the device downward mixing 16 uses comb filters, such as comb hybrid QMF (quadrature mirror filter), for example comb complex exponentially modulated filter with filter extension on Nyquist for the lowest frequency bands to increase their frequency resolution for transmitting signals in the spectral region in which the audio signals are provided in several sub-bands associated with different spectral parts, in particular the resolution of the comb filters. If the audio signals 141-14Nalready in the representation expected by the device downward mixing 16, it should not perform the spectral decomposition.

Figure 2 shows the audio signal in the just-mentioned spectral range. As you can see, the audio signal is represented as a set of signals of sub-bands. Each signal subband 301-30Pconsists of a follower of the spine of the values of sub-bands, indicated by small rectangles 32. As you can see, the values of the 32 sub-bands signals of sub-bands 301-30Psynchronize with each other in time so that for each successive time interval of the comb filter 34 each subrange 301-30Pincluded one the exact size of 32 subband. As illustrated by axis frequency 35, the signals podpisano 301-30Passociated with different frequency regions, and, as illustrated by the time axis 37, the time range of the comb filters 34 are sequentially arranged in time.

As outlined above, the device downward mixing 16 calculates the SAOC parameters of the input audio signals 141-14N. The device downward mixing 16 performs this computation in the time/frequency resolution, which can be reduced relative to the original temporal/frequency resolution, as defined intervals comb filters 34 and decomposition sub-bands, up to a certain value, where this particular amount can be transferred on the side of the decoder within the additional information 20 corresponding syntactic elements bsFrameLength and bsFreqRes. For example, groups of consecutive time slots of the comb filter 34 may, accordingly, is to form the frame 36. In other words, the audio signal may be divided into frames overlapping in time or, for example, which is directly adjacent in time. In this case bsFrameLength can determine the number of parametric time slots 38 on the frame, i.e. the unit of time in which the SAOC parameters, such as OLD and IOC are calculated SAOC frame 36, and bsFreqRes can determine the number of processing frequency bands, for which to compute the SAOC parameters, i.e. the number of ranges, which is divided into the frequency domain and for which the SAOC parameters are defined and communicated. By this measure, each frame is divided into time/frequency elements, illustrated in figure 2 by the dotted lines 39.

The device downward mixing 16 calculates the SAOC parameters according to the following formulas. In particular, the device downward mixing 16 calculates the difference between the levels of the object for each object i as

OLDi=nkmxin,kxin,kmaxj( nkmxjn,kxjn,k)

where the amount and the indices n and k, respectively, pass through all time intervals of the comb filters 34 and all sub-bands of the comb filter 30, which belong to a specific time/frequency element 39. Thus, the energy of all of the values of sub-bands xisound signal, or object i, are summed and normalized to the largest value of the energy of this element among all objects or sounds.

Next, SAOC device downward mixing 16 can calculate the degree of similarity to the corresponding time/frequency elements of different pairs of input objects 141-14N. Although SAOC device downward mixing 16 can calculate the degree of similarity between all pairs of input objects 141-14Nthe device downward mixing 16 may also inhibit the transmission of signals measures of similarity or limit calculating measures of similarity for sound objects 141-14Nthat form the left or right channels of the General stereo channels. In any case, the measure of similarity is azyvaetsja interobjective parameter cross-correlation IOC i,j. Calculated as follows

IOCi,j=IOCj,i=Re{nkmxin,kxjn,knkmxin,kxin,knkmxjn,kxjn,k}

where again the indices n and k run through all values of sub-bands belonging to a specific time/frequency element 39, and i and j denote certain pairs of sound objects 141-14N.

The device is down-mixed audio is Denmark 16 mixes with decrease objects 14 1-14Nusing the gain applied to each object 141-14N.

In the case of a stereo signal down-mixing, as shown in figure 1, the gain of the D1,iapplied to the object i, and then all such amplified objects are summed to obtain the left channel down-mix L0, and the gain of D2,iapplied to the object i, and then the thus amplified objects are summed to obtain the right channel down-mix R0. Thus, the coefficients D1,iand D2,iform the matrix downward mixing of D of size 2×N

D=(D1,1D1ND2,1DN)and (LORO)=D(ObjiObjN) .

This instruction down-mixing is transmitted to the side of the decoder by means of the gain down-mixing DMGiand , in the case of a stereo signal down-mixing by means of the difference of levels of channel down-mixing DCLDi.

The gains down-mixing is calculated according to:

DMGi=10log10(D1,i2+D2,i2+ε),

where ε is a small number such as 10-9or 96 dB below the maximum input signal.

For DCLDsthe following formula is used:

DCLD1=10log10(D1,i2D2,i2).

The device downward mixing 16 generates a signal of a stereo down-mixing according to:

(L0 R0)=(D1D2)(Obj1ObjN)

Thus, in the above formulas, the parameters of the OLD and the IOC are a function of the audio signals and parameters DMG and DCLD function D. incidentally, it is seen that D may change in time.

In the case of binaural rendering described here, the operation mode of the decoder output signal naturally includes two channels, that is, M'=2. However, the above information visualization 26 shows how the input signals 141-14Nmust be distributed to the provisions of the 1-M virtual loudspeaker, where M may be greater than 2. Information visualization, thus, may include a matrix visualization M, showing how the input objects objimust be distributed to the provisions of the virtual loudspeaker j to receive signals of the virtual loudspeaker vsjwhere j is between 1 and M inclusive, a i is between 1 and N inclusive, when

( νS1νSM)=M(Obj1ObjN)

Information visualization can somehow be provided or entered by the user. It is even possible that the visualization information 26 is contained in the additional information of the flow SAOC 21. Of course, it is possible to prevent information visualization has changed over time. For example, the temporal resolution may be equal to the resolution of the frame, that is, M can be determined for a frame 36. You can even change M frequency. For example, M may be determined for each element 39. Further, for example,Mrenl,mwill be used to identify M with m denoting the frequency range, and l denoting the parameter time interval 38.

Finally, in the future will be mentioned HRTFs (function modeling the perception of sound) 27. These HRTFs describe how the signal of the virtual loudspeaker j must visualise is to focus on the left and right ear respectively, to preserve the binaural cues. In other words, for each position of the virtual speaker j there are two HRTFs, namely: one for the left ear and one for the right ear. As will be described in more detail below, it is possible that the decoder is provided with the HRTF parameters 27, which includes, for each position of the virtual speaker j compensation of the phase shift fjdescribing the compensation of the phase shift between the signals received by both ears and comes from the same source j, and two magnification factor of attenuation of the amplitude of the Pi,Rand Ri,Lfor right and left ear, respectively, describing the weakening of both signals, due to the head of the listener. The HRTF parameter 27 can be constant over time, but is determined at a certain frequency resolution, which can be equal to the resolution setting SAOC, i.e. the frequency range. In the future, the HRTF parameters are given asFjm,Pj,RmandPj,Lmwith m denoting the frequency range.

p> Figure 3 shows the SAOC decoder 12 figure 1 in more detail. As there shown, the decoder 12 includes a preprocessing block down-mixing 40 and the SAOC parameter processing unit 42. The preprocessing block down-mixing 40 is formed to receive the signal stereo down-mix 18 and to convert it into a binaural output signal 24. The preprocessing block down-mixing 40 performs this transformation in a way that is controlled by the processing unit of the SAOC parameter 42. In particular, the processing unit SAOC parameter 42 provides the preprocessing block down-mixing 40 information about the order of the visualization 44, which is the processing unit of the SAOC parameter 42 receives from the additional information SAOC 20 and information visualization 26.

Figure 4 shows the preprocessing block down-mixing 40 in accordance with the implementation of the present invention in more detail. In particular, in accordance with figure 4, the preprocessing block down-mixing 40 includes two path connected in parallel between the input to which a signal is being received in stereo down-mixing 18, that is, Xn,kand the output unit 40, which is the binaural output signal Xn,knamely , the tract called dry path 46, to which is has consistently attached the unit dry visualization; and wet path 48 to which sequentially attached generator signal decorrelation 50 and the block wet visualization 52, where stage mixer 53 is used to mix the outputs of both circuits 46 and 48, to obtain the final result, namely the binaural output signal 24.

As will be described in more detail below, the unit dry visualization 47 is formed to calculate the preliminary binaural output signal 54 from the signal of the stereo down-mixing 18, where the preliminary binaural output signal 54 is output tract dry visualization 46. The unit dry visualization 47 performs its calculation, based on the prescription dry visualization presented by the processing unit of the SAOC parameter 42. In the specific implementation described below, the order of rendering is defined by a matrix of dry rendering Gn,k. The just-mentioned position illustrated in figure 4 by the dashed arrow.

Generator decorrelating signal 50 is formed to generate decorrelating signalXdn,kfrom the signal of the stereo down-mix 18 through down-mixing so that he was perceptual equivalent to a mono down-is tsiranana right and left channel signal to the stereo down-mixing 18, however, being decorrelating to the mono down-mix. As shown in figure 4, the generator decorrelating signal 50 may include an adder 56 to sum the left and right channels of a stereo signal down-mixing 18, for example, in the ratio of 1:1, for example, or in some other fixed ratio to get the corresponding mono stereo downmix 58, followed by decorrelator 60 for generating the above-mentioned decorrelating signalXdn,k. Decorrelator 60 may, for example, include one or more delay stages to form decorrelating signalXdn,kfrom the delayed version or weighted sum of delayed versions of the mono down-mix 58 or even a weighted sum to mono panyhose the mixer 58 and delayed version(s) of the mono down-mix. Of course, there are many alternatives for decorrelator 60. In fact, the decorrelation performed by decorrelation 60 and generator decorrelating signal 50, respectively, tends to reduce the interchannel coherence between decorrelating the m signal 62 and mono step-down mixer 58, as measured by the above formula, the corresponding interobjective cross-correlation, with a significant preservation of their difference of levels of the object, when measured by the above formula for the difference of levels of the object.

Block wet visualization 52 is formed to calculate a corrective binaural output signal 64 from decorrelating signal 62; thus received corrective binaural output signal 64 represents the output of a tract of wet visualization 48. Block wet visualization 52 bases its calculation on the prescription wet rendering, which, in turn, depends on the prescription dry visualization used by the unit dry visualization 47, as described below. Accordingly, the injunction wet visualization, as indicated by the P2n,kin figure 4, is obtained from the processing unit of the SAOC parameter 42, as indicated by the dotted arrow in figure 4.

Stage mixer 53 is used to mix binaural output signals 54 and 64 tracts of dry and wet visualization 46 and 48 to receive the final binaural output signal 24. As shown in figure 4, stage mixer 53 is formed to mix left and right channels of the binaural output signals 54 and 64 individually and may, respectively, include the adder 66 for summera the project they left channel and the adder 68 to summarize their right channels, respectively.

Having described the structure of the SAOC decoder 12 and the internal structure of the preprocessing block down-mixing 40, further describes their functionality. In particular, the implementation details described below are various alternatives for handling block SAOC parameter 42 for information about the order of the visualization 44, thereby controlling interchannel coherence of the binaural signal of the object 24. In other words, the processing unit SAOC parameter 42 not only calculates information about the order of the visualization 44, and simultaneously controls the mixing ratio by which the prior and corrective binaural signal 55 and 64 are mixed in the final binaural output signal 24.

In accordance with the first alternative, the processing unit SAOC parameter 42 is formed to control the just-mentioned mixing ratio, as shown in figure 5. In particular, at the stage 80 is determined by the actual binaural inter-channel coherence preliminary binaural output signal 54 or estimated by the block 42. At stage 82, the processing unit SAOC parameter 42 defines a given (target) value binaural inter-channel coherence. Based on these, thus certain values interchannel coherence, the unit is about the development of the SAOC parameter 42 sets the above-mentioned mixing ratio at stage 84. In particular, stage 84 may include a processing unit SAOC parameter 42, and accordingly calculates the prescription dry visualization used by the unit dry visualization 42, and the prescription wet visualization used by the unit wet visualization 52, respectively, based on the values of interchannel coherence, on certain stages 80 and 82, respectively.

Further to the above alternatives will be described in mathematical basis. Alternatives differ from each other by way of the processing unit of the SAOC parameter 42 defines information about the order of the visualization 44, including the prescription dry visualization and prescription wet rendering, essentially controlling the mixing ratio between the tracts of dry and wet visualization 46 and 48. In accordance with the first alternative, shown on figure 5, the processing unit SAOC parameter 42 defines a given (target) value binaural inter-channel coherence. As will be described in more detail below, the unit 42 may perform this determination based on the components of the matrix a given (target) coherence F=A·E·Awhere "" denotes the conjugate transpose of a matrix, And is a given (target) matrix binaural rendering, connecting objects/sounds 1...N with the right and left channel binaural the output signal 24 and the preliminary binaural output signal 54, respectively, and derived from information visualization 26 and settings HRTF 27, and E - matrix, the coefficients of which are derived from IOSij1,mand the differences in the levels of objectOLDil,m. The calculation can be performed in the spatial/time resolution of the SAOC parameters, that is, for each (l, m). But then you can perform the calculation in a lower resolution with interpolation between the corresponding results. The last statement is also true for the subsequent calculations presented below.

As specified (target) matrix binaural rendering And associates whodie objects 1...N with left and right channels of the binaural output signal 24 and the preliminary binaural output signal 54, respectively, of the same size 2×N, that is,

A=(a11...a1Na21...a2N)

The above matrix E has a size of N×N coefficients, defined as

eij=OLDiOLDjmax(IOCij,0)

Thus, the matrix E with

E=(e11e1NeN1eNN)

is along the diagonal of the difference of levels of the object, that is,

eii=OLDi

as IOCij=1 for i=j, whereas the matrix E outside the diagonal matrix has coefficients that represent the geometric mean of the difference of levels of object objects i and j, respectively, weighted by interobjective measures cross-correlation IOSij(if it is greater than 0 with coefficients set to 0 otherwise).

Compared for this purpose, the second and third alternatives, described below, seek to obtain the matrix visualization through noorden what I best fit in the sense of least squares equations, which displays the signal of the stereo down-mix 18 at the preliminary binaural output signal 54 by means of the dry matrix visualization of G equations for a given (target) visualization that displays the input object through the matrix "given" (target) binaural output signal 24 from the second and third alternatives, which differ from each other by the way in which is formed a best match, and the way is selected matrix wet rendering.

To facilitate understanding of the following alternatives, the above description of figure 3 and 4 repeatedly described mathematically. As described above, the stereo down-mix 18 Xn,kreaches the SAOC decoder 12 along with the SAOC parameters 20 and user-defined information visualization 26. Further, the SAOC decoder 12 and the processing unit SAOC parameter 42, respectively, have access to the database HRTF, as indicated by the arrow 27. Transferred to the SAOC parameters include the difference between the levels of the objectOLDil,m, value of interobjective cross-correlationIOCijl,mthe coefficients reinforced what I down-mixing DMGil,mand the difference between the levels of channel down-mixDCLDil,mfor all N objects i, j in l, m, denoting the corresponding temporal/spectral element 39, with l representing time, and m, defines the frequency. For example, it is assumed that the HRTF parameters 27 is represented asPq,Lm,Pq,RmandFqmfor all positions of the virtual loudspeaker or the position of the virtual spatial sound source q, for the left (L) and right (R) binaural channel and for all frequency bands m.

The preprocessing block down-mixing 40 is formed to calculate the binaural outputX^n,kas vichicle the tion of the stereo down-mix X n,kand decorrelating the mono down-mixXdn,kas

X^n,k=Gn,kXn,k+P2n,kXdn,k

Decorrelating signalXdn,kperceptual equivalent of $ 58 left and right channel down-mixing a stereo signal down-mixing 18, but most decorrelator according to her

Xdn,k=decorrFunction((11)Xn,k)

With reference to figure 4, the generator decorrelating signal 50 performs the function decorrFunction the above formula.

Further, as described above, the preprocessing block down-mixing 40 includes two parallel path 46 and 48. Accordingly, the above equation is based on two independent time/frequency matrices, namely Gl,mfor dry andP2l,mfor wet path.

As shown in figure 4, the decorrelation on a wet highway can be achieved by the sum of the left and right channel down-mix fed to decorrelator 60, 62 generates a signal that is perceptually equivalent, but most decorrelator to input 58.

The elements just mentioned matrices are computed by the preprocessing block SAOC 42. As also indicated above, the elements just mentioned matrices can be computed in the time/frequency resolution of the SAOC parameters, i.e. for each time interval l and each of the processing range m. The elements of the matrix, thus obtained, can be stretched in frequency and interpolated in time, resulting in matrix En,kandP2l,mdefined for all time intervals rowed the NCI n filters and the frequency of the k sub-bands. However, as mentioned above, there are also alternatives. For example, interpolation may not be taken into account that in the above equation, the indices n, k could be effectively replaced by "l, m". In addition, the calculation of the elements just mentioned matrices can even be performed in a reduced time/frequency resolution interpolated resolution l, m or n, k. So, again, although in the future, the indices l, m show that the matrix computations are performed for each element 39, the calculation can be performed at a lower resolution, where appropriate matrix preprocessing block down-mixing 40 matrix visualization can be interpolated to the final resolution, such as temporal/frequency resolution QMF values of the individual sub-bands 32.

According to the above-mentioned first alternative matrix dry imaging Gl,mis calculated for the left and right channel down-mix separately, so that

Gl,m=(PLl,m,1cos(βl,m+αl,m) exp(jφl,m,12)PLl,m,2cos(βl,m+αl,m)exp(jφl,m,22)PRl,m,1cos(βl,m-αl,m)exp(-jφl,m,12)PRl,m,2cos(βl,m-αl,m)exp(-jφl,m,22))

The corresponding gainPLl,m,x ,PRl,m,xand the phase difference ϕl,m,xdefined as

PLl,m,x=f11l,m,xVl,m,x,PRl,m,x=f22l,m,xVl,m,x,

φl,m,x={arg(fl,m,x)0if0mconst1|f12l,m, x|f11l,m,xf22l,m,xconst2else

where const1may be, for example, 11, and const2may be 0.6. The index x denotes the left or right channel down-mix and, accordingly, is assumed to equal 1 or 2.

In essence, the above provision makes a distinction between the higher spectral range and lower spectral range and, mainly, (potentially) is performed only for the lower spectral range. Additionally or alternatively, the position depends on whether one valid value binaural inter-channel coherence and the specified (target) value binaural inter-channel coherence predetermined relationship to the magnitude of the threshold coherence or not, despite the fact that the position of potentially feasible only if the coherence exceeds a threshold. Just mentioned individual podologie, as indicated above, may be combined by% the SAR.

The scalar Vl,m,xis calculated as

Vl,m,x=Dl,m,xEl,m(Dl,m,x)+ε.

It is observed that ε may be the same or different from ε mentioned above regarding the definition of the gain down-mixing. Above the matrix E has already been entered. Index (l, m) simply denotes a temporary/frequency dependence of matrix computations, as mentioned above. Further, the matrix Dl,m,xmentioned above regarding the definition of the gain down-mixing and the differences in the levels of channel down-mixing to Dl,m,1consistent with the above D1and Dl,m,2consistent with the above D2.

However, to facilitate understanding of how the processing unit SAOC parameter 42 produces a dry, generating a matrix Withl,mfrom the obtained parameters SAOC; the correspondence between the matrix channel down-mix Dl,m,xand the prescription of the down-mixing, including the gain down-mixing DMGil,mandDCLDil,magain seems to be in the opposite direction. In particular, thedil,m,xmatrix channel down-mix Dl,m,xsize 1×N, that is,Dl,m,x=(d1l,m,x,...dNl,m,x)given as

dil,m,1=10DMGil,m20dil,m1+dil,m, dil,m,2=10DMGil,m2011+dil,m

withdil,mdefined as

dil,m=10DCLDil,m10.

In the above equation Gl,mthe gain ofPLl,m,xandPRl,m,xand the phase difference ϕl,m,xdepend on the coefficients of fchannel-x matrix given individual (target) covariance Fl,m,xthat, in turn, how will the proposed below in more detail, depends on the matrix El,m,xof size N×N,eijl,m,xwhich are calculated as

eijl,m,x=eijl,m(dil,m,xdil,m,1+dil,m,2)(djl,m,xdjl,m,1+djl,m,2).

eijl,mmatrix El,mof size N×N, as indicated above, given aseijl,m=OLDil,m OLDjl,mmax(IOCijl,m,0).

Just mentioned the matrix a given (target) covariance Fl,m,xsize 2×2 withfuνl,m,xsimilar to the matrix of covariance F, above, is given as

Fl,m,x=Al,mEl,m,x(Al,m),

where "" corresponds to the conjugate transpose.

Set (target) binaural matrix visualization Andl,mretrieved from the HRTF parametersFqm,Pq,RmandP q,Lmfor all NHRTEprovisions of the virtual loudspeaker q and matrix visualizationMrenl,mand has a size of 2×N. the elements ofauil,mdetermine the desired relationship between all objects i and binaural sound signal as

a1,il,m=q=0NHRTE-1mq,il,mPq,Lmexp(jφqm2),a2,il,m=q=0NHRTE-1/mrow> mq,il,mPq,Rmexp(-jφqm2).

Matrix visualizationMrenl,mwithmqil,mcommunicates each sound object i from the virtual loudspeaker q, presents HRTF.

Matrix wet enhance mixingP2l,mis calculated based on the matrix Gl,mas

P2l,m=(PLl,msin(βl,m+αl,m)exp(jarg(c12/mrow> l,m)2)PRl,msin(βl,m-αl,m)exp(-jarg(c12l,m)2))

The gain ofPLl,mandPRl,mdefined as

PLl,m=c11l,mVl,m,PRl,m=c22l,mVl,m .

The matrix of covariance Withl,msize 2×2 withcu,νl,mdry binaural signal 54 is assessed as

Cl,m=Gl,mDl,mEl,m(Dl,m)(Gl,m)

where

Gl,m=(PLl,m,1exp(jφl,m,12)PLl,m,2exp(jφl,m,22)mi> PRl,m,1exp(-jφl,m,12)PRl,m,2exp(-jφl,m,22))

The scalar Vl,mis calculated as

Vl,m=Wl,mEl,m(Wl,m)+ε.

wil,mmatrix wet mono down-mix Wl,msize 1×N are given as

wil,m=dil,m,1+dil,m,2.

dx,jl,mmatrix stereo down-mix Dl,msize 2×N are given as

dx,il,m=dil,m,x.

In the above equation Gl,mthat αl,mand βl,mare the corners of the rotator, intended for the control of the ICC (inter-channel coherence). In particular, the angle of rotation αl,mcontrols the mix of dry and wet binaural signal to accommodate ICC binaural output signal 24 to the ICC binaural specified (target) signal. During the installation angles of the rotator should be taken into account ICC dry binaural signal 54, which is, depending on the audio content and matrix stereo down-mix D, typically less than 1.0 and greater than the specified (target) ICC. This is in contrast to the mono down-mix based binaural rendering, where ICC dry binaural signal is always equal to 1.0.

The angles of rotation αl,mand βl,mcontrol the mixing of dry and wet binaural signal. ICCρC l,mdry binaural visualized stereo down-mixing 54 at the stage 80 is estimated as

ρCl,m=min(|c12l,m|c11l,mc22l,m,1).

Full binaural specified (target) ICCρCl,mat the stage 82 is assessed as or defined as

ρTl,m=min(|f12l,m|f11l,mf22l,m,1)

The angles of rotation αl,mand βl,mfor minimizati the energy of the wet signal then at stage 84 is installed as

αl,m=12(arccos(ρTl,m)-arccos(ρCl,m)),

βl,m=arctan(tan(αl,m)PRl,m-PLl,mPLl,m+PRl,m).

Thus, according to just presents a mathematical description of the functionality of the SAOC decoder 12 for generating a binaural output signal 24 of the processing unit of the SAOC parameter 42 calculates, when determining the actual binaural ICC,ρCl,musing the above equations formsubsup> ρCl,mand auxiliary equations, as presented above. Similarly, the processing unit SAOC parameter 42 calculates, when the definition of the specified (target) binaural ICC on stage 82,ρCl,musing the above equations and auxiliary equations. Based on this processing unit SAOC parameter 42 determines at stage 84 corners of the rotator, thus setting the mixing ratio between the dry and wet tract visualization. With these angles rotator processing unit SAOC parameter 42 builds dry and wet matrix visualization or parameters increase mixing Gl,mandP2l,mthat, in turn, are used by the preprocessing block increasing mixing 40 - resolution n - k, to obtain the binaural output signal 24 from the stereo enhance mixing 18.

It should be noted that the above-mentioned first alternative may change in some way. For example, the above equation for interchannel differential phaseF Cl,mcan be changed to the extent that the second Podporozhye could compare the actual ICC dry binaural visualized stereo down-mixing with const2and not ICC determined from the matrix Fl,m,xindividual covariance of the channel so that in the equation part of|f12l,m,x|f11l,m,xf22l,m,xwas replaced by the term|c12l,m|c11l,mc22l,m.

Further, it should be noted that in accordance with the selected notation in some of the above equations, the matrix is not taken into account when a scalar constant, such as ε, was added to Matri is e so, to this constant was added to each coefficient of the corresponding matrix.

Alternative generating dry matrix visualization with higher potential extraction of the object based on the joint processing of the left and right channel down-mix. Omitting the index a couple of sub-bands for clarity, the principle aims to achieve the best match in the sense of least squares equations

X^=GX

given (target) visualization

Y=AS.

This results in a matrix of the specified (target) covariance:

YY=ASSA,

where kompleksnoznachnym specified (target) binaural matrix visualization As given in the previous formula, and the matrix S contains the original object signals of sub-bands in the form of rows.

The line of least squares is calculated from information of the second order received from the moved object and data down-mixing. Thus, the following substitutions are made

XX↔DED,

YX↔AED,

YY↔AEA.

To motivate replacement, recall that the parameters of the object SAOC usually carry information about the energy object (OLD) and (selected) lgobject the th cross-correlation (IOC). From these parameters, the result is a matrix of covariance of the object F of size N×N, which represents the approximation to SSthen there is E≈SSthat results in YY=AEA.

Further, X=DS, and the matrix of covariance of the reduction of the mixing becomes:

XX=DSSD,

which again can be obtained from E by XX=DED.

The dry matrix visualization of G is obtained by solving the least squares problems

min{norm {Y-X}}.

G=G0=YX(XX)-1

where YXcalculated as YX=AED.

Thus, the unit dry visualization 42 determines the binaural output signalX^from the signal of the down-mixing X with a 2×2 matrix improves mixing G by means ofX^=GXand the processing unit SAOC parameter determines G using the above formulas to be

G=AED(DED)-1

This is set kompleksnoznachnym matrix dry visualization, kompleksnoznachnym matrix wet visualization of the P - previously defined, R2- calculated in the processing unit of the SAOC parameter 42, whereas the missing matrix of the error covariance

ΔR=YY-G0XXG0.

It can be shown that this matrix is positive, and is the preferred choice of R is carried out by selecting the eigenvector of unit norm of u corresponding to the largest eigenvalue λ ΔR, and its scaling according to

P=λVu,

where the scalar V is calculated, as noted above, that is, V=WE(W)+ε.

In other words, as the wet path is set to adjust the correlation obtained dry solutions, ΔR=AEA-G0DEDG0is the missing matrix of the error covariance, that is,YY=X^X^+ΔRorand therefore, the processing unit SAOC parameter 42 cancels the edit P such that PP=ΔR for which is given one solution by choosing viewpo analogo eigenvector of unit norm u.

The third way of generating matrices dry and wet visualization represents the parameters of a visualization based on a comprehensive prediction of the replica limit, and combines the advantage of a fixed recovery complex covariance structure with the advantages of joint processing channel down-mixing for improved retrieval of the object. An additional feature provided by this method is that it allows you to completely omit wet enhance mixing in many cases, thus setting the stage for a version of binaural rendering with lower complexity of calculation. As with the second alternative, the third alternative presented below is based on joint processing of the left and right channel down-mix.

Principle designed to best fit in the sense of least squares

X^=GX

for a given (target) visualization Y=AS pressure corrected complex covariance

GXXG+VPP=Y^ Y^.

Thus, the goal is to find a solution for G and R, so that

1)Y^Y^=YY(which is a limitation of the formulation 2); and

2)min{norm{Y-Y^}}as was required in the second alternative.

From theory of Lagrange multipliers, it follows that there exists a self-adjoint matrix M=Mso

Mr=0 and

MGXX*=TX.

In the General case, where YXand XXnondegenerate (nonsingular), from the second equation it follows that M is nondegenerate (nonsingular), and so R=0 is the only solution of the first equation. It is a decision without wet rendering. Setting K=M-1one can notice that an increasing dry mixing is provided by

G=KG0,

where G0predictive solution obtained above regarding the second alternative, and self-adjoint matrix To solve

KG0XX*G0*K*=YY*.

If odnosno is about the positive and therefore, self-adjoint matrix square root of the matrix G0XX*G0*denoted by Q, then the solution can be written as

K=Q-1(QYY*Q)1/2Q-1.

Thus, the processing unit SAOC parameter 42 defines G as KG0=Q-1(QYY*Q)1/2Q-1G0=(G0DED*G0*)-1(G0DED*G0*AEA*G0DED*G0*)1/2(G0DED*G0*)-1G0when G0=AED*(DED*)-1.

For internal square root usually there are four self-adjoint solutions and select the solution, leading to better complianceX^and Y.

In practice, it is necessary to limit the dry matrix visualization G=KG0to the maximum size, for example, through restrictions on the sum of absolute values of the squares of all matrix coefficients of dry visualization, which can be expressed as

trace(GG*)≤gmax.

If the solution violates this limiting condition, instead it is a solution which is on the border. This is achieved by adding constraints

trace(GG)=gmax

to the previous constraints, the re-launch of the Lagrange equations. It turns out that the previous equation

MGXX*=YX*

should be replaced

MGXX*+µI=YX*,

where µ is the additional intermediate complex parameter and I is the 2×2 identity matrix. The result is a solution with nonzero wet visualization of R. In particular, the solution for the matrix increase wet mixing can be found by PP*=(YY*-GXX*G*)/V=(AEA*-GDED*G*)/V, where R preferably is based on the above reasoning about their own value relative to the second alternative, and V is WEW*+□. The latter definition P is also performed by the processing unit of the SAOC parameter 42.

Thus, certain matrices G and R are then used blocks wet and dry rendering, as described previously.

If you need a version of low complexity, the next step is to replace even this solution a solution without wet rendering. The preferred method of achieving this should reduce the requirement for complex covariance to match only on the diagonal, so that the correct signal strength was achieved in the right and left channels, but the mutual covariance remained open.

Regarding the first alternative, the subjective listening tests were conducted in the acoustic instrument and isolated listening room, which are designed to provide high-quality listening. The result is outlined below.

The play was performed using the headphones (STAX SR Lambda Pro with Lake-People D/A Converter and a STAX SRM-monitor). Test method followed the standard procedures used in the spatial audio validation tests based on Multiple stimulus with hidden reference and anchors" (MUSHRA), method for the subjective assessment of sound intermediate quality.

A total of 5 students participated in each of the performed tests. All subjects can be considered as experienced listeners. In accordance with the MUSHRA methodology listeners were trained to compare all tested conditions with the standard. The test conditions were randomized automatically for each test point, and for each listener. Subjective responses were recorded by a computer program MUSHRA scale from 0 to 100. It was a valid instant switching between test points. The MUSHRA tests were conducted to evaluate the perceptual work described processing from the stereo-to-binaural system MPEG SAOC.

To evaluate the improvement of the perceptual quality of systems described in comparison with the processing from mono-to-binaural, items processed by the system from mono-to-binaural, also were included who were in the test. Appropriate signals mono and stereo down-mixing were AAC-encoded at 80 kilobits per second per channel.

As HRTF database was used KEMAR_MIT_COMPACT". The initial condition was generated binaural filtering objects appropriately weighted HRTF impulse response, taking into account the desired visualization. Anchor condition - low-pass filtered original condition (3.5 kHz). Table 1 contains a list of tested sound points.

[0, -3, 0, 0, 0, 0]
Table 1
Sound points of hearing tests
Paragraphs listeningThe number of mono/stereo-objectsThe corners of the object
The gains of the object (dB)
disco1 disco210/0[-30, 0, -20, 40, 5,-5, 120, 0, -20, -40]
[-3, -3, -3, -3, -3, -3, -3, -3, -3,-3]
[-30, 0, -20, 40, 5, -5, 120, 0, -20, -40]
[-12, -12, 3, 3, -12, -12, 3, -12, 3, -12]
coffeel coffee26/0[10, -20, 25, -35, 0, 120
[10,-20, 25,-35,0, 120]
[3, -20, -15, -15, 3, 3]
pop21/5[0, 30, -30, -90, 90, 0, 0, -120, 120, -45, 45]
[4, -6, -6, 4, 4, -6, -6, -6, -6, -16, -16]

Were tested five different episodes, which are the result visualization (mono or stereo) of objects from 3 different pools of the original object. Three different matrices downward mixing was used in the encoding device SAOC, see Table 2.

Table 2
Types of down-mixing
The down-type mixingMonoStereoDual Mono
Matlab
the notation
dmx1=ons(1,N);dmx2=zeros(2,N);
dmx2(1,1:2:N)=1;
smx2(2,2:2:N)=1;
dmx3=ones(2,N):

Demonstration tests assess the quality improves mixing were identified, as listed in Table 3.

The table is 3
Check listening conditions
Check the conditionThe down-type mixingMain encoder
x-1-bMonoAAC@80 kbps
x-2-bStereoAAC@160 kbps
x-2-b_/MonoDual MonoAAC@160 kbps
5222StereoAAC@160 kbps
One MonoDual MonoAAC@160 kbps

System "5222" uses the preprocessor stereo down-mixing, as described in ISO/IEC JTC 1/SC 29/WG 11 (MPEG), Document No. 10045, "ISO/IEC CD 23003-2:200 Encoding Spatial Audio Object (SAOC)", 85th MPEG Meeting, July 2008, Hannover, Germany, with kompleksnoznachnym binaural specified (target) matrix visualization Andl,mas the input. Thus, the control of the ICC is not running. An informal listening test showed that the use of values Andl,mDL the upper ranges instead of saving it kompleksnoznachnym for all ranges improves. In the test used improved "5222".

A brief overview on the basis of charts showing the results of hearing tests can be found in Fig.6. These charts show the usual MUSHRA sorting per item on all students and the statistical average over all evaluated points along with the associated 95%th confidence intervals. It should be noted that the data for hidden links are omitted from the graphs MUSHRA, because all subjects identified them correctly.

The following observations can be made based on the results of listening tests:

- x-2-b_DualMono” works comparable to “5222”;

- x-2-b_DualMono” obviously works better than “5222_DualMono”;

- x-2-b_DualMono” works comparable to “x-1-b”;

- x-2-b” is made according to the above-mentioned first alternative works a little better than all other conditions;

click “discol” does not show a significant variation of results and is probably unsuitable.

Thus, the concept of binaural rendering stereo down-mixing in SAOC, described above, meets the requirements for different matrices downward mixing. In particular, as a double menopauznogo down-mixing is the same as the sign of a true mono downward miksi the Finance, which was tested in a test listen. Improving the quality, which can be obtained by stereo Panigale mixing compared to mono decreasing mixing, you can see the results of the test of listening. The basic processing units of the above accomplishments were dry binaural rendering stereo down-mixing and mixing with decorrelating wet binaural signal with the proper combination of both blocks.

In particular, wet binaural signal was calculated by using one of decorrelator to the input of a mono down-mix to the left and right power and IPD were the same as in the dry binaural signal.

Mixing wet and dry binaural signals is controlled by the specified (target) ICC (inter-channel coherence) and ICC dry binaural signal, which usually requires less decorrelation than for mono down-mix based binaural rendering, resulting in high overall quality of sound.

Next, the above implementation can be easily modified for any combination of input mono/stereo down-mix and mono/stereo/binaural output in a permanent manner.

In other words, implementation, providing a framework, education is atyaasaa signal, and method for decoding and binaural rendering stereo down-mixing based on SAOC bit streams with control interchannel coherence, have been described above. All combinations of input mono or stereo down-mix and mono, stereo or binaural output can be treated as special cases of the described concept, based on stereo Panigale mixing. The quality of the concept, based on stereo Panigale mixing was typically better than the concept, based on mono Panigale mixing, which was tested in the above MUSHRA listening test.

In the Coding of Spatial Audio Object (SAOC) ISO/IEC JTC 1/SC 29/WG 11 (MPEG), Document No. 10045, "ISO/IEC CD 23003-2:200x Coding of Spatial Audio Object (SAOC)", 85th MPEG Meeting, July 2008, Hannover, Germany, multiple audio objects are mixed down to mono or stereo. This signal is encoded and transmitted together with additional information (parameters SAOC) the SAOC decoder. The above implementation provides inter-channel coherence (ICC) binaural output signal, which is an important measure of the perception of the width of the virtual sound source, and being due to downward mixing of encoder degraded or even destroyed, can be (post is) completely corrected.

Inputs to the system are stereo stereo downmix, the SAOC parameters, information about the spatial visualization and database HRTF. The output is a binaural signal. And the input and output are given in the converted area of the decoder usually through superdirectional complex modulated analyzing comb filters, such as MPEG surround sound hybrid QMF comb filters, 23003-1:2007 ISO/IEC, Information technology - MPEG audio technology - Part 1: MPEG surround sound with low enough to be a combination of the names inside the range. Binaural output signal is converted back to PCM time domain by synthesizing comb filters. In other words, the system, therefore, is an expansion of the potential mono down-mix based binaural rendering, aimed at the stereo down-mixing. For dual mono down-mix output of the system is the same as the system output based on mono Panigale mixing. Therefore, the system can properly handle any combination of input mono/stereo down-mix and mono/stereo/binaural output by setting the parameters of the visualization stable way.

In other words, the above exercise perform binur the optimum visualization, and the SAOC decoding of bit streams based on stereo Panigale mixing with the control of the ICC. Compared with binaural rendering, based on mono Panigale mixing, the implementation can take advantage of the stereo down-mixing in two ways:

Properties of correlation between objects in different channels down-mixing partially preserved.

- Extraction of the object is improved because some objects present in one channel down-mixing.

Thus, the concept of binaural rendering of stereo down-mixing in SAOC, described above, meets the requirements for different matrices downward mixing. In particular, as a double menopauznogo down-mixing is the same as a true mono down-mix, which was tested in a test listen. Improving the quality, which is achieved with stereo Panigale mixing compared to mono decreasing mixing, it can also be noted in the test listening. The basic processing units of the above accomplishments were blocks of dry binaural rendering stereo down-mix and mixes with decorrelating wet binaural signal when a suitable combination of both blocks. In the lastnosti, wet binaural signal was calculated by using one of decorrelator to the input of a mono down-mix to the left and right power and IPD were the same as in the dry binaural signal. Mixing wet and dry binaural signals is controlled by the specified (target) ICC, and binaural rendering, based on mono Panigale mixing leads to higher quality of the overall sound. Further, the above implementation can be easily modified for any combination of input mono/stereo down-mix and mono/stereo/binaural output constant way. In accordance with the implementation of the stereo down-mix Xn,ktaken together with the SAOC parameters; the user defines the information visualization and HRTF database as inputs. The passed parameters SAOC-OLDil,m(the difference between the levels of the object), IOCijl,m(interobjective cross-correlation), DMGil,m(the gain down-mixing) and DCLDil,m(the difference between the levels of channel down-mixing) for all N objects i,j. The HRTF parameters were given asPq,Lm,P q,Rmandφqmfor the index q of the entire database HRTF, which is associated with a certain spatial position of the sound source.

Finally, it is seen that, although in the above description of the terms "inter-channel coherence" and "interobjective cross-correlation were built differently, with "coherence" is used in the same term, "cross-correlation" is used in another; the latter baths can be used alternately as a measure of similarity between the channels and objects, respectively.

Depending on the actual implementation, the inventive concept of binaural rendering can be implemented in hardware or in software. Therefore, this invention also relates to a computer program which may be stored on a medium readable by a computer such as CD-ROM, DVD, flash drive, memory card or memory chip. The invention is therefore also a computer program having a control program, which when implemented on a computer performs the inventive encoding method, conversion or decoding described in connection with the above pic is with Russia.

While this invention has been described on the basis of several preferred implementations, there are modifications, permutations, and equivalents that are within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. Therefore, it is assumed that nigerrima the claims should be interpreted as including all such modifications, permutations, and equivalents which are within the true spirit and scope of this invention.

In addition, it is seen that all the stages specified in the flowchart are performed by appropriate means in the decoder, respectively, and that the implementation may include routines running on the CPU, parts of ASICS or the like. A similar statement is true for functions of blocks in the flowcharts

In other words, according to the exercise provided a device for binaural rendering multichannel audio signal (21) in the binaural output signal (24); a multichannel audio signal (21) includes a stereo down-mixing (18), which is mixed with the lowering of the many sound signals (141-14N), and additional information (20) includes information on Panigale mixing (DMG, DCLD), dormancy is binding for each audio signal, to what extent corresponding sound signal was mixed in the first channel (L0) and the second channel (R0) stereo down-mixing (18) respectively, as well as information about the object-level (OLD) a multitude of audio signals and information about interobjective cross-correlation (IOC), describing the similarity between pairs of audio signals from multiple audio signals; the device includes means (47) to calculate, based on the first order imaging (Gl,mdependent on information about interobjective mutual correlation, information about the object-level information about Panigale mixing, information visualization, linking each of the sound signal with the position of the virtual speaker and the HRTF parameters, a preliminary binaural signal (54) from the first and second channels of a stereo signal down-mixing (18); means (50) for generating decorrelating signal(Xdn,k)as a perceptual equivalent to a mono down-mix (58) of the first and second channel signal of a stereo down-mixing (18) is, however, decorrelating to the mono down-mix (58); a means (52) for calculating, based on the second instructions visualization (P2l,m)depending on the information interobjective mutual correlation, information about the object-level information about Panigale mixing, the rendering information and the HRTF parameters, a corrective binaural signal (64) from decorrelating signala (62); and means (53) for mixing the preliminary binaural signal (54) from the corrective binaural signal (64) to obtain the binaural output signal(24).

Sources of information

1. ISO/IEC JTC 1/SC 29/WG 11 (MPEG), Document No. 10045, "ISO/IEC CD 23003-2:200x Coding of Spatial Audio Object (SAOC)", 85th MPEG Meeting, July 2008, Hannover, Germany.

2. EBU Technical recommendation: "the Way MUSHRA-EBU for subjective tests poslushania sound intermediate quality. B/AIM022, October 1999

3. ISO/IEC 23003-1:2007, Information technology - technology-MPEG audio - Part 1: MPEG surround sound.

4. ISO/IEC JTC1/SC29/WG11 (MPEG), Document No. 9099: "assessment Procedures and criteria the final coding of spatial audio object", April 2007, San Jose, USA.

5. Dzheroen Breebaart, Christoph Feller: spatial Sound processing. MPEG surround sound and other applications. Wiley & sons, 2007

6. Dzheroen Brieba the RT and others: Multi-channel goes mobile: MPEG binaural rendering of ambient sound. AES 29th international Conference, Seoul, Korea, 2006

1. Device for binaural rendering multichannel audio signal (21) in the binaural output signal (24); a multichannel audio signal (21) includes a stereo down-mixing (18), which is mixed with the lowering of the many sound signals (141-14N), and additional information (20) includes information on Panigale mixing (DMG, DCLD), showing for each audio signal, to what extent corresponding sound signal was mixed in the first channel (L0) and the second channel (R0) stereo down-mixing (18), respectively, as well as information about the object-level (OLD) a multitude of audio signals and information about the interobjective cross-correlation (IOC), describing the similarity between pairs of audio signals from multiple audio signals; the device is intended for:
calculate (47), based on the first order imaging (Gl,mdependent on information about interobjective mutual correlation, information about the object-level information about Panigale mixing, information visualization, linking each of the sound signal with the position of the virtual speaker and the HRTF parameters, a preliminary binaural signal (54) from the first and second channels of a stereo signal downward miksi the Finance (18);
generate (50) decorrelating signalas a perceptual equivalent to a mono down-mix (58) of the first and second channel signal of a stereo down-mixing (18), which, however, decorrelating to the mono down-mix (58);
calculate (52)depending on the second instructions visualizationdepending on the information interobjective mutual correlation, information about the object-level information about Panigale mixing, the rendering information and the HRTF parameters, a corrective binaural signal (64) from decorrelating signal (62); and
mixing (53) preliminary binaural signal (54) from the corrective binaural signal (64) to obtain the binaural output signal (24).

2. The device according to claim 1, where the device further is intended for generating decorrelating signalfor summing the first and second channel signal of a stereo down-mixing (18) and for dekorasyonu amount to obtain decorrelating signal (62).

3. The device according to claim 1, further suitable for:
assessment (80) of the actual binaural inter-channel coherence preliminary binaural signal (54);
definitions (82) of a given (target) led the ranks binaural inter-channel coherence; and
installation (84) mixing ratio, which determines the extent to which the binaural input signal (24) is subjected to first and second channel signal of a stereo down-mixing (18) as processed by calculation (47) preliminary binaural signal (54), and the first and second channel signal of a stereo down-mixing (18) as processed by generating (50) decorrelating signal and computation (52) corrective binaural signal (64), respectively, based on the actual size of binaural inter-channel coherence and value of the specified (target) binaural inter-channel coherence.

4. The device according to claim 3, where the device is further intended when setting the mixing ratio, to set the mixing ratio; setting the mixing ratio by setting the first regulations visualization (Gl,mand second instructions visualizationbased on the actual size of binaural inter-channel coherence and the value of the specified (target) binaural inter-channel coherence.

5. The device according to claim 3, where the device is further intended when determining the value of the specified (target) binaural inter-channel coherence, to perform the determination based on to which the components of a given (target) matrix of covariance F=And E Andwhen ""denoting conjugate transpose, And denotes the set (target) binaural matrix visualization, linking the audio signals from the first and second channels of the binaural output signal, respectively, and which is uniquely determined by the rendering information and the HRTF parameters, and E represents the matrix is uniquely defined by using the information about interobjective cross-correlation and information about the level of the object.

6. The device according to claim 5, where the device is further intended when calculating the preliminary binaural signal (54), to perform calculations so that

where X is a 2×1 vector whose components correspond to the first and second channels of a stereo signal down-mixing (18),vector whose components correspond to the first and second channels of the preliminary binaural signal (54), G is the first matrix visualization representing the first instruction visualization and has a size of 2×2 in

where, when x∈{1,2},
,,
(if the first condition applies otherwise)
where,and- greater the s matrices podsolevoi of covariance F xsize 2×2 if Fx=A ExA,
wherethe coefficients of the N×N matrix ExN - the number of audio signals, eijthe coefficients of the matrix E of size N×N, and dxuniquely determined by using the information about Panigale mixing, whereindicate the degree to which the audio signal i was mixed in the first channel signal stereophaser mixing (18), anddetermines the degree to which the audio signal i was mixed in the second channel of the stereo output signal (18),
where Vx- a scalar withand Dx- 1×N matrix, the coefficients of which are,
where the device is further intended when calculating the corrective binaural output signal (64), to perform calculations so that

where Xd- decoded signalvector whose components correspond to the first and second channels corrective binaural signal (64), and R2the second matrix visualization representing the second instruction visualization and has a size of 2×2 in

where the gain of PLand PRdefined as
,
where C11and c22the coefficients of the 2×2 matrix of covariance With the preliminary binaural signal (54) for

where V is a scalar if V=W E W+ε, W - matrix mono down-mixing of size 1×N, the coefficients of which is uniquely identified by,and-
,
where the device is further intended when assessing the magnitude of the actual binaural inter-channel coherence, to determine the magnitude of the actual binaural inter-channel coherence as

where the device is further intended when determining the value of the specified (target) binaural inter-channel coherence, to determine the value of the specified (target) binaural inter-channel coherence as
and
where the device is further intended when setting the mixing ratio, to determine the angles of rotation α and β according to
,
,
with ε denoting a small constant to avoid division by zero, respectively.

7. The device according to claim 1, where the device is further intended when calculating the preliminary binaural signal (54), done for the of calculations so as to

where X is a 2×1 vector whose components correspond to the first and second channels of a stereo signal down-mixing (18),vector whose components correspond to the first and second channels of the preliminary binaural signal (54), G is the first matrix visualization representing the first instruction visualization and has a size of 2×2 in
G=AED(DED)-1,
where E is a matrix uniquely determined by using the information about interobjective cross-correlation and information about the object level;
D - 2×N matrix, the coefficients dijare uniquely determined by using the information about Panigale mixing, where d1jshows the degree to which a sound signal j has been mixed in the first channel signal to the stereo down-mixing (18), and d2jdetermines the degree to which a sound signal j was mixed in the second channel of the stereo output (18);
And - the desired (target) binaural matrix visualization, linking the audio signals from the first and second channels of the binaural output signal, respectively and uniquely identified by using the rendering information and the HRTF parameters,
where the device is further intended when calculating the corrective binaural output signal (64), done for the of calculations so as to

where Xd- decorrelating signalvector whose components correspond to the first and second channels corrective binaural signal (64), and P is the second matrix visualization representing the second instruction visualization and has a size of 2×2, and is determined so that PP=ΔR with ΔR=AEA-G0DEDG0when G0=G.

8. The device according to claim 1, where the device is further intended when calculating the preliminary binaural signal (54), to perform calculations so that

where X is a 2×1 vector whose components correspond to the first and second channels of a stereo signal down-mixing (18),vector whose components correspond to the first and second channels of the preliminary binaural signal (54), G is the first matrix visualization representing the first instruction visualization and has a size of 2×2 in
G=(G0DED*G0*)-1(G0DED*G0*AEA*G0DED*G0*)1/2(G0DED*G0*)-1G0
when G0=AED*(DED*)-1,
where E is the matrix uniquely determined by using the information about interobjective mutual is th correlation and information about the level of the object;
D - 2×N matrix, the coefficients dijclearly identified by using the information about Panigale mixing, where d1jshows the degree to which sound working intensively j was mixed in the first channel signal to the stereo down-mixing (18), and d2jdetermines the degree to which a sound signal j was mixed in the second channel of the stereo output (18);
And - the desired (target) binaural matrix visualization, linking the audio signals from the first and second channels of the binaural output signal, respectively and uniquely identified by using the rendering information and the HRTF parameters, where the device is further intended when calculating the corrective binaural output signal (64), to perform calculations so that

where Xd- decorrelating signalvector whose components correspond to the first and second channels corrective binaural signal (64), and P is the second matrix visualization representing the second instruction visualization and has a size of 2×2, and defined so that PP*=(AEA*-GDED*G*)/V in V, which is a scalar.

9. The device according to claim 1, where the information about Panigale mixing (DMG, DCLD) is time-dependent and information about the object-level (OLD) and the formation of interobjective cross-correlation (IOC) are dependent on frequency and time.

10. The way binaural rendering multichannel audio signal (21) in the binaural output signal (24); a multichannel audio signal (21) includes a stereo down-mixing (18), which is mixed with the lowering of the many sound signals (141-14N); and additional information (20) includes information on Panigale mixing (DMG, DCLD), showing for each audio signal, to what extent corresponding sound signal was mixed in the first channel (L0) and the second channel (R0) stereo down-mixing (18), respectively, as well as information about the object-level (OLD) a multitude of audio signals and information about the interobjective cross-correlation (IOC), describing the similarity between pairs of audio signals from multiple audio signals; the method includes:
the calculation is based on the first order imaging (Gl,mdependent on information about interobjective mutual correlation, information about the object-level information about Panigale mixing, information visualization, linking each of the sound signal with the position of the virtual speaker and the HRTF parameters, a preliminary binaural signal (54) from the first and second channels of a stereo signal down-mixing (18);
generating decorrelating signal as a perceptual equivalent to a mono down-mix (58) of the first and second channel signal of a stereo down-mixing (18), which, however, decorrelating to the mono down-mix (58);
the calculation depending on the second instructions visualizationdepending on the information interobjective mutual correlation, information about the object-level information about Panigale mixing, the rendering information and the HRTF parameters, a corrective binaural signal (64) from decorrelating signal (62); and
mixing the preliminary binaural signal (54) from the corrective binaural signal (64) to obtain the binaural output signal (24).

11. The computer-readable medium containing stored thereon a computer program with program code capable of implementing the method of claim 10, when the computer program is executed by a computer or processor.



 

Same patents:

FIELD: radio engineering, communication.

SUBSTANCE: apparatus includes: a means of processing a foreground signal in order to provide a perceptible foreground angle for the foreground signal; a means of processing a foreground signal in order to provide the desirable attenuation level for the foreground signal; a means of processing a background signal in order to provide a perceptible background angle for the background signal; a means of processing a background signal in order to provide the desirable attenuation level for the background signal, wherein the background signal is processed such that it sounds fuzzier than the foreground signal; and a means of merging the foreground signal and the background signal into an output audio source signal.

EFFECT: clearer perceptible position for an audio source in an audio composition.

25 cl, 20 dwg

FIELD: information technologies.

SUBSTANCE: device to modify a sweet spot of a spatial M-channel audio signal comprises a receiver (201) to receive an N-channel audio signal, N<M, a parametric facility (203) to detect spatial parameters of step-up mixing, connecting the N-channel audio signal with the spatial M-channel audio signal, a modifying facility (207) to modify the sweet spot of the spatial M-channel audio signal by modification of at least one of spatial parameters of step-up mixing; a facility of generation (205) to generate a spatial M-channel audio signal by step-up mixing of an N-channel audio signal using at least one modified spatial parameter of step-up mixing.

EFFECT: possibility to manipulate a sweet spot with less complexity.

20 cl, 5 dwg, 2 tbl

FIELD: information technologies.

SUBSTANCE: audio processor (2) generates a stereo signal (4; 50) with enhanced perceptual properties using a central signal (6a) and a side signal (6b). The central signal (6a) represents a sum, and the side signal (6b) is a difference of the initial left and right channels (40). The audio processor includes a decorrelator (8) to generate a decorrelated representation of a component of the central signal (82) and/or a decorrelated representation of a component of the side signal (84), a combiner of a signal (10; 46) to generate an optimised side signal (14; 90) by combination of the representation (70) of the side signal with the decorrelated representation of the side signal (84) and with the decorrelated representation of the central signal component (82) or with the central signal components and the decorrelated representation of the side signal component (84), and a central side step-up mixer (12; 48), designed to generate a stereo signal with enhanced perceptual properties with application of the central signal representation and optimised side signal.

EFFECT: improved quality of sound reproduction.

20 cl, 7 dwg

FIELD: information technologies.

SUBSTANCE: method to balance a frequency characteristic of a loudspeaker located in a room includes a stage of a transfer function measurement in a position for listening (L) from an electric input of the first loudspeaker (L1) to a sound pressure in the listening position (LP) in the room, a stage of detection of a general transfer function (G), which represents a spatial average of a sound pressure level in the room developed by the first loudspeaker (L1), a stage to detect the upper gain limit (UGL) as the frequency function. Besides, the upper gain limit (UGL) is based on inversion of the general transfer function (G), a stage to detect a balancing filter (F), based on the transfer function inversion in the listening position (L). Besides, the balancing filter (F) gain is limited by the maximum gain in compliance with the upper gain limit (UGL), and a stage to detect the first loudspeaker (L1) in compliance with the balancing filter (F).

EFFECT: improved quality of received sound reproduction from a loudspeaker in the listening position.

54 cl, 8 dwg

FIELD: physics.

SUBSTANCE: audio signal, having at least one audio channel and associated direction parametres which indicate the direction of the origin of part of the audio channel relative the position of audio recording, is restored to obtain a reconstructed audio signal. The desired direction of the origin relative the position of audio recording is selected. Part of the audio channel is altered to obtain a reconstructed part of the reconstructed audio signal, where alteration includes amplifying intensity of part of the audio channel having direction parametres which indicate the direction of origin lying near the desired direction of origin relative the other part of the audio channel, having direction parametres which indicate the direction of origin lying far from the desired direction of origin.

EFFECT: improved perception of the direction of the source of a reconstructed audio signal.

19 cl, 8 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to processing signals in an audio frequency band. The apparatus for generating at least one output audio signal representing a superposition of two different audio objects includes a processor for processing an input audio signal to provide an object representation of the input audio signal, where that object representation can be generated by parametrically guided approximation of original objects using an object downmix signal. An object manipulator individually manipulates objects using audio object based metadata relating to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an output audio signal having one or multi-channel signals depending on a specific rendering setup.

EFFECT: providing efficient audio signal transmission rate.

14 cl, 17 dwg

FIELD: radio engineering, communication.

SUBSTANCE: described is a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker system, wherein each virtual sound source position is associated to each channel. The device includes a correlation reducer for differently converting, and thereby reducing correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a centre and a non-centre channel of the plurality of channels, in order to obtain an inter-similarity reduced combination of channels; a plurality of directional filters, a first mixer for mixing output signals of the directional filters modelling the acoustic transmission to the first ear canal of the listener, and a second mixer for mixing output signals of the directional filters modelling the acoustic transmission to the second ear canal of the listener. Also disclosed is an approach where centre level is reduced to form a downmix signal, which is further transmitted to a processor for constructing an acoustic space. Another approach involves generating a set of inter-similarity reduced transfer functions modelling the ear canal of the person.

EFFECT: providing an algorithm for generating a binaural signal which provides stable and natural sound of a record in headphones.

33 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: method comprises estimating a first wave representation comprising a first wave direction measure characterising the direction of a first wave and a first wave field measure being related to the magnitude of the first wave for the first spatial audio stream, having a first audio representation comprising a measure for pressure or magnitude of a first audio signal and a first direction of arrival of sound; estimating a second wave representation comprising a second wave direction characterising the direction of the second wave and a second wave field measure being related to the magnitude of the second wave for the second spatial audio stream, having a second audio representation comprising a measure for pressure or magnitude of a second audio signal and a second direction of arrival of sound; processing the first wave representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure, a merged direction of arrival measure and a merged diffuseness parameter; processing the first audio representation and the second audio representation to obtain a merged audio representation, and forming a merged audio stream.

EFFECT: high quality of a merged audio stream.

15 cl, 7 dwg

FIELD: physics.

SUBSTANCE: apparatus (100) for generating a multichannel audio signal (142) based on an input audio signal (102) comprises a main signal upmixing means (110), a section (segment) selector (120), a section signal upmixing means (110) and a combiner (140). The section signal upmixing means (110) is configured to provide a main multichannel audio signal (112) based on the input audio signal (102). The section selector (120) is configured to select or not select a section of the input audio signal (102) based on analysis of the input audio signal (102). The selected section of the input audio signal (102), a processed selected section of the input audio signal (102) or a reference signal associated with the selected section of the input audio signal (102) is provided as section signal (122). The section signal upmixing means (130) is configured to provide a section upmix signal (132) based on the section signal (122), and the combiner (140) is configured to overlay the main multichannel audio signal (112) and the section upmix signal (132) to obtain the multichannel audio signal (142).

EFFECT: improved flexibility and sound quality.

12 cl, 10 dwg

FIELD: information technology.

SUBSTANCE: invention relates to lossless multi-channel audio codec which uses adaptive segmentation with random access point (RAP) and multiple prediction parameter set (MPPS) capability. The lossless audio codec encodes/decodes a lossless variable bit rate (VBR) bit stream with random access point (RAP) capability to initiate lossless decoding at a specified segment within a frame and/or multiple prediction parameter set (MPPS) capability partitioned to mitigate transient effects. This is accomplished with an adaptive segmentation technique that fixes segment start points based on constraints imposed by the existence of a desired RAP and/or detected transient in the frame and selects a optimum segment duration in each frame to reduce encoded frame payload subject to an encoded segment payload constraint. RAP and MPPS are particularly applicable to improve overall performance for longer frame durations.

EFFECT: higher overall encoding efficiency.

48 cl, 23 dwg

FIELD: physics.

SUBSTANCE: method and system for generating output signals for reproduction by two physical speakers in response to input audio signals indicative of sound from multiple source locations including at least two rear locations. Typically, the input signals are indicative of sound from three front locations and two rear locations (left and right surround sources). A virtualiser generates left and right surround output signals suitable for driving front loudspeakers to emit sound that a listener perceives as emitted from rear sources. Typically, the virtualiser generates left and right surround output signals by transforming rear source input signals in accordance with a sound perception simulation function. To ensure that virtual channels are well heard in the presence of other channels, the virtualiser performs dynamic range compression on rear source input signals. The dynamic range compression is preferably performed by amplifying rear source input signals or partially processed versions thereof in a nonlinear way relative to front source input signals.

EFFECT: separating virtual sources while avoiding excessive emphasis of virtual channels.

34 cl, 9 dwg

FIELD: information technologies.

SUBSTANCE: invention discloses the method for reproduction of multiple audio channels, according to which out-of-phase information is extracted from side and/or rear side channels contained in a multi-channel audio signal.

EFFECT: improved reproduction of a multi-channel audio signal.

15 cl, 10 dwg

FIELD: information technologies.

SUBSTANCE: audio decoder for decoding multi-object audio signal comprises module to compute factor of forecasting matrix C consisting of factors forecasts based on data about object level difference (OLD), as well as means for step-up mixing proceeding from forecast factors for getting first upmix audio signal tending first type audio signal and/or second upmix signal tending to second type audio signal. Note here that multi-object audio signal comprises coded audio signals of first and second types. Multi-object audio signal consists of downmix signal 112 and service info. Service info comprises data on first and second type signal levels in first predefined frequency-time resolution.

EFFECT: separation of individual audio objects in mixing and decreasing/increasing channel number.

20 cl, 24 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to processing audio signals, particularly to improving intelligibility of dialogue and oral speech, for example, in surround entertainment ambient sound. A multichannel audio signal is processed to form a first characteristic and a second characteristic. The first channel is processed to generate a speech probability value. The first characteristic corresponds to a first measured indicator which depends on the signal level in the first channel of the multichannel audio signal containing speech and non-speech audio. The second characteristic corresponds to a second measured indicator which depends on the signal level in the second channel of the multichannel audio signal primarily containing non-speech audio. Further, the first and second characteristics of the multichannel audio signal are compared to generate an attenuation coefficient, wherein the difference between the first measured indicator and the second measured indicator is determined, and the attenuation coefficient is calculated based on the obtained difference and a threshold value. The attenuation coefficient is then adjusted in accordance with the speech probability value and the second channel is attenuated using the adjusted attenuation coefficient.

EFFECT: improved speech perceptibility.

12 cl, 5 dwg

FIELD: radio engineering.

SUBSTANCE: invention relates to a mechanism, which tracks signals of a secondary microphone in a mobile device with multiple microphones in order to warn a user, if one or more secondary microphones are covered at the moment, when the mobile device is used. In one example the estimate values of secondary microphone capacity averaged in a smoothed manner may be calculated and compared to the estimate value of the minimum noise level of the main microphone. Detection of microphone cover may be carried out by comparison of smoothed estimate values of secondary microphone capacity with an estimate value of minimum noise level for the main microphone. In another example the estimate values of the minimum noise level for signals of the main and secondary microphones may be compared with the difference in the sensitivity of the first and second microphones in order to detect whether the secondary microphone is covered. As soon as detection is over, a warning signal may be generated and issued to the user.

EFFECT: improved quality of main sonic signal sound.

37 cl, 9 dwg

FIELD: information technology.

SUBSTANCE: invention discloses an electronic reverberation system which employs a process to obtain several delay samples which are added to the forward signal to obtain reverberant sound. The system generates or uses a list of pairs of amplification values which are obtained in accordance with control settings or are presented in form of fixed coefficients. The processor generates reverberation samples by superimposing these coefficients onto the delay samples and summing their amplitudes to obtain a reverberation signal sample. The reverberation signal samples are added to the forward signal.

EFFECT: improved method.

48 cl, 11 dwg

FIELD: information technology.

SUBSTANCE: invention discloses an electronic reverberation system which employs a process to obtain several delay samples which are added to the forward signal to obtain reverberant sound. The system generates or uses a list of pairs of amplification values which are obtained in accordance with control settings or are presented in form of fixed coefficients. The processor generates reverberation samples by superimposing these coefficients onto the delay samples and summing their amplitudes to obtain a reverberation signal sample. The reverberation signal samples are added to the forward signal.

EFFECT: improved method.

48 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal (Xdn,k) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

FIELD: physics, acoustics.

SUBSTANCE: group of inventions relates to acoustics, particularly a reverberator and a method of reverberating an audio signal. The audio signal reverberator has a processor, which includes feedback loops with delay elements of two different frequency subband signals. The processor provides an audio signal with different delay loops to obtain reverberated frequency subband signals. The processor includes, for a first frequency subband signal, at least two frequency subband signals of a first delay line, having a first set of delay line taps which provide signals with delay by different delay taps, a first feedback loop connected to the delay line, and a first coupling unit for coupling signal outputs with the set of delay line taps. The processor also includes feedback loops with delay elements for a second frequency subband signal, at least two frequency subband signal of a second delay line, having a second set of delay line taps, which provide signals with delay by different delay taps, a second feedback loop connected to the second delay line and a second coupling unit for coupling signal outputs with the second set of delay line taps.

EFFECT: high sound quality.

18 cl, 23 dwg

Slit type gas laser // 2273116

FIELD: quantum electronics, possible use for engineering technological slit type gas lasers.

SUBSTANCE: slit type gas laser has hermetic chamber, a pair of metallic electrodes, alternating voltage source, a pair of dielectric barriers, and an optical resonator. Chamber is filled with active gas substance. Metallic electrodes are mounted within aforementioned chamber, each of them has surface, directed to face surface of another electrode. Source of alternating voltage is connected to aforementioned electrodes for feeding excitation voltage to them. Dielectric barriers are positioned between metallic electrodes, so that surfaces of these barriers directed to each other form slit discharge gap for forming of barrier discharge in gas substance.

EFFECT: possible construction of slit type gas laser, excited by barrier discharge, dielectric barriers being made specifically to improve heat drain from active substance of laser, decrease voltage fall on these dielectric barriers, provide possible increase of electrodes area, improve efficiency of laser radiation generation, increase output power of laser, improve mode composition of its output signal.

8 cl, 4 dwg

FIELD: stereophonic systems with more than two channels.

SUBSTANCE: in accordance to the method, data is generated for parametric codes of first subset of sound input channels for first frequency area by using parametric multi-channel encoding; and parameter code data is generated for second subset of sound input channels for second frequency area by means of application of parametric multi-channel audio-encoding, where the second frequency area is different from the first frequency area; and the second subset of sound input channels is different from the first subset of sound input channels.

EFFECT: reduced data processing load in encoder and decoder, and also reduced BCC bit code streams.

6 cl, 2 dwg

Up!