Device and method of decomposing input signal using downmixer

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio processing and particularly to decomposing audio signals into different components. A device for decomposing an input signal, having at least three input channels, comprises a downmixer for downmixing the input signal to obtain a downmixed signal having fewer channels, an analyser for analysing the downmixed signal to obtain an analysis result which is forwarded to a signal processor for processing the input signal or the signal derived from the input signal in order to obtain a decomposed signal.

EFFECT: high accuracy of reproducing stereo sound.

15 cl, 16 dwg

 

The present invention relates to audiooperati and, in particular, to the decomposition of audio signals to various components, for example different perception components.

The auditory system is most sensitive to sound from all directions. Perceived hearing (the adjective "hearing" refers to what is perceived, while the word "sound" is used to describe physical phenomena) environment creates the impression of the acoustic properties of the surrounding space and the arising sound events. Auditory impression, perceive a given sound field, can (at least partially) be modeled by considering the three different types of signals at the inputs to the ears: direct sound, early reflections and diffuse reflections. These signals contribute to the formation of perceived auditory spatial image.

Direct sound indicates the waveform of each sound event, who were the first to reach the listener directly from the source of sound without distortion. It is a characteristic of the sound source and provides the least compromising information on the direction of incidence of the sound event. The primary markers for assessing the direction of the sound source in the horizontal plane are the difference between the input signals in the left and right ear, and�military interiorally of time difference (ITD) and Interaural the difference in levels (ILD). Then the set of reflections of the direct sound received at the ears from different directions and with different relative time delays and levels. With increasing time delay relative to the direct sound, the density of reflections increases as long as they do not account for statistical noise.

The reflected sound contributes to the perception of distance and auditory spatial impression which consists of at least two components: apparent source width (ASW) (another common term for ASW is the volume of audibility) and circular listener envelopment (LEV). ASW is defined as an extension of the apparent width of the sound source and is determined mainly by early lateral reflections. LEV refers to the sense of encompassing sound at the listener and is determined mainly by late-arriving reflections. The purpose of the electroacoustic reproduction of stereo sound is to cause the pleasant perception of the auditory spatial image. This may be natural or architectural nature (for example, recording of a concert in the hall), or it can be a sound field that is not existing in reality (for example, electroacoustic music).

The art acoustics of concert halls known that in order to obtain a subjectively pleasing sound field, important is a strong sense of auditory spatial impression, which is an integral part of LEV. Interest is the ability of the configurations of loudspeakers to reproduce encompassing sound field by reproducing the scattered sound field. In a synthetic sound field is impossible to reproduce all the natural reflection with the use of special probes. This is in particular true for scattered late reflections. The properties of the timing and levels of scattered reflections can be simulated through the use of "reverberating" signals as input signals of the loudspeakers. If they are quite decorrelator, the number and location of loudspeakers used for reproduction, determines the perceived or not a diffused sound field. The goal is to cause the perception of a continuous, diffuse sound field using only a discrete number of converters. In other words, it creates a sound field where no single direction of arrivals of the sound can not be estimated, and, in particular, cannot be localized to a single Converter. Subjective distraction of synthetic sound fields can be evaluated in subjective tests.

Play Stere�PHONIC sound is aimed at creating the perception of a continuous sound field using only a discrete number of converters. Characteristics required to the greatest extent, are the directional stability of localized sources and realistic reproduction of ambient auditory environment. Most of the formats used today for storing or transporting stereo recording based on the channel. Each channel transmits a signal that is suitable for playback on the associated loudspeaker in a particular position. Specific auditory image is calculated during the process of recording or mixing. This image is exactly reconstructed if the layout of loudspeakers used for reproduction, resembles the target layout for which the calculated entry.

The number of suitable transmission channels and playback is constantly growing, and the appearance of each new audio format needs to perform the reproduction of the content in the traditional format in the actual playback system. The up-mixing algorithms (with increasing number of channels) represent the solution for this need through the calculation of a signal with a large number of channels from traditional signal. A number of algorithms that increase stereomicroscope proposed in the literature, for example, in the works of Carlos Avendano and Jean-Marc Jot, "A frequency-domain approachto multichannel upmix", Journal of the Audio Engineering Society, publication 52, No. 7/8, pp. 740-749, 2004; Christof Faller, "Multiple-loudspeaker playback of stereo signals", Journal of the Audio Engineering Society, publication 54, No. 11, pp. 1051-1064, February 2006; John Usherand Jacob Benesty "Enhancement of spatial sound quality: A new reverberation-extraction audio upmixer", IEEE Transactions on Audio, Speech and Language Processing, edition 15, No. 7, pp. 2141-2150, September 2007. Most of these algorithms based on the decomposition of the direct/ambient signals are subsequently reproduced, adapted to the target speaker layout.

Describes the decomposition of the direct/ambient signals are not readily applicable to multi-channel surround sound signals. It is not easy to formulate a model for signals and filtering to obtain the N number of audio channels corresponding to N direct sound and N surround sound channels. A simple model for signals used in stereolize (see, e.g., Christof Faller, "Multiple-loudspeaker playback of stereo signals", Journal of the Audio Engineering Society, publication 54, No. 11, pp. 1051-1064, February 2006), provided that the direct sound, which should be correlated between all channels, does not cover relations separations between the channels that may exist between channels of surround sound signals.

The common goal of enjoying stereo sound is to cause a perception of a continuous sound field using only limited�enny number of transmission channels and transducers. Two speakers are the minimum requirement for spatial sound reproduction. Modern consumer systems often offer a larger number of playback channels. Essentially, the stereo signals (independent of the number of channels) are recorded or mixed in such a way that for each source the direct sound becomes coherent (=addicted to) with the number of channels with specific directional labels, and reflected sounds become independent of the number of channels that define the labels for the apparent width of the source and circular embracing the listener. Correct perception of the target auditory image is usually possible only in a perfect observation point in the layout to play for which there is record. Adding more speakers to the speaker layout usually provides a more realistic recovery/modeling natural sound field. In order to use fully extended configuration of loudspeakers, if the input signals are provided in a different format, or to treat differently perceived part of the input, they should be available separately. This is the detailed description explains a method to separate the dependent and independent component�s stereo recording, containing an arbitrary number of the following input channels.

The decomposition of audio signals into different components perceived need for high-quality modification signals, improvements, adaptive playback, and perceptual coding. Recently a number of methods, which enable processing and/or extraction of various perceptual components of the signal from two-channel input signals. Since the input signals with more than two channels are becoming more common, describes the processing required for multi-channel input signals. However, most of the principles described for two-channel input signal cannot be easily shifted to work with input signals with arbitrary number of channels.

If you want to perform analysis of signals for direct and surrounding parts, for example, with 5.1-channel surround signal having a left channel, a center channel, right channel, left channel surround right channel surround sound and improve the low frequency (subwoofer), it's not obvious how to apply the analysis of direct/ambient signals. You can recall the comparison of each pair of the six channels, leading to hierarchical processing, which is, ultimately, up to 15 different operations compared to�. Then, when all of these 15 comparisons, in which each channel is compared with every other channel, you should determine how you need to evaluate 15 results. It is time consuming, the results are difficult to interpret and because a significant amount of the processing resources is not applicable, for example, where the application for the separation of the direct/ambient signals in real time or, in General, for expansions of signals, which may be, for example, used in the context of up-mixing or any other transactions of audiooperati.

In the work of M. M. Goodwin and J. M. Jot "Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement," in Proc. Of ICASSP 2007, 2007, principal component analysis is applied to the input channel signals in order to perform the decomposition on the primary (=straight) and the surrounding signals.

The models used in the works Christof Faller, "Multiple-loudspeaker playback of stereo signals", Journal of the Audio Engineering Society, publication 54, No. 11, pp. 1051-1064, February 2006, and C. Faller, "A highly directive 2-capsule based microphone system", Preprint 123rdConv. Aud. Eng. Soc., October 2007, assuming uncorrelated or partially correlated ambient sound in stereo signals and the signals of the microphones, respectively. They take filters to extract ambient/ambient signal based on this assumption. These approaches are limited to about�but - and two-channel audio signals.

Additional reference material is C. Avendano and J.-M. Jot, "A frequency-domain approach to multichannel upmix", Journal of the Audio Engineering Society, publication 52, No. 7/8, pp. 740-749, 2004. Reference material M. M. Goodwin and J. M. Jot "Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement," in Proc. Of ICASSP 2007, 2007, contains the following comments on reference material Avendano, Jot. Reference material provides an approach that encompasses the creation of time-frequency masks for extracting the surrounding portion of the stereo input. However, the mask is based on cross-correlation between the signals of the left and right channels, so this approach is not immediately applicable to the problem of extracting the surrounding part of any multi-channel input signal. The use of any such method based on the correlation for this case a higher order should lead to the need for hierarchical pairwise correlation analysis, which entails a significant computational cost, or some alternate dimension multichannel correlation.

Spatial reproduction on the basis of the impulse response ("SIRR") (work Juha Merimaa and Ville Pulkki "Spatial impulse response rendering", in Proc. of the 7thInt. Conf. on Digital Audio Effects (DAFx'04), 2004) estimates the direct sound and diffuse sound in the pulse characteristics in B-format.Much like "SIRR", directional audio coding (DirAC) (work by Ville Pulkki "Spatial sound reproduction with directional audio coding", Journal of the Audio Engineering Society, publication 55, No. 6, pp. 503-516, June 2007), implements a similar analysis of direct and ambient sound for continuous audio signals to B-format.

The approach presented in the work of Julia Jakka "Binaural to Multichannel Audio Upmix", Ph.D. thesis, Master's Thesis, Helsinki University of Technology, 2005, describes increasing mixing using binaural signals as input signal.

Reference material Boaz Rafaely "Spatially Optimal Wiener Filtering in the Reverberant Sound Field", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001, 21-24 October 2001, New Paltz, new York, describes the removal of Wiener filters, which are optimal for spatial riverberry sound fields. Provided application to noise reduction in a layout with two microphones in reverberation rooms. Optimal filters, which are derived from the spatial correlation of the scattered sound fields, capture the local nature of sound fields and therefore have a low order and a potentially large spatial reliability than the traditional adaptive filters for noise reduction in reverberation rooms. Presented formulas for unlimited and limited causal optimal filters, and an exemplary application to the improvement of speech with two micropho�AMI is demonstrated using computer simulation.

The purpose of the present invention is to provide an improved principle for the decomposition of the input signal.

This object is achieved by a device for the decomposition of the input signal according to claim 1, the method of decomposition of an input signal according to claim 14 or a computer program according to claim 15.

The present invention is based on the fact that for the decomposition of multichannel signals is the predominant approach with the condition not to perform analysis on the various components of the signal to the input signal, i.e. the signal having at least three input channels. Instead, multi-channel input signal having at least three input channels, processed by the step-down mixer for down-mixing the input signal to get mixed with decreasing signal. Mixed with decreasing signal has the number of channels down-mixing, which is less than the number of input channels and preferably equal to two. Then follows an analysis of the input signal for mixing with the falling edge of the signal, the input signal, and the analysis leads to the result analysis. However, this analysis does not apply to mixed with decreasing signal, and is applied to the input signal or, alternatively, to the signal output from the input signal, moreover, this signal derived from the input signal may be a signal up-mixing or, depending on the number of channels of the input signal is also the signal down-mixing, but the signal derived from the input signal should be different from mixing with the falling edge of the signal for which the analysis. When, for example, considers the case in which the input signal is a 5.1-channel signal, the signal down-mixing, for which analysis is performed, can be down-stereo mix with two channels. The results of the analysis are then applied directly to the input signal 5.1, to higher enhance mixing, such as the output signal 7.1, or to multi-channel step-down mixing the input signal with, for example, only three channels, which represent the left channel, center channel and right channel, when the hand is only a three-channel device for audio playback. However, in any case, the signal for which to apply the results of the analysis by the signal processor is different from mixing with the falling edge of the signal for which the analysis, and typically has more channels than mixed with decreasing signal for which analysis is performed relative to the components of the signal.

The� - called "indirect" analysis/processing is possible due to the fact, it can be assumed that any signal components in a separate input channels also occur in the mixed lower channels because downmix typically consists of the summation of the input channels in different ways. One simple downmix, for example, is that of the individual input channels are weighted as required by the rules of the lowering or mixing matrix step-down mixing and then summed after weighting. Alternative downmix comprises filtering the input channels by using certain filters such as HRTF-filters, and downmix is done through the use of filtered signals, i.e. the signals being filtered by means of HRTF filters, as is known in the art. For five-channel input signal requires 10 HRTF filters, and output signals of HRTF filters for the left side/left ear are summed, and the output signals of HRTF filters for filters right channels are summed for the right ear. Alternative step-down mixing can be used to reduce the number of channels that must be processed in the signal analyzer.

Therefore, embodiments of the present invention describe a new principle for sulcotrione on the perception of components of arbitrary input signals, through examination of the analyzed signal at the time when the result of the analysis is applied to the input signal. This is the analyzed signal can be received, for example, by considering the propagation model of signals of the channels or speakers in your ears. This is partly due to the fact that the auditory system also uses only two sensors (left and right ear) in order to evaluate the sound field. Thus, the extraction of various perceptual components, essentially boils down to the consideration of the analyzed signal, which is denoted hereinafter as a downmix. In this document, the term "downmix" is used for any pre-processing a multichannel signal that leads to the analyzed signal (it can include, for example, the propagation model, HRTF, BRIR, simple downmix on the basis of cross-ratios).

Knowing the format of the supplied input signal and the desired characteristics of the signal that needs to be extracted can be set perfect interchannel relations to mixed lower format, and in fact, the analysis of the analyzed signal is sufficient to generate a weighting mask (or several weighting masks) decomposed�I-channel signals.

In a variant implementation, multi-channel problem is simplified through the use of lowering stereomicroscope of the surround signal and applying the analysis of direct/ambient signals to the step-down mixing. Based on the result, i.e. short-time estimates of the power spectra of the direct and ambient sounds, the filters are derived for the decomposition of the N-channel of the signal N of the direct sound and N surround sound channels.

The present invention is advantageous due to the fact that the analysis of the signals is applied to a smaller number of channels, which significantly shortens the processing time, so the idea of the invention can be applied even in cases of application for up-mixing or down-mixing or any other operation for the processing of signals in real time, which require different components, for example different perception of the signal components.

Additional useful effect of the present invention is that although downmixes clarified that it does not impair the detectability of various perceptual components in the input signal. In other words, even when mixed with decreasing input channels, however, the individual signal components can be separated to a considerable extent. In addition�about, downmix works as some sort of "collecting" all the components of signals of all the input channels in two channels, and one analysis used for these "collected" mixed with decreasing signal, provides a unique result, which should no longer be interpreted and can be directly used for signal processing.

In the preferred embodiment, the implementation of specific efficiency for the decomposition of signals is achieved when the signal analysis is performed on the basis of pre-calculated frequency-dependent similarity in the curve as a reference curve. The term "similarity" includes correlation and coherence, thus, in a strict mathematical sense, the correlation is calculated between two signals without additional time offset, and the coherence is calculated by shifting the two signals by time/phase so that the signals have the maximum correlation and the actual correlation in frequency is then evaluated with the applied time shift/phase. In this text it is considered that similarity, correlation and coherence mean the same thing, i.e. the quantitative degree of similarity between two signals, for example, when a higher absolute value of similarity means that the two signals are largely similar, and a lower absolute value for�obia means, what two signals are less similar.

It is shown that the use of such correlation curve as a reference curve provides very effectively implemented the analysis, since the curve can be used for simple comparison operations and/or calculations of the weighting coefficients. The use of pre-calculated frequency-dependent correlation curve only allows simple calculations, rather than the more complex operations Wiener filtering. Furthermore, the use of frequency-dependent correlation curve is to some extent favorable due to the fact that the problem is solved not from a statistical point of view, and allowed a more analytical way, as you enter as much information as possible from the current layout in order to obtain a solution. Additionally, the flexibility of this procedure is very high, since the reference curve can be obtained through many different ways. One way is to actually measure two or more signals in a certain layout and then calculate the correlation curve according to the frequency of the measured signals. Therefore, it is possible to radiate the independent signals from different speakers or signals having a certain degree of dependence, which �is known in advance.

Another preferred alternative is to simply compute the correlation curve in accordance with the assumption of independent signals. In this case the signals are not actually required, since the result is independent of the signal.

The decomposition of signals using a reference curve for the analysis of signals can be used to staroobrjadcy, i.e. for the decomposition of a stereo signal. Alternatively, this procedure can also be implemented using a step-down mixer for the decomposition of multichannel signals. Alternatively, this procedure can also be implemented for multi-channel signals without using the step-down mixer, when provided pairwise evaluation of the signals in a hierarchical manner.

Preferred embodiments of the present invention is described hereinafter with reference to the accompanying drawings, in which:

Fig.1 is a block diagram for illustrating the device for decomposing the input signal using the step-down mixer;

Fig.2 is a block diagram illustrating a realization of a device for decomposition of a signal having at least three input channels, using the analyzer with a pre-calculated frequency-dependent correlation curve in accordance with the additional TSA�preparation of the invention;

Fig.3 illustrates an additional preferred implementation of the present invention when processing in frequency-domain down mixing, analysis and signal processing;

Fig.4 illustrates an exemplary pre-calculated frequency-dependent correlation curve to a reference curve for the analysis indicated in Fig.1 or Fig.2;

Fig.5 illustrates a block diagram illustrating post-processing in order to extract independent components;

Fig.6 illustrates an additional implementation of a block diagram for further processing, which is extracted scattered independent, independent direct and direct components;

Fig.7 illustrates a block diagram that implements a step-down mixer as the shaper of the analyzed signals;

Fig.8 illustrates a block diagram of the sequence of operations of way to specify the preferred method of processing the signal analyzer of Fig.1 or Fig.2;

Fig.9A-9E illustrate various pre-calculated frequency-dependent correlation curves that can be used as reference curves for several different layouts with varying numbers and positions of sound sources (e.g. loudspeakers);

Fig.10 illustrates a block diagram for illustrating another variant of implementation to evaluate absently�ti, in which the scattered components are components that need to be unfolded; and

Fig.11A and 11B illustrate an exemplary equation for use of analysis of signals without frequency-dependent correlation curve, but based on the approach based on Wiener filtering.

Fig.1 illustrates a device for the decomposition of the input signal 10 having at least three input channels or, in General, the N input channels. These input channels are introduced in a step-down mixer 12 for down-mixing the input signal to get mixed with decreasing signal 14, the step-down mixer 12 is made with the possibility of step-down mixing so that the number of channels reducing mixing mixed with decreasing signal 14, which is indicated by "m" is at least two and less than the number of input channels input signal 10. m channel down mixing are introduced into the analyzer 16 for analysis mixed with decreasing signal, to output the result of 18 analysis. The result 18 of the analysis is entered into the processor 20 signals, wherein the signal processor is arranged to process an input signal 10 or signal derived from the input signal by the module 22 removing signal using the analysis result, the CPU 20 signals is made possible with�TEW application of analysis results to the input channel or channels of the signal 24, extracted from the input signal to obtain a decomposed signal 26.

In a variant implementation, illustrated in Fig.1, the input number of channels is n, the number of channels down-mixing is m, the number of derived channels is l and the number of output channels is equal to l, when the derived signal and not the input signal is processed by the signal processor. Alternatively, when the module 22 launch of signals does not exist, the input signal is processed directly by the signal processor, and in this case the number of channels of the decomposed signal 26 indicated by "l" in Fig.1, is equal to n. Therefore, Fig.1 illustrates two different examples. One example is module 22 removing signals and the input signal is directly applied to the processor 20 signals. Another example is that the module 22 is implemented excretion of signals, and then the derived signal 24, and not the input signal 10 is processed by the processor 20 signals. Module excretion of signals, for example, can be a mixer of audio channels, such as increasing the mixer to form additional output channels. In this case, l must be less than n. In another embodiment, the implementation module excretion of signals may be a different audio processor that performs weighting, delay or�forge-else processing for the input channels, and in this case the number of output channels l module 22 launch of signals should be equal to the number n of input channels. In additional implementations, the removing module signals can be down-mixer, which reduces the number of channels from input signal to output signal. In this implementation, it is preferable that the number of l continues to exceed the number m is mixed with decreasing channel to be one of the advantages of the present invention, i.e. that the analysis of the signals is applied to a smaller number of channel signals.

The analyzer is arranged to analyze mixed with decreasing signal in relation to different perceptual components. These different perceptions of the components can be independent components in separate channels, on the one hand, and dependent components, on the other hand. Alternative components of the signal to be analyzed by the present invention are direct components, on the one hand, and surrounding components, on the other hand. There are many other components that can be separated by means of the present invention, such as speech components from musical components, noise components from speech components, noise components from musical components, high frequency components Shue�and relatively low frequency components of the noise in signals with multiple tone heights, the components provided by different tools, etc. This is due to the fact that there are powerful tools of analysis, such as Wiener filtering, as explained in the context of Fig.11A, 11B, or other analytical procedures, such as using the frequency-dependent correlation curve, as explained in the context of, for example, Fig.8 in accordance with the present invention.

Fig.2 illustrates another aspect in which the analyzer is implemented to use a pre-calculated frequency-dependent correlation curve 16. Thus, the device for decomposition of a signal 28 having a plurality of channels contains the analyzer 16 for the analysis of correlation between the two channels of the analyzed signal identical to the input signal or associated with an input signal, for example, through the operation of the step-down mixing, as illustrated in the context of Fig.1. The analyzed signal is analyzed by analyzer 16 has at least two of the analyzed channel and the analyzer 16 is arranged to use a pre-calculated frequency-dependent correlation curve as a reference curve to determine the result of 18 analysis. The processor 20 signals may operate similar to that of peacenow the context of Fig.1, and is capable of processing the analyzed signal or the signal derived from the test signal through the module 22 removing signals, and the module 22 launch of signals can be implemented similar to that explained in the context of the module 22 removing signals of Fig.1. Alternatively, the signal processor may process the signal from which is derived the analyzed signal, and signal processing uses the result of the analysis in order to obtain the decomposed signal. Therefore, in a variant implementation of Fig.2, the input signal may be identical to the analyzed signal, and in this case, the analyzed signal can also be a stereo signal with two channels, as illustrated in Fig.2. Alternatively, the analyzed signal can be output from the input signal through any kind of processing such as downmix, as described in the context of Fig.1, or through any other processing, such as increasing the mixing, etc. Additionally, the processor 20 signals can be useful for the purpose to apply signal processing to the signal identical to the signal entered into the analyzer, or signal processor can apply signal processing to the signal from which is derived the analyzed signal, for example, as indicated in the context of Fig.1, or the signal processor�fishing can apply signal processing to the signal, which is derived from the test signal, for example, by up-mixing, etc.

Therefore, the signal processor there are various possibilities, and all these features are favorable due to the unique operation of the analyzer using the pre-calculated frequency-dependent correlation curve as a reference curve to determine the analysis result.

Further clarifies additional implementation options. It should be noted that, as explained in the context of Fig.2 is considered even analyzed using a two-channel signal (without step-down mixing). Therefore, the present invention, as explained in various aspects in the context of Fig.1 and Fig.2, which can be used together or as separate aspects, downmix can be processed by the analyzer, or two-channel signal, which is probably not formed by means of step-down mixing can be processed by the signal analyzer using the pre-calculated reference curve. In this context it should be noted that the following description of aspects of the implementation can be applied to both aspects, schematically illustrated in Fig.1 and Fig.2, even when some signs opisyvayuschaya to one aspect, and not for both. If, for example, is considered Fig.3, it becomes apparent that the signs of the frequency domain of Fig.3 are described in the context of the aspect illustrated in Fig.1, but it is obvious that the frequency-time conversion, as described below relative to Fig.3, and the inverse transform can also be applied to the implementation in Fig.2, which has no step-down mixer, but which has the specified analyzer, which uses a pre-calculated frequency-dependent correlation curve.

In particular, time-frequency Converter must be placed with the option of converting the analyzed signal before the analyzed signal is injected into the analyzer, and Converter frequency/time must be placed at the output of the signal processor to convert the processed signal back into the time domain. When there is a module excretion of signals, time-frequency Converter can be placed at the input of the module excretion of signals, so that the module excretion of signals, the analyzer and the signal processor work on frequency/poddiapazona area. In this context, the frequency and sub-band frequencies essentially mean the part in the frequency of the frequency representation.

Moreover, it is clear that the analyzer of Fig.1 can be implemented in many different ways, but the floor�t analyzer in one embodiment, the implementation is also implemented as an analyzer, explained in Fig.2, i.e., as an analyzer, which uses a pre-calculated frequency-dependent correlation curve as an alternative to the Wiener filter or any other analytical method.

Variant implementation of Fig.3 applies the step-down procedure to an arbitrary mixing the input signal to get dual channel performance. Analysis is performed in frequency-time domain, and calculates a weighting mask, multiplied by time-frequency representation of the input signal, as illustrated in Fig.3.

In the drawing, T/F denotes the frequency-time transformation; usually short-time Fourier transform (STFT). iT/F denotes the corresponding inverse transform.are the input signals of the time domain, where n represents a time index.denote the coefficients of the frequency decomposition, whereis a time index of decomposition, and i represents the frequency index of decomposition.are the two channels mixed with decreasing signal.

()

is calculated by weighing.are weighted by the frequency decomposition of each channel. Hij(i) are the coefficients of step-down mixing that can be dejstvitelnoyosti or complex-valued, and the coefficients can be constant in time or time-dependent. Therefore, the coefficients of step-down mixing can be simple constants or filters, such as HRTF-filters, reverberation filters or similar filters.

Yj(m,i)=Wj(m,i)Xj(m,i),wherej=(1,2,...,N),(2)

Fig.3 illustrates the case of applying the identical in�of vishivaniya to all channels.

Yj(m,i)=W(m,i)Xj(m,i)(3)

are the output signals of the time domain containing the extracted signal components. (The input signal can have an arbitrary number of channels (N) is generated for arbitrary target layout of loudspeakers for playback. Downmix may include HRTF to get the signals into your ears, modeling filters earshot etc downmix can also be performed in the time domain).

In a variant implementation, calculates a difference between the reference correlation (In this text, the term "correlation" is used as a synonym for micanol�tion of similarity and may also include assessment of changes over time, for which usually uses the term "coherence". Even if the estimated time shifts, the resulting value can have a sign. Usually, the coherence is defined as having only positive values) as a function of frequency () and the actual correlation is mixed with decreasing input signal (). Depending on the deviation of the actual curve from the reference curve, calculates a weighting factor for each time-frequency fragment, indicating that it contains dependent or independent components. The obtained time-frequency weighting indicates independent components and can be applied to each input channel, to give the resulting multichannel signal (the number of channels equals the number of input channels), which includes independent parts, which may be perceived as different or scattered.

The reference curve can be defined in various ways.

Examples are:

- The ideal theoretical reference curve for an idealized two - or three-dimensional ambient sound field consisting of independent components.

Perfect curve is achievable at a reference target to the layout of the loudspeakers for a given input signal (e.g., standard stereo�komponovke with azimuth angles (±30°) or a standard five-channel arrangement according to ITU-R BS.775 with azimuth angles (0°, ±30°, ±110°).

- The ideal curve for the actual current layout of the speakers (the Actual position can be measured or known through user input. The reference curve can be calculated under the assumption of independent play signals according to the speakers).

- The actual frequency-dependent input power of each input channel can be included in the calculation of the reference curve.

The presence of frequency-dependent reference curve () can be set to the upper threshold value () and lower threshold () (see Fig.4). Threshold curves may coincide with the reference curve () or imposed by assuming a threshold of detectability, or they can be derived heuristically.

If the deviation of the actual curve from the reference curve lies within the boundaries defined by the threshold values, the actual sampling unit receives a weighting indicating the independent components. Above the upper threshold or below the lower threshold value sampling unit is specified as the dependent variable. This indicator can be binary or gradual (i.e. corresponding functions based on the soft decision). In particular, the EU�and the upper and lower threshold value coincides with the reference curve, applied weighing is directly related to the deviation from the reference curve.

With reference to Fig.3, the reference number 32 illustrates a time-frequency Converter that can be implemented as a short-term Fourier transform or as any kind of comb filters that form Podporozhye signals, such as QMF-comb filters, etc. Regardless of the detailed implementation of time-frequency Converter 32, the output time-frequency Converter for each input channel xiis the spectrum for each time period of the input signal. Therefore, time-frequency processor 32 may be implemented with the ability to always accept a block of input samples of the individual channel signal and calculate the frequency representation, for example FFT spectrum having spectral lines going from bottom frequency to top frequency. Then for the next block of time is identical to the procedure so that eventually the sequence of short-time spectra are computed for each signal of the input channel. A specific frequency range of a particular spectrum associated with a particular block of input samples of the input channel is called "frequency-temporal slice, and preferably, the analysis in the analyzer 16 is performed on the basis of these time-frequency fra�cops. Therefore, the analyzer receives, as an input signal for a single time-frequency slice, the spectral value at the first frequency for a defined block of input samples of the first channel (D1lowering mixing and takes the value for identical frequency and identical units (in time) of the second channel (D2step-down mixing.

Then, as for the example illustrated in Fig.8, the analyzer 16 is capable of determining (80) correlation value between the two input channels for each sub-band of frequencies and a temporary block, i.e. correlation values for time-frequency slice. Then the analyzer 16 extracts, in a variant implementation, illustrated relative to Fig.2 or Fig.4, the correlation value (82) corresponding to podology frequencies from the reference correlation curve. When, for example, sub-band frequency is a sub-band of frequencies, indicated as 40 in Fig.4, the stage 82 leads to a value of 41, indicating a correlation of -1 to +1, and the value 41 in this case represents the extracted correlation value. Then, in step 83, the result for sub-band frequencies using a specific correlation values from stage 80 and the extracted values of correlation 41 obtained in step 82, is processed through a ful�means of comparison and subsequent finding is made or is treated by computing the actual difference. The result can be, as explained above, a binary result indicating that the actual time-frequency fragment considered in the signal down-mixing/analyzed signal has independent components. This decision is, when in fact a certain correlation value (step 80) is equal to the reference correlation value, or close enough to the reference correlation value.

However, when it is determined that a correlation value indicates a higher absolute correlation than the reference correlation value, it is determined that the considered time-frequency fragment contains dependent components. Consequently, when the correlation time-frequency tail of the signal down-mixing or decomposed signal indicates a higher absolute correlation value than the reference curve, we can say that the components in the time-frequency fragment depend on each other. However, when the correlation is shown as very close to the standard curve, we can say that the components are independent. Dependent components can take the first weighted value, for example 1, and the independent components can take a second weighted value, for example 0. Preferably, as illustrated in f�G. 4, high and low thresholds, which are spaced from a reference line, are used in order to provide the best result that is more appropriate than using only one reference curve.

Additionally, with respect to Fig.4 it should be noted that the correlation can vary from -1 to +1. Correlation with the minus sign, and optionally specifies the phase shift of 180° between the signals. Therefore, can also be applied to other correlations, covering only from 0 to 1, in which the negative part is just to set the correlation is positive. In this procedure, in that case you can ignore the time shift or phase shift to determine the correlation.

An alternative way of calculating the result is to actually calculate the distance between the correlation value determined at the step 80, and the extracted correlation value obtained in step 82, and then determine a score ranging from 0 to 1 as the weighting factor based on the distance. Although the first alternative (1) in Fig.8 leads only to values of 0 or 1, option (2) leads to values from 0 to 1 and, in some implementations, is preferred.

The processor 20 signals in Fig.3 is illustrated as multipliers, and the results of the analysis represent just a certain weighting factor, which �perenapravljaetsja from the analyzer to the signal processor, as illustrated at 84 in Fig.8, and then applied to the corresponding frequency-temporal fragment of the input signal 10. When, for example, actually considered range is 20-m spectrum in the sequence of spectra, and when you actually consider the element resolution frequency is the fifth element resolution frequency that is 20-th spectrum, time-frequency fragment may be specified as a (20, 5), where the first number indicates the block number in time, and the second number specifies the element resolution frequency in this spectrum. Then the result of the analysis to time-frequency frame (20, 5) is applied to the corresponding frequency-temporal frame (20, 5) of each channel of the input signal in Fig.3 or, when implemented module removing signals, as illustrated in Fig.1, to the corresponding frequency-temporal portion of each channel output signal.

Further more explains the calculation of the reference curve. For the present invention, however, essentially no matter how derived reference curve. It can be an arbitrary curve or, for example, the values in the lookup table that specifies the ideal or desired relationship of the input signals xjin the signal D of the step-down mixing either (and in the context of Fig.2) in the analyzed signal. Following excretion is �reminyl.

The physical scattering of the sound field can be estimated by the method presented through the work of Cook and others (Richard K. Cook, R. V. Waterhouse, R. D. Berendt, Seymour Edelman and Jr. M. C. Thompson, "Measurement of correlation coefficients in reverberant sound fields", Journal Of The Acoustical Society Of America, edition 27, No. 6, pp. 1072-1077, November 1955), using the ratio (r) correlations of sound pressure in the steady state of plane waves in two spatially separated points, as illustrated in the following equation (4):

(4)

whereandare the measurements of the sound pressure at two points, n is the temporal index and < > denotes averaging in time. In the sound field in steady state can be derived the following relations:

(5)

(6)

(for two-dimensional sound fields)

where d is the distance between the two points of measurement, andis the wavenumber, and λ represents the wavelength. Physical reference curve r(k,d) can be used as for subsequent processing.)

An indicator of the perceived distraction of the sound field is the coefficient Interaural cross-correlations (), measured in the sound field. The measurement implies that the radius between the pressure sensors (ears) is fixed. When you enable this restriction, r becomes a function of frequency with an angular frequencywhere c is the speed of sound in air. In addition, the pressure signals differ from the previously discussed signals free-field due to the effects of reflection, diffraction and deflection caused by Pinna, head and torso of the listener. These effects are important for spatial auditory perception are described by the transfer functions of the perception of sound (HRTF). Taking into account these effects result from pressure in the ears areand. To calculate can be used measured HRTF data, or can be obtained by approximation through the use of analytical models (for example, in Richard O. Duda and William L. Martens "Range dependence of the response of the spherical head model", Journal Of The Acoustical Society Of America, publication 104, No. 5, pp. 3048-3058, November 1998).

Because the auditory system�a person acts as a frequency analyzer with reduced frequency selectivity, additionally, it may be included in this frequency selectivity. The auditory filters are estimated to have the character of changes is similar overlapping bandpass filters. The following exemplary explanation, the approach based on critical bands used in order to approximate these overlapping bandwidth of the filter through a filter with a rectangular characteristic. Equivalent rectangular bandwidth (ERB) may be calculated as a function of center frequency (operation R Brian. Glasberg are and Brian C. J. Moore "Derivation of auditory filter shapes from notched-noise data", Hearing Research, publication 47, pp. 103-138, 1990). Given the fact that binaural processing is performed after auditory filtering,must be computed for the individual frequency channels, giving as a result the following frequency-dependent pressure signals:

(7)

(8)

where the limits of integration are set by the limits of the critical band according to the actual Central frequency ω. The coefficients of 1/b(ω) can be used or not be used in equations (7) and (8).

If one of the measurements of acoustic pressure is carried out ahead of or delay in frequency-independent, the time difference, the coherence of the signals can be evaluated. The auditory system has the ability to use such property temporary alignment. Usually, interiorally coherence calculated within ±1 msec. Depending on available computing power, computing can be implemented using only the values zero lag (low complexity) or the coherence with the time advance and delay (if high complexity is possible). Further, no distinction between the two cases.

The perfect nature of the change is achieved by assuming an ideal diffuse sound field, which can be idealized as a wave field, which consists of equally strong, uncorrelated plane waves propagating in all directions (i.e. the superposition of an infinite number of propagating plane waves with random proportions of the phases and uniformly distributed directions spread�Oia). The signal emitted by the loudspeaker can be considered a plane wave for the listener located far enough. This assumption of a plane wave is common in binaural reproduction over loudspeakers. Thus, synthetic sound field reproduced through the speakers formed of the constituent plane waves from a limited number of directions.

When the input signal with N channels, formed to play in the layout with positionsspeakers. (If only the horizontal layout for playback, lispecifies the azimuthal angle. In the General case, li=(azimuth, altitude) indicates the position of the loudspeaker relative to the head of the listener. If the current layout in the room to listen to is different from the reference layout, lialternative may represent the positions of the loudspeakers the actual layout for playback). Using this information, the reference curveInteraural coherence for modeling on the basis of the scattered field can be computed for this layout under the assumption that each speaker serves independent signals. The power of the signal generated by the share of each input channel in ka�house the time-frequency slice can be included in the calculation of the reference curve. In exemplary implementationsused as

Various reference curves as examples for the frequency-dependent reference curves or correlation curves illustrated in Fig.9A-9E for a different number of sound sources at different positions of the sound sources and different orientations of the head, as indicated on the drawings.

Further more explains the calculation of the results of the analysis, as explained in the context of Fig.8, based on the reference curves.

The goal is to output a weighting that equals 1 if the correlation channel down mixing equal to the calculated reference correlation under the assumption of independent signals reproduced from all of the speakers. If the correlation of the step-down mixing is equal to +1 or -1, derived weighting should be 0, which indicates that the independent components are not present. Between these extreme cases, the weighting should provide a reasonable transition between the indicator as an independent (W=1) or completely dependent (W=0).

At a reference correlation curve(ω) and the estimate of the correlation/coherence of the actual input signal, reproduced by actual�certification the layout you want to play ((c sig(ω)) is the correlation of the relative coherence of step-down mixing can be calculated with the deviation of csig(ω)(ω). This deviation (possibly including upper and lower threshold value) is mapped to the range [0; 1] to obtain the weighting (), which applies to all input channels in order to separate the independent components.

The following example illustrates the possible mapping, the thresholds correspond to the reference curve:

The amplitude of variation (denoted as) the actual curvefrom the referenceis defined as follows:

(9)

Provided that the correlation/coherence is limited within [-1; +1], the maximum possible deviation in the direction of +1 or -1 for each frequency is defined as follows:

/mtd> (10)

(11)

The weighting for each frequency thus obtained from

(13)

Taking into account the temporal dependence and the limited frequency resolution of the frequency decomposition, the weighted values are derived as follows (check Here for a General case of the reference curve, which can change in time. Time-independent reference curve (i.e.also possible):

(14)

Such processing may be performed by frequency decomposition with frequency coefficients are grouped due to the perception of sub-band frequencies for reasons of computational complexity, and in order to obtain filters with smaller pulse characteristics. In addition, you can apply smoothing filters, and can be applied compression functions (i.e. distortion weighting in the required manner, the additional introduction of the minimum and/or maximum weighted values).

Fig.5 illustrates an additional implementation of the present invention, in which a step-down mixer is implemented using HRTF filters and the auditory filters, as illustrated. In addition, Fig.5 additionally illustrates that the results of the analysis, derived by the analyzer 16, are weighting coefficients for each time-frequency resolution element, and the processor 20 signals are illustrated as a removing module for removing the independent components. In this case, the output of processor 20 is again N channels, but each channel now includes only independent components and no longer includes the dependent components. In this implementation, anal�congestion should calculate weighing, so the first implementation of Fig.8 independent component needs to make a weighted value of 1, and the dependent component must make a weighted value of 0. In this case, the frequency-temporal fragments in the original N channels, which are processed by processor 20, which are dependent components must be equal to 0.

In another alternative, if there is a weighted value from 0 to 1 in Fig.8, the analyzer must compute the weighting so that the time-frequency tile of the short distance to the reference curve, should take a high value (closer to 1), and time-frequency tile of the large distance to the reference curve, should take a small weighting factor (closer to 0). In the depicted subsequent to weighing, for example, Fig.3 20 independent components then need to be strengthened, while the dependent components must be weakened.

However, when the processor 20 signals not implemented for extracting independent components, and to extract dependent components, the weighting should be appointed Vice versa, so that when the weighing is performed in the multipliers 20 illustrated in Fig.3, the independent components are attenuated, and the dependent components are amplified. Consequently, each processor�of galov can be used to extract the components of the signal because the definition is actually extracted component signal is performed by means of the weighted values.

Fig.6 illustrates an additional implementation of the inventive concept, but now in a different implementation of the processor 20. In a variant implementation of Fig.6, the processor 20 is implemented to retrieve the scattered independent parts, independent of direct parts and straight parts/components on the merits.

To obtain, from the separated independent components (), the parts that contribute to the perception of the encompassing /surrounding sound field, should take into account additional constraints. One such restriction may constitute a presumption that the covering surround sound is equally strong from every direction. Thus, for example, the minimum energy of each time-frequency slice in each channel independent audio signals can be extracted to obtain a covering surrounding the signal (which can be processed further to obtain a higher number of surround channels). Example:

(15)

wheredenotes kratkofil�ing the power evaluation. (This example shows the simplest case. One obvious exceptional case in which it is not applicable is when one of the channels includes a pause signal during which the input power of the channel must be very low or zero).

In some cases, the priority is to extract equal energy of all input channels and to calculate the weighting using only the extracted spectra.

(16)

Retrieved dependent (which, for example, can be derived as Ydependent=Yj(m,i)-Xj(m,i)) can be used to detect channel dependencies, and thus aimed to assess the labels which are inherent due to the input signal, providing the possibility of additional processes, such as, for example, re-panning.

Fig.7 illustrates a variation of the General principle. N-channel input signal is supplied to the driver of the analyzed signals (ASG). The formation of the M-channel of the analyzed signal may, for example, include the model of distribution of the channels/speakers in the ears or other methods, referred to as on�iAUDIO mixing in this document. The indicator relative to the various components based on the analyzed signal. Mask indicating various components are applied to the input signals (extracting A/D extraction (20a, 20b)). The weighted input signals are further processed (post-processing A/D postprocessing (70a, 70b) to give the resulting output signals with a specific character, and in this example, the designations "A" and "D" are chosen so that they indicate what components need to be extracted, can be "others" and "direct sound".

The following describes Fig.10. Stationary sound fields are called scattered if the directional distribution of sound energy does not depend on direction. Directional energy distribution can be estimated by measuring all directions using a shotgun microphone. In room acoustics reverberate sound field in an enclosed space often is modeled as a scattered field. Diffuse sound field can be idealized as a wave field, which consists of equally strong, uncorrelated plane waves propagating in all directions. This sound field is isotropic and homogeneous.

If the uniformity of the energy distribution is of particular interest, the coefficient correl�tion "point to point"

r=<p1(n)p2(n)>[<p12(n)><p22(n)>]12

sound pressure p1(t) and p2(t) in steady state at two spatially separated points can be used to evaluate the physical scattering of the sound field. For made a perfect three-dimensional and two-dimensional scattered sound fields in steady state, induced by a sinusoidal source can be derived the following relations:

r3D=sin(kd)kd,

and

r2D=J0(kd),

DG� k=2πλ(wherein λ represents a wavelength) is the wave number and d is the distance between the measurement points. Given these relationships, the scattering of the sound field can be estimated by comparing the measurement data with reference curves. As a perfect relationship are only necessary but not sufficient conditions, can be considered a certain number of measurements with different orientations of the axis connecting the microphones.

If the listener in the sound field measurement of sound pressure signals are set via pl(t) and pr(t) entering the ears. Thus, allowed the distance d between the measurement points is fixed, and r becomes a function of frequency only iff=kc2πwhere c is the speed of sound in air. The signals arriving at the ears, differ from the previously discussed signals free-field due to the influence of the effects caused by Pinna, head and torso of the listener. These effects are important for spatial hearing are outlined in�the case of the transfer functions of the perception of sound (HRTF). Measured HRTF data can be used in order to include these effects. In order to simulate the HRTF approximation, the analytic model. The head is modeled as a rigid sphere with a radius of 8.75 cm and locations of the ears when in azimuth ±100° and the height of 0°. With regard to theoretical nature of the change r in the ideal diffuse sound field and the effect of HRTF, you can determine the frequency-dependent reference curve Interaural mutual correlation of scattered sound fields.

Evaluation of distraction is based on comparing the simulated marks of alleged reference marks in the scattered field. This comparison is subject to the limitations of human hearing. In the auditory system, binaural processing is performed to the auditory periphery, consisting of external ear, middle ear and inner ear. The effects of the outer ear that are not approximated by a spherical model (for example, the shape of the ears, the ear canal), and the effects of the middle ear are not considered. The spectral selectivity of the inner ear is modeled as a comb of overlapping band-pass filters (called auditory filters in Fig.10). The approach on the basis of critical frequency bands is used in order to approximate these overlapping bandwidth filter through�m filters with a rectangular characteristic. Equivalent rectangular bandwidth (ERB) is calculated as a function of center frequency in accordance with the following:

b(fc)=24.7(0.00437fc+1)

It is assumed that the auditory system of a person permits a temporary alignment in order to detect coherent signal components, and this cross-correlation analysis is used to estimate the time of alignment (corresponding ITD) in the presence of complex sounds. Approximately up to 1-1,5 kHz, shifts in time of the carrier signal are estimated using a cross-correlation of waveforms, whereas at higher frequencies, the cross-correlation between the envelopes becomes relevant label. Further, this distinction is not carried out. Assessment Interaural coherence (IC) is modeled as a maximum absolute value of the normalized function Interaural cross-correlation:

IC=maxτ|<pL(t) pR(t+τ)>[<pL2(t)><pR2(t)>]12|.

Some models of binaural perception considering the analysis on the basis Interaural cross-correlation. Because they are seen stationary signals, the time dependence is not taken into account. In order to simulate the effect of treatment on the basis of critical frequency bands, the frequency-dependent normalized cross-correlation function is calculated as follows:

IC(fc)=<A>[<B><C>]12,

where A is the cross-correlation function in the calculation of the critical frequency band, and B and C are the autocorrelation functions in calc�those on the critical bandwidth. Their relationship with the frequency region over a bandpass mutual spectrum and bandpass autospectrum can be formulated as follows:

A=maxτ|2Re(ff+L*(f)R(f)ej2πf(tr)df)|,

B=|2(ff+L*(f)L(f)ej2πftdf)|,

C=|2(ff+R*(f)R(f)ej2πftdf)|,

where L(f) and R(f) are the Fourier transforms of the signals arriving at the ears,f±=fc±b(fc)2are the upper and lower limits of integration of the critical bands according to the actual center frequency, and * denotes the complex conjugate of a number.

If the signals from two or more sources at different angles are superimposed, are caused by fluctuating ILD and ITD labels. Such changes of ILD and ITD as a function of time and/or frequency can form the volume. However, in the long-term average, should not be ILD and ITD in the scattered sound field. The average ITD in zero� means, the correlation between the signals may be increased by temporarily combining. ILD in principle can be measured within a full range of sound frequencies. Since the head is not an obstacle at low frequencies, the ILD are most effective at medium and high frequencies.

Further explanation of Fig.11A and 11B, in order to illustrate an alternative implementation of the analyzer without using a reference curve, as explained in the context of Fig.10 or Fig.4.

Short time Fourier transform (STFT) is applied to the input audio channels, surround sound, giving as a result of short-term spectra ofX(m,i)1-XN(m,i)respectively, where m is the spectral (temporal) index, and i is the frequency index. Spectra lowering stereomicroscope input surround signal, denoted byX(m,i)1and X(m,i)2, are calculated. For 5.1 surround downmix ITU is appropriate as equation (1).X(m,i)1-X5(m,i)correspond in this order from the left channel (L), right channel (R), center channel (C), the left channel surround (LS) and right surround (RS). Further, both time and frequency indices are omitted in most cases for short entries.

Based on the stereo signal down-mixing filters WDand WAare calculated to provide estimates of the direct and ambient sound from surround in equation (2) and (3).

Under the assumption that the ambient sound signal decorrelates between all input channels, the coefficients of step-down mixing is selected in such a way that this assumption is also applied to channel down mixing. Thus, it is possible to form an�to wirawati model for signal down-mixing in equation 4.

D1and D2are correlated STFT spectra of the direct sound, and A1and A2are decorrelated surround sound. It is additionally assumed that the direct sound and the ambient sound in each channel are mutually decorrelated.

The evaluation of direct sound in relation to the method of least squares, is achieved by applying the Wiener lter to the original surround signal, to suppress the surrounding portion. To display a single filter that can be applied to all input channels, direct components in a down-mixing are estimated using an identical filter to the left and right channels according to equation (5).

Combined function of the RMS error for this estimate is set by means of equation (6).

E{}is the expectation operator, and PDand PAare the sums of short-term power ratings of direct and ambient components (equation 7).

Function error (6) is minimized by zeroing its derivative. The resulting filter to estimate the direct sound is in equation 8.

Similarly, the estimation filter for OK�: space sound can be output according to equation 9.

Further, estimates are derived for PDand PArequired to calculate WDand WA. Cross-correlation of the step-down mixing is defined by equation 10, which based models for signal down-mixing (4) is given with reference to (11).

Additionally, provided that the surrounding components in a down-mixing have identical power input left and right channel down-mixing, we can write equation 12.

After the substitution of equation 12 in the last line of equation 10 and considering equation 13 we get the equation (14) and (15).

As explained in the context of Fig.4, the formation of the reference curves for the minimum correlation may be assumed by placing two or more different audio sources in the layout to play and by placing the head of the listener at a specific position in the layout for playback. Then completely independent of the signals emitted by the various speakers. For the layout with two speakers, two channels must be fully decorrelate with correlation equal to 0, if no result mutual mixing. However, these result mutual mixing occur due to cross-linking from the left side to the right page�not auditory system of a person, other cross-linking also occur due to reverberations in the room, etc. Therefore, the resulting reference curves, as illustrated in Fig.4 or Fig.9A-9D, is not always equal to 0, a are, in particular, different from 0, although the reference signals, assumed in this scenario are completely independent. However, it is important to understand that these signals are not actually required. Also, it helps to assume full independence between two or more signals during the calculation of the reference curve. In this context, however, it should be noted that other reference curves can be calculated for other scenarios, e.g., use of, or assumption of the signals are not completely independent, and have certain, but pre-known dependence or the degree of dependence between them. When calculated this the other reference curve, interpretation or provision of the weight coefficients must be different compared to a standard curve, which allowed completely independent signals.

While some aspects described in the context of the device it is obvious that these aspects also represent a description of a corresponding method, wherein the block or device corresponds to the stage of the method or grounds of the fashion stage. Analogously, aspects described � the context of fashion stage, also provide a description of a corresponding block or item or feature of a corresponding device.

Invented the decomposed signal can be stored on digital media storage or can be transmitted over the transmission medium, such as a wireless transmission medium or a wired transmission medium, for example the Internet.

Depending on certain requirements to implementation, embodiments of the invention can be implemented in hardware or in software. The implementation may be performed using digital data storage media, such as floppy disk, DVD, CD, ROM, PROM, EPROM, EEPROM or flash memory having stored electronically readable control signals, which interact (or allow interaction with a programmable computer system so that is the appropriate way.

Some variants of the implementation according to the invention contain neuremedy media storage, having electronically readable control signals, which allow interaction with a programmable computer system such that one of the methods described in this document.

In General, embodiments of the present invention can be implemented as a computer program product programmnig code the software code being configured to implement one of the methods when the computer program product runs on a computer. Software code, for example, may be stored on machine-readable media.

Other embodiments of contain a computer program for implementing one of the methods described herein, stored on a computer readable medium.

In other words, therefore, embodiment of the invented method is a computer program having a program code for implementing one of the methods described herein when the computer program running on the computer.

Therefore, an additional embodiment of the invent ways represents the media data storage digital data storage media or machine-readable medium) containing recorded computer program for implementing one of the methods described in this document.

Therefore, an additional embodiment of the invent of the method is a data stream or a sequence of signals representing the computer program for implementing one of the methods described in this document. The data stream or the sequence of signals may, for example, be a ful�Nena with the possibility of transmission via the data connection, for example, through the Internet.

Additional variant of implementation includes processing means such as a computer or programmable logic device, configured to implement one of the methods described in this document.

Additional variant of implementation contains the computer that has installed a computer program for implementing one of the methods described in this document.

In some embodiments, the programmable logic device (e.g., a field programmable gate array) may be used to fulfill part or all of the functionalities of the methods described in this document. In some embodiments, user-programmable gate array may cooperate with a microprocessor to perform one of the methods described in this document. In General, the methods are preferably implemented by any hardware device.

The above-described embodiments of are merely indicative in respect of the principles of the present invention. It should be understood that modifications and changes of layouts and features described in this document, should be obvious to those skilled in the art. Consequently, they which imply�tsya as limited only by the scope of the following claims, and not by means of specific signs presented through descriptions and explanations of the embodiments herein.

1. A device for the decomposition of the input signal (10) having at least three input channels, comprising:
- step-down mixer (12) for down-mixing the input signal to get the signal down-mixing, with a step-down mixer (12) is made with possibility of step-down mixing, so that the number of channels step-down mixing signal (14) step-down mixing is at least 2 and less on the number of input channels;
- the analyzer (16) for analyzing the signal down-mixing it to output the result (18) analysis; and
- a processor (20) signals for processing an input signal (10) or signal (24), derived from the input signal using the result (18) analysis, wherein the processor (20) signals made with the possibility of using the analysis result to the input channels of the input signal or channels of the signal derived from the input signal to obtain a decomposed signal (26), wherein the signal derived from the input signal differs from the signal down-mixing.

2. The device according to claim 1, further comprising time-frequency Converter (32) for converting the input channels into a temporary sequence�lnost frequency representations of the channel, each frequency representation of the input channels has a plurality of sub-bands of frequencies, or in which a step-down mixer (12) comprises a time-frequency Converter for converting the signal down-mixing,
- wherein the analyzer (16) is made with the possibility of the formation of the result (18) analysis for individual sub-bands of frequencies, and
- the processor (20) signals made with the possibility of application of certain analysis results to the respective frequency sub-bands of the input signal or the signal derived from the input signal.

3. The device according to claim 1, wherein the analyzer (16) is arranged to generate, as a result of the analysis, the weighting factors (W(m, i)), and
- the processor (20) of the signals is arranged to apply weighting factors to the input signal or the signal output from the input signal by weighting using weighting factors.

4. The device according to claim 1, wherein a step-down mixer is made with the possibility of summation of weighted or unweighted input channels in accordance with rule lowering mixing set in such a way that at least two channel down mix differ from each other.

5. The device according to claim 1, wherein a step-down mixer (12) is adapted to the filtration.� input signal (10) using filters based on impulse response of the room, filters based on binaural impulse response of the room (BRIR) or filters based on HRTF.

6. The device according to claim 1, wherein the processor (20) is made with the possibility of applying the Wiener filter to the input signal or the signal output from the input signal, and in which the analyzer (16) is arranged to calculate the Wiener filter using the values of mathematical expectation, derived from the channel down-mixing.

7. Device according to one of the preceding paragraphs, further comprising a module (22) insertion signals to output a signal from the input signal so that the signal derived from the input signal, has a great number of channels compared with the signal down-mixing or input.

8. The device according to claim 1, wherein the analyzer (20) is arranged to use the pre-stored frequency-dependent curve of similarity indicating a frequency-dependent similarity between two signals generated by the pre-known reference signals.

9. The device according to claim 1, wherein the analyzer is configured to use the pre-stored frequency-dependent curve of similarity indicating a frequency-dependent similarity between two or more signals at the listening position, assuming that the signals have a known x�the characteristics of these similarities and the signals can be emitted through the speakers in the known positions of the loudspeakers.

10. The device according to claim 1, wherein the analyzer is arranged to calculate dependent signal from the frequency-dependent curve similarity using frequency-dependent short-term power of the input channels.

11. The device according to claim 8, in which the analyzer (16) is arranged to calculate the similarity of the channel down mixing in the sub-band of frequencies (80), to compare the result of the evaluation of similarity with the similarity indicated by the reference curve (82, 83), and to generate the weighting factor based on the result of compression as a result of the analysis, or
- calculate the distance between the result and the similarity indicated by the reference curve for identical sub-band of frequencies, and further to calculate a weighting factor based on the distance as a result of the analysis.

12. The device according to claim 1, wherein the analyzer (16) is arranged to analyze the channels reducing mixing in the sub-bands of frequencies defined by the frequency resolution of the human ear.

13. The device according to claim 1, wherein the analyzer (16) is arranged to analyze the signal down-mixing to form the resulting analysis provides a decomposition into direct and surrounding parts, and
- the processor (20) signals made with the possibility of the extraction�in termination of the straight part or the surrounding part using the result of the analysis.

14. Method of decomposition of an input signal (10) having at least three input channels, comprising stages on which:
- mixing lower (12) input to get the signal down-mixing, so that the number of channels step-down mixing signal (14) step-down mixing is at least 2 and less on the number of input channels;
- analyze (16) signal down-mixing it to output the result (18) analysis; and
- handle (20) input (10) or a signal (24), derived from the input signal, using the result (18) analysis, wherein the analysis result is applied to the input channels of the input signal or channels of the signal derived from the input signal to obtain a decomposed signal (26), wherein the signal derived from the input signal differs from the signal down-mixing.

15. Machine-readable media containing recorded therein a computer program for implementing the method according to claim 14 when the computer program is executed by a computer or processor.



 

Same patents:

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio processing, particularly to decomposition of audio signals into different components, for example, differently detectable components. An apparatus for decomposing a signal having at least three channels comprises an analyser (16) for analysing a similarity between two channels of an analysed signal related to the signal having at least two analysed channels, wherein the analyser is configured to use a pre-calculated frequency-dependent similarity curve as a reference curve to determine the analysis result. The signal processor (20) processes the analysed signal or a signal derived from the analysed signal or a signal, from which the analysed signal is derived using the analysis result to obtain a decomposed signal.

EFFECT: decomposing a signal using a pre-calculated frequency-dependent similarity curve as a reference curve.

15 cl, 16 dwg

FIELD: physics.

SUBSTANCE: invention discloses perfected tools for author development and presentation of sound playback data. Some said tools allow combine said data for wide range of playback means. Playback data can be individually developed by creation of metadata for audio objects. Said metadata can be created with reference to zones of loudspeakers. Data of audio playback can be reproduced in compliance with loudspeakers arrangement for particular playback medium.

EFFECT: simplified computer processing of 3D sound.

42 cl, 47 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to encoding and decoding an audio signal in which audio samples for each object audio signal may be localised in any required position. In the method and device for encoding an audio signal and in the method and device for decoding an audio signal, audio signals may be encoded or decoded such that audio samples may be localised in any required position for each object audio signal. The method of decoding an audio signal includes extracting from the audio signal a downmix signal and object-oriented additional information; generating channel-oriented additional information based on the object-oriented additional information and control information for reproducing the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multichannel audio signal using the processed downmix signal and the channel-oriented additional information.

EFFECT: high accuracy of reproducing object audio signals.

7 cl, 20 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of encoding audio signals and related spatial information in a format which is independent of the playback scheme. A first set of audio signals is assigned to a first group. The first group is encoded as a set of mono audio tracks with associated metadata describing the direction of the signal source of each track relative to the recording position and the initial playback time thereof. A second set of audio signals is assigned to a second group. The second group is encoded as at least one set of ambisonic tracks of a given order and a mixture of orders. Two groups of tracks comprising the first and second sets of audio signals are generated.

EFFECT: providing a technique capable of presenting spatial audio content independent of the exhibition method.

26 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to a surround sound system. multi-channel spatial signal comprising at least one surround channel is received. Ultrasound is emitted towards a surface to reach a listening position via reflection of said surface. The ultrasound signal may specifically reach the listening position from the side, above or behind of a nominal listener. A first drive unit generates a drive signal for the directional ultrasound transducer from the surround channel. The use of an ultrasound transducer for providing the surround sound signal provides an improved spatial experience while allowing the speaker to be located, for example, in front of the user. An ultrasound beam is much narrower and well defined than conventional audio beams and can therefore be better directed to provide the desired reflections. In some scenarios, the ultrasound transducer may be supplemented by an audio range loudspeaker.

EFFECT: high quality of reproducing audio and high efficiency of the surround sound system.

12 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal (Xdn,k) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

FIELD: physics, acoustics.

SUBSTANCE: invention relates to processing signals in an audio frequency band. The apparatus for generating at least one output audio signal representing a superposition of two different audio objects includes a processor for processing an input audio signal to provide an object representation of the input audio signal, where that object representation can be generated by parametrically guided approximation of original objects using an object downmix signal. An object manipulator individually manipulates objects using audio object based metadata relating to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an output audio signal having one or multi-channel signals depending on a specific rendering setup.

EFFECT: providing efficient audio signal transmission rate.

14 cl, 17 dwg

FIELD: radio engineering, communication.

SUBSTANCE: described is a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker system, wherein each virtual sound source position is associated to each channel. The device includes a correlation reducer for differently converting, and thereby reducing correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a centre and a non-centre channel of the plurality of channels, in order to obtain an inter-similarity reduced combination of channels; a plurality of directional filters, a first mixer for mixing output signals of the directional filters modelling the acoustic transmission to the first ear canal of the listener, and a second mixer for mixing output signals of the directional filters modelling the acoustic transmission to the second ear canal of the listener. Also disclosed is an approach where centre level is reduced to form a downmix signal, which is further transmitted to a processor for constructing an acoustic space. Another approach involves generating a set of inter-similarity reduced transfer functions modelling the ear canal of the person.

EFFECT: providing an algorithm for generating a binaural signal which provides stable and natural sound of a record in headphones.

33 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: method comprises estimating a first wave representation comprising a first wave direction measure characterising the direction of a first wave and a first wave field measure being related to the magnitude of the first wave for the first spatial audio stream, having a first audio representation comprising a measure for pressure or magnitude of a first audio signal and a first direction of arrival of sound; estimating a second wave representation comprising a second wave direction characterising the direction of the second wave and a second wave field measure being related to the magnitude of the second wave for the second spatial audio stream, having a second audio representation comprising a measure for pressure or magnitude of a second audio signal and a second direction of arrival of sound; processing the first wave representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure, a merged direction of arrival measure and a merged diffuseness parameter; processing the first audio representation and the second audio representation to obtain a merged audio representation, and forming a merged audio stream.

EFFECT: high quality of a merged audio stream.

15 cl, 7 dwg

FIELD: physics.

SUBSTANCE: apparatus (100) for generating a multichannel audio signal (142) based on an input audio signal (102) comprises a main signal upmixing means (110), a section (segment) selector (120), a section signal upmixing means (110) and a combiner (140). The section signal upmixing means (110) is configured to provide a main multichannel audio signal (112) based on the input audio signal (102). The section selector (120) is configured to select or not select a section of the input audio signal (102) based on analysis of the input audio signal (102). The selected section of the input audio signal (102), a processed selected section of the input audio signal (102) or a reference signal associated with the selected section of the input audio signal (102) is provided as section signal (122). The section signal upmixing means (130) is configured to provide a section upmix signal (132) based on the section signal (122), and the combiner (140) is configured to overlay the main multichannel audio signal (112) and the section upmix signal (132) to obtain the multichannel audio signal (142).

EFFECT: improved flexibility and sound quality.

12 cl, 10 dwg

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to complex transformation channel coding devices with broadband frequency coding. The coded data of multichannel sound are received in a bit flow, and the coded data of multichannel sound contain the coding data with channel expansion and coding data with frequency expansion, and coding data with channel expansion contain the combined channel for multiple sound channels and the set of parameters for representation of certain canals of this set of sound channels as modified versions of the combined channel. On the basis of information in the bit flow it is determined whether the named set of parameters contains the package of parameters containing a normalised correlation matrix, or the set of parameters containing the complex parameter representing the ratio containing the imaginary component and the real component for cross-correlation between two of the named set of sound channels. On the basis of this determination the named set of parameters is decoded. The set of sound channels is recovered using the coding data with channel expansion and coding data with frequency expansion.

EFFECT: improvement of quality of multichannel sound.

20 cl, 42 dwg, 1 tbl

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to means of stereo encoding and decoding using complex prediction in the frequency domain. In one of the versions of the invention, a decoding method for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels comprises upmixing steps of: computing a second frequency-domain representation of a first input channel; and computing an output channel based on the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient. The method includes performing frequency-domain modifications selectively before or after upmixing.

EFFECT: providing high audio quality while reducing computational costs.

15 cl, 19 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio processing, particularly to decomposition of audio signals into different components, for example, differently detectable components. An apparatus for decomposing a signal having at least three channels comprises an analyser (16) for analysing a similarity between two channels of an analysed signal related to the signal having at least two analysed channels, wherein the analyser is configured to use a pre-calculated frequency-dependent similarity curve as a reference curve to determine the analysis result. The signal processor (20) processes the analysed signal or a signal derived from the analysed signal or a signal, from which the analysed signal is derived using the analysis result to obtain a decomposed signal.

EFFECT: decomposing a signal using a pre-calculated frequency-dependent similarity curve as a reference curve.

15 cl, 16 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio signal estimation means. The apparatus includes a unit for determining a codebook from a plurality of codebooks as an identified codebook. In the apparatus, an audio signal is encoded using the identified codebook and an estimation unit, which is configured to obtain a level value associated with the identified codebook as the obtained level value and for estimating the level of the audio signal using the obtained level value.

EFFECT: high efficiency of encoding an audio signal.

19 cl, 11 dwg

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to bandwidth expansion devices. An excitation signal based on an acoustic signal is generated; with that, the acoustic signal includes a variety of frequency components. A feature vector is distinguished out of the acoustic signal; with that, the feature vector includes at least one feature of a component in a frequency domain and at least one feature of a component in a time domain. At least one parameter of the spectrum shape is determined based on the feature vector; with that, at least one parameter of the spectrum shape corresponds to a sub-range signal containing frequency components that belong to an additional variety of frequency components. A signal of the sub-range is generated by the filtration of an excitation signal by means of a filter bank and weighing of a filtered excitation signal using at least one parameter of the spectrum shape.

EFFECT: technical result consists in the improvement of perception of an expanded acoustic signal.

21 cl, 10 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to encoding and decoding an audio signal in which audio samples for each object audio signal may be localised in any required position. In the method and device for encoding an audio signal and in the method and device for decoding an audio signal, audio signals may be encoded or decoded such that audio samples may be localised in any required position for each object audio signal. The method of decoding an audio signal includes extracting from the audio signal a downmix signal and object-oriented additional information; generating channel-oriented additional information based on the object-oriented additional information and control information for reproducing the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multichannel audio signal using the processed downmix signal and the channel-oriented additional information.

EFFECT: high accuracy of reproducing object audio signals.

7 cl, 20 dwg

FIELD: physics, acoustics.

SUBSTANCE: group of inventions relates to expansion of a compressed audio signal which consists of one or more compressed audio channels into an expanded audio signal. An expansion unit is set up to use current variable expansion parameters to expand a compressed audio signal in order to obtain an expanded audio signal, wherein current variable expansion parameters comprise current variables of smoothed phase values. A parameter determiner is set up to obtain one or more current smoothed expansion parameters for use in the expansion unit based on input information on sampled expansion parameters. The parameter determiner is set up to combine a scaled version of the previous smoothed phase value and a scaled version of input phase information, using a phase change limiting algorithm to determine the current smoothed phase value based on the previous smoothed value and input phase information.

EFFECT: high quality of the expanded audio signal.

13 cl, 7 dwg

FIELD: physics, computer engineering.

SUBSTANCE: hardware unit for expanding a compressed audio signal into an expanded audio signal, comprising one or more expanded audio channels, including a parameter processing unit, configured to apply expansion parameters for expanding the compressed audio signal and obtain an expanded audio signal. The parameter processing unit is configured to apply phase shift to the compressed audio signal and obtain a phase-shifted version of the compressed audio signal when storing a decorrelated phase-invariable signal. The parameter processing unit is also configured to sum the phase-shifted version of the compressed audio signal and the decorrelated signal and obtain an expanded audio signal.

EFFECT: expanding a compressed audio signal into an expanded audio signal.

16 cl, 4 dwg

FIELD: radio engineering, communication.

SUBSTANCE: analogue speech signal is sampled with a standard frequency of 8000 Hz. The sampled speech signal is transmitted to the input of a bandpass filter with cut-off bands of 0.3 kHz and 3.4 kHz. Discrete Fourier transform is performed over the filtered signal to obtain expansion coefficients. Further, the expansion coefficients are rearranged in reverse order. Inverse discrete Fourier transform is then performed, after which the spectrum of the speech signal becomes inverted with respect to the initial spectrum. The disclosed transformation is characterised by that the signal becomes inverted on time.

EFFECT: faster transformation.

6 dwg

FIELD: physics.

SUBSTANCE: determination is ensured by making the conclusion on psychophysiological conditions of a person by variation in time of the ratio of absolute magnitude of arbitrary jitter of speech signal main tone period, duration of pauses in speech signal, duration of key depression and intervals between key depressions, duration of depression and intervals between depressions of left mouse key, mouse motion signal and image oscillation period exceeding the threshold to their total number.

EFFECT: higher accuracy of determination.

8 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: apparatus for encoding a mutichannel audio signal has a multichannel audio signal receiver, having a first and a second audio signal from a first and a second microphone, a time difference module for determining time difference between the first and second audio signals by combining successive observations of cross-correlations between the first and second audio signals, wherein the cross-correlations are normalised to derive state probabilities accumulated using a Viterbi algorithm to achieve time difference with built-in hysteresis, and the Viterbi algorithm calculates the state probability for each given state in form of a combined contribution of all routes included in that state, a delay module for multichannel audio signal compensation by delaying the first or second audio signal in response to the time difference signal, a monophonic module for generating a monophonic signal by combining multichannel audio signal compensation channels, and a monophonic signal encoder.

EFFECT: high quality and efficiency of encoding.

10 cl, 5 dwg

Up!