RussianPatents.com

Apparatus for generating output spatial multichannel audio signal. RU patent 2504847.

Apparatus for generating output spatial multichannel audio signal. RU patent 2504847.
IPC classes for russian patent Apparatus for generating output spatial multichannel audio signal. RU patent 2504847. (RU 2504847):

G10L19/008 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, e.g. for compression or expansion, source-filter models or; psychoacoustic analysis
Another patents in same IPC classes:
Method of creating codebook and search therein during vector quantisation of data Method of creating codebook and search therein during vector quantisation of data / 2504027
Method can be used to reduce consumption of computational resources and the required size of storage devices when creating codebooks and executing reference vector search algorithms therein, including when performing low-speed speech signal coding. The technical result of the disclosed method is reducing the required size of storage devices and reducing consumption of computing computational resources when performing search in a codebook during vector quantisation. The set task is achieved by constructing a special codebook structure based on neural networks using training algorithms with adjustment. Search is performed in form of step-by-step hierarchical vector quantisation. The resultant vector is a sum of code vectors found at each step. The disclosed method can be used to reduce consumption of computational resources and the required size of storage devices when executing reference vector search algorithms in a codebook.
Parametric stereophonic upmix apparatus, parametric stereophonic decoder, parametric stereophonic downmix apparatus, parametric stereophonic encoder Parametric stereophonic upmix apparatus, parametric stereophonic decoder, parametric stereophonic downmix apparatus, parametric stereophonic encoder / 2497204
Parametric stereophonic upmix apparatus (300, 400) generating a left signal (206) and a right signal (207) from a monophonic downmix signal (204) based on spatial parameters (205). Said parametric stereophonic upmix is characterised by that it comprises a means (310) for predicting a difference signal (311) comprising a difference between the left signal (206) and the right signal (207) based on the monophonic downmix signal (204) scaled with a prediction coefficient (321). Said prediction coefficient is derived from the spatial parameters (205). Said parametric stereophonic upmix apparatus (300, 400) further comprises an arithmetic means (330) for deriving the left signal (206) and the right signal (207) based on a sum and a difference of the monophonic downmix signal (204) and said difference signal (311).
Sound encoding device, sound decoding device, sound encoding and decoding device and teleconferencing system Sound encoding device, sound decoding device, sound encoding and decoding device and teleconferencing system / 2495503
Sound encoding device includes: a downmixing signal generating unit (410) which generates in the time domain a first downmixing signal which is one of a 1-channel sound signal and a 2-channel sound signal from an input multi-channel sound signal; a downmixing signal encoding unit (404) which encodes the first downmixing signal; a first conversion unit t-f (401), which converts the input multi-channel sound signal into a frequency-domain multi-channel sound signal; and a spatial information computing unit (409), which generates spatial information for generating a multi-channel sound signal from the downmixing signal.
Lossless multi-channel audio codec using adaptive segmentation with random access point (rap) and multiple prediction parameter set (mpps) capability Lossless multi-channel audio codec using adaptive segmentation with random access point (rap) and multiple prediction parameter set (mpps) capability / 2495502
Invention relates to lossless multi-channel audio codec which uses adaptive segmentation with random access point (RAP) and multiple prediction parameter set (MPPS) capability. The lossless audio codec encodes/decodes a lossless variable bit rate (VBR) bit stream with random access point (RAP) capability to initiate lossless decoding at a specified segment within a frame and/or multiple prediction parameter set (MPPS) capability partitioned to mitigate transient effects. This is accomplished with an adaptive segmentation technique that fixes segment start points based on constraints imposed by the existence of a desired RAP and/or detected transient in the frame and selects a optimum segment duration in each frame to reduce encoded frame payload subject to an encoded segment payload constraint. RAP and MPPS are particularly applicable to improve overall performance for longer frame durations.
Apparatus, method and computer programme for providing set of spatial indicators based on microphone signal and apparatus for providing double-channel audio signal and set of spatial indicators Apparatus, method and computer programme for providing set of spatial indicators based on microphone signal and apparatus for providing double-channel audio signal and set of spatial indicators / 2493617
Apparatus for providing a set of spatial indicators associated with an upmix audio signal, having more than two channels, based on a double-channel microphone signal, comprises a signal analyser and an additional spatial information generator. The signal analyser is configured to receive component energy information and direction information based on the double-channel microphone signal such that the component energy information describes an estimate of energy of the direct sound component of the double-channel microphone signal, and such that the direction information describes an estimate of the direction from which the direct sound component of the double-channel microphone signal arrives. The additional spatial information generator is configured to compare component energy information and direction information with spatial indicator information which describes the set of spatial indicators associated with an upmix audio signal, having more than two channels.
Efficient use of stepwise transmitted information in audio encoding and decoding Efficient use of stepwise transmitted information in audio encoding and decoding / 2491657
Audio signal can be derived using correlation information indicating a correlation between first and second input audio signals, when a signal characterisation information, indicating at least a first or a second different characteristic of the input audio signal, is additionally considered. Phase information indicating a phase relation between the first and the second input audio signals is derived, when the input audio signals have the first characteristic. The phase information and a correlation measure are included into the encoded representation when the input audio signals have the first characteristic, and only the correlation information is included into the encoded representation when the input audio signals have the second characteristic.
Audio signal decoder and method of controlling audio signal decoder balance Audio signal decoder and method of controlling audio signal decoder balance / 2491656
To support stereo perception, localisation vibration of a decoded signal is suppressed. A selection unit (220) selects balance parameters if balance parameters are input data from a gain decoding unit (210), or selects input data of balance parameters from a gain computation unit (223) if there are no input data of the balance parameter from the gain decoding unit (210), and outputs the selected balance parameters to a multiplier unit (221). The multiplier unit (221) multiplies gain input data from the selection unit (220) by decoded input data of a monophonic signal from a monophonic decoding unit (202), to process balance control.
Coding device, decoding device and method Coding device, decoding device and method / 2488897
Speech coding device comprises a coding section of the first level, which performs processing by coding in respect to the input speech signal; a decoding section of the first level, which performs processing by decoding using coded data of the first level; a section for calculation of erroneous transformation coefficients of the first level, which converts the erroneous signal of the first level into a frequency area for calculation of erroneous coefficients of transformation of the first level; and a coding section of the second level, which performs processing by coding in respect to erroneous transformation coefficients of the first level, besides, the coding section of the second level comprises: a setting section; a selection section; a connected strip configuration section, which connects a strip selected from the low-frequency strip, and a fixed strip from the high-frequency strip, in order to configure the connected strip; and a section of coded data generation, which codes erroneous transformation coefficients of the first level, included into the connected strip, in order to generate coded data of the second level.
Mixing of incoming information flows and generation of outgoing information flow Mixing of incoming information flows and generation of outgoing information flow / 2488896
Device (500) is used for mixing of multiple incoming information flows (510), in which each of incoming information flows (510) comprises a frame (540) of audio data in a spectral area, the frame (540) of the incoming information flow (510), comprising spectral information for multiple spectral components. The device comprises a unit of data processing (520), arranged to compare frames (540) of multiple incoming information flows (510). The data processing unit (520) is also arranged to determine, on the basis of comparison, for a spectral component of an outgoing frame (550) of an outgoing information flow (530) only one incoming information flow (510) from multiple incoming information flows (510). The data processing unit (520) is further arranged to generate an outgoing information flow (530) by means of copying of at least a part of information of the appropriate spectral component of the frame of the certain information flow (510), in order to describe the spectral component of the outgoing frame (550) of the outgoing information flow (530).
Quantiser, encoder and methods thereof Quantiser, encoder and methods thereof / 2486609
Quantising apparatus that quantises a value related to transformation coefficients when performing a principal component analysis transformation of a first vector signal and a second vector signal, the apparatus comprising: a power and correlation calculating section that calculates power of the first vector signal, power of the second vector signal and a correlation value between the first vector signal and the second vector signal; an intermediate value calculating section that calculates, as an intermediate value, a result of performing a difference computation using the power of the first vector signal and the power of the second vector signal; a codebook that holds a plurality of pairs of a first coefficient and a second coefficient, which are related to the transformation coefficients and numbered; and a quantising section that calculates, as a reference value, an addition result of a first multiplication result acquired by multiplying the first coefficient by the correlation value and a second multiplication value acquired by multiplying the second coefficient by the intermediate value, and, based on magnitude of the reference value, selects the number as a code.
Method for compaction and decompaction of speech messages Method for compaction and decompaction of speech messages / 2244963
Method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.
Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band Method and system for abolishing quantizer saturation during communication with data transfer in speech signal band / 2249860
Method and system for decreasing prediction error an averaging device for calculation of transfer coefficient is used, pulse detector, signals classifier, decision-taking means and transfer coefficient compensation device, wherein determining of compensated transfer coefficient of quantizer count is performed in process of coding/decoding of transferred data in speech signal band by use of vector linear non-adaptive predicting-type algorithm.
Method and device for reproducing speech signals and method for transferring said signals Method and device for reproducing speech signals and method for transferring said signals / 2255380
During encoding speech signals are separated on frames and separated signals are encoded on frame basis for output of encoding parameters like parameters of linear spectral couple, tone height, vocalized/non-vocalized signals or spectral amplitude. During calculation of altered parameters of encoding, encoding parameters are interpolated for calculation of altered encoding parameters, connected to temporal periods based on frames. During decoding harmonic waves and noise are synthesized on basis of altered encoding parameters and synthesized speech signals are selected.
Method for simulating auditory patient perception of acoustic signal after cochlear implantation Method for simulating auditory patient perception of acoustic signal after cochlear implantation / 2277375
Method involves applying analog-to-digital input signal transformation expressed as word, dividing transformed signal spectrum into odd and even frequency bands, summing odd bands, carrying out digital-to-analog transformation of resulting summed signal and training its perception by preliminarily getting familiar with the word shown for listening and following testing. Spectrum division is based on tonotopic frequency distribution law over cochlea axis. Frequency bands having odd numbers are arranged in equal distances along basilar membrane length in agreement with normal tonotopic frequency distribution law over cochlea axis. At least three odd spectrum bands are summed up. Training is carried out by multiple repetition of the word shown for listening until unambiguous correlation to the known word meaning given in preliminary acquaintance takes place. The same words are to be shown in testing and training.
Method and device for encoding an audio signal with usage of harmonics extraction Method and device for encoding an audio signal with usage of harmonics extraction / 2289858
In accordance to audio signal encoding method, harmonic components are extracted with usage of information resulting from fast Fourier transformation, which is received with usage of psycho-acoustic model 2 to received audio data of impulse-code modulation. Then, extracted harmonic components are removed from received audio data of impulse-code modulation. After that audio data, from which extracted harmonic components have been removed, are subjected to modified discontinuous cosine transformation and quantization.
Method and device for transmission of speech activity in distribution system of voice recognition Method and device for transmission of speech activity in distribution system of voice recognition / 2291499
Distributed system of voice recognition has voice recognition (VR) local mechanism in user unit and VR server mechanism in server. VR local mechanism has module for selection of features (FS), which selects features from voice signals. Voice activity detector (VAD) module detects voice activity invoice signal. Indication of voice activity is transmitted before features from user unit to server.
Alternating frame length encoding optimized for precision Alternating frame length encoding optimized for precision / 2305870
In accordance to the invention, polyphonic signals are used for creation of main signal, typically, a signal and a collateral signal. A row of encoding schemes of collateral signal (xside) is implemented, each encoding scheme is characterized by a set of sub-frames of varying length, while total length of sub-frames corresponds to encoding frame length of encoding scheme. Encoding scheme for collateral signal (xside) is selected on basis of current content of polyphonic signals, and collateral remainder signal is created as a difference between collateral signal and main signal, scaled with usage of balancing coefficient, which is selected for minimization of collateral remainder signal. Optimized collateral remainder signal and balancing coefficient are encoded and implemented as encoding parameters, representing the collateral signal.
Method and device for efficient masking of deleted shots in speech coders on basis of linear prediction Method and device for efficient masking of deleted shots in speech coders on basis of linear prediction / 2325707
Invention is related to method and device for improvement of masking shots of coded sound signal, which were deleted in the process of transfer from coder to decoder and for acceleration of restoration in decoder, after non-deleted shots of coded sound signal were accepted. When parameters of masking/restoration are determined in coder, they are transferred to decoder, where masking of deleted shots takes place and deleted shots are restored in accordance with parameters of masking/restoration. Masking/restoration parameters may be selected from the group that consists of the following: parameter of signal classification, parameter of energy information and parameter of phase information. Determination of masking/restoration parameters contains classification of sequential shots of coded sound signal as unvoiced shot, unvoiced conversion, voiced conversion, voiced shot or access shot, moreover, this classification is defined on the basis of the least part of the following parameters: parameter of normalized correlation, parameter of spectral gradient, parameter of relationship signal-noise, parameter of parameter main tone stability relative to parameter of shot relative energy and parameter of transition through zero.
Device and method for processing signal containing sequence of discrete values Device and method for processing signal containing sequence of discrete values / 2325708
Said utility invention relates to signal processing in the form of successive values, e.g., audio signal samples or video signal samples, which, in particular, are especially suitable for lossless coding applications. During processing of a signal containing a sequence of discrete values, having the first frequency band with high energy signal and the second frequency band with low energy signal, the sequence of discrete values is manipulated initially (202) to obtain a sequence of manipulated values so that at least one of the manipulated values would be different from an integer. After that, the sequence of manipulated values is rounded (204) to obtain a sequence of rounded manipulated values. Rounding is performed in order to create a generated rounding error spectrum so that the rounding error with the spectrum created would have higher energy in the first frequency band as compared to the second frequency band.
Improved error concealment in frequency range Improved error concealment in frequency range / 2328775
Essence of the invention lies in the concealment of coding coefficients with errors, using correlation of the coding coefficient on time as well as on frequency. The concealment technique can be applied to any type of information, such as audio data, video data and image data, which are compressed into coding coefficients and transmitted under unfavourable conditions in the channel. Error concealment is achieved using redundancy of the initial data signal on time as well as on frequency. This provides the possibility of using redundancy between frames (inter-frame), as well as within the limits of frames (intra-frame). Use of coding coefficients from the same frame, consisting of the coding coefficient with the error, is sometimes called correlation of intra-frame coefficients, and it is a more special case than the general frequency correlation.

FIELD: physics.

SUBSTANCE: apparatus (100) for generating an output spatial multichannel audio signal is based on an input audio signal and an input parameter. The apparatus (100) includes a decomposer (110) for breaking down the input audio signal based on the input parameter to obtain a first signal component and a second signal component, different from each other. The apparatus (100) also consists of a rendering unit (110) for rendering the first signal component to obtain a first rendered signal with a first semantic property and for rendering the second signal component to obtain a second rendered signal with a second semantic property different from the first semantic property. The apparatus (100) includes a processor (130) for processing the first and second rendered signals to obtain an output spatial multichannel audio signal.

EFFECT: providing high quality of perception when processing signals which create a background.

12 cl, 8 dwg

 

The present invention relates to the field of audio processing, especially for processing spatial audio properties.

Audio processing and/or coding improved in many ways. Brokers have created spatial audio applications. In many applications, the audio processing is used for or rendering signals. Such application may, for example, to change mono to stereo, mono/stereo to multichannel sound, producing the effect of artificial reverb, stereo widening Stereo widening) or custom interactive effects of mixing/rendering.

For some classes of signals, e.g. noise-like signals, such as signals similar to the applause, the usual methods and systems are defective or unsatisfactory quality of perception, or if you are using the object-oriented approach, high computational complexity due to the large number of acoustic events that you want to simulate, or handle. Another example of audio material, which is problematic, this is usually the material environment, such as noise generated by a flock of birds, seaside, a galloping horse, a division of soldiers on the March etc.

When conventional approaches used, for example, parametric stereo or encoding MPEG-environment (MPEG = Expert group on the moving image). Figure 6 illustrates the usual application of decorrelation to convert mono signal in stereo. Figure 6 depicts the input mono signal on 610, which provides input signal at the output. The static mixing matrix 620 served input along with the signal with decorrelator structure. Depending on the parameters of the control of mixing matrix 630, formed stereo output signal. signal 610 generates signal D arriving at the level of mixing matrix 620 together with the clean mono signal Meters Inside the mixing matrix 620 formed stereo channels (L = left stereo channel) and R (R = right stereo channel) in accordance with the mixing matrix N. The coefficients of the matrix N can be fixed and depend on signal, or is under the control of the user.

In addition, the matrix can be controlled by a third-party information transmitted signal, containing parametric description of how to mix the signals to create the desired multi-channel output signal. This information is usually generated encoder signal to the conversion process.

This is usually done in the spatial parametric audio encoding, as, for example, in the parametric stereo, see J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates" in AES 116 th Convention, Berlin, Preprint 6072, May 2004 and in the MPEG Surround, cf. J.Herre, K.Kjörling, J.Breebaart, et. al., MPEG Surround the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding," in Proceedings of the 122 nd AES Convention, Vienna, Austria, May 2007. The typical structure of the parametric stereo decoder is shown in figure 7. In this example, the process of decorrelation is a converted signal formed by analysing Bank filters 710, which converts the input mono signal in a different way, for example, representation in the form of a number of frequency bands in the frequency domain.

In the frequency domain 720 generates the corresponding signal that is mixing matrix 730. Mixing matrix 730 controlled by parameters, which are provided by the unit modification of parameters 740, which in turn receives them with spatial input parameters and unites with the parameters of the level of control 750. In the example shown in Fig.7 spatial parameters can be changed by user or additional means, such as postprocessing for stereo rendering/presentation. In this case, the parameters of mixing can be combined with parameters stereo filters to form the input parameters for the mixing matrix 730. The measurement parameters can be block changes the parameters of 740. Output mixing matrix 730 connected with synthesizing Bank filters 760, which generates output stereo signal.

As described above, the output L/R mixing matrix N can be calculated from the input mono signal M and signal D, for example, in accordance with the expression

.

sound output mixing matrix can be managed on the basis of the passed parameters, such as ICC (ICC= channel to channel correlation) and/or mixed or user-defined parameters.

Another traditional approach is based on the method of temporary shifts. A special method of decorrelation of such signals as signals similar to the applause you can find, for example, in Gerard Hotho, Steven van de Par, Jeroen Breebaart, "Multichannel Coding of Applause Signals," in EURASIP Journal on Advances in Signal Processing, Vol.1, Art.10, 2008. Here, monaural audio signal is segmented using overlapping temporary segments that temporarily pseudo randomly within the super-block to form uncorrelated output channels. Permutation are mutually independent for n output channels.

Another approach is to perform a rolling switching original and detained copies channels to get signal, see German patent 102007018032.4-55. In some well-known object-oriented systems, for example, see the Wagner, Andreas; Walther, Andreas; Melchoir, Frank; Strauβ, Michael; "Generation of Highly Immersive Atmospheres for Wave Field Synthesis Reproduction" at 116 th International EAS Convention, Berlin, 2004, which describes how to create effects, creating the effect of presence, for many objects, such as single cotton, with the use of synthesis field of waves.

Another approach is the so-called «guided audio coding» (DirAC), which is a method of rendering the sound and applicable for the various systems of audio playback, see Pulkki, Ville, "Spatial Sound Reproduction with Directional Audio Coding," in J. Audio Eng. Soc., Vol.55, No. 6, 2007. In this part of the analysis in one place are measured diffusion and direction of arrival of sound, depending on the time and frequency. In terms of synthesis signals from microphones are divided first on diffuse and not diffuse part and then played back through various methods.

Traditional approaches have a number of disadvantages. For example, a managed or unmanaged mixing audio signals, such as applause may require a strong . Hence, on one hand, strong necessary to restore the atmosphere of presence, for example, in the concert hall. On the other hand, appropriate filters, such as phase filters, reduce the quality of playback of transient events, such as single cotton, by creating effects of temporary lubrication, such as pre - and post-echo and the sound filter. Additionally, the spatial arrangement of the events of single cotton should be done on time grid with a good resolution, while environment must be in time.

Modern systems according to J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates" in AES 116 th Convention, Berlin, Preprint 6072, May 2004 and J. Herre, K. Kjörling, J. Breebaart, et. al., MPEG Surround the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding," in Proceedings of the 122 nd AES Convention, Vienna, Austria, May 2007 represent a compromise between the temporal resolution and the atmosphere of stability, between the deterioration of the quality of transient processes and atmosphere decorrelation.

For example, if the system uses a temporary permutations, will be felt deterioration in the perception of sound, certain recurring effects of output audio signal. This is due to the fact that the same segment of the input signal, you may not be modified in each output channel, though in a different time. Moreover, to avoid increasing the density of applause, some original channels are not used when mixing, and, thus, may be missing some important events in the audience.

In the well-known object-oriented systems, sound events are created by a large group of distributed point sources, which leads to the realization of complex computer algorithms.

The object of the present invention is to improve the concept of spatial audio processing. This is achieved using the device according to claim 1 and method according to item 16 of the claims.

In the present invention is shown that the sound can be decomposed into a number of components that provide spatial rendering, for example, in terms of decorrelation of or in terms of the spatial distribution of amplitudes. In other words, the present invention is based on the rationale that, for example, in a scenario with multiple sound sources, sources foreground and background can be divided to provide or differently. As a rule, one can distinguish different spatial depth and/or length of audio objects.

One of the key points of the present invention is the decomposition of signals, such as sound greeting the audience, flocks of birds, coast, a galloping horse, a division of soldiers on the March etc. signals foreground and the background, where signals foreground contain separate acoustic events, created by, for example, are located close to the sources and sources in the background to create the surrounding background distributed away events. Until the final mixing the two parts of the signal, processed separately, for example, to synthesize correlation, form the spatial distribution of audio signal etc.

The proposed solutions are not limited discernment only parts of the signal foreground and the background, they can distinguish several different audio parts which can be presented or differently.

In the General case, the audio signals can be divided into n different semantic components that are handled separately. The process of decomposition/separation of different semantic components can be implemented in time and/or in the frequency domain.

The proposed solution can provide the best quality of sound perception under moderate computational cost. The proposed solution provides a new method of decorrelation/rendering, which provides high quality of perception at reasonable prices, especially when processing the signals similar to the applause as the critical audio material or other similar creating the background, such as, for example, noise created by a flock of birds, the sea coast, a galloping horse, a division of soldiers on the March etc.

Embodiment of the present invention will be described in detail with using the accompanying charts, where

Fig.1 shows the embodiment of the device for determination of the spatial multi-channel audio signal;

Fig.1b shows a block diagram of another decision;

Figure 2 shows the solution, illustrating many signals decomposition;

Figure 3 illustrates a solution with semantic decomposition of signals foreground and background;

Figure 4 illustrates an example of a method to get the component signal background;

Figure 5 illustrates the synthesis of sound sources having a large extent;

Figure 6 illustrates one application decorrelator structure in the time domain in the Converter mono signal in stereo; and

Fig.7 shows another use decorrelator structure in the frequency Converter in mono signal in stereo.

Figure 1 shows the device 100 to specify the output of a spatial multi-channel audio signal based on the input audio signal. In some versions of the device can be configured to generate an output spatial multi-channel audio on the base of the input parameter. Input parameters can be created locally or be provided together with the input audio signal, such as external information.

In the solution, as shown in figure 1 unit 100 includes 110 for the decomposition of the input audio and receipt of the first components of the signal with the first semantic property and second components of the signal with the second semantic property, different from the first semantic properties.

The device is 100 next block rendering 120 for rendering of the first components of the signal, the first rendering characteristics of the display signal with the first semantic property, and to render the second components of the signal, using the second characteristic rendering to obtain a second display signal with the second semantic property.

Semantic property can match the spatial property, such as near or far, concentrated or distributed, and/or a dynamic property, such as whether the signal tone, permanent or transient, and/or property of domination, such as whether the signal of the foreground or background, and this measurement, respectively.

Also in its decision, the device is 100 includes a processor 130 for processing first provided the signal and the second provided a signal, and get the output spatial multi-channel audio signal.

In other words 110 executed with the possibility of decomposition audio signal, in some solutions, work is based on an input parameter. The decomposition of the audio signal is based on semantic, for example, the spatial properties of the various parts of the audio signal. Besides rendering, carried out in the block rendering 120, in accordance with the first and second rendering characteristics can also be made with the possibility of taking into account the spatial properties, which allow, for example, in a scenario where the first component of the output signal corresponds to the background of the audio signal, and the second component of the signal complies with the main audio signal, use a different rendering or . Hereinafter the term «foreground» is understood as a reference to the object audio, dominating the audio environment so that the potential listener may notice object audio foreground. Audio foreground object or source may be different or is differentiated from a background sound (sound background object or source. Background sound object or source may not be visible to potential student in audio environment, as it is less dominant than the audio object or source foreground. The embodiment of the invention is limited to audio objects or sources foreground, such as a point-like source of the sound, where the audio objects, or sources of background can meet more spatially extensive audio objects or sources.

In decision block rendering 120 can be further adapted to the rendering of the first components of the signal, so that the first characteristic of rendering does not have the features of delay. In other words may not be decorrelation of the first components of the signal. In another decision, the first characteristic of rendering may have a delay, which is characterized by the value of the first delay, and the second characteristic of rendering can have a second delay, the second latency, greater than the first latency. In other words in this decision as the first component of the signal, and the second component of the signal can be , however, the level of decorrelation can be scaled in accordance with the values of delays relevant component signals. Therefore may be stronger for the second component of the signal, than for the first components of the signal.

In addressing the first component of the signal and the second component of the signal may overlap and/or can be synchronous in time. In other words signal processing can be performed block method where one block of samples of the input audio signal can be separated 110 on the number of blocks component signal. In solving a number of component signal may at least partially overlap in the time domain, that is, the components can represent overlapping samples in the time domain. In other words signal components can match the parts of the audio input that overlap, that is, which are at least partially simultaneous audio signals. In the decision of the first and second components of the signal can represent filtered or converted version of the original input signal. For example, they may represent a part of the signal, extracted from the compound spatial signal, for example, relevant to a close source of the sound or more distant source of the sound. In another decision, they may correspond to the transient and stationary components of the signal etc.

In decision block rendering 120 can be divided into the first block of rendering and the second unit of rendering, where the first block of rendering can be executed with the possibility of rendering of the first components of the signal and the second unit of rendering can be executed with the possibility of rendering the second components of the signal. In decision block rendering 120 can be implemented in software, for example, as stored in the memory to run on the processor or a digital signal processor, which, in turn, is made with the possibility to render the component of the signal in sequence.

Block rendering 120 can be executed with the possibility of decorrelation of the first components of the signal to get the first signal and/or decorrelation of the second signal components to obtain a second signal. In other words block rendering 120 can be executed with the possibility of decorrelation of both the component signal, however, with the use of various characteristics of decorrelation or rendering. In decision block rendering 120 can be executed with the possibility of using the distribution of the amplitudes of one of the first or the second component of the signal, instead of, or in addition to decorrelation.

Block rendering 120 can be executed with the possibility of rendering of the first and second signals, each of which has many components, how many channels in a spatial multi-channel audio signal processor and 130 can be executed with the possibility of combining components from the first and second representations of signals to obtain the output of a spatial multi-channel audio. In other solutions unit rendering 120 can be executed with the possibility of rendering of the first and second signals, each of which has fewer components than the output spatial multi-channel audio signal and where the CPU 130 can be executed with the possibility of mixing the first and second representations of signals to obtain the output of a spatial multi-channel audio.

Fig.1b illustrates another embodiment of the device 100, which includes the same components that have been entered using the fig.1. However, fig.1b illustrates a solution that has more details. On fig.1b depicted 110, to receive any audio output signal and, if necessary, its input parameter. As can be seen from fig.1b, is made with the possibility of formation of the first and second components of the signal to block rendering 120, which is marked by a dotted line. In the solution, illustrated in fig.1b assumes that the first component of the signal corresponds to the point audio source as the first semantic property and that the unit rendering 120 is configured to perform spatial distribution of the amplitude as a first characteristic of rendering of the first components of the signal. In the decision of the first and second components of the signal are interchangeable, i.e. in other decisions, the execution of the spatial distribution of the amplitude can be applied to the second component of the signal.

In the decision on fig.1b block rendering 120 shows, two scalable amplifier 121 and 122, located on the way to the end of the first components of the signal amplifiers is made with the possibility of increasing, two copies of the first components of the signal differently. Used in solving various amplification coefficients are determined from the input parameter, in other incarnations, they can be determined from the input audio signal, they can be pre-installed, or formed locally, is also possible user input. The outputs of the two scalable amplifiers 121 and 122 served on the processor 130, information on which will be presented below.

As can be seen from fig.1b, 110 forms the second component signal to block rendering 120, which shall be exercised by another rendering on the way of processing the second components of the signal. In other decisions, the first component of the signal can be processed in accordance with the by processing the second components of the signal, or instead of the second components of the signal. The first and second components of the signal can be swapped.

In the decision on fig.1b, on the way of processing the second component of the signal has 123, next to the block cyclic shift or parametric stereo unit or module mixing 124, as the second characteristic rendering. 123 can be executed with the possibility of decorrelation of the second component of the signal X[k] and for the formation of version Q[k] the second signal components for parametric stereo or module mixing 124. On fig.1b, mono signal X[k] goes on the block decorrelator structure "D" 123 and blending module 124. Block decorrelator structure 123 can generate version of the input signal Q[k}, which has a similar frequency response and a similar average energy. Blending module 124, may on the basis of spatial parameters to calculate the coefficients of the mixing matrix and synthesize the output channels, Y 1 [k] and Y 2 [k]. Blending module is described by the expression

;

where the parameters c l , c, r , a and b - constants or variables, depending on the time or frequency, adaptive calculated from the input signal, X[k], or passed as external information together with the input signal, X[k], for example, in the format parameters ILD (ILD = Inter channel Level Difference) and parameters ICC (ICC Inter Channel Correlation). Signal X[k] - received mono signal Q[k] - signal, which is version of the signal X 1 [k]. Y 1 [k] and Y 2 [k] - the output signals.

123 can be implemented as an IIR filter (IIR = Infinite Impulse Response), arbitrary FIR filter (FIR = Finite Impulse response) or a special FIR filter that uses one connection just for the delay.

The parameters c l , c, r , a and beta can be defined in different ways. In some decisions, they are simply determined input parameters which may be provided together with the input audio signal, for example, with low levels of data as external information. In other decisions, they can be formed locally or deduced from the properties of the input audio signal.

The resolution shown on fig.1b, block rendering 120 executed with the possibility of rendering a second signal in the form of two output signals Y 1 [k] and Y 2 [k], the generated module mixing 124 and supplied to the processor 130.

In accordance with the route of the first components of the signal two versions of the spatial distribution of amplitudes of the first components of the signal outputs two scalable amplifiers 121 and 122 also served on the processor 130. In other solutions scalable amplifiers 121 and 122 may be present in the processor 130, where only the first component of the signal and the parameter of spatial distribution of amplitudes (pan) can be formed unit rendering 120.

As you can see on fig.1b, processor 130 can be executed with the possibility of processing or combining the first display of the signal and the second display of the signal in this solution, simply by combining the outputs to provide a stereo signal, with the left channel and right channel R, appropriate to the target spatial multi-channel audio signal fig.1.

In the decision on fig.1b, for both routes signals defined left and right channels of a stereo signal. On the route of the first components of the signal amplitude distribution is carried out by two scalable amplifiers 121 and 122, thus, formed two components common mode of sound signals, which are scaled differently. This creates the impression of a point audio source as the semantic properties or characteristics rendering.

The parameters c l , c, r , a and beta can also be selected using the method or in the range so that the L and R channels on the second route signal processing will , when modeling a spatially distributed audio source as the semantic properties, i.e. simulation of the sound source in the background, or spatially extended.

Figure 2 illustrates another solution that is more General. Figure 2 is a block of semantic decomposition 210, which corresponds to 110. The output of the unit semantic decomposition 210 is the input stage of rendering 220, which corresponds to a unit rendering 120. On the stage of rendering 220 consists of a series of individual modules rendering 221-22n, i.e. block semantic decomposition 210 is made with the possibility of decomposition mono/stereo input on the n-component signal, with the n semantic properties. Decomposition can be performed on the basis of the control parameters of decomposition, which could be provided along with mono/stereo input, be pre-installed, to be created locally or entered user, etc.

In other words 110 can be executed with the possibility of semantic decomposition audio signal based on additional input parameter, and/or to determine the input parameter of the audio signal. The output of decorrelation or stage of rendering 220 then served on the block mixing 230, which creates multichannel signal on the basis of decorrelation or rendering signals and, if necessary, on the basis of the control parameters of the mixing.

As a rule, the device can divide the audio material on the n different semantic component and each component separately using , D 1-D n , depicted in figure 2. In other words, in decision rendering characteristics correspond to the semantic properties of the component signals. Each of or rendering blocks can be executed with the possibility of accounting semantically properties of the corresponding component of the signal. Subsequently machined components can be mixed together to obtain the output multichannel signal. Different components can, for example, correspond to the simulated objects in the foreground and background.

In other words block rendering 110 can be executed with the possibility of combining the first components of the signal and the first signal for stereo or multi-channel mixed signal, as a rendering of the first signal and/or for the unification of the second components of the signal and the second signal for the stereo mixed signal as rendering a second signal.

Additionally, the rendering of 120 can be executed with the possibility of rendering the first signal components in accordance with the audio feature of the background and/or to render a second signal components in accordance with the main characteristic of the audio or Vice versa.

Because, for example, signals similar to the applause can be considered as signals, consisting of separate claps and noise as the atmosphere, with very dense distant applause, a suitable decomposition of such a signal can be obtained by dividing the isolated Khlopkov foreground, as one of its components, and background noise as another component. In other words, in one solution, n=2. In such a solution, for example, the rendering of 120 can be executed with the possibility of rendering of the first components of the signal, by providing spatial distribution of amplitudes (peak pan) of the first components of the signal. In other words, the correlation or the rendering Khlopkov foreground may, in a decision reached in the cell D 1 , the amplitude of panning the calculated location of each separate event.

In decision block rendering 120 can be executed with the possibility of rendering of the first and/or second components of the signal, for example, using the phase filtration of the first or the second component of the signal to get a first or second signal.

In other words in the decision, the background can be or subjected to render using m independent phase filters D 2 1...m . In the decision phase filters can only be processed quasistationary background, time-delay effects that occur when using traditional methods can thus be avoided. When using the amplitude pan to the events that occur the foreground object, the initial density of applause foreground can be roughly restored in contrast to existing systems represented, for example, in the works J.Breebaart, S.van de Par, A.Kohlrausch, E.Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates" in AES 116 th Convention, Berlin, Preprint 6072, May 2004 and J.Herre, K.Kjörling, J.Breebaart, et. al., MPEG Surround the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding," in Proceedings of the 122 nd AES Convention, Vienna, Austria, May 2007.

In other words in the decision 110 can be executed with the possibility of decomposition of the input audio signal semantically on the basis of the input parameter, where the input parameter can be passed along with the audio signal, such as external information. In such a solution 110 can be executed with the possibility of specifying the input parameter of the audio signal. Other decisions 110 can be executed with the possibility of specifying the input parameter as a parameter control, independent of the input audio signal, which can be created locally, preinstalled, or can also be entered by the user.

Block design rendering 120 can be performed with the ability to obtain spatial distribution of the first display (under rendering) signal or second display signal by applying broadband amplitude pan. In other words, in accordance with the description fig.1b given above, instead of creating a point source, source location can vary in time in order to create an audio source with a certain spatial distribution. In the decisions of the block rendering 120 can be executed with the possibility of using locally generated for the amplitude of low-frequency noise pan, i.e. the gains of the amplitude of the pan, for example, scalable amplifiers 121 and 122 .1b match the value of locally generated noise, i.e. the change over time in a specific frequency band.

Decisions can be made with the possibility of operation in managed or unmanaged mode. For example, when the managed mode, for example, see box surrounded by a dotted line in figure 2, can be achieved through the application of standard technology filters, managed on a rough time only grid, such as the background or the atmosphere and correlation through redistribution of every single event in, for example, the field of the foreground using the variable in time spatial positioning with a broadband amplitude pan on a much more accurate time grid. In other words in decision block rendering 120 can be executed with the ability to work various component signals at different temporal grids, i.e. based on different time scales, which can be expressed in the form of different sample rates or various delays to the relevant . In one decision, the separation of foreground and background for the field in the foreground can be used amplitude pan, where the amplitude is changed to a much more accurate time grid than in the operation of decorrelation related to the processing background.

In addition, we note that for decorrelation, for example, signals similar to applause, that is quasi-stationary random signals, spatially accurate every single cotton in the foreground may not have the same significance as the recovery of the overall distribution of many claps. The decision may have an advantage due to this fact and can run in unattended mode. In this mode, the above mentioned factor crest pan can be controlled by the low-frequency noise. Figure 3 illustrates the system of mono-stereo establishing this scenario. Figure 3 shows the semantic block decomposition 310 appropriate 110 for the decomposition of the input mono signal component of the foreground and the component the background.

As seen in figure 3, the component the background signal is processed phase filter D 1 320. signal, then comes along with the raw component of the background (background) in block mixing 330, the appropriate processor 130. Component signal foreground comes on the stage of the amplitude pan D 2 340, which corresponds to a unit rendering 120. Locally generated audible noise 350 also comes on the stage of the amplitude pan 340, which forms the input foreground block mixing 330. Signal output stage of the amplitude pan D 2 340 may be determined by the scaling factor k to select the amplitude of the two sets of stereo audio channels. The choice of the scaling factor k can be based on low frequency noise.

As can be seen from figure 3, there is only one arrow between the amplitude pan 340 and mixing unit 330. This arrow can also represent the amplitude signals, that is, in the case of stereo unit mixing - left and right channels. As can be seen from figure 3, block mixing 330 corresponding processor 130 can be executed with the possibility of processing or combining component signals foreground and background, to get stereo output signal.

Other solutions may use a natural treatment for obtaining of the component the background and the foreground, or the input parameters for the decomposition. 110 can be executed with the possibility of determining the first components of the signal and/or second components of the signal on the basis of the method of analysis of short-term features. In other words 110 can be executed with the possibility of determining the first or second components of the signal, based method of dividing and another component of a signal based on the difference between a particular component of the signal and full audio signal. In other decisions, the first or the second signal components can be defined on the basis of the method of analysis of transient characteristics and the calculation of the other components of the signal can be based on the difference between the first and second components of the signal and complete audio signal.

110, and/or block rendering 120, and/or processor 130 may include DirAC mono stage, and/or the stage of DirAC synthesis, and/or DirAC stage of the merge. In decision 110 can be executed with the possibility of decomposition of the input audio signal, the unit rendering 120 can be executed with the possibility of rendering of the first and/or second-signal components, and/or processor 130 can be executed with the ability to handle first and/or second components of the block rendering in different frequency ranges.

The decision can use the following approximation for the signals like applause. While the component of the foreground can be obtained detection methods, or separation of short-term features, see Pulkki, Ville; "Spatial Sound Reproduction with Directional Audio Coding," in J. Audio Eng. Soc., Vol.55, No. 6, 2007, background component can be retrieved using differential signal. Figure 4 shows an example where the described suitable method to obtain the components of the background x'(n), for example, x(n) signals like a round of applause for the implementation of the semantic decomposition 310, see figure 3, that is, the embodiment of 120. Figure 4 shows the discretized in time input signal x(n), which is an input to block DFT 410 (DFT = discrete Fourier transformation). The output signal of the unit DFT 410 is supplied to the unit smoothing spectrum 420 and block spectral whitening 430 for the spectral whitening based on the results of the DFT 410 and output stage smoothing spectrum 430.

Output unit spectral whitening 430 then fed into the unit collect the spectral peaks, 440, which divides the range and forms the two output signals, i.e., noise, transient signals and tone. Noise and transient signals on the LPC filter 450 (LPC = Linear Prediction Coding), the output signal of the residual noise which is supplied to the unit mixing 460 together with the output tone block collection the spectral peaks 440. Output unit of mixing 460 then served to block the formation of the spectrum 470, which forms the spectrum on the basis of the smoothed spectrum, which is formed in block smoothing spectrum 420. Output block the formation of the spectrum 470 then supplied to the filter synthesis 480, i.e. on the block inverse discrete Fourier transform to obtain the signal x(n), which is a component of the background. The main component can then be obtained as the difference of the input signal and the output signal, i.e. as x(n)-x'(n).

The present invention can be used in applications of virtual reality, such as 3D games. In such applications, the synthesis of sound sources with large spatial length on the basis of known solutions can be compound and complex. Sound sources can be, for example, the sea the flock of birds, a galloping horse, a division of soldiers on the March, or greeting the audience. As a rule, such sound events spatially formed, as a large group of point sources, resulting in computationally complex implementations, see Wagner, Andreas; Walther, Andreas; Melchoir, Frank; Strauβ, Michael; "Generation of Highly Immersive Atmospheres for Wave Field Synthesis Reproduction" at 116 th International EAS Convention, Berlin, 2004.

The proposed solution may provide a method that plausible synthesize extended sources of sound, but, at the same time, has a lower structural and computational complexity. The decision may be based on the DirAC (DirAC = Directional Audio Coding),CM. Pulkki, Ville; "Spatial Sound Reproduction with Directional Audio Coding” / J. Audio Eng. Soc., Vol.55, No. 6, 2007. In other words solution 110 and/or renderers 120 and/or processor 130 can be implemented with the possibility of signal processing DirAC. In other words 110 may include stage DirAC mono block rendering 120 may include stage DirAC synthesis and/or processor may include stage DirAC merge.

Figure 5 illustrates the synthesis of spatially extended sources of sound. Figure 5 shows the top-mono block 610, which creates DirAC mono stream, leading to a perception of the surrounding point audio sources, such as clapping coming applause of the audience. Lower mono block 620 is used to create DirAC mono flow, leading to a perception of a spatially distributed sound that suits, for example, to create a background sound of applause from the audience. Output of two DirAC mono blocks 610 and 620 then combined to stage DirAC merge 630. figure 5 shows that the solution makes use of only two blocks DirAC synthesis 610 and 620. One of them is used to create audio events that are in the foreground, such as sounds nearby or nearby birds or nearby or nearby persons audience and the other creates a background sound continuous sound flocks of birds, etc.

Foreground sound is converted to DirAC mono flow of DirAC mono block 610 such a way that the azimuth settings remain the same frequency, however, change accidentally or controlled by an external process in time. Option diffusion ψ has the value 0, then there is a point source. It is assumed that the audio input block 610 are in time sounds like the sounds of the screams of individual birds or clapping of applause, which creates the perception of the surrounding sounds sources such as birds or cheering people. Spatially distributed sound events in the foreground is controlled by tuning θ and space range_foreground , which means that individual sound events will be perceived in the directions θ±θ range_foreground , while, one event can be seen as a point. In other words, the point sources of sound are created in different positions in the range θ±θ range_foreground .

Block background 620 takes an input stream of audio signals, which contains all the other sound events, which are not represented in the audio stream the foreground, which include a variety of duplicate in time sound events, for example, hundreds of birds, or a large number of distant applause. The accompanying values of azimuth are set randomly, both in time and frequency, within tailored to the limit values of azimuth θ±θ range_background . Spatially extensive background sounds, thus, can be synthesized with low complexity of the calculation. Parameter diffusion ψ can also be managed. If it was added, DirAC decoder will apply sound in all directions, which can be used when the audio source is completely surrounds the listener. If this environment, diffusion in solution may be low or close to 0, or no.

The solution of the present invention can provide the advantage consists in the fact that the excellent quality of perception of the processed sounds can be achieved with modest computational cost. The solution allows a modular implementation of the spatial representation of the sound, as, for example, shown in figure 5.

Depending on specific requirements for carrying out the invention, the proposed methods can be implemented in hardware or software. Realization of the invention can be accomplished with the use of digital media and, in particular, flash-memory, disk, DVD or CD, which can be read electronically recorded control signals, which, with programmable computer system ensure implementation of the methods of the present invention. Thus, as a rule, the present invention is a computer program with program code stored on machine-readable records. The program code is proposed in the invention of methods, when the program runs on your computer. In other words, proposed in the invention of methods are, thus, a computer program having a code to perform at least one of the suggested methods when the program runs on your computer.

3. Device (100) according to claim 1, where the unit rendering (120) is made with the possibility of rendering of the first and second signals, each of which has many components, how many channels in a spatial multi-channel sound signal, and where the CPU (130) is made with the possibility of combining the first and second subjected rendering, signals, to get the output multi-channel audio signal.

4. Device (100) according to claim 1, where the unit rendering (120) is made with the possibility of rendering of the first and second signals, each of which has fewer components than the spatial multi-channel audio signal, and where the CPU (130) is made with the possibility of mixing of the components from the first and second subjected rendering, signals to obtain the output of a spatial multi-channel audio signal.

5. Device (100) according to claim 1 in which (110) is made with the possibility of determination of the audio signal input parameter as control parameters.

6. Device (100) according to claim 1, where the unit rendering (120) is made with the possibility of rendering of the first and second components of the signal based on different time scales.

7. Device (100) according to claim 1, where (110) is made with the possibility of determining the first components of the signal and/or second components of the signal, based on the method of analysis of short-term features.

8. Device (100) according to claim 1, where (110) is made with the possibility of determining the first or second components of the signal through analysis of the transient characteristics and the other components of the method, based on the difference between one component and audio signal.

9. Device (100) according to claim 1 in which (110) is made with the possibility of decomposition audio signal, the unit rendering (120) is made with the possibility of rendering of the first and/or second-signal components, and/or the processor (130) is made with the possibility of processing the first and/or second subjected rendering, signals in different frequency ranges.

10. The device of claim 1 in which the processor is made with the possibility of processing the first subjected to render signal, the second subjected to render signal, the second subjected to render the signal and the signal of the background area to get the output spatially distributed multi-channel audio signal.

11. Method for generating a spatial multi-channel audio signal, based on the input audio signal and the input parameter, comprising the steps of: a semantic decomposition of the input audio signal to obtain the first component signal with the first semantic property, the first component of the signal is a signal foreground regions, and the second components of the signal from the second semantic property, different from the first semantic properties, the second component of the signal is a signal area background the background; rendering signal foreground regions using the amplitude pan to get the first subjected to render signal with the first semantic property, by processing the signal foreground regions using level amplitude pan (221, 340), where locally generated audible noise (350) arrives on the level of the amplitude pan (340) for changes in the time of spatial localization audio source foreground regions; rendering signal area background the background by decorrelation second signal components to obtain a second subjected to render signal with the second semantic property; and the handling of the first subjected to render the signal and the second subjected to render signal to obtain spatial output multi-channel audio.

12. Machine-readable medium of information, having the code to be run method on item 11, when the code is running on a computer or processor.

 

© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English.