Method and apparatus for encoding and optimal reconstruction of three-dimensional acoustic field
FIELD: physics, acoustics.
SUBSTANCE: invention relates to means of encoding audio signals and related spatial information in a format which is independent of the playback scheme. A first set of audio signals is assigned to a first group. The first group is encoded as a set of mono audio tracks with associated metadata describing the direction of the signal source of each track relative to the recording position and the initial playback time thereof. A second set of audio signals is assigned to a second group. The second group is encoded as at least one set of ambisonic tracks of a given order and a mixture of orders. Two groups of tracks comprising the first and second sets of audio signals are generated.
EFFECT: providing a technique capable of presenting spatial audio content independent of the exhibition method.
26 cl, 11 dwg
The scope of the invention
The present invention relates to a technology to improve the encoding, distribution, and decoding three-dimensional acoustic field. In particular, the present invention relates to a technology-coding of audio signals with spatial information independent demonstration of the complex method; and optimal decoding for this demonstration system, or set of speakers or headphones.
For multichannel playback and listening, the listener is usually surrounded by a variety of speakers. Typically, one objective of the play is the creation of a sound field in which the listener can perceive the intended location of sound sources, for example, the location of the musician in the group. Different sets of speakers can create different spatial experience. For example, the standard stereocomplex can convincingly recreate the acoustic scene in the space between the two loudspeakers, but can not cope with such a task at angles outside the space between the two loudspeakers.
Complexes with a large number of loudspeakers surrounding the listener, can achieve better spatial experiences is of a larger set of angles. For example, one of the most widely known standards complexes of multiple speakers is Surround 5.1 (ITU-R775-1), which consists of 5 speakers, located on the azimuth -30, 0, 30, -110, 110 degrees around the listener, where 0 indicates the front direction. However, this system cannot cope with sound, located above the horizontal plane of the listener.
To increase the experience of immersion of the listeners present trend is to use speakers with a large number of speakers, including speakers located at different heights. One example is the system of 22.2 developed Hamasaki from NHK, Japan, which consists of 24 loudspeakers, placed at three different heights.
In the present paradigm the production of spatial audio in professional applications such complexes is to provide one audio track for each channel used during playback. For example, for stereocomplex requires two audio tracks; for complex 5.1 requires six audio tracks, etc. These tracks usually start to appear on the stage of postproduction, although they can be created directly on the stage of recording for broadcast. It is worth noting that in many cases, several gromkogovoriteli the th used to play the same audio channel. So is the case of most cinemas 5.1, where each surround channel lose three or more speakers. Thus, in these cases, although the number of speakers may exceed 6, the number of different audio channels still 6, and, in total, only played 6 different signals.
One consequence of this paradigm of "one track per channel" is that the work performed on the stages of recording and post production associated with demonstration complex, which will be demonstrated generated content (content). In the writing stage, for example, when broadcast, the type and location of the used microphones, and a method of mixing is determined as a function of the complex, which will play the event. Similarly, in the production of media, the postproduction engineers need to know the details of the complex, which will be shown the content, and take care of each of the channels. The failure to properly install the sample schema with multiple speakers, which was finalized content, will lead to lower quality playback. If the content will be shown in different complexes, the postproduction stage, you must create a bore who only versions. This leads to an increase in financial expenses and time.
Another consequence of this paradigm of "one track per channel" is the size of the required data. On the one hand, without additional coding paradigm requires as many tracks as use of channels. On the other hand, if you want to provide multiple versions, they provide either separately, which is, again, increase the size of the data, or perform some transformation to reduce the number of channels, which degrades the quality of the result.
And finally, the last drawback of the paradigm "one track per channel" is the fact that made so that the content does not stand the test of time. For example, 6 of the tracks are present in this film, produced for complex 5.1, do not include sound sources located above the listener, and does not fully involve complexes, in which the loudspeakers are located at different heights. Currently there are several technologies that can provide spatial audio that is not dependent on the demonstration system. Perhaps the simplest technique is the vector amplitude transfer (VBAP). It is based on applying the same mono signal to the speakers closest to the planned location of the sound source, adjust the volume to what each speaker. This system can work for two-dimensional or three-dimensional (elevation) complexes, usually, choosing two or three, respectively, the nearest loudspeaker. One of the advantages of this method is that it provides a large sweet-spot, which means that the speaker has a large area in which the sound is perceived as coming from a predetermined direction. However, this method is neither applicable to the playing fields reverberates sound, such as those present in the reverberation chambers or to play audio sources with a large diversity. In the best case, using these methods, you can play early reflections of the sound sources, but nevertheless, this method provides an expensive and inefficient solution.
Another technology that can provide spatial audio, independent demonstration system is amyotonia. This technology was developed in the 70s by Michael Gerzon, it provides a complete methodology chain encoding-decoding. When encoding is stored a set of spherical harmonics of the acoustic field at one point. Zero order (W) corresponds to the fact that will record omnidirectional microphone located at this point. The first order consisting of three signals (X, Y, Z), sootvetstvennomu, they will write at this point three microphone with directivity in the form of eight aligned along the axes of a Cartesian coordinate system. Signals of higher orders correspond to that will record the microphones in more complex layouts. There is also coding ambiophonic mixed order, when using only part of a set of signals each order; for example, using only the signals W, X, Y from ambiophonic of the first order, thus ignoring the signal Z. in spite of the fact that the generation of signals outside of the first order simple at the stage of post production, or by simulation of the acoustic field, when you record this sound field microphones is complicated; in fact, until recently, for use in professional fields, were only available microphones, capable of measuring signals of zero and first orders. An example of the microphones ambiophonic first order represent the Soundfield microphones, and more modern TetraMic. When decoding, after specify complex multiple speakers (the number and position of each speaker), the signal sent to each speaker, usually defined, requiring a maximum matching of the acoustic field created by the complex as a whole, with scheduled field (or POPs the data at the stage of post production or, which were recorded signals). In addition to independence from the demonstration system, an additional advantage of this technology is the high level of provided its manipulation (mainly, rotation and scaling of the sound stage), and its ability to accurately reproduce the reverberation field.
However, the technology of ambiophonic is limited to two main disadvantages: the inability to reproduce a similar sound sources, and the small size of the sweet spot. The concept of close or spaced sound sources used in this context as denoting the angular width of the perceived sound. The first problem comes from the fact that, even when you try to play a very narrow source of the sound, ambiophonics decoding will use more speakers than there are near the target position of the source. The second problem derives from the fact that, despite the location in the best area of perception, the waves emanating from each speaker, phase are summed to create the desired acoustic field, outside of the sweet spot, the waves create incorrect phase interference. This changes the color of the sound, and, more importantly, the sound seems to be coming from the loudspeaker located closer to the listener, and the-well-known psychoacoustic effect preferences. For a fixed size of the listening room, the only way to reduce both problems is the increase used of the order of ambiophonic, but this implies a rapid growth in the number of channels and loud speakers.
It is worth noting that there is another technology that is able to accurately reproduce an arbitrary sound field, the so-called wave field synthesis (WFS). However, this technology requires the location of the speakers at the distance from one another less than 15-20 cm, which requires additional approximations (and thus quality loss) and greatly increases the number of required loudspeakers; the existing use complexes between 100 and 500 speakers, which limits the scope of its application to the events of a very high level of preparation.
You want to provide a technology capable of providing spatial audio content that can be distributed independently from the demonstration of the complex, as two-dimensional and three-dimensional; that is, after you specify complex, can be decoded to use its full capabilities; which can play all types of acoustic fields (narrow sources, reverberation or diffusing fields) for all listeners in space, i.e. with a large area of the best is his perception; and which does not require the use of a large number of speakers. This will provide the opportunity to create content suitable for use in the future, in the sense that it can adapt easily to all existing and future complexes of several speakers, and will enable the cinemas or home users to choose several speakers that best meets their goals and objectives, while providing the assurance that there will be a large amount of content that will be able to fully use the capabilities of their chosen complex.
Method and device for encoding audio with spatial information independent demonstration of the complex manner, and the decoding and optimal play for any given sample of the complex, including complexes with loudspeakers placed at different heights, and headphones.
The invention is based on the method for coding a certain input material, in a format independent of the demonstration, by distributing it in two groups: the first group contains the audio, which requires a precisely aimed localization; the second group contains audio, which is enough for the localization provided by the technology s is YAponii low order.
All audio in the first group is coded as a set of separate monohydrogen with appropriate metadata. The number of individual monohydrogen not limited, however, in some embodiments, the implementation can impose some restrictions, as described below. The metadata should contain information about the exact time when you want to play each audio track, as well as spatial information describing at least the direction of the signal source at each point in time. All audio in the second group of code in the set of audio tracks representing the order ambiophonics signals. In the ideal case, there is one set ambiophonics channels, although in certain embodiments of the implementation, you can use more than one.
During playback, when it becomes known demonstration system, the first group of decode audio tracks to playback using the standard transfer algorithms that use a small number of loudspeakers close to the intended position of the audio source. The second set of audio channels decode for playback using ambiophonics decoders optimized for this demonstration system.
These method and device to solve the above paragraph is oblama, as described next.
First, it allows the phases of the audio recording, post-production and distribution of conventional materials take place regardless of the complexes, which will display the content. One consequence of this fact is that by this way the content is suitable for use in the future, in the sense that it can be easily adapted to any arbitrary complex multiple speakers as existing and created in the future. This quality also meets and technology ambiophonic.
Secondly, it becomes possible to reproduce a very narrow sources. They encode individual audio tracks, together with associated metadata directions, allowing you to use decoding algorithms that use fewer speakers around the planned location of the audio source, such as a two-dimensional or three-dimensional vector amplitude transfer. In contrast, amyotonia requires the use of very high order to achieve such results, with a corresponding increase in the number of associated tracks, data, and the complexity of the decoding.
Thirdly, this method and apparatus capable in most situations to secure a large area best perception, thus, HC is leciva region of the optimal reconstruction of the sound field. This is achieved by separating the first group of audio tracks all parts of the audio that will lead to a reduction in the area of better hearing. For example, in the embodiment illustrated in Fig.8 and described below, the direct sound of the dialogue encode as separate audio tracks with information about the direction from which it comes, while reverberantly part of the encoded set of paths of ambiophonic of the first order. Thus, much of the public perceives the direct sound of the source as coming from the correct location, mainly from several speakers in the determined direction; thus, the direct sound eliminate the effects definiowanej painting and precedence that establishes a sound image in its correct location.
Fourth, the amount of data, in most cases audio encoding for complexes of several speakers, is reduced in comparison with the paradigm of a single track on the channel, and compared with the encoding of ambiophonic higher order. This fact provides an advantage for the purposes of storage and distribution. For this there are two reasons. On the one hand, assigning a sound to a high degree of focus to the playlist narrow audio allows for the reconstruction of the steel part of the sound stage ambiophonic only the first order, consisting of spaced, diffentiating or with a low degree of sound direction. Thus, 4 tracks group ambiophonic first order enough. On the contrary, for the correct reconstruction of the narrow sources is required, for example, 16 channels for a third, or 25 for fourth order. On the other hand, the number of narrow sources requiring simultaneous playback, in many cases is small; so, for example, for the movie, where in the playlist narrow audio includes only the dialogues and some of the special effects. Moreover, all of the audio in the group playlist narrow audio is a collection of tracks with a duration corresponding to only the duration of this audio source. For example, the audio corresponding to the car in one scene within three seconds, lasts only three seconds. Thus, in the application example to the movie where you want to create a sound track of a movie for complex 22.2, in the paradigm of a single track on the channel, you will need 24 audio tracks and encode ambiophonic third order will require 16 audio tracks. On the contrary, in the proposed format, independent of the demonstration, you will need only 4 tracks full length, plus a set of separate audio tracks of different lengths, which reduce thus, th is would they covered only the intended duration of the narrow audio sources.
Brief description of drawings
In Fig.1 shows a variant implementation of the method for having this set initial audio track selection and coding them, and finally, decoding and optimal playback in a random sample complex.
In Fig.2 shows a diagram of the proposed format, independent of the demonstration, two groups of audio: playlist narrow audio with spatial information and tracks ambiophonic.
In Fig.3 shows a decoder that uses various algorithms to process any of the groups of audio.
In Fig.4 shows a variant implementation of the method, which you can encode two groups of audio.
In Fig.5 shows an implementation option, which is independent of the demonstration, the format can be found on the audio stream, instead of the full audio files stored on disks or other types of memory.
In Fig.6 shows an additional variant of the method, which is independent of the demonstration, the format is introduced into the decoder, which can reproduce the content in any demonstration complex.
In Fig.7 shows some of the technical details of the process of rotation, which corresponds to a simple operations with both groups of audio.
In Fig.8 shows a variant implementation of the method in a production environment audiovisual postpress is DSTV.
In Fig.9 shows an additional variant of implementation, as part of the audio production and postproduction in the virtual scene (for example, in the animated movie or a three-dimensional game).
In Fig.10 shows an additional variant of the method, as part of a digital server movies.
In Fig.11 shows an alternative implementation of the method for a film in which the content can be decoded prior to the distribution.
A detailed description of the preferred embodiments
In Fig.1 shows a variant implementation of the method for having the initial set of track selection and coding them, and finally, decoding and optimal playback in a random sample complex. Thus, for a given location of the speakers, spatial sound field will be reconstructed effectively adapted to the existing speakers, and increasing the area of optimal play to the maximum extent possible. The original sound can come from any source, for example: using any type of microphone with any pattern or any amplitude-frequency sensitivity; using ambiophonics microphones capable of delivering ambiophonics signals of any order or mixed order; or ISOE is isua synthesized audio, or special effects such as room reverberation.
The process of selecting and encoding consists of creating two groups of tracks from the original audio. The first group consists of those parts of the audio, which require a narrow localization, while the second group consists of the remaining audio, for which sufficient orientation of this order ambiophonic. The audio signals that are distributed in the first group, contain monohydrogen, together with spatial metadata about the source direction in time, and the initial playback.
Selection is a process carried out by the user, although some types of the original audio, you can perform the default action. In the General case (i.e. not ambiophonics audio tracks), the user defines, for each element of the original audio, the direction of the source and source type: narrow or ambiophonics source, in accordance with the previously described groups of coding. Angles can be defined, for example, azimuth and elevation angle of the source relative to the listener, and it can be specified as fixed values for the track, or how the data is changing over time. If for some track direction does not indicate that you can specify the default destination, for example, by assigning such DOROZHNOE fixed constant direction.
Additionally, the angles can accompany parameter explode. The terms are posted and narrow, in this context must be understood as the angular width of the perceived sound source. For example, it is possible to quantify the diversity, using the values in the interval [0, 1], where 0 means exactly directional sound (i.e. the sound coming from only one well-defined direction), and a value of 1 indicates the sound coming from all directions with the same energy.
For some types of original tracks, you can define the default action. For example, tracks identified as a stereo pair can be placed in ambiophonics group with azimuth -30 and 30 degrees to the left and right channels, respectively. Tracks identified as surround 5.1 (ITU-R775-1), can similarly be assigned to the azimuth -30, 0, 30, -110, 110 degrees. And finally, track, identified as ambiophonics first order (or B-format), you can assign a group of ambiophonic without a request for additional information about orientation.
The encoding process with Fig.1 receives the above-mentioned user-defined information and issues beyond the demonstration audio format with spatial information, as described in Fig.2. The output of the encoding process are Soboh is, for the first group, the set of monohydrogen with audio signals corresponding to different sound sources, with associated spatial metadata, which includes the direction of the source in accordance with this reference system, or the parameters of the audio explode. The output of the conversion process for the second group of audio represent one single set ambiophonics tracks selected order (e.g., 4 tracks, if the selected amyotonia first order), which corresponds to the mixing of all sources in ambiophonics group.
Then, the output of the encoding process uses the decoder, which uses the information about the selected demonstration complex to create a single audio track or audio stream for each channel complex.
In Fig.3 shows a decoder that uses various algorithms to process each group of audio. Group ambiophonics tracks decode using suitable to the particular complex ambiophonics decoders. Tracks in the playlist narrow audio decode, using algorithms that are suitable for this purpose; they use spatial information from the metadata of each track for decoding, usually using a very small number of speakers around the planned m is topologize each track. One example of such algorithm is the vector amplitude of the transfer. Time metadata is used to start playback of each audio in the right moment. Finally, the decoded channels are sent for playback on speakers or headphones.
In Fig.4 shows an additional variant of the method, which two groups of audio can be recoded. In General, the process of transcoding takes a playlist narrow audio, which contains N different audio tracks with associated metadata orientation, and set ambiophonics tracks of a given order P, and this type of mixture A (for example, it can contain all tracks of the zero and first order, but only two tracks corresponding to the signals of the second order). The output of the recoding process is a playlist narrow audio, which contains M different audio tracks with associated metadata orientation, and set ambiophonics track of the order Q, with this type of mixture B. In the process of recoding, M, Q, B may be different from N, P, A, respectively.
Recoding can be used, for example, to reduce the amount of data. This can be achieved, for example, choosing one is whether multiple tracks, contained in the playlist narrow audio, and reassign them to the group ambiophonic, converting, using associated with monodromy information orientation mono, amyotonia. In this case, it becomes possible to achieve M<N, through the use of ambiophonics localization for transcoded audio narrow focus. With the same purpose it is possible to reduce the number of tracks of ambiophonic, for example, leaving only those that are required to play in a planar demo complexes. In those cases, when the number of signals ambiophonic for a given P is described by the formula (P+1)*2, reduction to planar complexes reduces this number to 1+2*P.
Another application of the recoding process is to reduce the number of simultaneous audio tracks, required by the given playlist targeted audio. For example, in broadcast applications it is desirable to limit the number of audio tracks that are played simultaneously. Again, this can be achieved, perenaznachen a number of tracks from a playlist narrow audio group ambiophonic.
Playlist narrow audio may contain optional metadata describing the relevance of the contained audio that p is ecstasy a description of the importance of decoding each audio using algorithms for focused sources. This metadata can be used to automatically assign the least relevant audio group ambiophonic.
Another use of the recoding process is to simply allow the user to assign the audio in the playlist narrow audio group ambiophonic, or to change the order and type of mixing groups ambiophonic aesthetic purposes. It is also possible to assign audio from ambiophonic in the playlist narrow audio: one possibility is to select part of a track zero-order and assigning a spatial metadata manually; another possibility is the use of algorithms that compute the location of the source of the tracks ambiophonic, such as the DirAC algorithm.
In Fig.5 shows an additional variant of implementation of the present invention, in which the proposed format, independent of the demonstration may be based on the audio stream, instead of the full audio files stored on disks or other types of memory. In broadcast scenarios, the bandwidth allocated to audio, limited and fixed, and, consequently, the number of audio channels that can simultaneously transmit. The proposed method consists, first, in the division of existing streams between TLDs what I groups, thread the narrow directional and ambiophonics flows, and, secondly, the intermediate recoding file format independent of the demonstration, in a limited number of threads.
This encoding uses the techniques described in the previous paragraphs, to reduce, if necessary, the number of simultaneous tracks, and for part of the focused audio (reassigning tracks with low relevance to the group of ambiophonic), and for ambiophonics parts (by removing ambiophonics component).
The audio has additional features, such as the need concatenation of narrow tracks of audio in a continuous stream, and the need to recode the metadata focus narrowly focused audio in the available transfer methods. If the format of the audio is not possible to transfer such metadata orientation, select one audio track to transfer these metadata appropriately converted.
The following simple example should serve for a more detailed explanation. Consider the sound track of the film, in the proposed format which is not dependent on demonstration using ambiophonic first order (4 channel) and the playlist narrow audio, with the maximum number of channels simultaneous play is to be placed, equal 4. This audio track you want to transfer to digital TV, using only 6 of its channels. As shown in Fig.5, the encoding uses 3 channel ambiophonic (removing channel Z) and two narrow channel audio (thus reassigning a maximum of two simultaneous playback tracks in the group ambiophonic).
Optionally, the proposed format, independent of the demonstration, you may use the compression ratio of the audio data. It can be used for both types of the proposed format, independent of the demonstration: file and stream. When using psychoacoustic lossy formats, compression can affect the quality of the spatial reconstruction.
In Fig.6 shows an additional variant of the implementation of this method, in which a format that is not dependent on the demonstration served to the input of the decoder is able to reproduce the content in any demonstration complex. Specification demonstration of the complex can be done in several different ways. The decoder may have a standard pre-sets, such as surround 5.1 (ITU-R775-1), from which the user can choose coinciding with his demonstration complex. The choice may include an optional adjustment, for adjusting a better match with the location of the speakers is konkretnej custom configuration. There is an optional opportunity to use some sort of system of detection, is able to localize the location of each loudspeaker, for example, using sound, ultrasonic, or infrared technology. Specification demonstration of the complex can be reconfigured any number of times, giving the user the ability to adapt to any existing or future demonstration complex. The decoder may have multiple outputs, so that various processes of decoding could be performed simultaneously, for simultaneous playback of different complexes. Ideally, the decoding is performed before any possible adjustment of the playback system.
In that case, if the replay system earphone is used, the decoding performed by means of standard technologies sound. Using one or various database transfer functions, taking into account the peculiarities of perception (HRTF), it is possible to produce surround sound using algorithms adapted for both groups audio, proposed in this way: playlist narrow and audio tracks ambiophonic. Usually this is done using the above algorithms for decoding in a virtual complex is escolca speakers, and, then, rolling each channel with HRTF corresponding to the location of the virtual loudspeaker.
One of the additional embodiments of the method allows, for the demonstration phase, the final rotation of the entire soundstage as to demonstrate in a complex of several speakers and headphone Jack. This can be useful in different cases. In one of the applications, the user in the headphones may have a mechanism for tracking the position of the head, measuring the orientation of his head for a corresponding rotation of the whole sound stage.
In Fig.7 shows some of the technical details regarding the process of rotation, which corresponds to simple operations with both groups of audio. Rotation tracks of ambiophonic perform, using different rotation matrices for each order of ambiophonic. This procedure is well known. On the other hand, spatial metadata associated with each track of the playlist narrow audio can be modified by a simple calculation of the azimuth and elevation angle of the source from which the sound will be perceived by the user with the given orientation. And again, this is a simple normal calculation.
In Fig.8 shows a variant implementation of the method in a production environment on audiovisual is proizvodstva. The user has all the content in its post-production software, which can be a workstation processing digital audio. The user specifies the direction of each source in need of localization using either standard or custom modules. To generate the proposed intermediate format that is not dependent on the demonstration, it selects the audio, which should be encoded in the playlist malodorous, and audio that is encoded in the group ambiophonic. This assignment can be done in different ways. In one of the embodiments, a user, using a module that assigns a directionality factor for each of the audio sources; this assignment is then used to automatically assign all sources with directionality factor greater than this value, the playlist narrow audio, and the remaining audio in group ambiophonic. In another embodiment, some of the assignments performed by software; for example, reverberantly part of the whole audio, as well as all the audio that was recorded using ambiophonics microphones, you can assign a group of ambiophonic, unless otherwise specified by the user. As an alternative, all assignments can assests the th manually.
When the assignment is completed, the software uses a special module for generating a playlist narrow and audio tracks ambiophonic. In this procedure encode metadata about spatial properties of the playlist targeted audio. Similarly, the direction, and, optionally, separation of audio sources that are assigned to the group ambiophonic, used for the transformation of the mono or stereo in amyotonia, by applying standard algorithms. Thus, the stage of post production audio is an intermediate format that is not dependent on demonstrations, playlist targeted audio and a set of channels ambiophonic this order and mixing.
In this embodiment, it may be useful to generate more than one set of channels ambiophonic to create other versions. For example, if you produce a version of the same movie in different languages, it can be useful to encode a second set of tracks ambiophonic all audio related to the dialogues, including reverberant portion of the dialogue. Using this method, the only change that will be required for the production version in another language, is to replace the dry dialogues contained in the playlist narrow audio, and re is Esperanto part of the dialogue, contained in the second set of tracks ambiophonic.
In Fig.9 shows an additional variant of the implementation of this method, as part of the audio production and postproduction in the virtual scene (for example, animated film or a three-dimensional game). In the virtual scene, the available information regarding the location and orientation of the sound sources and the listener. There may be information about three-dimensional geometry of the scene, as well as materials, it is present. Optional calculation of reverberation can be automatically calculated using the simulation of room acoustics. In this context, the encoding of the sound stage into an intermediate format which is not dependent on the demonstration can be simplified. On the one hand, it is possible to assign audio tracks to each source, and to encode the position in relation to the listener at every moment, just automatically calculating them from their respective locations and orientations, instead of having to specify them in a later stage of postproduction. You can also decide how much reverb to encode the group of ambiophonic, assigning the direct sound of each source, as well as a certain number of early reflections of sound in the playlist narrow audio, and the remaining portion of the reverberation in the group ambiophonic.
In Fig.10 shows an additional variant of Khujand is the implementation of the method, as part of the digital cinema server. In this case, the same audio content can be distributed in cinemas in the described format that is not dependent on the demonstration, consisting of a playlist narrow audio, plus a set of tracks ambiophonic. In every theater, you can set the decoder, with the specification of a particular complex of several speakers, which can be entered manually or with the aid of autodetection of any type. In particular, the automatic determination of the complex can easily be integrated into a system which, at the same time, calculates the adjustment required for each speaker. This phase may consist of measuring the impulse response of each speaker in this cinema, to calculate the position of the loudspeaker, and the inverse filter is required for adjustment. Measurement of the impulse response, which you can perform various existing methods (such as sine sweep or sequences MLS)and the corresponding calculation of the location of the loudspeaker is a procedure that does not need to do often, but, on the contrary, only when the characteristics of the placements or complex change. In any case, after the decoder has the specification of complex, content can in order to decode the optimal way to format a single track on the channel, ready to play.
In Fig.11 shows an alternative implementation of the method for a film in which the content can be decoded prior to the distribution. In this case, the decoder must have the specification of each complex film, so as to generate multiple versions of the same track on the channel, which then distribute. This application is useful, for example, for delivery of content to cinemas, is not equipped with a decoder that is compatible with the proposed in the present document format, independent of the demonstration. Also this can be useful for testing or certification audio quality, which is adapted for a particular system prior to distribution.
In an additional embodiment of this method, some of the playlist narrow audio can be edited without recourse to the original master project. For example, some of the metadata that describes the position of the sources or their spacing can be changed.
Despite the fact that the previous shown and described with reference to specific embodiments of the invention, the experts in this field will understand that various other changes in form and detail can be made without departing from the scope and spirit of this invention. You should understand that various changes can inositide adaptation to different variants of implementation, without departing from the broad concepts disclosed herein and described in the attached claims.
1. The method of coding audio signals and related spatial information in a format that is not dependent on the schema of reproduction, and the method includes:
a. the purpose of the first set of audio signals in the first group, and the encoding of the first group as set monohydrogen with associated metadata describing the direction of the source of each track relative to the record position and time to start playback;
b. the purpose of the second set of audio signals to the second group, and the encoding of the second group as at least one set of tracks ambiophonic this order and mix order; and
c. generating two groups of tracks containing the first and second set of audio signals.
2. The method according to p. 1, additionally containing the coding parameters explode associated with paths in the set of monohydrogen.
3. The method according to p. 1, additionally containing the coding of additional parameters orientation associated with paths in the set of monohydrogen.
4. The method according to p. 1, additionally containing a receiving direction of the source signals for the tracks in the first set of any three-dimensional representation of the scene containing vekovye sources, associated with the tracks, and the recording position.
5. The method according to p. 1, additionally containing the assignment of the direction of the source signals for the tracks in the first set in accordance with predefined rules.
6. The method according to p. 1, additionally containing the coding of orientation parameters for each track in the first set either a fixed constant values or values that change over time.
7. The method according to p. 1, additionally containing the encoding of metadata describing the format specification in use, amyotonia, for example, the order of ambiophonic, type of mixing orders of magnitude, the gain of the tracks, and the ordering of the tracks.
8. The method according to p. 1, additionally containing the encoding start time of the playback associated with paths of ambiophonic.
9. The method according to p. 1, additionally containing the encoding of the input mono with associated data direction in the walkway ambiophonic this order and mix order.
10. The method according to p. 1, additionally containing the coding of any of the input multi-channel signals in the tracks of ambiophonic this order and mix order.
11. The method according to p. 1, additionally containing the coding of any input ambiophonics signals of any order and mix order walkway ambiophonic may, the other in this order and mix order.
12. The method according to p. 1, additionally containing an encoding format that is independent from the schema playback and transcoding includes at least one of the following:
a. assigning tracks from a set of malodorous in the set of ambiophonic;
b. the purpose of the audio parts from a set of ambiophonic set malodorous, possibly including the obtained information about the orientation of ambiophonics signals;
c. change the order or mixing of the orders of the set of paths of ambiophonic;
d. metadata change orientation associated with the set of malodorous;
e. the tracks change ambiophonic by performing operations such as rotation and scaling.
13. The method according to p. 12, optionally containing an encoding format that is independent from the schema of the play, in the format applicable for broadcast, and recoding satisfies the following constraints: a fixed number of continuous audio streams, using the available protocols for the transmission of metadata contained in a format that is independent from the schema of the play.
14. The method according to p. 1, additionally containing a decoding format that is independent from the schema of the audio for this complex of several speakers, and the decoding uses the specification of the positions of some of the speakers for:
a. decoding the set of malodorous using algorithms that are applicable for playback narrowly focused sound sources;
b. decoding the set of tracks of ambiophonic using algorithms adapted to the order of the tracks and mixing orders, and for the specified property.
15. The method according to p. 14, further including using parameters explode, and possibly other spatial metadata associated with a set of malodorous to use decoding algorithms, applicable to specified explode.
16. The method according to p. 14, further including using the standard presets schemes playback, for example, stereo and surround 5.1, ITU-R775-1.
17. The method according to p. 14, optionally containing decoding headphone, using standard technology sound, using the database transfer functions, taking into account the peculiarities of perception.
18. The method according to p. 14, further including using parameters control the rotation, to perform a rotation full sound stage, and these control parameters can be formed, for example, a device that tracks the position of the head.
19. The method according to p. 14, further including using technology to automatically obtain a position Gromkov is varicela, to determine the specification for use by the decoder.
20. The method according to p. 14 or 17, in which the output of the decoding retain as a set of audio tracks, instead of the direct playback.
21. The method according to p. 1, 12, 13, 14 or 17, in which the audio signals, in whole or in part, encoded in compressed audio formats.
22. Audio encoder for encoding audio signals and related spatial information in a format that is not dependent on the schema of reproduction, and the encoder includes:
a. an encoder to assign the first set of audio signals in the first group and the encoding of the first group in the set of malodorous with information about orientation and time to start playback;
b. an encoder to assign the second set of audio signals in the second group and the encoding of the second group in the set of paths of ambiophonic any order and mix order; and
c. an encoder for generating two groups of tracks containing the first and second set of audio signals.
23. The audio recoder for recoding audio in the input format is not dependent on the schema of reproduction, and the recoder is configured to perform at least one of the following:
a. to assign a track from a set of malodorous in the set of ambiophonic;
b. to assign a portion of the audio from a set of ambiophonic in the set of Manado is ojek, including, obtained from signals ambiophonic information about orientation;
c. to change the order or mix order on the set of tracks of ambiophonic;
d. to change the orientation metadata associated with a set of malodorous;
e. to change the track of ambiophonic through operations such as rotation and scaling.
24. Audio decoder audio decoding format-independent schema playback, the playback system with N channels, and format-independent schema playback, generate in accordance with the method according to p. 1, and the audio decoder audio contains:
a. a decoder for decoding a set of malodorous with information about the direction and timing of reproduction in the N audio channels based on the specification of complex play,
b. a decoder for decoding the set of tracks of ambiophonic N audio channels based on the specification of complex play,
c. a mixer for mixing the output of the previous two decoders for generating N output channels, ready to play or save.
25. System for encoding and transcoding spatial audio in a format that is not dependent on the schema of reproduction, and for decoding and reproduction in any complex multiple speakers, or headphone, and sist the mA contains:
a. audio encoder for encoding the set of audio signals and related spatial information in a format that is not dependent on the schema of reproduction, as in p. 22,
b. recoder and audio Converter for manipulating and re-encoding the audio in the input format that is independent from the schema of the play, as in p. 23,
c. audio decoder audio decoding format-independent schema playback, the playback system, or complex multiple speakers or headphones, as in p. 24.
26. Converter audio to manipulate audio in the input format is not dependent on the schema of reproduction, and the output data is converted in accordance with the method according to p. 12.
FIELD: physics, acoustics.
SUBSTANCE: invention relates to a surround sound system. multi-channel spatial signal comprising at least one surround channel is received. Ultrasound is emitted towards a surface to reach a listening position via reflection of said surface. The ultrasound signal may specifically reach the listening position from the side, above or behind of a nominal listener. A first drive unit generates a drive signal for the directional ultrasound transducer from the surround channel. The use of an ultrasound transducer for providing the surround sound signal provides an improved spatial experience while allowing the speaker to be located, for example, in front of the user. An ultrasound beam is much narrower and well defined than conventional audio beams and can therefore be better directed to provide the desired reflections. In some scenarios, the ultrasound transducer may be supplemented by an audio range loudspeaker.
EFFECT: high quality of reproducing audio and high efficiency of the surround sound system.
12 cl, 11 dwg
FIELD: physics, acoustics.
SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal
EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.
11 cl, 6 dwg, 3 tbl
FIELD: physics, acoustics.
SUBSTANCE: invention relates to processing signals in an audio frequency band. The apparatus for generating at least one output audio signal representing a superposition of two different audio objects includes a processor for processing an input audio signal to provide an object representation of the input audio signal, where that object representation can be generated by parametrically guided approximation of original objects using an object downmix signal. An object manipulator individually manipulates objects using audio object based metadata relating to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an output audio signal having one or multi-channel signals depending on a specific rendering setup.
EFFECT: providing efficient audio signal transmission rate.
14 cl, 17 dwg
FIELD: radio engineering, communication.
SUBSTANCE: described is a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker system, wherein each virtual sound source position is associated to each channel. The device includes a correlation reducer for differently converting, and thereby reducing correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a centre and a non-centre channel of the plurality of channels, in order to obtain an inter-similarity reduced combination of channels; a plurality of directional filters, a first mixer for mixing output signals of the directional filters modelling the acoustic transmission to the first ear canal of the listener, and a second mixer for mixing output signals of the directional filters modelling the acoustic transmission to the second ear canal of the listener. Also disclosed is an approach where centre level is reduced to form a downmix signal, which is further transmitted to a processor for constructing an acoustic space. Another approach involves generating a set of inter-similarity reduced transfer functions modelling the ear canal of the person.
EFFECT: providing an algorithm for generating a binaural signal which provides stable and natural sound of a record in headphones.
33 cl, 14 dwg
FIELD: information technology.
SUBSTANCE: method comprises estimating a first wave representation comprising a first wave direction measure characterising the direction of a first wave and a first wave field measure being related to the magnitude of the first wave for the first spatial audio stream, having a first audio representation comprising a measure for pressure or magnitude of a first audio signal and a first direction of arrival of sound; estimating a second wave representation comprising a second wave direction characterising the direction of the second wave and a second wave field measure being related to the magnitude of the second wave for the second spatial audio stream, having a second audio representation comprising a measure for pressure or magnitude of a second audio signal and a second direction of arrival of sound; processing the first wave representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure, a merged direction of arrival measure and a merged diffuseness parameter; processing the first audio representation and the second audio representation to obtain a merged audio representation, and forming a merged audio stream.
EFFECT: high quality of a merged audio stream.
15 cl, 7 dwg
SUBSTANCE: apparatus (100) for generating a multichannel audio signal (142) based on an input audio signal (102) comprises a main signal upmixing means (110), a section (segment) selector (120), a section signal upmixing means (110) and a combiner (140). The section signal upmixing means (110) is configured to provide a main multichannel audio signal (112) based on the input audio signal (102). The section selector (120) is configured to select or not select a section of the input audio signal (102) based on analysis of the input audio signal (102). The selected section of the input audio signal (102), a processed selected section of the input audio signal (102) or a reference signal associated with the selected section of the input audio signal (102) is provided as section signal (122). The section signal upmixing means (130) is configured to provide a section upmix signal (132) based on the section signal (122), and the combiner (140) is configured to overlay the main multichannel audio signal (112) and the section upmix signal (132) to obtain the multichannel audio signal (142).
EFFECT: improved flexibility and sound quality.
12 cl, 10 dwg
FIELD: information technology.
SUBSTANCE: invention relates to lossless multi-channel audio codec which uses adaptive segmentation with random access point (RAP) and multiple prediction parameter set (MPPS) capability. The lossless audio codec encodes/decodes a lossless variable bit rate (VBR) bit stream with random access point (RAP) capability to initiate lossless decoding at a specified segment within a frame and/or multiple prediction parameter set (MPPS) capability partitioned to mitigate transient effects. This is accomplished with an adaptive segmentation technique that fixes segment start points based on constraints imposed by the existence of a desired RAP and/or detected transient in the frame and selects a optimum segment duration in each frame to reduce encoded frame payload subject to an encoded segment payload constraint. RAP and MPPS are particularly applicable to improve overall performance for longer frame durations.
EFFECT: higher overall encoding efficiency.
48 cl, 23 dwg
SUBSTANCE: method and system for generating output signals for reproduction by two physical speakers in response to input audio signals indicative of sound from multiple source locations including at least two rear locations. Typically, the input signals are indicative of sound from three front locations and two rear locations (left and right surround sources). A virtualiser generates left and right surround output signals suitable for driving front loudspeakers to emit sound that a listener perceives as emitted from rear sources. Typically, the virtualiser generates left and right surround output signals by transforming rear source input signals in accordance with a sound perception simulation function. To ensure that virtual channels are well heard in the presence of other channels, the virtualiser performs dynamic range compression on rear source input signals. The dynamic range compression is preferably performed by amplifying rear source input signals or partially processed versions thereof in a nonlinear way relative to front source input signals.
EFFECT: separating virtual sources while avoiding excessive emphasis of virtual channels.
34 cl, 9 dwg
FIELD: information technologies.
SUBSTANCE: invention discloses the method for reproduction of multiple audio channels, according to which out-of-phase information is extracted from side and/or rear side channels contained in a multi-channel audio signal.
EFFECT: improved reproduction of a multi-channel audio signal.
15 cl, 10 dwg
FIELD: information technologies.
SUBSTANCE: audio decoder for decoding multi-object audio signal comprises module to compute factor of forecasting matrix C consisting of factors forecasts based on data about object level difference (OLD), as well as means for step-up mixing proceeding from forecast factors for getting first upmix audio signal tending first type audio signal and/or second upmix signal tending to second type audio signal. Note here that multi-object audio signal comprises coded audio signals of first and second types. Multi-object audio signal consists of downmix signal 112 and service info. Service info comprises data on first and second type signal levels in first predefined frequency-time resolution.
EFFECT: separation of individual audio objects in mixing and decreasing/increasing channel number.
20 cl, 24 dwg
FIELD: physics, computer engineering.
SUBSTANCE: invention relates to means of updating the processing unit of an encoder or decoder for using modulated transforms having a size greater than a predetermined size. The method includes storing an initial prototype filter characterised by an ordered set of initial size coefficients; providing a step for constructing a prototype filter of a size greater than the initial size to implement the modulated transform of a greater size by inserting at least one coefficient between two consecutive coefficients of the initial prototype filter.
EFFECT: reducing the size of memory required for the encoding-decoding process.
10 cl, 8 dwg
FIELD: radio engineering, communication.
SUBSTANCE: adaptive delta codec includes a source and a receiver of an analogue signal, a digital communication channel, a coder containing a waveform digitizer, a comparator, an inverter, a JK trigger, a transmission adaptation circuit including a voltage divider, an operating transmission amplifier, the first, the second and the third resistors, a capacitor, as well as a clock-pulse generator (CPG), a decoder containing an amplifier, an analogue switch, a low-pass filter, a reception adaptation circuit including a voltage divider, an operating reception amplifier, the first, the second and the third resistors, and a capacitor.
EFFECT: improving transmission quality of a voice signal via digital communication channels at low transmission rate at simultaneous simplification of the device structure.
FIELD: physics, acoustics.
SUBSTANCE: invention relates to encoding and decoding a multichannel audio signal. The audio signal decoder is designed to generate a decoded representation of a multichannel audio signal based on the encoded representation of the multichannel audio signal and includes a time warping decoder for reconstructing time warping of multiple audio signals included in the encoded representation of the multichannel audio signal. The audio signal encoder generates an encoded representation of a multichannel acoustic signal and includes a generator of the encoded representation of the audio signal, which in turn selectively generates a representation of the audio signal containing information about the general time warping outline, which cumulatively characterises multiple audio channels of the multichannel acoustic signal, or an encoded representation of the audio signal containing information about individual time warping outlines, separately characterising each of the multiple audio channels, where the choice depends on the similarity or difference between time warping outlines relating to each of the multiple audio channels reflected in the information.
EFFECT: improved characteristics of an encoder/converter for modified discrete cosine transform with time warping, providing an effective bit rate when storing and/or transmitting a multichannel audio signal.
14 cl, 40 dwg
FIELD: physics, acoustics.
SUBSTANCE: invention relates to means of decoding and/or transcoding audio. A first and a second source set of spectral band replication (SBR) parameters are merged into a target set of SBR parameters. The first and second source set comprise a first and second frequency band partitioning, respectively, which are different from one another. The first source set comprises a first set of energy related values associated with frequency bands of the first frequency band partitioning. The second source set comprises a second set of energy related values associated with frequency bands of the second frequency band partitioning. The target set comprises a target set of energy related values associated with an elementary frequency band. The method comprises steps of breaking up the first and the second frequency band partitioning into a joint grid comprising the elementary frequency band; assigning a first value of the first set of energy related values to the elementary frequency band; assigning a second value of the second set of energy related values to the elementary frequency band; and combining the first and second value to yield the target energy related value for the elementary frequency band.
EFFECT: simplifying the process of reducing the number of channels while preserving the relevant high-frequency channel information.
32 cl, 9 dwg
FIELD: radio engineering, communication.
SUBSTANCE: at transmitting side, each video signal code processing channel includes a "code 2n-code 2n-1" converter and each audio code processing channel includes a "sound-code" converter and at the receiving side, each screen matrix element is made from one emitting cell.
EFFECT: reduced capacity of transmitted video and audio codes, introducing digital microphones at the transmitting side and doubling screen resolution at the receiving side.
7 tbl, 16 dwg
FIELD: radio engineering, communication.
SUBSTANCE: invention relates to means of stereo encoding and decoding using complex prediction in the frequency domain. A decoding method for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels, comprises the upmixing steps of: (i) computing a second frequency-domain representation of a first input channel; and (ii) computing an output channel based on the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient.
EFFECT: high speed of encoding in the range of high bit transfer rates.
14 cl, 19 dwg, 1 tbl
FIELD: radio engineering, communication.
SUBSTANCE: invention relates to digital broadcasting which provides an audio indicator of link quality. After receiving a digital radio signal using a digital radio receiver, the quality of the received digital radio transmission is determined. Then an audio message from the received digital radio transmission is decoded. Then an audio indicator is superimposed onto the audio message, to form a composite audio signal. Finally, the amplitude of the audio indicator is dynamically adjusted relative to the amplitude of the audio message depending on the quality of the received digital radio transmission.
EFFECT: improved quality of digital radio transmission of audio signals through accurate detection and correction of single-bit errors.
26 cl, 5 dwg
FIELD: physics, acoustics.
SUBSTANCE: invention relates to an information system for delivering different types of information to an end device through acoustic waves. A transmitter capable of generating acoustic waves for transmitting information, which are almost inaudible to the human ear, is required in a medium which enables to transmit information through acoustic waves. The transmitter is a device for converting different types of information into an acoustic wave in a sound spectrum, and transmission, having a microphone for receiving ambient sound at the point from where the acoustic wave is emitted, which serves as the input signal of the ambient sound; a peak frequency detector for determining in the ambient sound signal the peak frequency of the main component of ambient sound; a carrier generator for generating carriers, having a plurality of frequencies equal to the product of the peak frequency and a natural number and can be used to mask ambient sound; and a modulator for modulating the plurality of carriers of the baseband.
EFFECT: transmitting information through acoustic waves.
6 cl, 5 dwg
FIELD: physics, acoustics.
SUBSTANCE: invention relates to audio encoding and decoding technology, particularly to hierarchical audio encoding and decoding and hierarchical audio encoding and decoding for transient signals. The hierarchical audio encoding method comprises performing a transient detection on an audio signal of a current frame; performing a time-frequency transform; quantising and encoding amplitude envelope values of core layer encoding sub-bands and extended layer encoding sub-bands; quantising and encoding core layer frequency-domain coefficients; inversely quantising the frequency-domain coefficients in the core layer which are performed with a vector quantisation; performing a difference calculation with original frequency-domain coefficients to obtain a core layer difference signal; and calculating amplitude envelope quantisation indices of the core layer difference signals; quantising and encoding the extended layer encoding signals; multiplexing and packeting the amplitude envelope encoded bits of the core layer encoding sub-bands and the extended layer encoding sub-bands, the encoded bits of the core layer frequency-domain coefficients and the encoded bits of the extended layer coding signals, and then transmitting to a decoding end.
EFFECT: high quality of hierarchical encoding and decoding.
18 cl, 9 dwg, 11 tbl
FIELD: physics, acoustics.
SUBSTANCE: invention relates to audio encoding technologies. An audio encoder for encoding an audio signal has a first coding channel for encoding an audio signal using a first coding algorithm. The first coding channel has a first time/frequency converter for converting an input signal into a spectral domain. The audio encoder also has a second coding channel for encoding an audio signal using a second coding algorithm. The first coding algorithm differs from the second coding algorithm. The second coding channel has a domain converter for converting an input signal from an input domain into an output domain audio signal.
EFFECT: improved encoding/decoding of audio signals in low bitrate circuits.
21 cl, 43 dwg, 10 tbl
FIELD: electric communication, namely systems for data transmitting by means of digital communication lines.
SUBSTANCE: method comprises steps of preliminarily, at reception and transmission forming R matrices of allowed vectors, each matrix has dimension m2 x m1 of unit and zero elements; then from unidimensional analog speech signal forming initial matrix of N x N elements; converting received matrix to digital one; forming rectangular matrices with dimensions N x m and m x N being digital representation of initial matrix from elements of lines of permitted vectors; transmitting elements of those rectangular matrices through digital communication circuit; correcting errors at transmission side on base of testing matching of element groups of received rectangular matrices to line elements of preliminarily formed matrices of permitted vectors; then performing inverse operations for decompacting speech messages. Method is especially suitable for telephone calls by means of digital communication systems at rate 6 - 16 k bit/s.
EFFECT: possibility for correcting errors occurred in transmitted digital trains by action of unstable parameters of communication systems and realizing telephone calls by means of low-speed digital communication lines.
5 cl, 20 dwg