System and tools for perfected author development and presentation of 3d audio data

FIELD: physics.

SUBSTANCE: invention discloses perfected tools for author development and presentation of sound playback data. Some said tools allow combine said data for wide range of playback means. Playback data can be individually developed by creation of metadata for audio objects. Said metadata can be created with reference to zones of loudspeakers. Data of audio playback can be reproduced in compliance with loudspeakers arrangement for particular playback medium.

EFFECT: simplified computer processing of 3D sound.

42 cl, 47 dwg

 

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the priority of provisional patent application U.S. No. 61/504005, filed July 1, 2011, and provisional application for U.S. patent No. 61/636102 filed on 20 April 2012, both applications with a link are included in this disclosure fully in all respects.

AREA TECHNICAL APPLICATIONS

[0002] This disclosure relates to the author's development and presentation of audio data. In particular, this disclosure relates to the author's development and presentation of audio data for reproducing such media as sound system for film.

Background of the INVENTION

[0003] Since the introduction in 1927 of the sound on the tape was sustainable development technology used to capture the author's intention sound track of the film and to play in the medium of cinematography. In the 1930s synchronized sound on the CD was replaced with the variable region of sound on film, which further developed in the 1940s, together with considerations of acoustics for theaters and improve the design of the speakers along with first views of multitrack recording and playback managed (using tones of governors for moving sounds). In the 1950s and 1960s the application of the magnet�th track on the tape made it possible for multi-channel playback in theater, the introduction of surround channels and up to five screen channels in theaters high class.

[0004] In the 1970s, Dolby introduced noise reduction as when linking film production and film along with cost-effective means of encoding and dissemination of mixing audio tracks with three screen channels and mono surround channel. The cinematic quality of sound was further improved in the 1980s noise reduction Dolby Spectral Recording (SR) and such programs certifications like THX. During the 1990s, Dolby has brought into the digital cinema sound with 5.1-channel format, which provides for separate left, center and right screen channels, left and right surround arrays, and extremely low-frequency channel for low frequency effects. Dolby Surround 7.1, presented in 2010, increased the number of surround channels by decomposing existing left and right surround channels into four "zones".

[0005] as an increasing number of channels and the locations of the speakers goes from flat two-dimensional (2D) array to three-dimensional (3D) array, including elevation, location and reporting to the sounds becoming more and more complex. Would be desirable improved way author the development and presentation of audio data.

To�WAY description of the INVENTION

[0006] Some features of the subject invention described in this disclosure can be implemented in the workbench author for the development and presentation of audio data. Some of these tools author allow us to generalize the audio data for reproducing a wide range of environments. According to some of these implementations, audio data can autorski be developed through the creation of metadata for audio objects. This metadata can be created with reference to the zone speakers. During the process of submitting data, audio data can be played back in accordance with the scheme of arrangement reproducing speaker for reproducing specific environment.

[0007] Some implementations described in this disclosure provide a device that includes the system interfaces and logical system. Logical system which can be configured to receive via the system interfaces audio data that contain one or more audio objects and associated metadata, and data reproducing medium. The data reproducing medium may contain a pointer to the number of playback speakers in reproducing Wed�de, and the pointer to the location of each reproducing loudspeaker within the playback environment. Logical system which can be configured to represent audio data objects in one or more signal applied to the speaker at least partially based on the associated metadata and data reproducing environment, where each signal supplied to the loudspeaker corresponds to at least one reproducing loudspeaker within the playback environment. Logical system which can be configured to calculate the gain factors for loudspeakers corresponding to the locations of the virtual speakers.

[0008] a Reproducing medium can, for example, to provide an environment of sound system for cinematography. Reproducing environment can have a configuration Dolby Surround 5.1, configuration Dolby Surround or 7.1 configuration ambient sound Hamasaki 22.2. The data reproducing medium can contain data layouts reproducing loudspeakers, indicating the location of the reproducing loudspeakers. The data reproducing medium can contain data zones reproducing loudspeakers pointing region reproducing speaker reproducing and location of loudspeakers, which correspond to the areas of the reproducing loudspeakers.

[0009] the Metadata may contain information�Yu to assign the location of the sound object location unit reproducing loudspeaker. The data representation may include the creation of a total gain based on one or more of the following parameters: the desired position of the sound object, the distance from the desired position of the sound object to its original position, speed of the sound object or content type of the sound object. The metadata can include data to constrain the position of the sound object one-dimensional two-dimensional curve or surface. The metadata can include data trajectory for the sound object.

[0010] the Performance data may include the imposition of restrictions on the zone speakers. For example, the device may comprise a system for user input. According to some implementations, the performance data may include the application of control balance between the screen and the room in accordance with the data control the balance between the screen and the space obtained from the system for user input.

[0011] the Device may comprise a display system. Logical system which can be configured to control the display system to demonstrate the dynamic three-dimensional view of the playback environment.

[0012] the Performance data may include control of the spread of the sound object in one or more of the three dimensions. View dannymorel enable dynamic reallocation of the object in response to the overload of the speakers. The data representation may include assigning locations of the audio objects to the planes of the arrays of loudspeakers reproducing environment.

[0013] the Device may contain one or more persistent storage media such as a memory device of the memory system. The storage device can, for example, include a memory with random access (RAM), read only memory (ROM), flash memory, one or more drives on hard magnetic disks. The interface system may include an interface between a logical system and one or more specified storage devices. The interface system may also include a network interface.

[0014] the Metadata may include metadata restrictions zones of speakers. Logical system which can be configured to attenuate selected signals from the selected speaker by performing the following operations: calculating the first gain, which contain contributions from selected loudspeakers; calculating the second gain values that do not include the contributions from the selected speaker; and mixing the first gain of the second amplification coefficients. The logic system may be configured to determine whether to apply the rules panoraminju�s to the position of the sound object or to assign the position of the sound object to the location of a single loudspeaker. Logical system which can be configured for smooth transitions between the gains of the loudspeakers in the transition from assigning the position of the sound object from the first location of a single loudspeaker in the location of the second single loudspeaker. Logical system which can be configured for smooth transitions between the gains of the loudspeakers in the transition between the assignment of the position of the sound object to the location of a single speaker and the application of rules to the pan position of the sound object. Logical system which can be configured to calculate the gains of the loudspeakers to the provisions of the sound object on a one-dimensional curve between positions of the virtual loudspeakers.

[0015] Some methods described in this disclosure include receiving audio data that contain one or more audio objects and associated metadata, and receiving data reproducing medium, which includes a pointer to the number of playback speakers on the playback environment. The data reproducing medium may contain a pointer to the location of each reproducing loudspeaker within the playback environment. These methods may include providing d�nnyh sound objects in one or more of the signals supplied to the speakers, at least partially based on the associated metadata. Each signal supplied to the loudspeaker, may correspond to at least one of the reproducing loudspeakers within the playback environment. Reproducing medium may be a medium sound system for film.

[0016] the Performance data may include the establishment of the cumulative gain on the basis of one or several parameters: the desired location of the sound object, the distance from a desired location of the sound object to its original position, speed of the sound object or content type of the sound object. The metadata can include data to constrain the location of the sound object one-dimensional two-dimensional curve or surface. The data representation may include the imposition of restrictions on the zone speakers.

[0017] Some implementations can be detected on one or more permanent data carriers containing stored in their memory software. The software may contain commands to control one or more devices to perform the following operations: receiving audio data comprising one or more audio objects and associated meth�data; receiving data reproducing medium containing a pointer to the number of playback speakers on the playback environment, and a pointer to the location of each reproducing loudspeaker within the playback environment; and data representation of sound objects in one or more of the signal applied to the speaker at least partially based on the associated metadata. Each signal supplied to the loudspeaker, may correspond to at least one of the reproducing loudspeakers within the playback environment. Reproducing medium can, for example, to provide an environment sound system for film.

[0018] the Performance data may include the establishment of the cumulative gain on the basis of one or several parameters: the desired position of the sound object, the distance from a desired location of the sound object to its original position, speed of the sound object or content type of the sound object. The metadata can include data to constrain the position of the sound object one-dimensional two-dimensional curve or surface. The data representation may include the imposition of restrictions on the zone speakers. The data representation may include dynamic reallocation of the object in response to peragro�ku loudspeakers.

[0019] In this disclosure are described and alternative devices. Some such devices may include system interfaces, system user input and a logical system. Logical system which can be configured to receive audio data via the interface system, receiving the position of the sound object through a system of user input or system interfaces and determining the position of the sound object in three-dimensional space. This definition may include limiting the provisions of the one-dimensional two-dimensional curve or surface within the three-dimensional space. Logical system which can be configured to generate metadata associated with the sound object at least partially based on user input received via the user input system, wherein the metadata includes data that specifies the position of the sound object in three-dimensional space.

[0020] the Metadata may include trajectory data indicating time-varying position of the sound object within three-dimensional space. Logical system which can be configured to calculate the trajectory data in accordance with user input received via the user input system. These trajectories can contain a set of provisions in the pre�crystals three-dimensional space for multiple points in time. These trajectories may contain initial position, velocity data and acceleration data. These trajectories may contain original position and the equation that determines the position in three-dimensional space and the corresponding times.

[0021] the Device may comprise a display system. Logical system which can be configured to control the display system to demonstrate the trajectory of the sound object in accordance with the data path.

[0022] the Logic system may be configured to create metadata restrictions zones of loudspeakers in accordance with user input received via the user input system. Metadata restrictions zones of loudspeakers can contain data for blocking the selected loudspeakers. Logical system which can be configured to create metadata restrictions zones of speakers by assigning the position of the sound object, a single loudspeaker.

[0023] the Device may comprise an audio system. Logical system which can be configured to control the audio system at least partially in accordance with the metadata.

[0024] the Position of the sound object may be limited to a one-dimensional curve. Logical system may additionally to�to figuriruete to create provisions virtual speakers on a one-dimensional curve.

[0025] In this disclosure are described and alternative ways. Some such methods include receiving audio data, the receiving position of the sound object and determining the position of the sound object in three-dimensional space. This definition may include limiting the provisions of the one-dimensional two-dimensional curve or surface within the three-dimensional space. These methods may include providing metadata associated with the sound object at least partially based on user input.

[0026] the Metadata can include data that specifies the position of the sound object in three-dimensional space. The metadata can include data trajectory, indicating time-varying position of the sound object within three-dimensional space. Creating metadata can include metadata creation restrictions zones of loudspeakers, for example, in accordance with user input. Metadata restrictions zones of loudspeakers may contain data to block selected speakers.

[0027] the Position of the sound object may be limited to a one-dimensional curve. These methods may include the creation of regulations of virtual speakers at the specified one-dimensional curve.

[0028] Other features of this disclosure can be implemented on od�ohms or more permanent data carriers, in memory which contains the software. The software may contain commands to control one or more devices to perform the following operations: receiving audio data; receiving the position of the sound object; and determining the position of the sound object in three-dimensional space. This definition may include the restriction of specified provisions of the one-dimensional two-dimensional curve or surface within the three-dimensional space. The software may contain commands to control one or more devices with the aim of creating metadata associated with the specified sound object. The metadata is at least partially based on user input.

[0029] the Metadata can include data that specifies the position of the sound object in three-dimensional space. These metadata can include data trajectory, indicating time-varying position of the sound object within the specified three-dimensional space. Creating metadata can include metadata creation restrictions zones of loudspeakers, for example, in accordance with user input. Metadata restrictions zones of loudspeakers may contain data to block selected speakers.

[0030] Specified�th position of the sound object can be limited to one-dimensional curve. The software may contain commands to control one or more devices with the purpose of the provisions of virtual speakers at the specified one-dimensional curve.

[0031] the Details of one or more implementations of the subject invention described in this description set forth below in the accompanying drawings and the description. Other characteristic features and advantages will be apparent from the description of graphic materials and claims. It should be noted that the relative dimensions of the following figures may not be drawn to scale.

BRIEF DESCRIPTION of GRAPHIC MATERIALS

[0032] Fig. 1 shows an example of a reproducing medium having a configuration Dolby Surround 5.1.

[0033] Fig. 2 shows an example of a reproducing medium having a configuration Dolby Surround 7.1.

[0034] Fig. 3 shows an example of a reproducing medium having the configuration of surround sound Hamasaki 22.2.

[0035] Fig. 4A shows an example graphical user interface (GUI) that graphically represents the area of the speakers at variable elevations reproducing in the virtual environment.

[0036] Fig. 4B shows another example of the playback environment.

[0037] Fig. 5A-5C show examples of the response of the speakers corresponding to the audio object�, having a position that is restricted to the two-dimensional surface in three-dimensional space.

[0038] Fig. 5D and 5E show examples of two-dimensional surfaces, which may be limited to the sound object.

[0039] Fig. 6A is a flow chart that describes one example of a process of limiting the provisions of the sound object two-dimensional surface.

[0040] Fig. 6B is a flow chart that describes one example of a process of assigning the position of the sound object to the location of a single speaker or single speaker.

[0041] Fig. 7 is a flow chart that describes the process of creating and using virtual loudspeakers.

[0042] In Fig. 8A-8C show examples of virtual loudspeakers assigned to the endpoints of the line, and the respective characteristics of the speakers.

[0043] Fig. 9A-9C show examples of the use of virtual bindings to move the sound object.

[0044] Fig. 10A is a flow chart that describes the process of using virtual bindings to move the sound object.

[0045] Fig. 10B is a flow chart that describes the viola�native the process of using virtual bindings to move the sound object.

[0046] In Fig. 10C-10E show examples of the process described in Fig. 10V.

[0047] In Fig. 11 shows an example of the use of the restriction zones of loudspeakers reproducing in the virtual environment.

[0048] Fig. 12 is a flow chart that describes some examples of applications of the rules restrictions zones of speakers.

[0049] In Fig. 13A and 13C shows one example of a GUI that can switch between the two-dimensional image and the three-dimensional image reproducing virtual environment.

[0050] In Fig. 13C-13E shows the combination of two-dimensional and three-dimensional illustrations reproducing environments.

[0051] Fig. 14A is a flow chart that describes the process control device intended for providing such interfaces GUI interfaces as shown in Fig. 13C-13E.

[0052] Fig. 14C is a flow chart that describes the process of data representation of sound objects for reproducing environment.

[0053] In Fig. 15A shows one example of the sound object and the associated width of the sound reproducing the object in a virtual environment.

[0054] In Fig. 15B shows one example of the distribution profile corresponding to the width of the sound object, as shown in Fig. 15A.

[0055] Fig. 16 is a sh�Moo sequence of operations which describes the process of redistribution of the sound objects.

[0056] In Fig. 17A and 17B show examples of the sound object, located in a reproducing three-dimensional virtual environment.

[0057] In Fig. 18 shows examples of areas that correspond to the modes of the pan.

[0058] In Fig. 19A-19D show examples of application of methods of panning in the near zone and far zone to the audio objects in different locations.

[0059] In Fig. 20 shows the zones of loudspeakers reproducing medium which can be used in the management of the offset between the screen and the room.

[0060] Fig. 21 is a block diagram representing example components of a device for the author and/or view data.

[0061] Fig. 22A is a block diagram that represents some of the components that can be used to create audio content.

[0062] Fig. 22B is a block diagram that represents some of the components that can be used to play the sound on the playback environment.

[0063] Similar reference positions and designations in the various drawings indicate similar elements.

DESCRIPTION of the EXEMPLARY embodiments of the INVENTION

[0064] the following description is directed to some actual�tion to describe certain inventive features of this disclosure, as well as examples of contexts in which these inventive features can be implemented. However, the ideas described in this disclosure can be applied in other different ways. For example, although various implementations are described in relation to a particular display environments, the idea of this disclosure is widely applicable to other known reproducing Wednesdays and reproducing environments that can be presented in the future. Similarly, although in this disclosure presents examples of graphical user interfaces (GUI), some of which provide examples of the locations of the loudspeakers, loudspeakers, etc., and assumed other implementations. Moreover, the described implementations may be implemented in various tools author and/or presentation, which may variously be implemented in hardware, software, and hardware and software, etc., Respectively, the ideas in this disclosure are not intended as limited to the implementations shown in the figures and/or described in this disclosure, but instead have wide applicability.

[0065] In Fig. 1 shows an example of a reproducing medium having a configuration Dolby Surround 5.1. Dolby Surround 5.1 of schedule�ve in the 1990s, however, this configuration remains widespread in environments of sound systems for cinematography. The projector 105 may be configured for projection of video images, for example, the picture on the screen 150. Data audio can be synchronized with the video process device 110 sound processing. Amplifiers 115 power can supply loudspeakers reproducing medium 100 by the signal applied to the speakers.

[0066] the Configuration Dolby Surround 5.1 contains left surrounding the array 120 and right surround array 125, each of which is comprehensively managed by a single channel. Configuration Dolby Surround 5.1 also contains separate channels for the left screen channel 130, the Central screen channel 135 and the right screen channel 140. For low frequency effects (LFE) provides a separate channel for low-frequency loudspeaker 145.

[0067] In 2010, Dolby introduced enhancements to digital audio for cinema, presenting Dolby Surround 7.1. Fig. 2 shows an example of a reproducing medium having a configuration Dolby Surround 7.1. Digital projector 205 can be configured to receive digital video data and for projecting video images onto the screen 150. Data audio can be processed by the device 210 of sound processing. Amplifiers 215 power�spine can provide speakers reproducing environment 200 signals, served on the speakers.

[0068] the configuration of the Dolby Surround 7.1 includes left side surrounding the array 220 and right side surrounding the array 225, each of which can be controlled only by the channel. Like Dolby Surround 5.1, configuration Dolby Surround 7.1 contains separate channels for the left screen channel 230, the Central screen channel 235, the right screen channel 240 and the low-frequency loudspeaker 245. However, Dolby Surround 7.1 increases the number of surround channels by separating the left and right surround channels, Dolby Surround 5.1 into four zones: in addition to the left side surrounding the array 220 and the right side of the surrounding array 225, left for rear surround speakers 224 and right rear surround speakers included 226 separate channels. The increase in the number of surrounding zones within the playback environment 200 can significantly improve the localization of sound.

[0069] In an attempt to create a more multi-directional environment, some reproducing medium can be configured with elevated quantities of loudspeakers that are managed in increased levels of channels. Moreover, some reproducing medium may contain loudspeakers deployed at different elevations, some of which may be located above the base surface of the playback environment./p>

[0070] In Fig. 3 shows an example of a reproducing medium having the configuration of surround sound Hamasaki 22.2. Hamasaki 22.2 developed in NHK Science &Technology Research Laboratories in Japan as a component of ambient sound for television ultra high definition. Hamasaki 22.2 provides 24 channels of loudspeakers that can be used to control the loudspeakers, arranged in three layers. The top layer 310 loudspeakers reproducing environment 300 may be controlled 9 channels. The middle layer 320 speakers can be controlled 10 channels. The bottom layer 330 loudspeaker can be controlled by 5 channels, two of which are for extremely low-frequency loudspeakers 345a and 345b.

[0071] Accordingly, the modern trend is the inclusion of not only a greater number of speakers and the larger number of channels, but also the inclusion of loudspeakers at different heights. As an increasing number of channels, and the layer of the speakers is transferred from a two-dimensional array to three-dimensional array, are becoming more and more complex tasks determine the position and presentation of the data for the sounds.

[0072] This disclosure provides various tools and related user interfaces, which increases the functionality and/and�and reduces the complexity of the author's development system for three-dimensional sound.

[0073] In Fig. 4A shows one example of a graphical user interface (GUI) that graphically represents the zone of loudspeakers at different elevations reproducing in the virtual environment. GUI 400 graphically represent areas at different elevations in a virtual environment. GUI 400 may, for example, be displayed on the display device in accordance with commands from the logical system, in accordance with the signals received from the user input data, etc. Some such devices are described below with reference to Fig. 21.

[0074] In the context of this disclosure, with reference to reproducing such virtual environments as virtual reproducing medium 404, the term "zone speakers" generally refers to a logical structure, which may have, but may or may not have, one-to-one correspondence with a playback loudspeaker reproducing actual environment. For example, "the position of the speaker" may fit, but may not be location specific reproducing loudspeaker reproducing medium for film. Instead, the term "position of the speaker" can generally refer to the area of reproducing virtual environment. In some implementations, the area of the loudspeaker�I reproducing virtual environment may correspond to a virtual loudspeaker, for example, through the use of such virtualization technologies like Dolby Headphone,™ (sometimes referred to as Mobile Surround™), which creates a virtual environment ambient sound in real time using a set of two-channel stereo headphones. The GUI 400 includes seven zones 402A of the speakers at the first elevation and two zones 402b of the speakers at the second elevation, a total of nine zones of loudspeakers reproducing in the virtual environment 404. In this example, zones 1 to 3 speakers are in the front area 405 reproducing virtual environment 404. The anterior region 405 may correspond to, for example, the area of the playback environment for cinematography, in which there is a screen 150, to the field house, which is a television screen, etc.

[0075] Here, zone 4 speakers usually corresponds to the loudspeakers in the left pane 410, and zone 5 of the speakers matches the speakers in the right pane 415 reproducing virtual environment 404. Area 6 speakers corresponds to a left rear region 412, and the zone 7 speakers corresponds to the right rear area 414 reproducing virtual environment 404. 8 zone speaker matches the speakers in the upper pane a, and zone 9 speakers corresponds to the loudspeakers upper region 420b, which may represent a region of the virtual ceiling, such as an area of the virtual ceiling 520 shown in Fig. 5D and 5E. Accordingly, and as will be described in more detail below, the location of zones 1-9 loudspeakers, which are shown in Fig. 4A, may or may not correspond to the locations of reproducing loudspeakers reproducing actual environment. In addition, other implementations may include more or fewer speakers and/or elevations.

[0076] In various implementations described in this disclosure, a user interface, such as GUI 400 may be used as part of the tool's author and/or instrumental means of presentation. In some implementations, a tool author and/or tool data representation can be implemented by software stored in memory one or more permanent storage media. Tool author and/or tool data representation can be implemented (at least partially) the hardware, firmware, etc., such as a logical system, and other devices described below with SS�the left main coronary artery in Fig. 21. In some implementations, author associated tool author can be used to create metadata related to the audio data. The metadata can include data indicating the position and/or trajectory of the sound object in three-dimensional space, these restrictions zones of loudspeakers, etc., the Metadata may be created in respect of zones 402 loudspeakers reproducing virtual environment 404 and not in relation to specific arrangements of loudspeakers reproducing actual environment. Tool data representation can accept audio data and associated metadata and compute the coefficients of amplification and the signal applied to the loudspeaker for reproducing environment. These coefficients of amplification and the signal applied to the speakers, can be computed according to the process of amplitude panning, which can create the feeling that the sound comes out of P in the playing media. For example, the signal applied to the loudspeaker, reproducing can be delivered to the loudspeakers 1-N reproducing medium according to the following equation:

[0077]xi(t)= gix(t)for i=1,...N (Equation 1)

[0078] In equation 1xi(t)represents the signal applied to the loudspeaker that is to be applied to loudspeaker i, girepresents the gain of the corresponding channel,x(t)is the audio signal, and t represents time. The gain factors can be determined, for example, in accordance with the methods of amplitude panning, described in section 2 on pages 3-4 of publication V. Pulkki, Compensating Displacement of Amplitude-Panned Virtual Sources (Audio Engineering Society (AES) International Conference on Virtual, Synthetic and Entertainment Audio), which by reference is included in this disclosure. In some implementations, the gain can be frequency dependent. In some implementations, by replacingx(t)onx(tΔt)may be a short delay.

[079] In some implementations, data presentation, the data of sound, created with reference to zone 402 loudspeakers, can be attributed to a location of loudspeakers for reproducing a wide array of environments, which may have the configuration Dolby Surround 5.1, configuration, Dolby Surround 7.1, configuration Hamasaki 22.2 or another configuration. For example, with reference to Fig. 2, tool data representation can assign audio data for zones 4 and 5 loudspeakers left side surrounding the array 220 and the right side of the surrounding array 225 reproducing medium having the configuration of Dolby Surround 7.1. Data audio for zones 1, 2 and 3 loudspeakers can respectively be assigned to the left display channel 230, the right display channel 240 and the Central display channel 235. Data audio for zones 6 and 7 loudspeakers can be assigned to the left rear ambient speakers 224 and right rear surround-speaker 226.

[0080] In Fig. 4 depicts an example of another playback environment. In some implementations, tool data representation can assign audio data for zones 1, 2 and 3 corresponding on-screen loudspeakers loudspeakers 455 reproducing environment 450. Tool data representation can� to assign data to a sound reproduction for zones 4 and 5 loudspeakers left side surrounding the array 460 and right side surrounding the array 465 and assign audio data for zones 8 and 9 speakers the upper left speakers a and the upper right speakers 470b. Data audio for zones 6 and 7 loudspeakers can be assigned to the left rear ambient speakers a and right rear surround-speakers 480b.

[0081] In some implementations author authoring tool author can be used to create metadata for audio objects. In the context of this disclosure, the term "sound object" may refer to the stream of audio data and related metadata. This metadata is usually three-dimensional location of the object, constraints, data views, and also the type of content (e.g., dialogue, effects, etc.). Depending on the implementation, the metadata may contain other types of data, such as data width, the data gain, trajectory data, etc. Some sound objects can be stationary, while other objects can move. The details of the sound object can autorski be developed or presented in accordance with the associated metadata, which, among other things, may indicate the position of the sound object in three-dimensional space at a given time. When the sound object is observed or played back in a playback environment, the audio volume�Chow may be submitted in accordance with the provisions metadata using the reproducing loudspeakers, which are present on the playback environment, and not to be output in a predetermined physical channel, as in the case of such traditional systems on a channel basis, both Dolby 5.1 and Dolby 7.1.

[0082] Various tools copyright design and data presentation are described in this disclosure with reference to the GUI, which is essentially similar to GUI 400. However, in combination with the above tools author the development and presentation of the data can be used and other various user interfaces, including GUI interfaces as non-limiting examples. Some such tools can simplify the process of the author's development through the application of restrictions. Some implementations will be described below with reference to Fig. 5A et seq.

[0083] In Fig. 5A-5C show examples of the response of the speakers corresponding to the audio object that has a position that is limited in three-dimensional space two-dimensional surface, which in this example is a hemisphere. In these examples, the characteristics of the loudspeakers were calculated device view data and assume a configuration with nine speakers, where each speaker corresponds to one of son-9 speakers. However, as indicated elsewhere in this disclosure, the existence of one-to-one assignment between the zones of the virtual loudspeakers reproducing medium and reproducing loudspeakers in the playing media may be optional. With reference, primarily, Fig. 5A, the audio object 505 is shown at the location in the front left of the reproducing virtual environment 404. Accordingly, the loudspeaker corresponding to the zone 1 speakers, indicates a significant gain, and the speakers corresponding to the zones 3 and 4 loudspeakers, indicate moderate gains.

[0084] In this example, the location of the audio object 505 may be changed by placing the cursor 510 on the audio object 505 and "drag and drop" audio object 505 to a desired location in the plane x,y reproducing virtual environment 404. As the object is dragged towards the center of the playback environment, it is also given to the surface of the hemisphere, and its elevation increases. Here the increase in elevation of the audio object 505 is indicated by increasing the diameter of the circle that represents the audio object 505: as shown in Fig. 5B and 5C, as the audio object 505 is dragged into the upper Central hour�ü reproducing virtual environment 404, audio object 505 is becoming larger and larger. Alternatively, or in addition, the elevation of the audio object 505 may be indicated by changes in color, brightness, numerical index of elevation, etc. When the audio object 505 is located in the upper Central part of the reproducing virtual environment 404, as shown in Fig. 5C, the speakers corresponding to the zones 8 and 9 speakers, indicate significant gains, and the rest of the speakers indicate a small gain or the lack of it.

[0085] In this implementation, the position of the audio object 505 is limited to such a two-dimensional surface as a spherical surface, an elliptical surface, conical surface, cylindrical surface, wedge etc. In Fig. 5D and 5E show examples of two-dimensional surfaces, which may be limited to a sound object. Fig. 5D and 5E represent the image of a transverse section through reproducing the virtual environment 404 with the front region 405, shown on the left. Fig. 5D and 5E, the y values on the y-axis-z increases in the direction of the front region 405 reproducing virtual environment 404 to accommodate the orientations of the axes x-y, shown in Fig. 5A-5C.

[0086] In the example shown in Fig. 5D, a two-dimensional surface a is a cross-section �of ellipsoid. In the example shown in Fig. 5E, a two-dimensional surface 515b is a cross-section of the wedge. However, the shape, orientation and position of two-dimensional surfaces 515, shown in Fig. 5D and 5E are examples only. In alternative implementations, at least part of the two-dimensional surface 515 can extend beyond reproducing virtual environment 404. In some such implementations, a two-dimensional surface 515 can pass over the virtual ceiling 520. Accordingly a three-dimensional space within which passes a two-dimensional surface 515, not necessarily have the same length in space with a volume reproducing virtual environment 404. In other implementations, the audio object may be limited to such one-dimensional elements, such as curves, straight lines, etc.

[0087] In Fig. 6A shows a flow chart that describes one example of a process of limiting the provisions of the sound object two-dimensional surface. As for the rest of the sequence diagrams of operations presented in this disclosure, the operations of process 600 are not necessarily performed in the order shown. Moreover, the process 600 (and other processes described in this disclosure) may contain the number of operations is greater than or less than that which is specified in the graphics and/or opisyvaet�Xia. In this example, the blocks 605-622 run tool author, and the blocks 624-630 are instrumental means of presenting data. Tool author and tool data representation can be implemented in a single device or more than one device. Despite the fact that Fig. 6A (and other sequence diagram of operations presented in this disclosure) can produce the impression that the processes of the author to the development and presentation of the data is performed sequentially, in many implementations, the processes of the author to the development and presentation of the data are performed essentially simultaneously. Processes author and the process of data presentation can be interactive. For example, results of operations author may be sent to an instrumental means of presenting data corresponding to the results of the instrumental means of presentation can be assessed by the user who can follow the author's development based on these results, etc.

[0088] In block 605 is accepted indicating that the position of the sound object you want to restrict the two-dimensional surface. The pointer may be, for example, the logical system of the device that RMS�figuerora to provide tools author and/or view data. As with other implementations described in this disclosure, a logical system can operate in accordance with software commands stored in permanent memory of the data carrier, in accordance with the hardware and software, etc. the Pointer may be a signal from user input device (such as a touch screen, a mouse, a trackball device, gesture recognition, etc.) in response to user input.

[0089] the Audio data are accepted in optional block 607. In this example, block 607 is optional, since the data can come directly to the tools to present data from another source (e.g. mixing console), which is synchronized with the tool author to metadata. In some such implementations, there may be an implicit mechanism to associate each stream with the corresponding input flow metadata for the purpose of formation of the sound object. If the data reporting device configured to input an audio signal, which are also numbered from 1 to N, the instrumental means of presenting data can automatically be assumed that the sound object is generated by the flow of meta�data which is identified by a numerical value (e.g. 1), and audio data received at the first input audio data. Similarly, any of the metadata stream, identified by the number 2, may form the object with the audio data received at the second input audio channel. In some implementations, audio data and metadata may be pre-packaged tools author, creating sound objects and sound objects can be delivered in a tool data representation, for example, be transmitted over the network as TCP/IP packets.

[0090] In alternative implementations, a tool author can be transferred over the network only the metadata and tool data representation can hear the audio from another source (for example, flow through the pulse code modulation (PCM), through analog audio signal, etc.). In such implementations, tool data views can be configured for grouping the audio data and metadata shaping the sound objects. The audio data may, for example, taken to a logical system through the interface. The interface may, for example, be a network interface, an audio interface (such as interface configured for OSU�of estline communication via a standard AES3, developed by the society of mechanical engineers acustico and the European broadcasting Union, also known as AES/EBU, by means of the multichannel audio digital interface (MADI), through analog signals, etc.) or logical interface between the system and storage device. In this example, the data received by the device data representation, contain at least one audio object.

[0091] In block 610 accepts the coordinates (x,y) or (x,y,z) position of the sound object. Block 610 may include receiving a reference position of the sound object. Block 610 may also include receiving indication that the user has positioned the audio object, or changed his position, as, for example, described above with reference to Fig. 5A-5C. The coordinates of the sound object are assigned a two-dimensional surface in block 615. A two-dimensional surface can be similar to those described above with reference to Fig. 5D-5E, or it may be a different two-dimensional surface. In this example, each point of the x-y plane will be assigned a single value of z, so the block 615 includes assigning the x and y coordinates obtained in block 610, the value of z. In other implementations may use different processes of assignment and/or coordinate system. The sound object can be demonstrated (BL�to 620) at location (x,y,z), which is determined in block 615. The audio data and the metadata that contains the assigned location (x,y,z), which is determined in block 615, can remain in block 621. Audio data and metadata can be poisonous tool data representation (block 622). In some implementations, the metadata can be sent continuously as some operations are performed author, for example, as determined by the position of the sound object, restriction, demonstration via GUI 400, etc.

[0092] In block 623 is determined whether to continue the process author. For example, the author may end (block 625) when receiving input from the user interface indicating that the user no longer wishes to restrict the position of the sound object two-dimensional plane. Otherwise, the process author may continue, for example, returning to the block or block 607 610. In some implementations, the operations of data presentation can be continued regardless of whether continues the process author. In some implementations, the sound objects can be written to disk on the author's platform, and then to display and to play a specialized processing device sound or cinema server associated with the device clicks�processing of sound, for example, a device with audio processing, similar to the device 210 of sound processing in Fig. 2.

[0093] In some implementations, tool data representation may consist of software that runs on a device that is configured to provide the functionality author. In other implementations, tool data representation may be provided on another device. The type of communication Protocol used for communication between tool, author and tool data representation, may vary depending on, run both tools on the same device, or they communicate over the network.

[0094] In block 626 audio data and metadata (including the position (s) (x,y,z) determined in block 615) accepted instrumental method of data presentation. In alternative implementations, audio data and metadata can be taken individually and interpreted instrumental means of presenting data as a sound object using the implicit mechanism. As noted above, for example, the metadata stream may contain an identification code of the sound object (e.g., 1, 2, 3, etc.) and can �recreates respectively to the first, the second, third, etc. audio input (i.e. digital or analog audio into the system, forming a sound object whose data can introduce yourself to the speakers.

[0095] during operations of the reporting process 600 (and other operations data presentation described in this disclosure), in accordance with the scheme of arrangement of the reproducing loudspeakers reproducing specific environment, we can apply the equation of gain when panning. Accordingly, the logical system tool data representation can receive data reproducing medium containing a pointer to the number of playback speakers on the playback environment, and a pointer to the location of each reproducing loudspeaker within the playback environment. These data may be, for example, by obtaining access to the data structure that is stored in memory available to the logical system, or be taken through the system interfaces.

[0096] In this example, the equation of gain when you pan apply to the provision (provisions) (x,y,z) to determine the values of the amplification coefficients (block 628) for the purpose of their application to the audio data (block 630). In some implementations, audio data b�Li-adjusted level in response to the values of the gains, may be reproduced reproducing loudspeakers, for example, speakers, headphones, or speakers) that is configured for communication with a logical system tool data representation. In some implementations, the location of the reproducing loudspeakers can match the locations of zones of loudspeakers reproducing the virtual environment, as above described reproducing virtual environment 404. Relevant characteristics of the loudspeakers can be displayed on a display device, such as shown in Fig. 5A-5C.

[0097] In block 635 is determined whether the process continued. For example, the process may end (block 640) when receiving input from the user interface indicating that the user no longer wishes to continue the process of submitting data. Otherwise, the process may continue, for example, returning to block 626. If the logical system receives the indication that the user wishes to return to the appropriate process author, the process 600 may return to block 607 or block 610.

[0098] Other implementations may include the imposition of restrictions and the establishment of other types of metadata constraints for sound objects. Fig. 6B is a diagram of posledovatel�ness operations, which describes one example of a process of assigning a position of the sound object to the location of a single loudspeaker. In this disclosure, this process also may be referred to as "binding". In block 655 is accepted indicating that the position of the sound object can bind to a single location of a single loudspeaker or loudspeaker circuit. In this example, the pointer such that the position of the sound object will be optionally bound to a single location speaker. This pointer may, for example, taken to the logical system of the device, which is configured for providing tools author. The pointer may correspond to the input taken from the user input device. However, this pointer can also fit the category of the sound object (e.g., the sound of bullets, making sounds, etc.) and/or width of the sound object. Information relating to the category and/or width may, for example, be taken as metadata for the sound object. In such implementations, block 657 may be in front of the block 655.

[0099] In block 656 accepted audio data. The position coordinates of the sound object are taken at block 657. In this example, the position of the sound object is demonstrated(block 658), in accordance with coordinates taken in block 657. The metadata containing the coordinates of the sound object and the flag of bindings indicating a feature bindings are stored in block 659. The audio data and the metadata are sent tool author tool data representation (block 660).

[0100] In block 662 is determined whether to continue the process author. For example, the author may end (block 663) upon receipt from the user interface input indicating that the user no longer wishes to bind the position of the sound object to the location of one of the speakers. Otherwise, the process author may continue, for example, returning to block 665. In some implementations, the operations of data presentation can be continued regardless of whether continues the process author.

[0101] the Audio data and metadata sent tool author tool data representation, accepted an instrumental means of presenting data in block 664. In block 665 is determined (e.g., logical) whether to bind the position of the sound object to the location of one of the speakers. This determination may be based, for example at least partially on the distance between the position of the sound object and the location of the nearest reproducing loudspeaker reproducing environment.

[0102] In this example, if the unit 665 determines the binding location of the sound object to the location of one of the speakers, the position of this sound object will be assigned in block 670 the location of one of the speakers, usually the loudspeaker nearest to the identified position (x,y,z), it follows that for a given sound object. In this case, the gain for the audio data played back by this location of the loudspeaker, will be equal to 1.0, while the gain for the audio data reproduced by the other speakers, will be zero. In alternative implementations, the position of the sound object can be assigned in block 670 the group locations of the speakers.

[0103] for Example, again referring to Fig. 4B, block 670 may include a binding position of the sound object to one of the top left speaker a. In an alternative embodiment, block 670 may include a binding position of the sound object to a single loudspeaker and neighboring loudspeakers, for example, to one or two neighboring loudspeakers. Thus, the appropriate metadata can be applied to small g�the SCP reproducing loudspeakers and/or reproducing loudspeaker.

[0104] However, if in block 665 is determined that the position of the sound object will not be attached to the location of the loudspeaker, for example, in the case where it can lead to a large deviation in the original position relative to the planned position obtained for this object, will apply the rules of pan (block 675). The rules of the pan can be used in accordance with the position of the sound object, and other properties of the sound (such as width, volume, etc.).

[0105] These gain values determined in block 675, can be applied to the audio data in block 681, and the result can be stored. In some implementations, the resulting audio data can be played back by the loudspeakers are configured for communication with a logical system. If in block 685 is determined that the process 650 will be continued, the process 650 may return to block 664 to continue operations data reporting. In an alternative embodiment, process 650 may return to block 655 to continue operations author.

[0106] the Process 650 may include various kinds of smoothing operation. For example, a logic system may be configured to smooth transitions in the gain applied to the audio data during the transition from assigned�of moving the position of the sound object from the first location of a single loudspeaker location of the second single loudspeaker. Again referring to Fig. 4B, if the position of the sound object is initially assigned to one of the top left speaker a, and later were assigned to one of the upper right surround speakers 480b, a logical system can be configured to have a smooth transition between the speakers so that it seemed that the sound object suddenly "jumps" from one speaker (or zone speakers) to another. In some implementations, the smoothing can be implemented in accordance with the speed parameter smooth transition.

[0107] In some implementations, the logic system may be configured to smooth the transitions between the gain applied to the audio data during the transition between the assignment of the position of the sound object to the location of a single loudspeaker and application for the position of the sound object rules panning. For example, if in block 665 consistently determined that the position of the sound object has been moved to the position which is determined as too remote from the nearest loudspeaker, in block 675 rules may apply pan for the position of the sound object. However, when you switch from peg to the pan (and Vice versa), a logic system may be configured to spezialitaeten in gains, applied to the audio data. The process may end in block 690, for example, upon receiving input from the user interface.

[0108] in Some alternative implementations may include the establishment of logical constraints. In some cases, for example, for the audio mixer, you may need to more explicitly manage a set of loudspeakers, which are used during a particular operation of the pan. Some implementations allow the user to generate one - or two-dimensional logical assignment between sets of speakers and the interface of the pan.

[0109] Fig. 7 is a flow chart that describes the process of creating and using virtual loudspeakers. Fig. 8A-8C show examples of virtual loudspeakers assigned to the endpoints of the line, and the corresponding characteristic zones of speakers. Referring, primarily, to the process 700 of Fig. 7, at block 705 is taken the pointer to create virtual loudspeakers. The pointer may be, for example, a logical system device copyright of the development and to comply with the input received from the user input device.

[0110] In block 710, was adopted by the pointer location of the virtual loudspeakers. N�example, with reference to Fig. 8A, the user may use the user input device to position the cursor 510 in the position of the virtual loudspeaker a and for selecting this location, for example, click. In block 715 is determined (e.g., in accordance with user input), which in this example will select additional virtual speakers. The process returns to block 710, and, in this example, the user selects the position of the virtual loudspeaker a shown in Fig. 8A.

[0111] In this case, the user wants to create only two locations of the virtual speakers. Therefore, in block 715 is determined (e.g., in accordance with user input) that the additional virtual speakers will not be selected. As shown in Fig. 8A, can be expressed with a broken line 810 that connects the position of the virtual speakers a and 805b. In some implementations, the position of the audio object 505 is limited by the broken line 810. In some implementations, the position of the audio object 505 may be restricted to a parametric curve. For example, to define a parametric curve in accordance with the user input may be provided, a set of control points and an algorithm for selecting the best approximating curve, to�to a spline curve. In block 725, the accepted position of the sound object on a broken line 810. In some such implementations, the position will be specified as a scalar with values in the interval from zero to one. In block 725 can be displayed coordinates (x,y,z) of the sound object and broken line defined by the virtual speakers. Can display the audio data and associated metadata containing the received scalar position and the coordinates (x,y,z) of the virtual loudspeakers. (Block 727). Here, audio data and metadata may be sent to an instrumental means of presenting data through the corresponding communication Protocol in block 728.

[0112] In block 729 is determined whether to continue the process author. If not, then process 700 may end (block 730) or may continue operations represent data in accordance with user input. However, as noted above, in many implementations, at least some operations of data presentation can be performed in parallel with the operations of the author's development.

[0113] In block 732 the audio data and metadata are accepted instrumental method of data presentation. In block 735 for the position of each virtual loudspeaker computed gain factors to be applied to audioland�M. Fig. 8B shows the characteristics of the speakers for the position of the virtual loudspeaker a. Fig. 8C shows the characteristics of the speakers for the position of the virtual loudspeaker 805b. In this example, as in many other examples described in this disclosure, the speaker specifications submitted for reproducing loudspeakers that have locations corresponding to the locations shown, the GUI 400 for zones of speakers. Here, the virtual loudspeakers a, 805b and line 810 were located in a plane that is not near a reproducing loudspeakers that have a location corresponding to the zones 8 and 9 speakers. Therefore, in Fig. 8B and 8C gain for these speakers is not specified.

[0114] When the user moves the audio object 505 in other positions on the line 810, the logic system will calculate a smooth transition, which is consistent with those provisions (block 740), for example, in accordance with a scalar parameter position of the sound object. In some implementations, the blending between the gain factors to be applied to the audio data for the position of the virtual loudspeaker a, and the gain to be applied to the audio data for the positi�Oia virtual loudspeaker 805b, can apply the law of pairwise panning (e.g., sine or exponential law of conservation of energy).

[0115] Then, in block 742 may be determined (e.g., according to user input) whether to continue the process 700. For example, the user may be provided (e.g., via GUI) the possibility of continuing operations data presentation or return to the transactions of the author's development. If it is determined that the process 700 will not continue, the process ends (block 745).

[0116] When panning fast moving sound objects (e.g., audio objects, which correspond to the automobiles, jet planes, etc.), authoring a smooth trajectory can be difficult if the position of sound objects at a time selected by the user at one point. Lack of smoothness in the trajectory of the sound object can influence the perception of the sound image. Accordingly, some implementations of the author's development presented in this disclosure, apply the filter bandwidth lower frequency to the position of the sound object in order to smooth the resulting coefficients of the pan. An alternative implementation author used the filter bandwidth lower frequency to the gain applied to the Audi�data.

[0117] Other implementations author can allow the user to simulate the capture, popping, dropping, or similar interaction with sound objects. Some such implementations may include the use of such model physical laws, as the rule sets used to describe velocity, acceleration, momentum, kinetic energy, force, etc.

[0118] In Fig. 9A-9C show examples of the use of virtual bindings to drag the sound object. Fig. 9A virtual binding 905 was formed between the audio object 505 and the cursor 510. In this example, the virtual reference 905 has a virtual spring constant. In some such implementations, the virtual spring constant can be selected in accordance with user input.

[0119] Fig. 9B shows the audio object 505 and the cursor 510 in subsequent point in time after which the user has moved the cursor 510 in the direction of the zone 3 speakers. The user can move the cursor 510 using a mouse, joystick, trackball, the device for gesture recognition or user input device of another type. Virtual binding 905 was stretched, and audio object 505 has been moved close to the 8 zone speakers. The sound object in Fig. 9A and 9B has�to be about the same size, indicating (in this example) that the exaltation of the audio object 505 has not changed.

[0120] Fig. 9C shows the audio object 505 and the cursor 510 at a later point of time after which the user has moved the cursor to zone 9 speakers. Virtual binding 905 stretched even more. Audio object 505 is moved downward, as indicated by the decrease in the size of the audio object 505. Audio object 505 has been moved in a smooth arc. This example illustrates one of the potential benefits of these implementations, which is that the audio object 505 can move in a smoother trajectory than if the user selected position of the audio object 505 on several points.

[0121] Fig. 10A is a flow chart that describes the process of using virtual bindings to move the sound object. The process 1000 begins at block 1005 where the accepted audio data. In block 1007, the accepted pointer to attach virtual binding between the sound object and the cursor. The pointer may be a logical system device copyright design and can match the input received from the user input device. With reference to Fig. 9A, for example, the user may place the cursor 510 over the audio object 505, and �ATEM specify, via the user input device, or a GUI what virtual binding 905 should be formed between the cursor 510 and the audio object 505. Data can be taken of the position of the cursor and the object. (Block 1010).

[0122] In this example, the data speed and/or acceleration of the cursor can be computed by a logical system in accordance with the data of the cursor position as you move the cursor 510. (Block 1015). The details of the position and/or trajectory data of the audio object 505 can be calculated according to the virtual spring constant of the virtual binding 905 and the cursor position, data velocity, and acceleration. Some such implementations may include assigning the audio object 505 virtual mass. (Block 1020). For example, if the cursor 510 is moved with a relatively constant speed, virtual binding 905 may not be tense, and audio object 505 may drag at a relatively constant speed. If the cursor 510 is accelerated, virtual binding 905 can stretch, and virtual reference 905 may be attached to a sound object corresponding force. Between the acceleration of the cursor 510 and the application of force virtual reference 905 can occur the time lag. In alternative implementations, the position and/or trajectory of the audio object 505 may be determined differently, for example, without ascribing virtual reference virtual 905 spring�Oh constant, by applying to the audio object 505 of the laws of friction and/or inertia, etc.

[0123] the Discrete position and/or trajectory of the audio object 505 and the cursor 510 may be displayed (block 1025). In this example, the logical system discretize the position of the sound object in the time interval (block 1030). In some such implementations, the user can define the time interval for sampling. The metadata location of the sound object and/or trajectory, etc. may be stored (Block 1034).

[0124] In block 1036 is determined whether or not to continue this mode author. If the user expresses such a desire, the process may continue, for example, returning to the block or block 1005 1010. Otherwise, the process 1000 may end (block 1040).

[0125] Fig. 10B is a flow chart that describes an alternative process of using virtual bindings to move the sound object. Fig. 10C-10E show examples of the process described in Fig. 10B. Referring primarily to Fig. 10B, the process 1050 begins at block 1055 where the accepted audio data. In block 1057 accepted a pointer to attach virtual binding between the sound object and the cursor. The pointer may be a logical system device copyright design and can match the input, �prinimaemogo from the user input device. With reference to Fig. 10C, for example, the user may place the cursor 510 over the audio object 505, and then indicate through the user input device, or a GUI that between the cursor 510 and the audio object 505 should form a virtual binding 905.

[0126] the Data of the cursor position and the sound object can be taken in a block 1060. In block 1062 logical system can receive (e.g., via user input device or GUI) indicating that the audio object 505 must be held in a specified position, for example, in the position indicated by the cursor 510. In block 1065, the logical unit accepts a pointer to the fact that the cursor 510 is moved to a new position, which can be shown along with the position of the audio object 505 (block 1067). For example, with reference to Fig. 10D, the cursor 510 is moved from the left side reproducing virtual environment 404 to the right side. However, the sound object 510 was still being held in the same position as indicated in Fig. 10S. As a result, virtual binding 905 was severely stretched.

[0127] In block 1069 logical system receives (e.g., via user input device or GUI) indicating that the audio object 505 is subject to release. A logical system can calculate the data of the result of the provisions of Zvukovaya and/or trajectory which can be displayed (block 1075). To demonstrate the results may be similar demonstrations shown in Fig. 10E, which shows the audio object 505, smoothly and quickly moving through reproducing the virtual environment 404. A logical system can save the metadata location and/or trajectory in the system memory (block 1080).

[0128] At block 1085 is determined whether to continue the process 1050 author. The process can continue, if the logical system receives the indication that the user wishes to do so. For example, the process 1050 may continue by returning to block or block 1055 1060. Otherwise, a tool author may send the audio data and metadata tool data representation (block 1090), then the process 1050 may end (block 1095).

[0129] To optimize the likelihood in the perception of the movement of the sound object, you may need to provide the user with a tool author (or instrumental means of presentation) ability to select on the playback environment is a subset of the speakers and, thus, to limit the set of active loudspeakers selected subset. In some implementations, the zone speakers and/or groups of zones g�of mahavoetava during the operation of copyright in the development or presentation of data may be designated as active or inactive. For example, with reference to Fig. 4A, the zone speakers in the front region 405, the left pane 410, the right pane 415 and/or the upper area 420 can be controlled as a group. Zone speakers in the rear region, which includes zones 6 and 7 loudspeakers (and, in other implementations, one or more other zones of loudspeakers located between zones 6 and 7 loudspeakers) can also be operated as a group. May provide to the user interface dynamically unblocked or blocked all of the speakers that correspond to that particular area of the speakers or the field that contains multiple zones of speakers.

[0130] In some implementations, the logic device system author (or device data representation) can be configured to create metadata restrictions zones of loudspeakers in accordance with user input received via the user input system. Metadata restrictions zones of loudspeakers may contain data to block selected areas of the speakers. Some such implementations will be described below with reference to Fig. 11 and 12.

[0131] In Fig. 11 shows an example of the use of the restriction zones of loudspeakers reproducing in the virtual environment. In some such implementations, the user may be able to choose from the speakers by clicking on their submissions in this GUI as GUI 400 using the user input device like a mouse. In this case, the user has blocked zones 4 and 5 of the speakers on the sides of reproducing virtual environment 404. Zones 4 and 5 speakers can fit most or all speakers in reproducing the physical environment, as the environment of the sound system for film. In this example, the user is also restricted position of the audio object 505 regulations on line 1105. If the majority of speakers, or all speakers on the side walls blocked the pan from the screen 150 to the rear of reproducing virtual environment 404 could be limited to non-use of the side speakers. This can create for a broad viewing area, especially for viewers who are seated near reproducing loudspeakers corresponding to zones 4 and 5 loudspeakers, improved perception of movement from front to rear.

[0132] In some implementations, restrictions zones of loudspeakers to be made by all modes of data presentation. For example, limiting the areas of loudspeakers may be effected by means of situations in which to represent the data is available for fewer zones, for example, when reporting d�nnyh configuration for Dolby Surround 7.1 or 5.1 exhibited only 5 or 7 zones. Restrictions zones of loudspeakers can also be by means of a situation in which to represent data more available zones. Essentially, the restrictions of zones of loudspeakers can also be viewed as a method of controlling the change of data representation that provides a non-blind solution for the traditional process of "step-up/step-down mixing".

[0133] Fig. 12 is a flow chart that describes some examples of applications of the rules restrictions zones of speakers. The process 1200 begins at block 1205, where is one or more pointers to apply rules limiting zones of speakers. A pointer (pointers) can be a logical system device copyright design or presentation, and can match the input taken from the user input device. For example, pointers may correspond to the selection by the user to deactivate one or more zones of speakers. In some implementations, block 1205 may include receiving indication of what type of limits zones of loudspeakers should be used, as described below.

[0134] In block 1207 tool author accepted audio data. For example, in a suitable�accordance with the user input tool author, may be taken (block 1210) and displayed (block 1215) data position of the sound object. In this example, these provisions represent the coordinates (x,y,z). Here, the active and inactive zones of loudspeakers for the selected limits of the zones of loudspeakers are also displayed in block 1215. In block 1220 audio data and associated metadata are stored. In this example, the metadata includes metadata position of the sound object and metadata restrictions zones of loudspeakers, which may include the flag of the identifier areas of the speakers.

[0135] In some implementations, the metadata limiting zones of loudspeakers can specify that tool data reporting shall apply the equation panning to calculate gains using binary image, for example, considering all the speakers in the selected (locked) areas of loudspeakers as being "disabled" and all other zones of the speakers as being "inclusive". Logical system which can be configured to create metadata restrictions zones of loudspeakers that contain data to block selected areas of the speakers.

[0136] In alternative implementations, the metadata limiting zones of loudspeakers can specify that instruments�al settings a means of presenting the data would be to use the equation panning to calculate the gain of a mixed method, which to some extent includes contributions from speakers from blocked zones of speakers. For example, a logical system can be configured to create metadata restrictions zones of speakers, indicating that the instrumental means of presenting data should weaken the selected zone speaker by performing the following operations: calculating a first gain, which contain contributions from the selected (locked) areas of loudspeakers; calculating the second gain values that do not contain contributions from selected zones of loudspeakers; and mixing the first gain of the second amplification coefficients. In some implementations, the first gain and/or to the second gain may be offset (e.g., from a selected minimum value to a selected maximum value) to allow for some interval of potential contributions from selected zones of speakers.

[0137] In this example, a tool author sends audio data and metadata tool data representation in block 1225. A logical system can then determine whether to continue the process author (block 1227). Process author can continue�atsya, if the logical system takes a pointer that the user wishes to do so. Otherwise, the process author may end (block 1229). In some implementations, the operations of data presentation can be continued in accordance with user input.

[0138] the Audio object containing audio data and metadata created by the tool author, accepted an instrumental means of presenting data in a block 1230. These provisions for a particular sound object in this example are taken in block 1235. Logical system tool data representation can be applied to the position data of the sound object equation panning for calculating the gain in accordance with the rules limiting zones of speakers.

[0139] In block 1245 computed gain factors are applied to the audio data. A logical system can store in the system memory gain, the metadata location of the sound object and limits of zones of speakers. In some implementations, the audio data can be reproduced by the speaker system. Relevant characteristics of the loudspeakers in some implementations, can be displayed on the display.

[0140] In block 1248 is determined, will be �and continue the process 1200. The process can continue, if the logical system receives an indication that the user wishes to do so. For example, the reporting process may continue by returning to block or block 1230 1235. If it is a pointer that the user wishes to return to the appropriate process author, the process may return to block or block 1207 1210. Otherwise, the process 1200 may end (block 1250).

[0141] the task of determining the position and presentation of the audio objects in reproducing three-dimensional virtual environments are becoming more and more complex. Part of the difficulty relates to the problems presented by reproducing virtual environment in the GUI. Some implementations of the author to the development and presentation of the data presented in this disclosure allow a user to switch between panning in a two-dimensional screen space and panning in a three dimensional space. This functionality can help to maintain the accuracy of determining the position of the audio object and, at the same time, to provide a convenient GUI for the user.

[0142] Fig. 13A and 13C shows an example GUI that can be switched between two-dimensional and three-dimensional view of reproducing virtual environment. With reference to Fig. 13A, the GUI 400 from�Breguet image 1305 on the screen. In this example, the image 1305 is an image of a saber-toothed tiger. In this top view to reproducing the virtual environment 404, the user can easily observe that the audio object 505 is within the zone 1 speakers. Conclusion about the rise of can be done, for example, by size, color or other characteristics of the audio object 505. However, in this form it is difficult to determine the relationship of the position to that of the image 1305.

[0143] In this example, GUI 400 may be dynamically pivots about this axis as the axis 1310. Fig. 13C shows the GUI 1300 after the rotation process. In this view, the user can more clearly see the image 1305 and may use information from an image 1305 to more accurately determine the position of the audio object 505. In this example, the audio object corresponds to the sound in the direction looking saber-toothed tiger. Having the ability to switch between top view and a display view of reproducing virtual environment 404, the user has the ability to quickly and accurately choose the proper elevation for the audio object 505, using the information from the material which is on the screen.

[0144] In this disclosure provides for other convenient GUI interfaces for upstream development and/or presentation of data. Fig. 13C-13E shown �of Oceania two-dimensional and three-dimensional image reproducing environments. With reference to Fig. 13C, in the left pane of the GUI 1310 shows a top view of reproducing the virtual environment 404. GUI 1310 also includes a three-dimensional image 1345 virtual (or actual) reproducing environment. Area 1350 three-dimensional image 1345 corresponds to the screen 150 of the GUI 400. The position of the audio object 505, in particular its elevation is clearly visible in the three-dimensional image 1345. In this example, on a three-dimensional image is also shown 1345 width of the audio object 505.

[0145] Scheme 1320 speaker placement depicts the location 1324-1340 loudspeakers, each of which may specify the gain corresponding to the position of the audio object 505 reproducing in the virtual environment 404. In some implementations, the circuit 1320 speaker placement may, for example, to display the location of the reproducing loudspeakers in such a reproducing actual environment configuration as Dolby Surround 5.1, configuration Dolby Surround 7.1, configuration Dolby 7.1, supplemented with top speakers, etc. When a logical system adopts the position of the audio object 505 reproducing in the virtual environment 404, a logic system may be configured to assign this position to gain for locations 1324-1340 speakers schemes 1320 location �of removevariable, for example, using the above-described process of amplitude panning. For example, in Fig. 13C of each of the locations 1325, 1335 and 1337 speaker has a color change that indicates the gain corresponding to the position of the audio object 505.

[0146] With reference to Fig. 13D, the sound object has been moved into position behind the screen 150. For example, the user can move the audio object 505, placing the cursor on the sound object in the GUI 400 and dragging it to a new position. The new position also shown in three dimensions 1345, which was rotated to a new orientation. The characteristics of the circuit 1320 speaker placement can be the same as in Fig. 13C and 13D. However, in the actual GUI, location, 1325, 1335, and 1337 speakers may have a different appearance (e.g., a different brightness or color) to indicate relevant differences gain caused by the new position of the audio object 505.

[0147] With reference to Fig. 13E, the audio object 505 was quickly moved into position to the right rear of reproducing virtual environment 404. At the moment depicted in Fig. 13E, location 1326 speaker is responding to the current position of the audio object 505, and location 1337 1325 and the speakers continue to respond to the previous situation sounds�th object 505.

[0148] Fig. 14A is a flow chart that describes a process management device for presenting such interfaces GUI interfaces as shown in Fig. 13C-13E. The process 1400 begins at block 1405, which takes one or more pointers to demonstrate the location of the sound object, locations of zones of speakers, and locations reproducing speaker for reproducing environment. The location of zones of loudspeakers can match reproducing virtual environment and/or reproducing actual environment, for example, as shown in Fig. 13C-13E. A pointer (pointers) can be a logical system device data and/or author and can match the input taken from the user input device. For example, pointers may correspond to a user selection of the configuration of the playback environment.

[0149] In block 1407 accepted audio data. The details of the position and width of the sound object are accepted in block 1410, for example, in accordance with user input. In block 1415 demonstrates the sound object, the location of zones of loudspeakers reproducing and location of the speakers. The position of the sound object can be shown Dume�nom and/or three-dimensional form, for example, as shown in Fig. 13C-13E. The data width can be used not only to represent the data of the sound object, but can also influence the way in which demonstrates the sound object (see image of the audio object 505 in three dimensions 1345 of Fig. 13C-13E).

[0150] the Audio data and associated metadata may be recorded. (Block 1420). In block 1425 tool author sends audio data and metadata tool data representation. Then logic system may determine (at block 1427), whether to continue the process author. Process author can continue (for example, returning to block 1405), if the logical system takes a pointer that the user wishes to do so. Otherwise, the process author may terminate. (Block 1429).

[0151] the Audio object comprising audio data and metadata created by the tool author, accepted an instrumental means of presenting data in a block 1430. These provisions for a particular sound object in this example are taken in block 1435. Logical system tool author can apply the equations of pan for the purpose of calculating the coefficients of Wuxi�need for position data of the sound object in accordance with the metadata-width.

[0152] In some implementations, data representation, logical system can assign zones of loudspeakers reproducing loudspeakers reproducing environment. For example, a logical system can access the data structure that contains zones of speakers and their locations reproducing loudspeakers. More details and examples are described below with reference to Fig. 14V.

[0153] In some implementations, the equations of pan can be used, for example, a logical system in accordance with the regulations, of a width of the sound object and/or other information such as the location of the loudspeakers reproducing environment (block 1440). In block 1445 audio data is processed in accordance with the gain, which are obtained in block 1440. If such a desire is expressed, at least some of the resulting audio data can be stored along with the corresponding data position of the sound object and other metadata taken from the tool's author. Audio data can be played back by the loudspeakers.

[0154] Then, the logic system may determine (block 1448), whether to continue the process 1400. Process 1400 may continue if, for example, the logical system takes a pointer that the user wishes to do so. Otherwise, process 1400 may end (block 1449).

[0155] Fig. 14C is a flow chart that describes the process of data representation of sound objects for the playback environment. Process 1450 begins in block 1455, which takes one or more pointers to data representation of sound objects for the playback environment. A pointer (pointers) can be a logical system device data and may correspond to the input received from the user input device. For example, pointers may correspond to a user selection of the configuration of the playback environment.

[0156] In block 1457 accepted audio data (including one or more audio objects and associated metadata). The data reproducing medium may be taken in block 1460. The data reproducing medium may contain a pointer to the number of playback speakers on the playback environment and the pointer location of each of the reproducing loudspeakers within the playback environment. Reproducing medium may be a medium sound systems for cinema, home theatre environment, etc. In some implementations, the data reproducing medium can contain data layout �he reproducing loudspeakers, indicate areas reproducing loudspeaker, reproducing and location of speakers that match the specified zones of speakers.

[0157] a Reproducing medium can be shown in block 1465. In some implementations, a playback environment can be shown similarly to the circuit 1320 speaker placement shown in Fig. 13C-13E.

[0158] In block 1470 sound objects can be presented in one or more of the signal applied to the loudspeaker for reproducing environment. In some implementations, the metadata associated with the sound objects can autorski be developed in the same way as the method described above, and thus, the metadata can include data of gains corresponding to the zones of loudspeakers (e.g., corresponding to zones 1-9 the speakers in the GUI 400). A logical system can assign zones of loudspeakers reproducing loudspeakers reproducing environment. For example, a logical system can access data stored in the memory data structure, which contains patches of speakers and the corresponding location of the speakers. The data reporting device can contain many such data structures, each corresponding to a different configuration�tion of loudspeakers. In some implementations, the data reporting device can contain such data structures for many standard configurations reproducing media such as configuration Dolby Surround 5.1, configuration Dolby Surround 7.1 and/or configuration ambient sound Hamasaki 22.2.

[0159] In some implementations, the metadata for audio objects can contain other information from the process author. For example, the metadata can include data limitations of the speakers. Metadata can contain information to assign the position of the sound object location unit reproducing loudspeaker or a single location zone speakers. The metadata can include data that limits the position of the sound object one-dimensional two-dimensional curve or surface. The metadata can include data of the trajectory of the sound object. The metadata may include an identifier for the type of content (e.g., dialogue, music or effects).

[0160] Accordingly, the reporting process may include the use of metadata, for example, to limit the zone speakers. In some such implementations, the data reporting device can provide the user the ability to modify the limitations specified in metadata, for example, by appropriate modification of the limitations of the speakers and change the data view. The data representation may include the creation of a total gain based on one or more of the following parameters: the desired position of the sound object, the distance from the desired position of the sound object to its original position, the speed of the sound object or the content type of the sound object. Can demonstrate the relevant characteristics of the reproducing loudspeakers. (Block 1475). In some implementations, the logic system can control the loudspeakers to reproduce sound corresponding to the results reporting process.

[0161] In block 1480 logical system can determine whether to continue the process 1450. Process 1450 may continue if, for example, the logical system takes a pointer that the user wishes to do so. For example, the process 1450 may continue back to block 1457 or block 1460. Otherwise, the process 1450 may end (block 1485).

[0162] the Characteristic features of some existing systems author/submission data ambient sound are the distribution and management of apparent source width. In this disclosure, the term "distribution" refers to distribute the same signal to multiple speakers with the aim of blurring of the sound image. �Ermin "width" refers to the decorrelation of the output signals for each channel to regulate its apparent width. Width can be an additional scalar value which regulates the amount of decorrelation that is applied to each signal applied to the speakers.

[0163] Some implementations described in this disclosure provide for adjustment of the spread focused on a three dimensional axis. One such implementation will be described below with reference to Fig. 15A and 15B. Fig. 15A shows one example of the sound object and the associated width of the sound reproducing the object in a virtual environment. Here, the GUI 400 specifies the ellipsoid 1505 passing around of the audio object 505 and specifies the width of the sound object. The width of the sound object may be indicated by metadata of the audio object and/or to be taken in accordance with user input. In this example, the dimensions x and y axes of the ellipsoid 1505 differ, but in other implementations these dimensions can be the same. The size z of the ellipsoid 1505 in Fig. 15A is not shown.

[0164] Fig. 15B shows one example of a distribution profile, which corresponds to the width of the sound object, as shown in Fig. 15A. The distribution can be represented by three-dimensional vector parameter. In this example, the profile 1507 distribution can be adjusted independently in three directions, for example, in accordance with user input. Ratio, low gas consumpt�patients under stood amplification along the axes x and y shown in Fig. 15V appropriate height curves 1510 and 1520. The gain for each discrete value 1512 also indicates the size of the respective circles 1515 within the profile 1507 distribution. Speaker specifications 1510 are indicated in Fig. 15V gray shading.

[0165] In some implementations, the profile 1507 distribution for each axis can be implemented through a separate integral. According to some implementations, the minimum value of the distribution can be selected automatically depending on the placement of loudspeakers to avoid timbre mismatches when panning. Alternatively or in addition, the minimum value of the distribution can be set automatically depending on the speed of sound panned object so that with increasing speed of the sound object, the object became more extended in space the same way as blurred fast moving images in the movie.

[0166] When using such implementations, the audio data reporting on the basis of the sound object, as the implementation described in this disclosure, a potentially large number of audio tracks and accompanying metadata (including as non-limiting examples, the metadata indicating a co�incorporate the provisions of the sound object in three-dimensional space) can be delivered in reproducing the environment in remiksirovana. Instrumental means of presenting data in real time can use the metadata and information relating to the playback environment to calculate the signal applied to the speakers to optimize the playback of each sound object.

[0167] When a large number of audio objects are mixed together in the output signals of the loudspeakers, you may overload or in the digital domain (for example, the peak of the digital signal may be cut before analog conversion), or in the analog domain, when the amplified analog signal is played reproducing loudspeakers. Both cases can result in an audible distortion, which is undesirable. Overload in the analog domain can also damage reproducing loudspeakers.

[0168] Accordingly, some implementations described in this disclosure, include dynamic "redistribution" of objects in response to the overload of the reproducing loudspeakers. If the sound objects are represented with a given distribution profile, in some implementations, the energy can be diverted to an increased number of neighboring reproducing loudspeakers with keeping the total constant energy. For example, if the energy for the sound object uniformly distrib�delelis N reproducing loudspeakers, it can contribute to the output signal of each reproducing loudspeaker with a gain of 1/sqrt(N). This approach provides for additional mixing "margin level" and can reduce or prevent the distortion of the reproducing loudspeakers, like the cutting of a peak.

[0169] To use a numerical example, assume that the speaker is cut off peak if it accepts input greater than 1.0. Suppose that two objects, as indicated, should be mixed into the speaker And one at the level of 1.0, and the second is at 0.25. If redistribution is not used, a mixing level at the loudspeaker And would be in the amount of 1.25, and would cut peak. However, if the first object is transferred to another speaker, then (according to some implementations) each speaker will take the specified object when 0,707 that results in additional "stock level" in the loudspeaker And for mixing other objects. Then the second object can safely be mixed into the speaker without removing the peak, as the mixing level for the loudspeaker And will be 0,707+0,25=0,957.

[0170] In some implementations, the phase of the author's development of each sound object can be mixed in one of the subsets �he loudspeakers (or in all zones of speakers) with a predetermined gain for mixing. Thus, it is possible to build a dynamic list of all the objects contributing to each speaker. In some implementations, the list may be sorted in descending order of energy levels, for example, using the result of multiplying the initial root-mean-square (RMS) of the signal level on the gain when mixing. In other implementations, the list may be sorted by other criteria, such as assigned to an object relative importance.

[0171] during the reporting process, if the output signal reproducing speaker an overload is detected, the energy of the sound objects may be distributed across multiple reproducing loudspeakers. For example, the energy of sound objects can be distributed using the ratio of the width, or spread, which is proportional to the amount of overload and the relative contribution of each sound object in the reproducing loudspeaker. If the same audio object contributes to several overloaded reproducing loudspeakers, the ratio of the width or spread, can, in some implementations, the additive increase and be applied to present the next frame of audio data.

[0172] Typically, the hard limiter threshold will be cut off Liu�th value which exceeds the threshold value to the threshold value. As in the above example, if the speaker accepts the mixed object at $ 1.25 and is able to allow the maximum level of 1.0, the object will be "strictly limited" to 1.0. A limiter with a smooth threshold will initiate the restriction before reaching the absolute threshold in order to produce smooth, pleasant to the ear the result. Limiters with a smooth threshold can also use the functionality of the "foresight" to predict when in the future it may be cutting peak, with the aim of gradual decline gain before cutting the peak should occur in order, therefore, to avoid cutting the peak.

[0173] Various implementations of "redistribution" in this disclosure may be used in combination with a limiter with a hard or soft threshold to limit audible distortions, at the same time, avoiding the deterioration of the spatial accuracy/sharpness. In contrast to the global distribution to, or use of limiters themselves, the implementation of redistribution can selectively target the loud objects or objects with a given type of content. Such implementations may be controlled by the mixer. For example, the EU�and metadata restrictions zones of loudspeakers for sound object point, that some subset of the reproducing loudspeakers should not be used, the data reporting device can in addition to the implementation of the method of redistribution to apply the rules of limits of zones of speakers.

[0174] Fig. 16 is a flow chart that describes the process of redistribution of the sound objects. The process 1600 begins at block 1605 where they accept one or more pointers to activate the functionality of the redistribution of the sound objects. A pointer (pointers) can be a logical system device data and may correspond to the input taken from the user input device. In some implementations, the signs may include user selection of the configuration of the playback environment. In alternative implementations, the user can pre-select the configuration of the playback environment.

[0175] In block 1607 accepted audio data (contains one or more audio objects and associated metadata). In some implementations, the metadata can include metadata restrictions zones of loudspeakers, such as described above. In this example, in block 1610 in audio data are found (or, otherwise, are accepted,�reamer, by entering via the user interface) data position, time and distribution of the sound object.

[0176] the characteristics of the reproducing loudspeakers are determined for a particular configuration reproduces the environment by applying equations panning to the data of the sound object, for example, as described above (block 1612). In block 1615 shows the position of the sound object and the characteristics of the reproducing loudspeakers. The characteristics of the reproducing loudspeakers can also be played through the speakers which are configured for communication with a logical system.

[0177] In block 1620 logic system determines, is there an overload for any reproducing speaker playback environment. If found, until, until you no longer detected overload, may apply the above rules for the redistribution of objects (block 1625). The output of audio data in block 1630 may optionally be stored and displayed on the reproducing loudspeakers.

[0178] In block 1635 logical system can determine whether to continue the process 1600. Process 1600 may continue if, for example, the logical system takes a pointer that the user wishes to do so. For example, the process� 1600 may continue, returning to block or block 1607 1610. Otherwise, the process 1600 may end (block 1640).

[0179] Some implementations provide for the predetermined equation gain when you pan that can be used to image the position of the sound object in three-dimensional space. Some examples will be described below with reference to Fig. 17A and 17B. Fig. 17A and 17B show examples of the sound object placed in reproducing the three-dimensional virtual environment. First, with reference to Fig. 17A, you can see the position of the audio object 505 reproducing in the virtual environment 404. In this example, zones 1-7 loudspeakers are located in one plane, and the zones 8 and 9 loudspeakers are located in a different plane, as shown in Fig. 17V. However, the zone numbers of speakers, planes, etc. are given only for example; the concepts described in this disclosure can be extended to other numbers of zones of speakers (or speakers) and more than two planes of elevation.

[0180] In this example, the parameter elevation "z", which may be in the range of from zero to 1, assigns the position of the sound object planes of elevation. In this example, the value z=0 corresponds to the basal plane, which contains zones 1-7 of speakers, while the value z=1 with�corresponds to the upper plane, which contains zones 8 and 9 speakers. The values z between zero and 1 correspond to the mixing between the sound image generated using only speakers in the basal plane, and a sound image generated using only speakers in the upper plane.

[0181] In the example shown in Fig. 17B, setting the elevation for the audio object 505 has a value of 0.6. Accordingly, in one implementations, the first sound image can be generated using the equations of the pan to the basal plane in accordance with coordinates (x,y) of the audio object 505 in the basal plane. The second sound image can be generated using the equations of the pan to the upper plane in accordance with coordinates (x,y) of the audio object 505 in the upper plane. The resulting sound image can be generated by combining the first sound image with a second sound manner in accordance with the proximity of the audio object 505 to each of the planes. Can be used function of elevation z, conserve energy or amplitude. For example, assuming that z can be in the interval from zero to one, the values of the gain factors of the first sound image can be multiplied byCos(z* π/2)and the values of the amplification coefficients of the second sound image can be multiplied bysin(z*π/2)so that the sum of their squares was equal to 1 (conservation of energy).

[0182] Other implementations described in this disclosure, can include a calculation of gain on the basis of two or more methods of panning and creating a total gain based on one or more parameters. These options can include one or more of the following parameters: the desired position of the sound object; the distance from the desired position of the sound object to its original position; the speed of the sound object; or the content type of the sound object.

[0183] Some of these implementations will be described below with reference to Fig. 18 et seq. Fig. 18 shows examples of areas that correspond to different panning modes. The size, shape and size of these zones are given only as an example. In this example, the audio objects located within the area 1805 ways to apply pan in the near zone, and to sound objects located in the zone za of 1810, apply methods of panning in the far zone.

[0184] Fig. 19A-19D show examples of application of methods of panning in the near zone and far zone to the audio objects in different locations. First, with reference to Fig. 19A, the sound object is essentially beyond reproducing virtual environment 1900. This location corresponds to the area in 1815 Fig. 18. Therefore, in this case will apply one or several ways to pan in the far zone. In some implementations, methods of panning in the far zone can be based on equations of amplitude panning vector-based (NORMAL), known medium-sized specialists in this field. For example, methods of panning in the far zone can be based on the NORMAL equations described in section 2.3, page 4 publications V. Pulkki, Compensating Displacement of Amplitude-Panned Virtual Sources (AES International Conference on Virtual, Synthetic and Entertainment Audio), which by reference is included in this disclosure. In alternative implementations, for panning sound objects in the near zone and far zone can be used other ways, for example, methods that include the use of appropriate acoustic synthesis of plane or spherical waves. Relevant methods are described in the monograph by D. de Vries, Wave Field Synthesis (AES Monograph 1999), which by reference is included in �data disclosure.

[0185] With reference to Fig. 19B, the sound object is reproducing inside a virtual environment 1900. Its location corresponds to the area in 1805 of Fig. 18. Therefore, in this case will apply one or several ways to pan in the near zone. Some of these methods of panning in the near zone will use multiple zones of speakers, and captures the audio object 505 reproducing in the virtual environment 1900.

[0186] In some implementations, a method of panning in the near zone may include panning "double balance" and the Union of the two sets of gain. In the example shown in Fig. 19B, the first set of gain factors corresponds to the front/rear balance between the two sets of zones of loudspeakers comprises the position of the audio object 505 in the y-axis. Relevant characteristics include all the zones of the virtual loudspeakers reproducing environment 1900 (excluding zones 1915 and 1960 loudspeakers.

[0187] In the example shown in Fig. 19C, a second set of gain factors corresponds to the left/right balance between the two sets of zones of loudspeakers comprises the position of the audio object 505 in the x-axis. Relevant characteristics include areas 1905-1925 speakers. Fig. 19D specified R�the result of combining the characteristics above in Fig. 19 and 19C.

[0188] as the audio object included in a virtual playing environment 1900 or leaves may be required mixing different panning modes. Accordingly, for sound objects located in the zone of 1810 (see Fig. 18), may be used a mixture of gains calculated according to the methods of panning in the near zone and methods for panning in the far zone. In some implementations, for mixing gain, calculated according to the methods of panning in the near zone and methods for panning in the far zone can be used paired pan law (e.g., sine or exponential law of conservation of energy). In alternative implementations, paired pan law can be preserving the amplitude, not energy-conserving, so that instead of the sum of the squares of the unit was equal to the sum. You can also mix the resulting processed signals, for example, to process an audio signal with independent use of both pan and smooth transition between the two resulting audio signals.

[0189] you May need to include a mechanism that allows the content Creator and/or vosproizvoditel content can be fine adjustment of various IZMENENIY� data view for a given author's trajectory. In the context of mixing films, is considered an important concept of energy balance between the screen and the room. In some cases, the automatic change of data representation for a given path of the sound (or "pan") will result in a different balance between the screen and the room, depending on the number of playback speakers on the playback environment. According to some implementations, the offset between the screen and the room can be adjusted in accordance with the metadata created during the process author. According to alternative implementations, the offset between the screen and the room can be regulated exclusively by the data representation (i.e. running vosproizvoditel content) and not in response to the metadata.

[0190] Accordingly, some implementations described in this disclosure include one or more forms of management of the offset between the screen and the room. In some such implementations, the offset between the screen and the room can be implemented as a scaling operation. For example, the scaling operation may include the original planned trajectory of the sound object in the direction from front to back and/or scaling of the provisions of the loudspeakers used in the data reporting device for ODA�division of gain when panning. In some such implementations, the management of the offset between the screen and the room can be a variable in the interval from zero to a maximum value (e.g., units). The change may be controlled, for example, GUI, virtual or physical slider, button, etc.

[0191] alternatively, or in addition, control of the offset between the screen and the space can be implemented using any form of restriction of the areas of the speakers. Fig. 20 shows the zones of loudspeakers reproducing medium which can be used in the management of the offset between the screen and the room. In this example, can be set to region 2005 front speakers and region 2010 (or 2015) the rear speakers. The offset between the screen and the room can be adjusted depending on selected areas of loudspeakers. In some such implementations, the offset between the screen and the room can be implemented as a scaling operation region between 2005 front speakers and region 2010 (or 2015) the rear speakers. In alternative implementations, the offset between the screen and the room can be realized binary image, allowing the user to select the offset on the front, offset to the rear side or the absence �substituted. Offset settings for each event can correspond to a pre-defined (and usually non-zero) levels of displacement for the region 2005 the front speakers and region 2010 (or 2015) the rear speakers. Essentially, such implementations may include three pre-set to control the offset between the screen and the space instead of (or in addition to) scale continuous values.

[0192] According to some such implementations, a GUI (e.g., 400) for upstream development can be created two additional logical zones of loudspeakers by dividing side walls, the front side wall and rear side wall. In some implementations, two additional logical zones of loudspeakers correspond to the left wall/left surround and right wall/right surround device data representation. Depending on user's choice which of these two logical zones of the speakers are active, tool data representation can apply the preset scaling factors (e.g., described above) when submitting data in the configuration Dolby 5.1 or Dolby 7.1. Tool data representation can also apply those pre�preliminarily defined scaling factors in the reporting of data for reproducing media, supporting the definition of these two additional logical zones, for example, due to the fact that their physical configuration of the speakers do not contain more than one physical speaker on the side wall.

[0193] Fig. 21 is a block diagram that shows examples of components of the device of the author's development and/or presentation of data. In this example, the device 2100 includes a system 2105 interfaces. System 2105 interfaces may contain a network interface such as a wireless network interface. Alternatively, or in addition, the system 2105 interfaces may include an interface universal serial bus (USB) or other similar interface.

[0194] the Device 2100 includes a logical system 2110. Logical system 2110 may include a processor, such as one - or multi-chip General purpose processor. Logical system 2110 may include a processor for digital signal processing (DSP), problem-oriented integrated circuit (ASIC), field programmable gate arrays (FPGA) or other programmable logic device, scheme for discrete components or transistor logic, or discrete components of hardware, or a combination thereof. Logical system 2110 can be configured to control the d�ugih components of the device 2100. And although Fig. 21 does not show the interfaces between the components of the device 2100, a logical system 2110 can be configured with interfaces for communication with other components. If necessary, other components may be configured or may be configured to communicate with each other.

[0195] the Logical system 2110 can be configured to perform the functionality of a copyrighted sound development and/or presentation, including as non-limiting examples of the types of functionality the author's development of sound and/or data presentation described in this disclosure. In some such implementations, the logic system 2110 can be configured for the action (at least partially) in accordance with software stored in memory one or more permanent storage media. Permanent storage of data may include such associated with the logical system memory 2110, memory with random access (RAM) and/or read only memory (ROM). Permanent storage of data may include system memory 2115 memory. System 2115 memory may contain one or more persistent storage media types, such as flash memory, hard drive, magnetic disk, etc.

[0196] the Display system 2130 can�to keep the display one or more suitable types, depending on the existence of the device 2100. For example, the display system 2130 may include a liquid crystal display, plasma display, bistable display, etc.

[0197] the System 2135 user input may include one or more devices configured to receive input from the user. In some implementations, the system 2135 user input may include a touch screen, which is superimposed on the display of the display system 2130. System 2135 user input may include a mouse, a trackball, a system for detecting gestures, a joystick, one or more GUI interfaces and/or menus presented on the display system 2130, buttons, keyboard, switches, etc. In some implementations, the system 2135 user input may include a microphone 2125: user can give the device 2100 voice commands through the microphone 2125. Logical system which can be configured for speech recognition and for controlling at least some operations of the device 2100 in accordance with these voice commands.

[0198] the System 2140 food may contain one or more suitable batteries such as Nickel-cadmium battery or a lithium-ion battery. System 2140 power can be configured to receive energy from the electrical outlet.

[0199] Fig. 22A is a block diagram, �which has some components which can be used to create audio content. For example, the system 2200 can be used to create audio content in mixing studios and/or Assembly halls. In this example, the system 2200 includes a tool 2205 author audio and metadata tool 2210 views of data. In this implementation, a tool 2205 author audio and metadata tool 2210 views contain data, respectively, interfaces 2207 2212 and sound connection, which can be configured to communicate via AES/EBU, MADI, analog communication, etc tool 2205 author audio and metadata tool 2210 data views contain, respectively, the network interfaces 2209 2217 and that can be configured for sending and receiving metadata using the TCP/IP Protocol or any other suitable Protocol. Interface 2220 is configured to output audio data to the loudspeakers.

[0200] the System 2200 may, for example, contain such existing system author's development as a Pro Tools system™, which launches a tool for generating metadata (i.e. described in this disclosure tool pan) in to�quality software the. Tool pan also can be run in an Autonomous system (e.g. PC or the console) connected to tool 2210 representations of data, or can be run on the same physical device as a tool 2210 views of data. In the latter case, panning tools and data reporting can use a local connection, e.g., via shared memory. GUI tool pan can also be remote on a tablet, laptop, etc. tool 2210 views of data can include data reporting system, which comprises a device for sound processing, which is configured to run software data reporting. The reporting system may include, for example, a personal computer, laptop, etc. that contains the interfaces for the input/output device and the corresponding logical system.

[0201] Fig. 22B is a block diagram that shows some of the components that can be used to play the sound on the playback environment (e.g. in cinema). In this example, the system contains 2250 server 2255 cinema system and 2260 views d�nnyh. The server 2255 cinema system and 2260 views contain data network interfaces 2257 and 2262, respectively, which can be configured to send and receive sound objects through TCP/IP or any other suitable Protocol. Interface 2264 configured to output audio data to the loudspeakers.

[0202] the Average experts in this field can be easily understood various modifications to the implementations described in this disclosure. The General principles defined in this disclosure may be applied to other implementations without deviation from the spirit and scope of this disclosure. Thus, the invention is not assumed as limited to the implementations shown in this disclosure, but should be agreed with the widest scope, relevant to this disclosure, the principles and novel characteristics disclosed in the present disclosure.

1. The device, containing:
the system interfaces; and
a logical system configured for:
receiving via the system interfaces audio data containing one or more audio objects and associated metadata; wherein the associated metadata includes data paths for at least one of the one or more audio objects indicating time-varying position of zvukova� object of at least one audio object within three-dimensional space; wherein the position of the sound object is limited to a two-dimensional surface; wherein the audio data created in respect of reproducing virtual environment containing multiple zones of speakers at different elevations;
receiving via the interface system of the data reproducing medium containing a pointer to the number of reproducing actual loudspeakers reproducing a three-dimensional environment and a pointer to the location of each reproducing speaker within the actual playback environment;
assigning audio data created in relation to several areas of the loudspeakers reproducing virtual environment reproducing the actual loudspeakers reproducing the environment; and
represent one or more sounds into one or more of the signal applied to the speaker at least partially based on the associated metadata, where each signal supplied to the loudspeaker corresponds to at least one of the reproducing loudspeakers within reproducing actual environment.

2. The device according to claim 1, characterized in that the reproducing environment contains the environment of the sound system for film.

3. The device according to claim 1, characterized in that the actual reproducing Wednesday �will win a 7.1 configuration.

4. The device according to claim 1, characterized in that the data reproducing actual environment contain data layouts reproducing loudspeakers, indicating the location of the reproducing loudspeakers.

5. The device according to claim 1, characterized in that the data reproducing actual environment contain data layout of zones reproducing loudspeakers, indicating the location of the reproducing loudspeakers.

6. The device according to claim 5, wherein the metadata includes information for assigning the position of the sound object location unit reproducing loudspeaker.

7. The device according to claim 1, wherein the performance data includes creating an amplification factor based on one or more of the following parameters: the desired position of the sound object, the distance from the desired position of the sound object to its original position, the speed of the sound object or the content type of the sound object.

8. The device according to claim 1, characterized in that a two-dimensional surface contains one of the following: a spherical surface, an elliptical surface, conical surface, cylindrical surface or a wedge.

9. The device according to claim 1, wherein the performance data includes the imposition of constraints on W�NY speakers contains the data to block selected reproducing loudspeakers.

10. The device according to claim 1, characterized in that the reproducing actual environment contains a screen for projection of video images; audio data synchronized with the video images; and wherein the data representation includes the application of control balance between the screen and the room in accordance with the data control the balance between the screen and the room taken from the system for user input.

11. The device according to claim 1, characterized in that it further comprises a display system, where the logical system is configured to control the display system to demonstrate the dynamic three-dimensional view actual playback environment.

12. The device according to claim 1, wherein the performance data includes the management of the spread of the sound object in one or more of the three dimensions for several reproducing the speakers.

13. The device according to claim 1, wherein the performance data includes dynamic reallocation of the object in response to the overload of the speakers by sending sound energy in an increased number of neighboring reproducing loudspeakers with keeping the total constant energy.

14. Remove�according to claim CTB 1, wherein the performance data includes the attribution provisions of the audio objects to the planes of the arrays of loudspeakers reproducing actual environment.

15. The device according to claim 1, characterized in that it further comprises a storage device, wherein the interface system includes an interface between a logical system and storage device.

16. The device according to claim 1, characterized in that the interface system includes a network interface.

17. The device according to claim 1, characterized in that the logical system is configured to determine whether to apply the rules of panning for the position of the sound object to multiple locations of loudspeakers or to assign the position of the sound object to the location of a single loudspeaker.

18. The device according to claim 17, wherein the logic system is configured to smooth the transitions between the gains of the loudspeakers in the transition from assigning the position of the sound object from the first location of a single loudspeaker in the location of the second single loudspeaker.

19. The device according to claim 17, wherein the logic system is configured to smooth the transitions between the gains of the loudspeakers in the transition from the attribution provisions of pollutants�the distance of the object to the location of a single speaker and to the application of the rules of panning for the position of the sound object to the location of the speakers.

20. Device according to any one of claims. 1-19, characterized in that the logic system is further configured to calculate the gains of the loudspeakers of the respective multiple zones of speakers.

21. The device according to claim 20, wherein the logic system is further configured to calculate coefficients of the speakers to the provisions of the sound object on a one-dimensional curve between positions of the virtual loudspeakers.

22. The method comprising stages on which:
accept audio data containing one or more audio objects and associated metadata; wherein the associated metadata includes data paths for at least one of the one or more audio objects indicating time-varying position of the sound object, at least one audio object within three-dimensional space; wherein the position of the sound object is limited to a two-dimensional surface; wherein the audio data created in respect of reproducing virtual environment containing multiple zones of speakers at different elevations;
accept data reproducing medium containing a pointer to the number of playback speakers in reproducing actual environment and castelbelforte each of reproducing three-dimensional loudspeaker reproducing actual environment;
assign audio data created in relation to several areas of the loudspeakers reproducing virtual environment reproducing the actual loudspeakers reproducing the environment; and
represent one or more sounds into one or more of the signal applied to the speaker at least partially based on the associated metadata, where each signal supplied to the loudspeaker corresponds to at least one of the reproducing loudspeakers within reproducing actual environment.

23. A method according to claim 22, characterized in that the reproducing actual environment contains the environment of the sound system for film.

24. A method according to claim 22, wherein said performance data includes creating an amplification factor based on one or more of the following parameters: the desired position of the sound object, the distance from the desired position of the sound object to its original position, the speed of the sound object or the content type of the sound object.

25. A method according to claim 22, wherein said performance data includes the imposition of constraints on the zone of loudspeakers that contain data to block selected reproducing loudspeakers.

26. Permanent data carrier containing stores�I in his memory software the software contains commands to perform the following operations:
receiving audio data containing one or more audio objects and associated metadata; wherein the associated metadata includes data paths for at least one of the one or more audio objects indicating time-varying position of the sound object, at least one audio object within three-dimensional space; wherein the position of the sound object is limited to a two-dimensional surface; wherein the audio data created in respect of reproducing virtual environment containing multiple zones of speakers at different elevations;
receiving data reproducing medium containing a pointer to the number of playback speakers in reproducing actual environment and a pointer to the location of each reproducing three-dimensional loudspeaker reproducing actual environment;
assigning audio data created in relation to several areas of the loudspeakers reproducing virtual environment reproducing the actual loudspeakers reproducing the environment; and
represent one or more sounds into one or more signals applied to the loudspeakers �about least partially based on the associated metadata where each signal supplied to the loudspeaker corresponds to at least one of the reproducing loudspeakers within reproducing actual environment.

27. Permanent data carrier according to claim 26, characterized in that the reproducing actual environment contains the environment of the sound system for film.

28. Permanent data carrier according to claim 26, wherein the data representation includes creating an amplification factor based on one or more of the following parameters: the desired position of the sound object, the distance from the desired position of the sound object to its original position, the speed of the sound object or the content type of the sound object.

29. Permanent data carrier according to claim 26, wherein the performance data includes the imposition of constraints on the zone of loudspeakers that contain data to block selected reproducing loudspeakers.

30. Permanent data carrier according to claim 26, wherein the performance data includes dynamic reallocation of the object in response to the overload of the speakers by sending sound energy in an increased number of neighboring reproducing loudspeakers with keeping the total constant energy.

31. Device (2100) for author sound about�EKTA, the device (2100) contains:
system (2105) interfaces;
system (2135) user input;
the display system (2130); and
a logical system (2110) configured for:
receiving audio data via the interface system;
demonstrating reproducing virtual environment in a graphical user interface on the display system (2130); reproducing the virtual environment contains multiple zones of speakers at different elevations;
receiving user input with respect to the position of the sound object through user input;
determining trajectory data indicating time-varying position of the sound object in three-dimensional space in accordance with user input received via user input, wherein the definition includes a restriction time-varying position of the two-dimensional surface within the three-dimensional space; wherein the audio object comprises audio data;
demonstrating the trajectory of the sound object in accordance with the trajectory data in the graphical user interface;
creating metadata associated with an audio object, wherein the metadata contains the data path.

32. The device according to claim 31, characterized in that the two-dimensional surface� contains one of the following: a spherical surface, elliptical surface, conical surface, cylindrical surface or a wedge.

33. The device according to claim 31, characterized in that the trajectory data contain a set of provisions within three-dimensional space for a few moments.

34. The device according to claim 31, characterized in that the data contain the original trajectory position, velocity data and acceleration data.

35. The device according to claim 31, characterized in that the trajectory data contain the initial position and the equation that determines the position in three-dimensional space and the corresponding time indices.

36. The device according to claim 31, characterized in that it further comprises an audio system, where the logical system is configured to control the audio system, at least in part, in accordance with the metadata.

37. The device according to claim 31, characterized in that several zones of loudspeakers correspond to reproducing the actual loudspeakers reproducing a three-dimensional environment containing reproducing loudspeakers, or where multiple zones correspond to the virtual loudspeakers loudspeakers virtual environment ambient sound.

38. The device according to claim 31, characterized in that the increased elevation of the sound object are pointing in graphical user�com interface by increasing the diameter of a circle, which is the sound object in the graphical user interface.

39. The method comprising stages on which:
accept audio data;
demonstrate reproducing the virtual environment in a graphical user interface on the display system; reproducing the virtual environment contains multiple zones of speakers at different elevations;
accept user input relating to the position of the sound object;
determine trajectory data indicating time-varying position of the sound object in three-dimensional space, where the definition includes the restriction position by a two-dimensional surface within the three-dimensional space; wherein the audio object comprises audio data;
demonstrate a trajectory of an audio object in accordance with the trajectory data in the graphical user interface; and
create the metadata associated with the audio object; wherein the metadata contains the data path.

40. A method according to claim 39, characterized in that a two-dimensional surface contains one of the following: a spherical surface, an elliptical surface, conical surface, cylindrical surface or a wedge.

41. The permanent storage medium containing stored in its memory software, the program is�mnoe software contains commands to perform the following operations:
receiving audio data;
demonstrating reproducing virtual environment in a graphical user interface on the display system; reproducing the virtual environment contains multiple zones of speakers at different elevations;
receiving user input concerning the status of the sound object;
determining trajectory data indicating time-varying position of the sound object in three-dimensional space, where the definition includes the restriction position by a two-dimensional surface within the three-dimensional space; wherein the audio object comprises audio data;
demonstrate the trajectory of the sound object in accordance with the trajectory data in the graphical user interface; and
creating metadata associated with the audio object; wherein the metadata contains the data path.

42. Permanent data carrier according to claim 41, characterized in that a two-dimensional surface contains one of the following: a spherical surface, an elliptical surface, conical surface, cylindrical surface or a wedge.



 

Same patents:

Device // 2554510

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of processing audio signals. The device includes at least one processor and at least one memory module storing a computer program code, wherein the at least one memory module and the computer program code are configured to, while interacting with at least one processor, allow the device to carry out at least the following: providing visual rendering of at least one audio signal parameter associated with at least one audio signal, wherein said at least one audio signal is an acoustic field around the device in real time using at least two microphones; detecting, using an interface, interaction with said visual rendering of the audio signal parameter and processing at least one audio signal associated with the audio signal parameter depending on said interaction.

EFFECT: reduced noise in captured audio signals.

20 cl, 7 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio signal processing. This system receives stereo signal to be divided by segmentation into frequency-time segments of stereo signals. Each of the latter can comply with sample of frequency band in the given time segment. Decomposition unit decomposes frequency-time signal segments for every pair of stereo signal time-frequency segments. This is executed in the steps that follow. Similarity measure is defined to indicate the degree of similarity of stereo signal frequency-time segments. Total signal time-frequency segment is generated as the sum of stereo signal frequency-time segments. Central stereo signal frequency-time segment is generated from total stereo signal frequency-time segments and pair of lateral stereo signal frequency-time segments from pair of stereo signal frequency-time segments in compliance with similarity measure. Then, generator generates multichannel signal containing central signal generated from total stereo signal frequency-time segments and lateral signals generated from lateral stereo signal frequency-time segments.

EFFECT: better spatial perception of sound signal.

14 cl, 5 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of estimating positions of loudspeakers in surround sound systems. The system comprises motion sensors (201, 203, 205), configured to determine motion data for a user portable unit, where motion data describe movement of the user portable unit. A user input device (207, 209) receives user activations indicating that at least one of the current position and orientation of the user portable unit is associated with a loudspeaker position when user activation is received. User activation may arise, for example, when a user presses a button. An analysing processor (211) then generates loudspeaker position estimates in response to motion data and user activations. The system can enable, for example, estimation of positions of speakers based a handheld device, for example a remote control panel, directed towards a speaker or mounted thereon.

EFFECT: high accuracy of estimating loudspeaker positions in surround sound systems.

14 cl, 6 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of generating an output spatial multichannel audio signal based on an input audio signal and an input parameter. The input audio signal is decomposed based on the input parameter to obtain the first signal component and the second signal component that are different from each other. The first signal component is rendered to obtain the first signal representation with the first semantic property and the second signal component is rendered to obtain the second signal representation with the second semantic property different from the first semantic property. The first and second signal representations are processed to obtain an output spatial multichannel audio signal.

EFFECT: reduced computational costs of the decoding/rendering process.

15 cl, 8 dwg

FIELD: physics, control.

SUBSTANCE: invention relates to universal remote control panels designed to control a large number of household devices. Described is a method of determining the correct code set to be used to control a household device. A remote control panel transmits to the household device one or more instructions using the corresponding code of at least one of multiple code sets (3040). The code set to be used to control said household device is determined based on at least instructions transmitted to the household device by the user of the remote control panel as a response thereto (3070).

EFFECT: saving power owing to a simpler method of determining a code set to be used from multiple code sets in a remote control panel.

12 cl, 3 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of generating an output spatial multichannel audio signal based on an input audio signal. The input audio signal is decomposed based on an input parameter to obtain a first signal component and a second signal component that are different from each other. The first signal component is rendered to obtain a first signal representation with a first semantic property and the second signal component is rendered to obtain a second signal representation with a second semantic property different from the first semantic property. The first and second signal representations are processed to obtain an output spatial multichannel audio signal.

EFFECT: low computational costs of the decoding/rendering process.

5 cl, 8 dwg

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal (Xdn,k) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

FIELD: radio engineering, communication.

SUBSTANCE: apparatus includes: a means of processing a foreground signal in order to provide a perceptible foreground angle for the foreground signal; a means of processing a foreground signal in order to provide the desirable attenuation level for the foreground signal; a means of processing a background signal in order to provide a perceptible background angle for the background signal; a means of processing a background signal in order to provide the desirable attenuation level for the background signal, wherein the background signal is processed such that it sounds fuzzier than the foreground signal; and a means of merging the foreground signal and the background signal into an output audio source signal.

EFFECT: clearer perceptible position for an audio source in an audio composition.

25 cl, 20 dwg

FIELD: information technologies.

SUBSTANCE: device to modify a sweet spot of a spatial M-channel audio signal comprises a receiver (201) to receive an N-channel audio signal, N<M, a parametric facility (203) to detect spatial parameters of step-up mixing, connecting the N-channel audio signal with the spatial M-channel audio signal, a modifying facility (207) to modify the sweet spot of the spatial M-channel audio signal by modification of at least one of spatial parameters of step-up mixing; a facility of generation (205) to generate a spatial M-channel audio signal by step-up mixing of an N-channel audio signal using at least one modified spatial parameter of step-up mixing.

EFFECT: possibility to manipulate a sweet spot with less complexity.

20 cl, 5 dwg, 2 tbl

FIELD: information technologies.

SUBSTANCE: audio processor (2) generates a stereo signal (4; 50) with enhanced perceptual properties using a central signal (6a) and a side signal (6b). The central signal (6a) represents a sum, and the side signal (6b) is a difference of the initial left and right channels (40). The audio processor includes a decorrelator (8) to generate a decorrelated representation of a component of the central signal (82) and/or a decorrelated representation of a component of the side signal (84), a combiner of a signal (10; 46) to generate an optimised side signal (14; 90) by combination of the representation (70) of the side signal with the decorrelated representation of the side signal (84) and with the decorrelated representation of the central signal component (82) or with the central signal components and the decorrelated representation of the side signal component (84), and a central side step-up mixer (12; 48), designed to generate a stereo signal with enhanced perceptual properties with application of the central signal representation and optimised side signal.

EFFECT: improved quality of sound reproduction.

20 cl, 7 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to encoding and decoding an audio signal in which audio samples for each object audio signal may be localised in any required position. In the method and device for encoding an audio signal and in the method and device for decoding an audio signal, audio signals may be encoded or decoded such that audio samples may be localised in any required position for each object audio signal. The method of decoding an audio signal includes extracting from the audio signal a downmix signal and object-oriented additional information; generating channel-oriented additional information based on the object-oriented additional information and control information for reproducing the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multichannel audio signal using the processed downmix signal and the channel-oriented additional information.

EFFECT: high accuracy of reproducing object audio signals.

7 cl, 20 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of encoding audio signals and related spatial information in a format which is independent of the playback scheme. A first set of audio signals is assigned to a first group. The first group is encoded as a set of mono audio tracks with associated metadata describing the direction of the signal source of each track relative to the recording position and the initial playback time thereof. A second set of audio signals is assigned to a second group. The second group is encoded as at least one set of ambisonic tracks of a given order and a mixture of orders. Two groups of tracks comprising the first and second sets of audio signals are generated.

EFFECT: providing a technique capable of presenting spatial audio content independent of the exhibition method.

26 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to a surround sound system. multi-channel spatial signal comprising at least one surround channel is received. Ultrasound is emitted towards a surface to reach a listening position via reflection of said surface. The ultrasound signal may specifically reach the listening position from the side, above or behind of a nominal listener. A first drive unit generates a drive signal for the directional ultrasound transducer from the surround channel. The use of an ultrasound transducer for providing the surround sound signal provides an improved spatial experience while allowing the speaker to be located, for example, in front of the user. An ultrasound beam is much narrower and well defined than conventional audio beams and can therefore be better directed to provide the desired reflections. In some scenarios, the ultrasound transducer may be supplemented by an audio range loudspeaker.

EFFECT: high quality of reproducing audio and high efficiency of the surround sound system.

12 cl, 11 dwg

FIELD: physics, acoustics.

SUBSTANCE: binaural rendering of a multi-channel audio signal into a binaural output signal is described. The multi-channel audio signal includes a stereo downmix signal (18) into which a plurality of audio signals are downmixed; and side information includes downmix information (DMG, DCLD), indicating for each audio signal, to what degree the corresponding audio signal was mixed in the first channel and second channel of the stereo downmix signal (18), respectively, as well as object level information of the plurality of audio signals and inter-object cross correlation information, describing similarity between pairs of audio signals of the plurality of audio signals. Based on a first rendering prescription, a preliminary binaural signal (54) is computed from the first and second channels of the stereo downmix signal (18). A decorrelated signal (Xdn,k) is generated as an perceptual equivalent to a mono downmix (58) of the first and second channels of the stereo downmix signal (18) being, however, decoded to the mono downmix (58).

EFFECT: improved binaural rendering while eliminating restrictions with respect to free generation of a downmix signal from original audio signals.

11 cl, 6 dwg, 3 tbl

FIELD: physics, acoustics.

SUBSTANCE: invention relates to processing signals in an audio frequency band. The apparatus for generating at least one output audio signal representing a superposition of two different audio objects includes a processor for processing an input audio signal to provide an object representation of the input audio signal, where that object representation can be generated by parametrically guided approximation of original objects using an object downmix signal. An object manipulator individually manipulates objects using audio object based metadata relating to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an output audio signal having one or multi-channel signals depending on a specific rendering setup.

EFFECT: providing efficient audio signal transmission rate.

14 cl, 17 dwg

FIELD: radio engineering, communication.

SUBSTANCE: described is a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker system, wherein each virtual sound source position is associated to each channel. The device includes a correlation reducer for differently converting, and thereby reducing correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a centre and a non-centre channel of the plurality of channels, in order to obtain an inter-similarity reduced combination of channels; a plurality of directional filters, a first mixer for mixing output signals of the directional filters modelling the acoustic transmission to the first ear canal of the listener, and a second mixer for mixing output signals of the directional filters modelling the acoustic transmission to the second ear canal of the listener. Also disclosed is an approach where centre level is reduced to form a downmix signal, which is further transmitted to a processor for constructing an acoustic space. Another approach involves generating a set of inter-similarity reduced transfer functions modelling the ear canal of the person.

EFFECT: providing an algorithm for generating a binaural signal which provides stable and natural sound of a record in headphones.

33 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: method comprises estimating a first wave representation comprising a first wave direction measure characterising the direction of a first wave and a first wave field measure being related to the magnitude of the first wave for the first spatial audio stream, having a first audio representation comprising a measure for pressure or magnitude of a first audio signal and a first direction of arrival of sound; estimating a second wave representation comprising a second wave direction characterising the direction of the second wave and a second wave field measure being related to the magnitude of the second wave for the second spatial audio stream, having a second audio representation comprising a measure for pressure or magnitude of a second audio signal and a second direction of arrival of sound; processing the first wave representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure, a merged direction of arrival measure and a merged diffuseness parameter; processing the first audio representation and the second audio representation to obtain a merged audio representation, and forming a merged audio stream.

EFFECT: high quality of a merged audio stream.

15 cl, 7 dwg

FIELD: physics.

SUBSTANCE: apparatus (100) for generating a multichannel audio signal (142) based on an input audio signal (102) comprises a main signal upmixing means (110), a section (segment) selector (120), a section signal upmixing means (110) and a combiner (140). The section signal upmixing means (110) is configured to provide a main multichannel audio signal (112) based on the input audio signal (102). The section selector (120) is configured to select or not select a section of the input audio signal (102) based on analysis of the input audio signal (102). The selected section of the input audio signal (102), a processed selected section of the input audio signal (102) or a reference signal associated with the selected section of the input audio signal (102) is provided as section signal (122). The section signal upmixing means (130) is configured to provide a section upmix signal (132) based on the section signal (122), and the combiner (140) is configured to overlay the main multichannel audio signal (112) and the section upmix signal (132) to obtain the multichannel audio signal (142).

EFFECT: improved flexibility and sound quality.

12 cl, 10 dwg

FIELD: information technology.

SUBSTANCE: invention relates to lossless multi-channel audio codec which uses adaptive segmentation with random access point (RAP) and multiple prediction parameter set (MPPS) capability. The lossless audio codec encodes/decodes a lossless variable bit rate (VBR) bit stream with random access point (RAP) capability to initiate lossless decoding at a specified segment within a frame and/or multiple prediction parameter set (MPPS) capability partitioned to mitigate transient effects. This is accomplished with an adaptive segmentation technique that fixes segment start points based on constraints imposed by the existence of a desired RAP and/or detected transient in the frame and selects a optimum segment duration in each frame to reduce encoded frame payload subject to an encoded segment payload constraint. RAP and MPPS are particularly applicable to improve overall performance for longer frame durations.

EFFECT: higher overall encoding efficiency.

48 cl, 23 dwg

FIELD: physics.

SUBSTANCE: method and system for generating output signals for reproduction by two physical speakers in response to input audio signals indicative of sound from multiple source locations including at least two rear locations. Typically, the input signals are indicative of sound from three front locations and two rear locations (left and right surround sources). A virtualiser generates left and right surround output signals suitable for driving front loudspeakers to emit sound that a listener perceives as emitted from rear sources. Typically, the virtualiser generates left and right surround output signals by transforming rear source input signals in accordance with a sound perception simulation function. To ensure that virtual channels are well heard in the presence of other channels, the virtualiser performs dynamic range compression on rear source input signals. The dynamic range compression is preferably performed by amplifying rear source input signals or partially processed versions thereof in a nonlinear way relative to front source input signals.

EFFECT: separating virtual sources while avoiding excessive emphasis of virtual channels.

34 cl, 9 dwg

Slit type gas laser // 2273116

FIELD: quantum electronics, possible use for engineering technological slit type gas lasers.

SUBSTANCE: slit type gas laser has hermetic chamber, a pair of metallic electrodes, alternating voltage source, a pair of dielectric barriers, and an optical resonator. Chamber is filled with active gas substance. Metallic electrodes are mounted within aforementioned chamber, each of them has surface, directed to face surface of another electrode. Source of alternating voltage is connected to aforementioned electrodes for feeding excitation voltage to them. Dielectric barriers are positioned between metallic electrodes, so that surfaces of these barriers directed to each other form slit discharge gap for forming of barrier discharge in gas substance.

EFFECT: possible construction of slit type gas laser, excited by barrier discharge, dielectric barriers being made specifically to improve heat drain from active substance of laser, decrease voltage fall on these dielectric barriers, provide possible increase of electrodes area, improve efficiency of laser radiation generation, increase output power of laser, improve mode composition of its output signal.

8 cl, 4 dwg

Up!