Audio encoder, audio decoder, encoded audio information, methods of encoding and decoding audio signal and computer programme

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to communication engineering. An audio decoder for providing decoded audio information based on encoded audio information includes a window application-based signal converter formed to map a frequency-time presentation, which is described by the encoded audio information, to a time interval presentation. The window application-based signal converter is formed to select one of a plurality of windows, which include windows of different transition inclinations and windows of different conversion lengths based on window information. The audio decoder includes a window selector formed to evaluate window information of a variable-length code word for selecting a window for processing said part of the frequency-time presentation associated with said audio information frame.

EFFECT: eliminating artefacts arising when processing time-limited frames.

15 cl, 23 dwg

 

Implementation according to the invention associated with an audio encoder for providing the encoded audio information on the basis of the input audio information and audio decoder for providing a decoded audio information on the basis of an encoded audio information. Further implementation according to the invention associated with the encoded audio information. Further implementation according to the invention relates to method for providing a decoded audio information on the basis of an encoded audio information and a way to provide encoded audio information on the basis of the input audio information. Further implementation associated with computer programs to perform the inventive methods.

The implementation of the invention associated with the proposed upgrade of the syntax of the bitstream based on the combined speech and audio coding (USAC).

Next will be explained the background of the invention in order to facilitate understanding of the invention and its advantages. In the last decade, great efforts were made to create opportunities for the preservation and distribution of audio content in digital form. One important achievement in this regard is the definition of the international standard ISO/IEC 14496-3. Part 3 of this standard is associated is with the encoding and decoding of audio content, and subsection 4 of part 3 deals with General audio coding. ISO/IEC 14496 part 3, subsection 4 defines the concept of encoding and decoding audio content. In addition, proposed further enhancements to improve the quality and/or reduce the required bit rate.

However, according to the concept described in this standard, the audio signal of the time interval is converted into a time-frequency representation. The conversion of a time interval in the frequency-time domain is usually performed through the use of transforming blocks, which are also referred to as "frames" sample time interval. It was found that it is advantageous to use overlapping frames, which are shifted, for example, half-frame, because the overlap can effectively be avoided (or at least reduce) the artifacts. In addition, it was found that the organization of the Windows must be running in order to avoid artifacts arising from the processing of the limited time frames. In addition, the organization window allows you to optimize the process of blending and adding further displaced in time, but overlapping frames.

However, it was found that it is problematic to effectively represent edges, that is, sharp transitions or so-called prepreg) is administered intermittent electrical noise within the audio content, using a window of fixed length, because the energy transition will be distributed along the entire length of the window, resulting in audible artifacts. Accordingly, it was suggested switching between Windows of different lengths, so that approximately constant part of the audio content was encoded through the use of long Windows and, thus, to transition parts (e.g. parts, including nuisance) of the audio content was encoded by using shorter Windows.

However, in the system, which allows you to choose between different Windows to convert the audio content from the temporary interval in the frequency-time domain, of course, you must inform the decoder, what window should be used for decoding the coded audio content of this frame.

In conventional systems, for example in the audio decoder according to international standard ISO/IEC 14496-3, part 3, subsection 4, a data element called "window_sequence", which shows the sequence of Windows used in the current frame, fits two bits in the bit stream in the so-called "ics_info" element of the bitstream. Taking into account the sequence window of the previous frame, reported eight different sequences window.

Based on the above discussion, m is should be noted, that bit load of the encoded bit stream representing audio information, is created by the need to inform the type of the window.

In view of this situation it is advisable to create a concept which provides a more effective, relative to the bit rate, information about the type of window used to convert between the presentation time interval of the audio content and the presentation of the time-frequency region of the audio content.

This problem is solved by the audio encoding device under item 1, the audio decoder under item 9, encoded audio information under item 12, the method for providing a decoded audio information on p. 14, a method for providing encoded audio information on p. 15 and a computer program under item 16.

The implementation according to the invention creates an audio decoder for providing a decoded audio information on the basis of an encoded audio information. The audio decoder includes based on the application window signal Converter generated to display the time-frequency representation, described by the encoded audio information, on the presentation time interval of the audio content. Based on the application window signal Converter is formed to select oknos multiple Windows, includes box of different slopes and different window lengths conversion, based on the information about the window. The audio decoder includes a selector window formed to assess the information about the window code words of variable length, to select a window to handle this part (e.g., frame) time-frequency representation associated with a given frame of audio information.

This implementation of the invention is based on the discovery that the speed of transmission of bits required to store or transmit information indicating what type of Windows should be used to convert the presentation time-frequency region of the audio content in view of the time interval can be reduced by using information about the window code words of variable length. It was found that information about the window code words of variable length is appropriate, because the information needed to select the appropriate window, is suitable for such representation code words of variable length.

For example, when using the window information code words of variable length, you can use the fact that there is a correlation between the choice of the slope of the transition and the length of the conversion, because the short length conversion usually does not IP alzueta for Windows having one or two long slope of the transition. Accordingly, it is possible to avoid transmission of redundant information when using the window information code words of variable length, thereby increasing the efficiency of bit-rate encoded audio information.

As a further example, it should be noted that usually there is a correlation between the shapes of the Windows of adjacent frames, which can also be used for selective reducing the length of the code word information window when the window type is more adjacent Windows (adjacent to the window currently under consideration) restricts the choice of window types for the current frame.

To summarize the above, the use of window information code words of variable length provides savings in bit rate without significantly increasing the complexity of the audio decoder and without changing the shape of the outgoing wave audio decoder (compared with information about the window code words of fixed length). In addition, the syntax of the encoded audio information can even be simplified in some cases that will be discussed in detail next.

In a preferred implementation, the audio decoder includes a bit stream analyzer generated to analyze the bit stream representing codiovan the Yu sound information, and to extract from the bitstream of bit length information of the inclination of the window, and selectively extracted from the bit stream of one-bit length information of the conversion, depending on the value of bit length information of the inclination of the window. In this case, the selector window is preferably formed so that, depending on the length information of the inclination of the window, to selectively use or ignore the information about the transform length to select the window to handle this part of time-frequency representation.

When using this concept can be obtained separation between information about the length of the tilt window and information about the length of the conversion, which helps to simplify the display in some cases. In addition, sharing information about the window on the required bit length of the tilt of the window and bit length conversion, the presence of which depends on the state of the bit length of the tilt of the window, provides a very effective reduction of the bit rate that can be obtained while maintaining the simple syntax of the bitstream. Accordingly, the complexity analyzer bit stream remains relatively low.

In a preferred implementation, the selector window is formed to select the window type for processing a current portion of the time-frequency information is, and (for example, the current audio frame) depending on the type of window chosen to handle the previous part (for example, the previous audio frame time-frequency information, so that the length of the left tilt window for processing a current portion of the time-frequency information is consistent with the length of the right-hand slope of the selected window to process the previous part of time-frequency information. Using this information, the bit rate required to select the window type for processing a current portion of the time-frequency information, is extremely small, because the information type selection window is coded with very low complexity. In particular, it is not necessarily a "waste" of bits to encode the length of the left tilt window associated with the current part time-frequency information. Accordingly, when using the information about the length of the right-hand slope of the window, used for processing the previous part of time-frequency information can be used two bits (for example, a required bit length of the tilt of the window and the optional bit length conversion) to select the appropriate window from the set consisting of more than four Windows that can be selected. Thus, it is possible to avoid unnecessary redundancy and to increase the efficiency with which oresti transmission of bits of the encoded bit stream.

In a preferred implementation, the selector window is formed to select between the first type of window and the second window type depending on the value of bit length information of the inclination of the window, if the length of the right-hand slope of the window for processing the previous part of time-frequency information receives a "long" value (indicating a greater length tilt window compared to the "short" value indicating the shorter the length of the tilt of the window) and if the previous part of the time-frequency information, current portion of time-frequency information and the time-frequency information is encoded in the basic mode (main mode) the frequency domain.

The selector window is also preferably formed to choose the third type of window in response to the first value (for example, a value of "one") one-bit length information of the inclination of the window, if the length of the right-hand slope of the window for processing the previous part of time-frequency information gets "short" value (as discussed above) and if the previous part of the time-frequency information, current portion of time-frequency information and the time-frequency information is encoded in the basic mode (main mode) frequency domain.

In addition, the selector window also preferably forms the tsya, to choose between the fourth window type and sequence Windows (which can be considered as the fifth window type) depending on the bit length information conversion, if the single-bit information about the length of the tilt window receives the second value (for example, a value of "zero"), showing the short right-angle window, and if the length of the right-hand slope of the window for processing the previous part of time-frequency information gets "short" value (as discussed above), and if the previous part of the time-frequency information, current portion of time-frequency information and the time-frequency information is encoded in the basic mode (main mode) frequency domain.

For this case, the first type of window includes a (relatively) long left-tilt window, (relatively) long right-hand tilt open and (relatively) long conversion; the second type of window includes a (relatively) long left-tilt window, (relatively) short length right angle open and (relatively) long transformation; the third type of window includes a (relatively) short length left-tilt window, (relatively) long right-hand tilt open and (relatively) long transformed the education; and the fourth type window includes a (relatively) short length left-tilt window, (relatively) short length right angle open and (relatively) long conversion. "Screens" (or fifth window type) determines the sequence or the imposition of multiple subwindows associated with a single part (e.g., frame) of the time-frequency information; each set of subwindows has a (relatively) short transform length, (relatively) short length left tilt open and (relatively) short length right angle open. When using this approach, a total of five window types (including type "screens") can be selected using only two bits, where the bit information (namely, one-bit information about the length of the tilt of the window) is sufficient to report a regular sequence of multiple Windows having a relatively large length-and left-hand and right-hand slope of the window. On the contrary, the case of double-bit information about the window is only required when linking sequence short Windows ("screens" or "the fifth type of the window") and extended in time (many frames) series of frames of a sequence of Windows.

To summarize, the above concepts what I type selection box from set, consisting, for example, of the five different types of Windows, provides a significant reduction of the bit rate. While, traditionally, three selected bits are needed to select the window type, for example, of the five types of Windows, according to this invention to make such a choice, you need only one or two bits. Thus, it can be achieved substantial savings bits, which reduces the required bit rate and/or provides a chance to improve the sound quality.

In a preferred implementation, the selector window is formed to selectively evaluate the bit length conversion window information code words of variable length, only if the window type for the processing of the previous part (e.g., frame) time-frequency information has a length of right-wing tilt of the window corresponding to the length of the left tilt window short sequence of Windows, and if a single-bit information about the length of the tilt of the window associated with the current part (for example, the current frame) time-frequency information, determines the length of the right-hand slope of the window corresponding to the length of the right-hand slope of the short window sequence window.

In a preferred implementation, the selector window is further formed to receive information about previous basic mode (jus the th fashion), related to the previous part (e.g., frame) of the audio information and describes basic mode (the main mode used for encoding the previous part (e.g., frame) of the audio information. In this case, the selector window is formed to select the window for the processing of current portion (e.g., frame) of the time-frequency representation based on the information about the previous basic mode (the main mode), and depending on the window information code words of variable length associated with the current part time-frequency representation. Thus, the basic mode (main mode) of the previous frame can be used to select the appropriate window for the transition (for example, in the form of an overlay operation and addition) between the previous frame and the current frame. Again, using the window information code words of variable length is very advantageous, because again, you can save a considerable amount of bits. Especially good savings can be obtained if the number of box types, which is available (or valid) for the encoded audio frame, for example, in the field of linear prediction is small. Thus, often there is the possibility to use a short code word of the longer code words and shorter code words in the transition between the two different basic modes (main mod) (for example, between the base mode (main mode) the field of linear prediction and basic mode (main mode) frequency domain).

In a preferred implementation, the selector window is further formed to receive information about the next basic mode (main mode) associated with the subsequent part (or frame) of the audio information and describes basic mode (the main mode used for encoding the next frame of audio information. In this case, the audio selector is preferably formed to select the window for the processing of current portion (e.g., frame) of the time-frequency representation based on the information about the subsequent basic mode (the main mode), and depending on the window information code words of variable length associated with the current part time-frequency representation. Again, the window information code words of variable length can be used in combination with information about the next basic mode (main mode) to determine the type of Windows account bits low level.

In a preferred implementation, the selector window is formed to select a window with a shortened right-hand slope, if the information about the subsequent basic mode (main mode) indicates that the next frame of audio information by kodinomaisesta use basic mode (main mode) the field of linear prediction. Thus, the adaptation of the window to switch between the basic mode (main mode) frequency domain and basic mode (main mode) time interval can be made without additional signaling.

Another implementation according to the invention creates an audio encoder for providing the encoded audio information on the basis of the input audio information. The audio encoder includes based on the application window signal Converter, which is formed to provide a sequence of parameters of the audio signal (for example, the representation of time-frequency region of the input audio information on the basis of many of the parts sold through the organization window (for example, overlapping or non-overlapping frames of the input audio information. Based on the application window signal Converter preferably formed to adapt the shape of the window to get implemented through the organization of open parts of the input audio information in dependence on the characteristics of the input audio information. Based on the application window signal Converter is formed to switch between a usage of Windows having a (relatively) longer the slope of the transition, and Windows that have (relatively) b is more short slope transition, and also switch between a usage of Windows having two or more different length conversion. Based on the application window signal Converter is also formed to define the window type used to convert current portion (e.g., frame) of the input audio information in dependence on the type of window used to convert the previous part (e.g., frame) of the input audio information and the audio content of the current portion of the input audio information. In addition, the audio encoder is generated to encode information about the box that describes the type of window used to convert the current portion of the input audio information, through the use of code words of variable length. This is the audio encoder provides the advantages already discussed in relation to inventive audio decoder. In particular, it is possible to reduce the bit rate encoded audio information, avoiding the use of relatively long code words in some or all situations, when possible.

Another implementation according to the invention generates encoded audio information. Encoded audio information includes encoded time-frequency representation, describing the sound the TV content many implemented through the organization of open parts of the audio signal. Windows of different slopes of the transition (e.g., length of slope transition) and different lengths transformations are associated with different implemented through the organization of open parts of the audio signal. Encoded audio information includes encrypted information about the window encoding the types of Windows used to obtain the coded frequency-time representations of many implemented through the organization of open parts of the audio signal. Encoded information about the box is information about the variable length window that encodes one or more types of Windows through the use of the first, smaller number of bits, and encoding one or more other types of Windows through the use of a second, larger number of bits. This encoded audio information provides the advantages already discussed above in relation inventive audio decoder and inventive audio encoder device.

Another implementation according to the invention creates a method for providing a decoded audio information on the basis of an encoded audio information. The method includes evaluating the information about the window code words of variable length for the selection of multiple Windows, including Windows of different slopes of the transition (e.g., different lengths of the slope of the transition) and Windows of different length of the conversion, to handle this part of time-frequency representation associated with a given frame of audio information. The method also includes displaying this particular part of time-frequency representation, which describes the encoded audio information, on the representation of the time interval by using the selected window.

Another implementation according to the invention creates a method for providing encoded audio information on the basis of the input audio information. The method includes providing a sequence of parameters of the audio signal (for example, the representation of time-frequency region) on the basis set implemented by the organization of the open parts of the input audio information. In order to provide a sequence of parameters of the audio signal, switches between a usage of Windows having a longer slope of the transition, and Windows that have a shorter slope of the transition, and between a usage of Windows having two or more different lengths of transformation in order to adapt the shape of the window to get implemented through the organization of open parts of the input audio information in dependence on the characteristics of the input audio information. The method also includes encoding information about the box that describes the type of window used for the preobrazovaniya current portion of the input audio information, through the use of code words of variable length.

In addition, the implementation according to the invention create a computer program to implement the following methods.

Brief description of drawings

The invention will be subsequently described with reference to the attached drawings, where:

Fig.1 shows a block diagram of an audio encoding device according to the implementation of the invention;

Fig.2 shows a block diagram of an audio decoder according to the implementation of the invention;

Fig.3 shows a schematic representation of the different types of Windows that can be used in accordance with the inventive concept;

Fig.4 shows a graphical representation of the allowed transitions between Windows of different types of Windows that can be used in the circuit implementation according to the invention;

Fig.5 shows a graphic representation of the sequence of different types of Windows that can be produced inventive encoder or which can be processed by the inventive audio decoder;

Fig.6 shows a table representing the proposed syntax of the bitstream according to the implementation of the invention;

Fig.6b shows a graphical representation of the display window type of the current frame "windowjength" information (about the length of the window) is "transfbrm_length" information (on the transform length);

Fig.6C shows a graphical representation of the display to obtain the window type of the current frame based on information about previous basic mode (main mode), "window_length" information (about the length of the window) of the previous frame, the "window_length" information (about the length of the window) of the current frame and transfbrm_length" information (on the transform length) of the current frame;

Fig.7a shows a table representing the syntax of the "window_length" information (about the length of the window);

Fig.7b shows a table representing the syntax "transform_length" information (about the length of conversion);

Fig.7C shows a table representing the new syntax of the bitstream and transitions;

Fig.8 shows a table giving an overview of all combinations of the "window_length" information (about the length of the window) and "transform_length" information (about the length of conversion);

Fig.9 shows a table representing a saving of bits that can be obtained using an embodiment of the invention;

Fig.10A shows a representation of the syntax, the so-called block of source data USAC;

Fig.10b shows a representation of the syntax, the so-called single-item;

Fig.10 shows a representation of the syntax, the so-called dual element;

Fig.10d shows a representation of the syntax, the so-called information ICS (information and communication system);

Fig.10E shows the presentation of the syntax, the so-called flow channel frequency domain;

Fig.11 shows a block diagram of a method of providing encoded audio information on the basis of the input audio information; and

Fig.12 shows a block diagram of a method for providing a decoded audio information on the basis of an encoded audio information.

Detailed description of the implementation

A brief overview of the audio encoding device

Next will be described an audio encoder, which can be applied to the inventive concept. However, it should be noted that the audio coding device described with reference to Fig.1 should be viewed only as an example of an audio encoding device, which can be applied to the invention. However, even though a relatively simple audio encoder is discussed with reference to Fig.1, it should be noted that the invention can also be applied in a much more complex sound coding devices, for example, in the audio encoding device, which can switch between different basic modes (main mod) encoding (for example, between the encoding frequency region and the coding region of the linear prediction). However, for the sake of simplicity, it seems useful to understand the main ideas of simple audio encoder in which trojstva frequency domain.

Audio encoder, shown in Fig.1, is very similar to the audio coding device described in international standard ISO/IEC 14496-3:2005 (E), part 3, subsection 4, as well as in the documents on which it has reference. Accordingly, reference should be made to the specified standard, the documents cited and there is extensive literature related to audio coding MPEG.

The audio encoder 100 shown in Fig.1, is formed to receive the input audio information 110, such as an audio signal time interval. The audio encoder 100 further includes an additional preprocessor 120 formed to the optional pre-process the input audio information 110, for example by subdirectly the input audio information 110, or by regulating the gain of the input audio information 110. The audio encoder 100 also includes, as a key component, based on the application window signal Converter 130, which is formed to receive the input audio information 110, or a pre-processed version 122, and to convert the input audio information 110 or her pre-processed version 122 in the frequency domain (or frequency-time domain), th is would be to obtain a sequence of parameters of the audio signal, which can be the spectral values in the frequency-time domain. To this end, based on the application window signal Converter 130 includes a control device Windows/Converter 136, which may be configured to convert the blocks of samples (e.g., frames) of the input audio information 110, 122 in the set of spectral values 132. For example, the device management Windows/Converter 136 may be formed to provide one set of spectral values for each block of samples (i.e. for each "frame") of the input audio information. However, the blocks of samples (i.e., "frames") of the input audio information 110, 122 preferably can be overlapping, so that adjacent time blocks of samples (frames) of the input audio information 110, 122 together used a lot of samples. For example, two consecutive time block of samples (frames) may overlap of approximately 50% of samples. Accordingly, the control device Windows/Converter 136 may be configured to perform a so-called overlapped transform, such as a modified discrete cosine transform (MDCT). However, performing a modified discrete cosine transformation, the device management Windows/Converter 136 may amend the ü window for each block of samples, through this weighing Central samples (ordered in time close to the temporal center of the block of samples) is stronger than the peripheral samples (ordered in time close to the leading and rear end of the block of samples). Window management can help to avoid artifacts arising from the segmentation of the input audio information 110, 122 on the blocks. Thus, the use of Windows before or during conversion of the time interval in the frequency-time domain provides a smooth transition between successive blocks of samples of the input audio information 110, 122. Details about weighting can be found in the international standard ISO/IEC 14496, part 3, subsection 4 and in the documents to which reference is made. In a very simple version of the audio encoding device number of 2N samples of the audio frame (defined as a block of samples) will be converted into a set of N spectral coefficients, regardless of the characteristics of the signal. However, it was found that this concept, which uses a constant transform length 2N samples of audio information 110, 122 regardless of the characteristics of the input audio information 110, 122, leads to serious degradation of transitions, because in the case of the transition, the energy of the transition is spread across the frame when decoding C is okoboi information. However, it was found that the improvement of the encoding of the edges can be obtained, if you select a shorter length conversion (for example, 2N/8=N/4 samples for conversion). However, it was also found that the choice of a shorter length conversion generally increases the required bit rate, even if you get a smaller number of spectral values for a shorter length conversion, compared to the greater length of the conversion. Accordingly, it is recommended to switch from a greater length conversion (for example, 2N samples for conversion) on a short transform length (for example, 2N/8=N/4 samples for conversion) close to the transition (also referred to as the territory) of the audio content and to switch back to long conversion (for example, 2N-samples for conversion) after the transition. Switching the transform length associated with the change of the window, used for windowing the samples of the input audio information 110, 122, before or during conversion.

Regarding this issue, it should be noted that in many cases, the audio encoder may use more than two different views. For example, the so-called "only_long_sequence" (long sequence) can be used to encode the current audio frame, if the and, and the previous frame (preceding the currently considered frame), and the subsequent frame (following the currently considered frame) is encoded through the use of the large length of the conversion (for example, 2N samples). On the contrary, the so-called "long_start_sequence" (longest initial sequence) can be used in the frame, which is converted through the use of the large length of the conversion, which precedes the frame, converted through the use of long lengths of transformation and followed by a frame, converted through the use of short length conversion. In the frame, which is converted through the use of short length conversion can be applied to so-called "eight_short_sequence" (a sequence of eight short) sequence of Windows, which includes eight short and overlapping (sub-) window. In addition, the so-called "long_stop_sequence" (long finite sequence) the sequence window can be used to convert the frame preceding the previous frame, converted through the use of short length conversion, and followed by a frame, converted through the use of large length conversion. Details on the possible p is sledovatelnot Windows is described in ISO/IEC 14496-3:2005 (E), part 3, subsection 4. In addition, reference is made to Fig.3, 4, 5, 6, which will be explained hereinafter in more detail.

However, it should be noted that some implementations may use one or more additional types of Windows. For example, there may be used the so-called "stop_start_sequence" (finite initial sequence) the sequence of Windows, if the current frame is preceded by a frame that uses a short length conversion, and if the current frame should frame that uses a short length of the conversion.

Accordingly, based on the application window signal Converter 130 includes the identifier of the sequence window 138, which is formed to provide information about the type of window 140 control device Windows/Converter 136, so that the control device Windows/Converter 136 can use the appropriate type of window ("screens"). For example, the determinant of the sequence of window 130 may be formed to directly evaluate the input audio information 110 or pre-processed input sound information 122. However, alternatively, the audio encoder 100 may include psycho-acoustic processor for recognition reference model 150, which is formed for receiving audible input the information 110 or pre-processed input sound information 122, and apply the psycho-acoustic model to extract information, which is important for the encoding of the input audio information 110, 122 from the input audio information 110, 122. For example, the psycho-acoustic processor for recognition reference model 150 may be formed to identify the transitions within the input audio information 110, 122, and to provide information about the length of the window 152, which may report the frames in which the desired short length conversion, due to the presence of transition in the corresponding input audio information 110, 122.

Psycho-acoustic processor for recognition reference model 150 can also be configured to determine which spectral values must be encoded with high resolution (i.e. fine quantization) and what are the spectral values can be encoded with a lower resolution (i.e. more coarse quantization) without serious degradation of the audio content. To this end, the psycho-acoustic processor for recognition reference model 150 may be formed to assess the psycho-acoustic masking effect, thus identifying a spectral value (or group of spectral values), which have lower psycho-acoustic relevance, and other spectral values (or gr is PPI spectral values) which have a higher psycho-acoustic relevance. Accordingly, the psycho-acoustic processor for recognition reference model 150 provides information about the psycho-acoustic relevance 154.

The audio encoder 100 further includes an optional spectral processor 160, which is formed to obtain a sequence of parameters of the audio signal 132 (for example, the representation of time-frequency region of the input audio information 110, 122) and to provide, on its basis, postoperating sequence of parameters of the audio signal 162. For example, the spectral post-processor 160 may be configured to perform temporal noise, long-term prediction, perceptual noise substitution and/or processing of the audio channel.

The audio encoder 100 also includes an optional CPU scaling/quantization/encoding 170, which is formed to scale the parameters of the audio signal (for example, the values of the frequency-time domain or spectral values") 132, 162 to perform quantization to encode the scaled and quantized values. To this end, the CPU scaling/quantization/encoding 170 can be configured to use information 154 provided psycho-offering the m processor for recognition reference model, for example, to decide what scale and/or the quantization to be applied to the parameters of the audio signal (or spectral values). Accordingly, scaling and quantization can be adjusted to obtain the desired bit rate scaled, quantized and coded sound signal (or spectral values).

In addition, the audio encoder 100 includes the encoder code word of variable length 180, which is formed to receive information about the type of window 140 from the determinant of the sequence window 138 and to provide, on its basis, a code word of variable length 182, which describes the type of window used for the operation of the windowing/ conversion performed by the device management Windows/a Converter 136. Details regarding the coding device code words of variable length 180 will be described later.

In addition, the audio encoder 100 includes optional formatter payload bit stream 190, which is formed to receive the scaled, quantized and coded spectral information 172 (which describes the sequence of parameters of the audio signal or spectral values 132), and a code word of variable dline, describe the type of window used for the operation of the windowing/conversion. Accordingly, the formatter payload bit stream 190 provides a bit stream 192, which includes information 172 and a code word of variable length 182. Bitrate 192 serves as a coded audio information and can be saved on storage media and/or transmitted from the audio encoding device 100 audio decoder.

To summarize the above, the audio encoder 100 is formed to provide encoded audio information 192 based on the input audio information 110. The audio encoder 100 includes, as an essential component, based on the application window signal Converter 130, which is formed to provide a sequence of parameters of the audio signal 132 (for example, the sequence of spectral values) on the basis set implemented by the organization of the open parts of the input audio information 110. Based on the application window signal Converter 130 is formed so that the window type to get implemented through the organization of open parts of the input audio information selected depending on characteristics of the audio information. Based on the application window signal Converter 130 forms the tsya, to switch between using Windows, have a longer slope of the transition, and Windows that have a shorter slope of the transition, as well as to switch between a usage of Windows having two or more different length conversion. For example, based on the application window signal Converter 130 is formed to determine the type of window used to convert current portion (e.g., frame) of the input audio information in dependence on the type of window used to convert the previous part (e.g., frame) of the input audio information, and depending on the audio content of the current portion of the input audio information. However, the audio encoder is formed to encode, for example, through the use of encoder code word of variable length 180, information about the type of window 140 that describes the type of window used to convert current portion (e.g., frame) of the input audio information through the use of code words of variable length.

Types of Windows conversion

Next will be described the various Windows that can be used by the device management Windows/Converter 136, and which are selected identifier in the sequence window 138. However, the window discussed here should be ivalsa only as an example. Subsequently, will be discussed inventive concept for efficient encoding window type.

Now with reference to Fig.3, which shows a graphical representation of the different types of Windows conversion, will give a brief overview of the new sample window. However, additional reference is made to ISO/IEC 14496-3, part 3, subsection 4, in which the concept of using Windows transformation described in even more detail.

Fig.3 shows a graphical representation of the first type of window 310, which includes a (relatively) long left tilt window 310a (1024 sample) and a long right-hand tilt window SUI (1024 sample). A total of 2048 samples and 1024 spectral coefficient associated with the first type of window 310, so that the first type of window 310 includes the so-called "long transformation.

The second type of window 312 is defined as "long_start_sequence" (longest initial sequence) or long_start_window" (long initial window). The second type of window includes a (relatively) long left tilt window a (1024 sample and (relatively) short right tilt window 312b (128 samples). A total of 2048 samples and 1024 spectral coefficient associated with the second type of window, so that the second type of window 312 includes long conversion.

The third type of window 314 is defined as "long_stop_sequence" (long end members shall etelnost) or long_stop_window" (long end of the window). The third type of window 314 includes a short left-tilt window 314a (128 samples) and a long right-angle box 314b (1024 sample). A total of 2048 samples and 1024 spectral coefficient associated with the third type of window 314, so that the third type of window includes long conversion.

The fourth type of window 316 is defined as "stop_start_sequence" (finite initial sequence) or stop_start_window"(end-start window). The fourth window type 316 includes a short left-tilt window a (128 samples) and a short right-angle window 316b (128 samples). A total of 2048 samples and 1024 spectral coefficient associated with the fourth type of window, so that the fourth window type includes "long conversion.

The fifth type of window 318 differs significantly from the first to fourth types window. The fifth type of the window includes the imposition of eight short Windows or subwindows a-319h, which is ordered to overlap in time. Each of the short Windows a-319h includes a length equal to 256 samples. Accordingly, short MDCT transformation that converts 256 samples in 128 spectral values associated with each of the short 319a-319h Windows. Accordingly, eight sets of 128 spectral values each associated with a fifth type box 318, while a single set of 1024 spectral values of the CBE is Ana with each of the first to fourth types of window 310, 312, 314, 316. Accordingly, we can say that the fifth type of window includes a "short" length conversion. However, the fifth window type includes short left tilt window a and a short right-angle window 318b.

Thus, for a frame associated with the first type of window 310, a second type of window 312, the third type of window 314 or fourth window type 316, 2048 samples of the input audio information jointly implemented by the organization window and transformed by MDCT, as a group, in the frequency-time domain. On the contrary, for a frame that is associated with the fifth type box 318, eight (at least partially overlapping) subsets consisting of 256 samples each, individually (or separately) is transformed by MDCT, so get eight sets of MDCT coefficients (frequency-time values).

Again with reference to Fig.3 it should be noted that Fig.3 shows a lot of additional Windows. These additional Windows, namely, the so-called "stop_1152_sequence (the final sequence 1152) or stop_window_1152 (the final box 1152) 330 and the so-called "stop_start_1152_sequence" (finite initial sequence 1152) or stop_start_window_1152" (end of the initial box 1152) 332 can be applied, if the current frame is preceded by a previous frame, which is encoded in the field of linear prediction. In such cases, d is in the conversion adapts, to ensure the destruction of artifacts combining the names of the time interval.

In addition, an extra box 362, 366, 368, 382 optional can be used if the current frame should subsequent frame, which is encoded in the field of linear prediction. However, the types window 330, 332, 362, 366, 368, 382 should be considered as optional and are not required to implement the inventive concept.

Transitions between types of Windows conversion

Now with reference to Fig.4, which shows a schematic representation of the allowed transitions between sequences of Windows (or window types conversion) will be explained in some further detail. Drawing attention to the fact that two subsequent window conversion, each of which has one of the window types 310, 312, 314, 316, 318, apply to partially overlapping blocks of audio samples, it should be understood that the right-hand slope of the window, the first window should come to the left-hand slope of the second window, the next window in order to avoid artifacts caused by a partial overlap. Accordingly, the choice of window type for the second frame of the two successive frames) is limited, if given the type of window for the first frame of the two successive frames). As can be seen in Fig.4, if the first window is "only_long_sequence" (long sequence) window, C is the first window can only follow "only_long_sequence" (long sequence) window or "long_start_sequence" (longest initial sequence) window. On the contrary, do not use "eight_short_sequence" (a sequence of eight short window, "long_stop_sequence" (long finite sequence window or stop_start_sequence" (finite initial sequence), the window for the second frame following the first frame, if you use "only_long_sequence" (long sequence) of the window to convert the first frame. Similarly, if "long_stop_sequence" (long finite sequence) window is used in the first frame, the second frame may use "only_long_sequence" (long sequence) window or "long_start_sequence" (longest initial sequence) window, and the second frame may not use "eight_short_sequence" (a sequence of eight short window, "long_stop_sequence" (long finite sequence) window or stop_start_sequence" (finite initial sequence window.

On the contrary, if the first frame of the two successive frames) uses "long_start_sequence" (longest initial sequence) window, "eight_short_sequence" (a sequence of eight short) window or stop_start_sequence" (finite initial sequence window, the second frame of the two successive frames) may not be used "only_long_sequence" (long sequence) window or "long_start_sequence" (longest initial sequence) window, but can use "eight_short_sequence" (the sequence in the seven short) window, "long_stop_sequence" (long finite sequence)"window or "stop_start_sequence" (finite initial sequence window.

The allowed transitions between types of window "only_long_sequence" (only a long sequence), "long_start_sequence" (longest initial sequence), "eight_short_sequence" (a sequence of eight short), "long_stop_sequence" (long finite sequence), and "stop_start_sequence" (finite initial sequence) is shown with a check mark in Fig.4. On the contrary, transitions between types of Windows that do not have a "tick", is invalid in some implementations.

In addition, it should be noted that additional types of window ' LPD_sequence (LPD sequence), "stop_1152_sequence (the final sequence 1152), and "stop_start_1152_sequence" (finite initial sequence 1152) can be used, if possible transitions between basic mode (main mode) frequency domain and basic mode (main mode) the field of linear prediction. However, this possibility should be optional, and it will be discussed later.

The approximate sequence window

Next will be described a sequence of Windows that uses the types window 310, 312, 314, 316, 318. Fig.5 shows a graphic representation of this sequence of Windows. As you can see, the abscissa 510 shows the time. Frames that overlap by approximately 50%, from whom Iceni in Fig.5 and labeled "frame 1" - "frame 7" (frame 1-frame 7). Fig.5 shows the first frame 520, which may, for example, to enable 2048 samples. A second frame 522 is shifted in time relative to the first frame 520 is approximately 1024 sample, so that the second frame overlaps the first frame 520 by approximately 50%. Temporal alignment of the third frame 524, the fourth frame 526, the fifth frame 528, sixth frame 530 and the seventh frame 532 can be seen in Fig.5. "Only_long_sequence" (long sequence) box 540 (type 310) associated with the first frame 520. In addition, "only_long_sequence" (long sequence) box 542 (type 310) associated with the second frame 522. "long_start_sequence" (longest initial sequence), box 544 (type 312) associated with the third frame, "eight_short_sequence" (a sequence of eight short) window 546 (type 318) associated with the fourth frame 526, "stop_start_sequence" (finite initial sequence) box 548 (type 316) is connected with the fifth frame, "eight_short_sequence" (a sequence of eight short) window 550 (type 318) associated with the sixth frame 530, and "long_stop_sequence" (long finite sequence) window 552 (type 314) associated with the seventh frame 532. Accordingly, a single set of 1024 MDCT coefficients associated with the first frame 520, the other a single set of 1024 MDCT coefficients associated with the second frame 522, and another single set of 1024 MDCT coefficients associated with Tr is Tim frame 524. However, eight sets of 128 MDCT coefficients associated with the fourth frame 526. A single set of 1024 MDCT coefficients associated with the fifth frame 528.

The screens shown in Fig.5 may, for example, result in particularly efficient encoding bit rate, if there is an obstacle in the Central part of the fourth frame 526 and if there is another obstacle in the Central part of the sixth frame 530, while the signal is approximately constant during the rest of the time (for example, during the first frame 520, the second frame 522, early in the third frame 524, in the center of the fifth frame 528 and at the end of the seventh frame 532).

However, as will be explained in detail hereinafter, this invention creates a particularly effective concept coding types of Windows associated with the audio frames. Regarding this issue, it should be noted that a total of five different types of Windows 310, 312, 314, 316, 318 are used in the sequence window 500 in Fig.5. Accordingly, "usually", it is necessary to use three bits to encode the frame type. On the contrary, the invention creates a concept that provides the encoding type of the window with the requirement of a smaller number of bits.

With reference to Fig.6A, and Fig.7a, 7b and 7C, will be explained inventive concept coding window type. Fig.6A shows t the blitz, representing the proposed syntax information about the type of window that includes an encoding rule type window. For explanation, it is assumed that the information about the type of window 140, which is provided codereuse device code words of variable length 180 identifier in the sequence window 138, describes the window type of the current frame and can use one of the values "only_long_sequence" (only a long sequence), "long_start_sequence" (longest initial sequence), "eight_short_sequence" (a sequence of eight short), "long_stop_sequence" (long finite sequence), "stop_start_sequence" (finite initial sequence), and optional even one of the values "stop_1152_sequence (the final sequence 1152) and stop_start_1152_sequence" (finite initial sequence 1152). However, according to the inventive concept coding, the encoder code word of variable length 180 provides single-bit "window_length" information" (information about the length of the window), which describes the length of the right tilt of the window associated with the current frame. As can be seen in Fig.7a, a value of "0" bit "window_length" information (about the length of the window) can represent the length of the right tilt of the window equal to 1024 samples, and a value of "1" may represent the length of the right tilt window of 128 samples. Accordingly, the encoder code what about the words of variable length 180 can provide the value "0" "window_length" information (about the length of the window), if the type of the window "only_long_sequence" (long sequence) (the first type of window 310) or "long_stop_sequence" (long finite sequence) (the third type of window 314). Optionally, the encoder code word of variable length 180 may also provide the "window_length" information (about the length of the window), "0", for a window of type "stop_1152_sequence (the final sequence 1152) (type box 330). On the contrary, the encoder code word of variable length 180 can provide the value "1" "window_length" information (about the length of the window) to "long_start_sequence" (longest initial sequence) (the second type of window 312), for "stop_start_sequence" (end of the initial sequence) (the fourth window type 316) and "eight_short_sequence" (a sequence of eight short) (the fifth type box 318). Optionally, the encoder code word of variable length 180 may also provide the "window_length" information (about the length of the window), equal to "1", "stop_start_1152_sequence" (finite initial sequence) (window type 332). In addition, the encoder code word of variable length 180 may optionally provide a value of "1" "window_length" information (about the length of the window) for one or more types window 362, 366, 368, 382.

However, the encoder code word of variable length 180 is formed to selectively provide another bit of information, namely the so-called "transorm_length" information (on the transform length) of the current frame, depending on the value of bit "window_length" information (about the length of the window) of the current frame. If the "window_length" information (about the length of the window) of the current frame is set to "0" (that is, for a window of type "only_long_sequence" (only a long sequence), "long_stop_sequence" (long finite sequence) and the optional "stop_1152_sequence (the final sequence 1152)), the encoder code word of variable length 180 does not provide "transform_length" information (about the length conversion) for inclusion in the bit stream 192. On the contrary, if the "window_length" information (about the length of the window) of the current frame is set to "1" (i.e. for window type "long_start_sequence" (longest initial sequence), "stop_start_sequence" (finite initial sequence), "eight_short_sequence" (a sequence of eight short), and, optionally, "LPD_start_sequence (LPD initial sequence) and "stop_start_1152_sequence" (finite initial sequence 1152)), the encoder code word of variable length 180 provides single-bit "transform_length" information (about the length conversion) for inclusion in the bit stream 192. "Transform_length" information (about the length of the conversion) is available, if it is provided so that "transform_length" information (on the transform length) is the length of the transform applied to the current frame. Thus, "transform_length" information (about the line conversion) is provided, to take the first value (for example, the value "On") for window type "long_start_sequence" (longest initial sequence), "stop_start_sequence" (end of the initial sequence), and the optional "stop_start_1152_sequence" (finite initial sequence 1152) and "LPD_start_sequence (LPD initial sequence), thereby showing that the kernel size MDCT, applied to the current frame is equal to 1024 samples (or 1152 samples). On the contrary, "transform_length" information (about the length of the conversion) is provided by the encoding device code words of variable length 180 to receive the second value (for example, the value "1") if "eight_short_sequence" (a sequence of eight short) window type associated with the current frame, thereby showing that the kernel size MDCT associated with the current frame is equal to 128 samples (see the representation of the syntax of Fig.7b).

To summarize, the encoder code word of variable length 180 provides a one-bit code word comprising only a single bit "window_length" information (about the length of the window) of the current frame to be included in the bit stream 192, if the right-hand slope of the window associated with the current frame, a relatively long (long slope window 310b, 314b, 330b), i.e. for window type "only_long_sequence" (only a long sequence), "long_stop_sequence" (long finite sequence) and "stop_1152_sequence" (end placenta is the sequence 1152). On the contrary, the encoder code word of variable length 180 provides a 2-bit code word comprising a single-bit "window_length" information (about the length of the window) and the bit "transform_length" information (about the length of the transform), for inclusion in the bitstream 192, if the right-hand slope of the window associated with the current frame is a short window tilting 312b, 316b, 318b, 332b, i.e. for window type "long_start_sequence" (longest initial sequence), "eight_short_sequence" (a sequence of eight short), "stop_start_sequence" (finite initial sequence) and, optional, "stop_start_1152_sequence" (finite initial sequence). Thus, 1 bit is saved for the case of "only_long_sequence" (long sequence) window type and "long_stop_sequence" (long finite sequence) window type (and, optionally, for "stop_1152_sequence" (final sequence 1152) window type).

Thus, only one or two bits, depending on the type of the window associated with the current frame, are required to encode the selection of the five (or even more) possible window types.

It should be noted that Fig.6A shows the display type window, which is specified in the type column of the window 630, on the value of the "window_length" information (about the length of the window), which is shown in column 620, and the status of collateral and the value (if needed) "transform_length" information the AI (on the transform length), which is shown in the column 624.

Fig.6b shows a graphical representation of the display to obtain the "window_length" information (about the length of the window) of the current frame and transform_length" information (about the length conversion) (or an indication that "transform_length" information (about the length of the conversion) is not included in the bitstream 192) of the window type of the current frame. This mapping can be performed by the encoding device code words of variable length 180 that receives information about the type of window 140 that describes the window type of the current frame and displays it on the "window_length" information (about the length of the window), as shown in column 660 table of Fig.6b, and "transform_length" information (about the length of the transform), as shown in column (662 table of Fig.6b. In particular, the encoder code word of variable length 180 may provide "transform_length" information (about the length of the transform), only if the "window_length" information (about the length of the window) takes a predetermined value (for example equal to "I"), and otherwise excluded collateral "transform_length" information (about the length of the transform), or prohibit the inclusion of "transform_length" information (about the length conversion) in the bit stream 192. Accordingly, the number of bits in the type box, included in the bitstream 192 for a given frame may vary, as shown in column 664 table of Fig.6b, depending on the window type current the frame.

It should also be noted that in some implementations the window type of the current frame can adapt or change, if the current frame should frame encoded in the field of linear prediction. However, this usually does not affect the display type window on the "window_length" information (about the length of the window) and selectively provided "transform_length" information (about the length of the transform).

Accordingly, the audio encoder 100 is formed to provide a bit stream 192, so that the bitstream 192 obey the syntax, which will be discussed below with reference to Fig.10A-10E.

A brief overview of the audio decoder

Further, the audio decoder according to the implementation of the invention will be described in detail with reference to Fig.2. Fig.2 shows a schematic diagram of the audio decoder according to the implementation of the invention. The audio decoder 200 of Fig.2 is formed, to obtain a bit stream 210 that includes encoded audio information, and to provide, on the basis of the decoded sound data 212 (e.g., in the form of a sound signal time interval). The audio decoder 200 includes optional deformatter payload bit stream 220, which is formed to obtain a bit stream 210 and to extract from the bitstream 210 encrypted information on the spectral value is 222 and INF is rmatio about box code words of variable length 224. Deformatter payload bit stream 220 may be configured to extract from the bitstream 210 additional information such as control information, information about the gain, and additional information about the audio settings. However, this additional information is well known to specialists, knowledgeable in this area, and is not related to this invention. For further details reference is made, for example, on the international Standard ISO/IEC 14496-3:2005 (E), part 3, subsection 4.

The audio decoder 200 includes an optional decoder/inverse quantization/device zoom 230, which is formed to decode encoded information about the spectral value 222 to perform inverse (inverse) quantization and re-scaled inversely quantized information about the spectral value, thereby obtaining the decoded information on the spectral value 232. The audio decoder 200 further includes an optional spectral preprocessor 240, which may be configured to perform one or more steps of spectral pre-processing. Some of the possible steps of spectral pre-processing, for example, explained in the International Standard ISO/IEC 14496-3: 2005 (E), part 3, subsection 4. Accordingly, the functional is the ability of the decoder/inverse quantitate/device zoom and optional spectral preprocessor 240, provide the result (decoded, and, optionally, pre-processed) time-frequency representation 242 coded audio information represented by the bit stream 210. The audio decoder 200 includes, as a key component, based on the application window signal Converter 250. Based on the application window signal Converter 250 is formed to be converted (decoded) time-frequency representation 242 in the audio signal time interval 252. To this end, based on the application window signal Converter 250 may be configured to convert the frequency-time domain to the time domain. For example, the inverter/charger for windowing 254 based on the application window signal Converter 250 may be configured to receive, as a time-frequency representation 242, the coefficients of the modified discrete cosine transform (MDCT coefficients) associated with overlapping time-frame coded audio information. Accordingly, the inverter/charger for windowing 254 may be configured to perform overlap transformation, in the form of an inverse modified discrete is about cosine transform (IMDCT), to get implemented through the organization of the open part of the time interval (frames) encoded audio information, and to perform an overlap-add further implemented through the organization of the parts window time interval (frames) through the operation of overlapping and adding. At the restoration (reconstruction) of the audio signal time interval 252-based time-frequency representation 242, that is, when performing inverse modified discrete cosine transform in combination with window management and operation of overlapping and adding Converter/device management window 254 may select a window from a variety of available window types to ensure the appropriate restoration (reconstruction), as well as to avoid any blocking artifacts.

Audio decoder also includes an optional post-processor time frame 260, which is formed to obtain the decoded audio information 212 on the basis of the audio signal time interval 252. However, it should be noted that the decoded audio information 212 may be identical to the audio signal time interval 252 in some implementations. In addition, the audio decoder 200 includes a selector window 270, which is formed to receive information what s about box code words of variable length 224, for example, of the optional deformatter payload bit stream 220. The selector window 270 is formed to provide information about the window 272 (for example, information about the type of window or the sequence information of Windows) Converter/device management window 254. It should be noted that the selector window 270 may or may not be part based on the application window signal Converter 250, depending on the actual implementation.

To summarize the above, the audio decoder 200 is formed to provide decoded audio data 212 based on the encoded audio information 210. The audio decoder 200 includes, as a key component, based on the application window signal Converter 250, which is formed to display the time-frequency representation 242, which describes the coded audio information 210, on the representation of the time interval 252. Based on the application window signal Converter 250 is formed to select a window from a variety of Windows, including Windows of different slopes of the transition (e.g., different lengths of the slope of the transition), and the Windows of different lengths conversion based on the information about the window 272. The audio decoder 200 includes, as another key component selector window 270, which is formed with the button to evaluate the information about box code words of variable length 224, to select a window to handle this part of time-frequency representation 242 associated with the given frame of audio information. Other components of the audio decoder, namely, deformatter payload bit stream 220, the decoder/inverse quantization/device zoom 230, spectral preprocessor 240 and the postprocessor time frame 260 may be considered as optional, but may be present in some executions of the audio decoder 200.

Next will be described the details regarding the selection window for converting/windowing performed by the Converter/controller Windows 254. However, given the importance of the choice of the various Windows, reference is made to the above explanations.

The audio decoder 200, preferably, has the ability to use the above-mentioned types of window "only_long_sequence" (only a long sequence), "long_start_sequence" (longest initial sequence), "eight_short_sequence" (a sequence of eight short), "long_stop_sequence" (long finite sequence) and "stop_start_sequence" (finite initial sequence). However, the audio decoder may optionally be adapted to the use of additional types of Windows, for example, so-called, "stop_1152_sequence (the final sequence 1152) and the so-called "stop_start_1152_sequence" (the horse is owned by the initial sequence 1152) (both of which can be used to transition from the encoded frame the field of linear prediction to the encoded frame frequency domain). In addition, the audio decoder 200 may further be configured to use additional types of Windows, such as the types of window 362, 366, 368, 382, all of which can be adapted to transition from an encoded frame in the frequency domain to the encoded frame the field of linear prediction. However, the use of window types 330, 332, 362, 366, 368, 382 can be considered as optional.

However, an important characteristic of the inventive audio decoder is a software especially effective solutions aimed at obtaining the corresponding window type of window information code words of variable length 224. As mentioned above, it will be explained below with reference to Fig.10A-10E.

Information about the window code words of variable length 224 usually includes 1 or 2 bits per frame. Preferably, the window information code words of variable length includes the first bit that carries the "window_length" information (about the length of the window) of the current frame, and the second bit bearing "transform_length" information (on the transform length) of the current frame, where the presence of the second bit ("transform_length" bit (the transform length)) depends on the value of the first bit ("window_length" bit (the window length)). Thus, the selector window 270 is formed to selectively evaluate one or two bits of information about the window ("window_length" (window length) and "transform_length" (length into the project) for a decision on the type of window, associated with the current frame depending on the value of the bit "window_length" (window length) associated with the current frame. However, in the absence of bit transform_length" (transform length), the selector window 270 may, of course, to assume that bit transform_length" (transform length) gets the default value.

In a preferred implementation, the selector window 270 may be formed to assess the syntax as described above with reference to Fig.6A, and to provide information about the window 272 in accordance with the specified syntax.

Provided that the audio decoder 200 is always running in basic mode (main mode) frequency domain, that is, that there is no switching between the basic mode (main mode) frequency domain and basic mode (main mode) the field of linear prediction, it may be sufficient to distinguish the above-mentioned five types open ("only_long_sequence" (only a long sequence), "long_start_sequence" (longest initial sequence), "long_stop_sequence" (long finite sequence), "stop_start_sequence" (finite initial sequence) and "eight_short_sequence" (a sequence of eight short)). In this case, the "window_length" information (about the length of the window) of the previous frame, the "window_length" information (about the length of the window) of the current frame and transform_length" information (on the transform length) of the current frame (if PT is PNA) can be sufficient for making decisions about the type of window.

For example, assuming that the operation is performed only in the basic mode (main mode) frequency domain (at least in the sequence of the three subsequent frames), from the fact that the "window_length" information (about the length of the window) of the previous frame shows a long slope transition (0), and that the "window_length" information (about the length of the window) of the current frame shows a long slope transition (0), we can conclude that the type of the window "only_long_sequence" (only long surface) associated with the current frame, without evaluation "transform_length" information (about the length of the transform), which in this case is not transmitted by the encoding device.

Again, provided that the operation is performed only in the basic mode (main mode) frequency domain, from the fact that the "window_length" information (about the length of the window) of the previous frame shows a long (right) the slope of the transition, and from the fact that the "window_length" information (about the length of the window) of the current frame indicates a short (right) the slope of the transition (value "1"), it can be concluded that the type of the window "long_start_sequence" (longest initial sequence) associated with the current frame even without evaluation "transform_length" information (on the transform length) of the current frame (which in this case may or may not be generated and/or transmitted by the encoding device).

Again, provided that the operation you is only within each group in the basic mode (main mode) frequency domain, from the fact that the "window_length" information (about the length of the window) of the previous frame indicates the presence of short (right) slope transition (value "1"), and that the "window_length" information (about the length of the window) of the current frame indicates a long (right) the slope of the transition (0), we can conclude that the type of the window "long_stop_sequence" (long finite sequence) associated with the current frame, even without evaluation "transform_length" information (about the length conversion) the current frame (which is usually not provided appropriate audio encoder, in any case).

If, however, the "window_length" information (about the length of the window) of the previous frame indicates the presence of short (right) the slope of the transition, and the "window_length" information (about the length of the window) of the current frame also shows the presence of a short slope transition (value "1"), you may need to evaluate transform_length" information (on the transform length) of the current frame. In this case, if "transform_length" information (on the transform length) of the current frame takes the first value (e.g. zero), the window type "stop_start_sequence" (finite initial sequence) associated with the current frame. Otherwise, i.e. if "transform_length" information (on the transform length) of the current frame takes a second value (e.g., unit), you can come to the conclusion of the Oia, that window type "eight_short_sequence" (a sequence of eight short) is associated with the current frame.

To summarize the above, the selector window 270 is formed to evaluate the "window_length" information (about the length of the window) of the previous frame and the "window_length" information (about the length of the window) of the current frame, to determine the type of window associated with the current frame. In addition, the selector window 270 is formed selectively, depending on the values of the "window_length" information (about the length of the window) of the current frame (and possibly also depending on the "window_length" information (about the length of the window) of the previous frame, or information about the underlying mode (main mode)), given the "transform_length" information (on the transform length) of the current frame, to determine the type of window associated with the current frame. Thus, the selector window 270 is formed to evaluate the information about box code words of variable length, to determine the type of window associated with the current frame.

Fig.6C shows a table representing a mapping "window_length" information (about the length of the window) of the previous frame, the "window_length" information (about the length of the window) of the current frame and transform_length" information (on the transform length) of the current frame on the window type of the current frame. "Window_length" information (about the length of the window) of the current frame and transform_length" information (on the transform length) of the current frame can be represented by information about the window is ogopogo words of variable length 224. The window type of the current frame can be represented by information about the window 272. The mapping described by the table in Fig.6C, may be performed by the selector window 270.

As you can see, the display may depend on the previous basic mode (the main mode). If the previous basic mode (main mode) is a basic mode (main mode) frequency domain" (abbreviated to "FD"), the display may take the form, as discussed above. If, however, the previous basic mode (main mode) is a basic mode (main mode) the field of linear prediction" (abbreviated as "LPD"), the display can be changed, as can be seen in the last two rows of the table in Fig.6s.

In addition, the display can be changed, if the subsequent basic mode (main mode) (i.e. basic mode (main mode) associated with the subsequent frame) is not basic mode (main mode) frequency domain, and the basic mode (main mode) the field of linear prediction.

The audio decoder 200 may, optionally, include an analyzer of the bitstream generated to analyze the bitstream 210 representing the encoded audio information, and to extract from the bitstream of bit length information of the inclination of the window (also defined here as the "window_length" information (about the length of the window), ictory to selectively extract, depending on the value of bit length information of the inclination of the window, one-bit information on the transform length (defined here as "transform_length" information (about the length conversion)). In this case, the selector window 270 is formed to selectively, depending on the length information of the inclination of the window frame, to use or ignore the information about the transform length to select the window type to handle this part (e.g., frame) time-frequency representation 242. The bit stream analyzer may, for example, be part of deformatter payload bit stream 220, and may allow the audio decoder 200 to properly manage the information about the window code words of variable length, as discussed above, and as also described with reference to Fig.10A-10E.

Switching between basic mode (main mode) frequency domain and basic mode (main mode) time interval

In some implementations the audio encoder 100 and the audio decoder 200 may be configured to switch between a basic mode (main mode) frequency domain and basic mode (main mode) the field of linear prediction. As explained above, it is assumed that the basic mode (main mode) frequency domain is the basic mode (the main mode),for which fit the above explanation. However, if the audio encoder can switch between the basic mode (main mode) frequency domain and basic mode (main mode) the field of linear prediction, can also be cross-attenuation and amplification (in the sense of the operations of intersection and addition) between the frames encoded in the basic mode (main mode) frequency domain, and the frames encoded in the basic mode (master mode) the field of linear prediction. Accordingly, the appropriate box must be selected to ensure the proper crossfade and strengthening between frames encoded in different basic modes (main mod). For example, in some implementations may be two types of Windows, namely, the types of window 330 and 332 shown in Fig.2B, which are adapted to transition from the basic mode (main mode) the field of linear prediction to the basic mode (main mode) frequency domain. For example, the type of window 330 may allow the transition between the coded frame the field of linear prediction, and encoded frame frequency region, with a long left-hand slope of the transition, for example, from the encoded frame the field of linear prediction to the encoded frame frequency region by using the window type "only_long_sequence" (only "long members shall etelnost), or window type "long_start_sequence" (longest initial sequence). Similarly, the window type 332 may provide a transition from the encoded frame the field of linear prediction to the encoded frame frequency region with a short left-hand slope of the transition (for example, from the encoded frame the field of linear prediction to the frame associated with the type of the window "eight_short_sequence" (a sequence of eight short) or "long_stop_sequence" (long finite sequence) or stop_start_sequence (finite initial sequence)). Accordingly, the selector window 270 may be formed to select the type of window 330, if it is detected that a previous frame preceding the current frame) is encoded in the field of linear prediction that the current frame encoded in the frequency domain and that the "window_length" information (about the length of the window) of the current frame indicates a long right-hand slope of the transition of the current frame (for example, the value "0"). On the contrary, the selector window 270 is formed to select the type of window 332 for the current frame, if detected that the previous frame is encoded in the field of linear prediction that the current frame encoded in the frequency domain and that the "window_length" information (about the length of the window) of the current frame indicates that long right-hand slope of the transition is associated with the current frame (for example, the value "1").

Thus, the inventive mechanism using information about the window code words of variable length can be applied even in the case where the transitions between encoding the frequency domain and coding linear prediction occur, without compromising the efficiency of the encoding.

Detailed description of the syntax of the bitstream

Next will be discussed the details regarding the syntax of the bitstream 192, 210, with reference to Fig.10A-10E. Fig.10A shows a representation of what intoxica, the so-called block of source data "unified speech and audio coding" ("USAC") -"USAC raw_data_block". You may notice that the input data block USAC may include so-called single-channel element ("single_channel_element ()") and/or dual element ("channel_pair_element ()"). However, the input data block USAC may, of course, include more than one single element and/or more than one channel element.

Now with reference to Fig.10b, which shows a representation of the syntax of a single element, will be explained in some more details. As can be seen in Fig.10b, the single element may include information about the basic mode (main mode), for example in the form of bit core_mode" (basic mode (main mode)). Information on basic mode (main mode) can show that coded whether the current frame in the basic mode (main mode) the range of linear prediction, or basic mode (main mode) frequency domain. When the current frame is encoded in the basic mode (master mode) the field of linear prediction, the single element may include a flow channel area linear prediction ("LPD_channel_stream ()") when the current frame encoded in the frequency domain, a single element may include a flow channel frequency domain ("FD_channel_stream ()").

Now referring is a of Fig.10C, which shows a representation of the syntax of a dual element will be explained in some additional ' details. Dual element may include first information about the underlying mode (main mode), for example, in the form of bit core_mode0", describing the basic mode (main mode) of the first channel. In addition, dual element may include a second basic mode (main mode) in the form of bit core_mode1", describing the basic mode (main mode) of the second channel. Thus, different or the same basic modes (basic mode) can be selected for the two channels described dual element. Optional dual element may include General ICS information ("ICS_info ()") for both channels. This common ICS information advantageous if the configuration of the two channels described dual element, similar. Naturally, the overall ICS information, preferably, is used only if both channels are encoded in the same basic mode (master mode).

In addition, dual element includes a flow channel area linear prediction ("LPD_channel_stream ()") or stream channel frequency domain ("FD_channel_stream ()") associated with the first channel based on the basic mode (main mode), defined for the first channel (information on basic mode (main mode) "core_mode0").

The AOC is e, dual element includes a flow channel area linear prediction ("LPD_channel_stream ()") or stream channel frequency domain ("FD_channel_stream ()"). For the second channel based on the basic mode (main mode) used for encoding the second channel (which may be indicated by information about the underlying mode (main mode) "core_mode1").

Now, with reference to Fig.10d, which shows the syntax for representing ICS information will be described in some additional detail. It should be noted that the ICS information may be included in a dual-element, or in an individual flow channel frequency domain (which will be discussed with reference to Fig.10E).

ICS information includes a single-bit (or single-digit code) "window_length" information (about the length of the window), which describes the length of the right-hand slope of the transfer window associated with the current frame, for example in accordance with the definition given in Fig.7a. If, and only if, the "window_length" information (about the length of the window) takes a predetermined value (for example, "1"), ICS information includes additional single-bit (or single-digit code) "transform_length" information (about the length of the conversion). "Transform_length" information (about the length conversion) describes the kernel size MDCT, for example, in accordance with the definition given in Fig.7b. If the "window_length" and the formation (about the length of the window) takes the value different from a predetermined value (for example, the value "0"), "transform_length" information (about the length of the conversion) is not included (or skipped) in ICS information (or the corresponding bit stream). However, in this case, the bit stream analyzer audio decoder can set the restored value of the variable "transform_length" (transform length) of the decoder to the default value (for example, "0").

In addition, ICS information may include so-called "window_shape" information (about the shape of the window), which can be single-bit (or single-digit code) information describing the form of the transition window. For example, "window_shape" information (about the shape of the window) may describe whether the transfer window shape sine/cosine or a derived form of the Kaiser-Bessel (Kaiser-Bessel). For more details on the meaning of "window_shape" information (about the shape of the window can be accessed, for example, to the international standard ISO/IEC 14496-3:2005 (E), part 3, subsection 4. However, it should be noted that "window_shape" information (about the shape of the window) leaves the main window type unchanged, and that the overall characteristics (long slope transition or short the slope of the transition; a large transform length or short length conversion) remain unchanged through window_shape" information (about the shape of the window).

Thus, in implementations according to the invention "window", that is, the shape of the transitions is determined separately from the window type, i.e. the total length of the slopes of the transitions (large or short) and the transform length (large or short).

In addition, ICS information may include information about the type of window that is dependent on the scale factor. For example, if the "window_length" information (about the length of the window) and "transform_length" information (about the length conversion) shows that the current window type is "eight_short__sequence (a sequence of eight short), ICS information may include "max_sfb" information describing the maximum range of scaling factors, and scale_factor_grouping" information describing the grouping of scaling factors. Details regarding this information describes, for example, in the international standard ISO/IEC 14496-3:2005 (E), part 3, subsection 4. The alternative, that is, if the "window_length" information (about the length of the window) and "transform-length" information (about the length conversion) indicates that the current frame is not the type of the window "eight_short_sequence" (a sequence of eight short), ICS information may include only "max_sfb" information (about the maximum range of scaling factors) (but does not include "scale_factor_grouping" information about grouping of scaling factors)).

Hereinafter will be described in some further detail with reference to Fig.10E, which shows a representation of the syntax clause the current channel frequency domain ("FD_channel_stream ()"). The flow channel frequency domain includes "global_gain" information describing the global gain, associated with the spectral values. In addition, the flow channel frequency domain includes ICS information ("ICS_info ()"), if such information is not yet included in the channel element, including the stream channel frequency domain. Regarding ICS information details have been described with reference to Fig.10d.

In addition, the flow channel frequency domain includes data scale factor ("scale_factor_data ()")" which describe the scaling to be applied to the values (or ranges of scale factor) of the decoded information on the spectral value, or time-frequency representation. In addition, the flow channel frequency domain includes the coded spectral data, which can, for example, be arithmetically coded spectral data (ac_spectral_data ()"). However, you may have a different way of encoding spectral data. Regarding the data scale factor and coded spectral data, reference again is made to the international standard ISO/IEC 14496-3:

2005 (E), part 3, subsection 4. However, different ways of encoding data scale factor and spectral data, of course, can be applied, if desired.

Next privatedancer conclusions and evaluation of the implementation of the inventive concept. The implementation of the present invention create the concept of reducing the required bit rate, which may be applied, for example, in combination with circuits audio encoding, defined in international standard ISO/IEC 14496-3:2005 (E), part 3, subsection 4. However, the concept discussed here can also be used in combination with the so-called approach "unified speech and audio coding" (USAC). Based on existing definitions of the bitstream and the decoder architectures, this invention creates a modification of the syntax of the bitstream, which simplifies the syntax signaling sequences window, saves the bit rate without increasing the complexity, and does not change the shape of the curve of the output signal of the decoder.

Next, the background and the idea underlying the present invention, will be briefly discussed and summarized. In current audio coding according to ISO/IEC 14496-3:2005 (E), part 3, subsection 4, as well as in USAC working draft of a code word of a fixed length equal to two bits, is sent to inform about the sequence window. Additionally, information about the sequence window of the previous frame is sometimes necessary to determine the correct sequence.

However, it was found that, taking into account this information and making the length of the code is the first word of the variable (one or two bits), you can reduce the bit rate. New code word has a maximum length equal to two bits ("window_length" (window length), and in some cases "transform_length" (transform length)). Thus, the bit rate never increases (in comparison with the conventional approach).

New code word ("window_length" (window length), and in some cases "transform_length" (transform length)) consists of a single bit ("window_length" (window length) indicating the length of the right-tilt window, and a single bit ("transform_length" (transform length) indicating the length of the conversion. In many cases, the length of the transform can be uniquely obtained by the information of the previous frame, namely the sequence of Windows and basic mode (the main mode). Thus, there is no need to resend this information. Accordingly, bit transform_length" (transform length) is omitted in such cases, which reduces the bit rate.

Next will discuss some details regarding the proposal for the new syntax of the bit stream according to the present invention. Proposed new syntax of the bitstream provides a more direct execution and notification of the sequences window, because it transmits only information that is actually required to determine the sequence of which he is the current frame, that is, the right tilt of the window and the transform length. Left tilt window of the current frame is obtained from the right tilt window of the previous frame.

The proposal (or the proposed new bit stream) clearly separates information about the length of the tilt window ("window_length" information (about the length of the window), and the transform length ("transform_length" information (about the length conversion)). A code word of variable length - a combination of both, where the first bit of "window-length of the window determines the length of the right tilt of the window (the current frame), and the second bit "transform_length" (transform length) determines the length of the MDCT (current frame) according to Fig.7a and 7d. When the "window_length" (window length)=0, that is, selects the long slope of the window, "transform_length" (length conversion) can be omitted (or actually omitted), as the kernel size MDCT, is equal to 1024 samples (or 1152 samples in some cases) is required.

Fig.7C gives a brief overview of all combinations "window_length" (window length) and "transform_length" (transform length). You may notice that there are only three significant combination of two single-bit units of information "window_length" (window length) and transform_length" (transform length), such that the transfer "transform_length" (length conversion) may be omitted if the "window_length" information (about the length of the window) is set to "zero" without negative impact on the transfer of desirable the information.

Hereinafter will be briefly summed up display "window_length" information (about the length of the window) and "transform_length" information (about the length change) on "window_sequence information (sequence window) (which describes the type of window to be used for the current frame). Table of Fig.6A shows how the element window_sequence (sequence of frames) of the bitstream of the current status of work projects under USAC standard can be obtained from the new proposed elements of the bitstream. This demonstrates that the proposed change is transparent in terms of information content.

In other words, the inventive syntax, you can reduce the bit rate for messages about the type of window, which is based on the use of information about box code words of variable length, may be of "full" information content, which is typically transmitted using a higher bit rate. In addition, the inventive concept can be applied in a conventional audio coding devices and decoders, for example, an audio encoder or the audio decoder according to ISO/IEC 14496-3:2005 (E), part 3, subsection 4, or according to the current USAC working draft without any significant modifications.

Next will assess the achievable savings bits.However, it should be noted that in some cases saving bits may be somewhat less than indicated, and in other cases, savings bits can be even much more than discussed saving bits. "Assessment of the economy of bits shown in Fig.9 illustrates the estimation of saving bits for transcoding lossless comparing the bit streams using the new syntax of the bitstream, with the usual bitstreams (these bit streams were presented for proposals). As you can clearly see, the transfer of bits "transform_length" (transform length) can be omitted in accordance with the invention, in 95.67% of all frames in the frequency domain for mono 12 kbit/s and up to 95.15% of all frames in the frequency domain for a 64 kbit/S.

As can be seen from Fig.9, on average, from 2 to 24 bits per second can be saved without compromising the quality of the audio content. Because the bit rate is a very critical resource for the preservation and transmission of audio content, this improvement can be considered as very valuable. In addition, it should be noted that in some cases the improvement in bit rate may be much larger, for example, if the frames are selected relatively short.

To summarize the foregoing, this invention provides a new syntax bit is Otok to alert the sequences window. The new syntax of the bitstream saves the data rate and is more logical and more flexible compared to the old syntax. It is easily implemented and has no disadvantages regarding complexity.

Comparison with the current USAC working draft

Next will be discussed the proposed text changes to the technical description of the current USAC working draft. In order to incorporate the proposed inventive changes according to this invention, the following sections should be updated:

In the pending definition of "payloads for the audio object USAC type" that describes the syntax of the so-called ICS information, the basic syntax must be replaced by the syntax shown in Fig.10b.

In addition, the "data element" "windowjsequence (screens) should be replaced with the following definition of data elements "window_length" (window length) and "transform_Iength" (transform length):

window_length: single-bit field that defines the length of the tilt box to the right part of this sequence of Windows; and

transform_length: single-bit field that defines the length of the conversion is used for this sequence of Windows.

Furthermore, the definition of the reference element window_sequence (sequence window must be the obavljeno as follows:

window_sequence: shows ' screens as defined in the "window_length" (window length) of the previous frame, "transform_length" (transform length) and "window_length" (window length) of the current frame and core_mode" (basic mode (main mode)) of the next frame according to the table shown in Fig.8.

Fig.8 shows the definition of the reference element window_sequence (sequence of frames), which is optional, can be obtained from the "window_length" information (about the length of the window) of the previous frame, the "window_length" information (about the length of the window) of the current frame, "transform_length" information (on the transform length) of the current frame and the "core mode" information about the underlying mode (main mode)) of the next frame.

In addition, the usual definition of "window_sequence" (sequence window) and "window_shape" (the shape of the window) can be replaced by more appropriate definitions of "window_length" (the window length), "transform_length" (length conversion) and "window_shape" (the shape of the window) as follows:

window_length: single-bit field that defines the length of the tilt box to the right of this window;

transform_length: single-bit field that defines the length of the transform, is used for the window; and

window_shape: a bit specifying which window function is selected.

The method according to Fig.11

Fig.11 shows a block diagram of a method of providing an encrypted sound in the information based on the input audio information. The method 1100 of Fig.11 includes a step 1110 of providing the sequence of the parameters of the audio signal on the basis set implemented by the organization of the open parts of the input audio information. When providing the sequence of parameters of the audio signal switches between a usage of Windows having a longer slope of the transition, and Windows that have a shorter slope of the transition, as well as between using Windows associated, moreover, with two or more different lengths of conversion to accommodate the type of window to receive implemented through the organization of open parts of the input audio information in dependence on the characteristics of the input audio information. The method 1100 also includes a step 1120 encode information about the box that describes the type of window that is used to convert the current portion of the input audio information through the use of code words of variable length.

The method according to Fig.12

Fig.12 shows a block diagram of a method for providing a decoded audio information on the basis of an encoded audio information. The method 1200 of Fig.12 includes a step 1210 evaluation window information code words of variable length, to select a window from a variety of Windows, including Windows of different slopes and open, connected, furthermore, with different lengths of conversion, to handle this part of time-frequency representation associated with a given frame of audio information.

The method 1200 also includes a step 1220 display this part of time-frequency representation, which describes the encoded audio information, on the representation of the time interval by using the selected window.

It should be noted that the methods according to Fig.11 and 12 can be added to any feature and any functionality described herein relative to the inventive device and the inventive characteristics of the bit stream.

Implementation alternatives

Although it has been described several aspects in the device context, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a stage of the method or the characteristics of the stage of the way. Similarly, the aspects described in the context of the stage of the way, also represent a description of a corresponding block, or item, or characteristics of the respective devices.

Any stage of the inventive method can be performed through the use of a microprocessor, a programmable computer, fpga (programmable gate array), or any other hardware, for example, devices which funds data.

Inventive encoded audio signal may be stored on a digital storage medium or can be transmitted on the transmission channel, such as a wireless transmission channel, or a wired transmission channel, such as the Internet.

Depending on the specific requirements of execution of the invention can be performed in hardware or in software. Execution can be implemented through the use of digital media data, such as a floppy disk, a DVD (digital video disc), Blu-Ray, CD, ROM (permanent memory, ROM), FROM (programmable permanent memory, EPROM), EPROM (erasable programmable permanent memory, EEPROM) EEPROM (electrically erasable programmable permanent memory, EEPROM) or flash memory with stored electronically-readable control signals, which interacts (or may interact) with a programmable computer system so as to ensure the implementation of the corresponding method. Therefore, the digital media data may be readable by a computer.

Some of the implementation according to the invention includes a data carrier having electronically-readable control signals that can interact with the programming is my computer system, so that is one of the methods described here.

Usually the implementation of the present invention may be implemented as a computer program product with the control program; the control program operates to perform one of the methods when the computer program product runs on a computer. The control program may, for example, be stored on machine-readable media.

Other implementation include a computer program for performing one of the methods described here, stored on computer-readable media.

In other words, the implementation of the inventive method, therefore, is a computer program having a control program for executing one of the methods described here, when the computer program runs on a computer.

Further implementation of creative ways, so is the media (or digital data carrier or computer readable medium) including recorded thereon a computer program perform one of the methods described here.

Further implementation of the inventive method, therefore, is a data stream or a sequence of signals representing the computer program for performing one of the methods described here. The data flow or succession of alnost signals, for example, be configured to be transmitted via the data transmission channel, for example via the Internet.

Further implementation includes a processor such as a computer or programmable logic device formed for or adapted to perform one of the methods described here.

Further implementation includes a computer with a pre-installed computer program for performing one of the methods described here.

In some implementations of a programmable logic device (for example, gate matrix with operational programming) can be used to perform some or all of the functionality of the methods described here. In some implementations, a gate matrix with operational programming can interact with the microprocessor to execute one of the methods described here. Typically, the methods are preferably performed by any device hardware.

The above implementation is just an illustration of the principles of the present invention. Have in mind that modifications and changes in the arrangements and details described herein will be obvious to specialists in this field. Therefore, it is assumed to be limited only by the framework under consideration by the formulas of the invention, and not by the specific details presented here by describing and explaining the implementation.

1. Audio decoder (200) for providing a decoded audio information (212) based on the encoded audio information (210), including based on the application window signal Converter (250) configured with the ability to display the time-frequency representation (242) audio information, which describes the coded audio information (210), on the representation of the time interval (252) audio information, where based on the application window signal Converter formed so as to select a window from multiple Windows(310, 312, 314, 316, 318), includes box of different slopes of the transition (310a, a, 314a, a, a, 310b, 312b, 314b, 316b, 318b), and Windows associated to the same with different lengths of conversion, by using information about the window (272); where the audio decoder (200) includes a selector window (270), allowing to assess the information about the window code words of variable length (224) to select the window to handle this part of time-frequency representation associated with a given frame of audio information.

2. Audio decoder (200) p. 1, where the audio decoder includes a bit stream analyzer (220), which analyzes the bit stream (210) representing the encoded audio is the information, and to extract from the bitstream (210) single-bit information about the length of the tilt window ("window_length"), and to selectively extract, depending on the value of bit length information of the inclination of the window, one-bit information about the length of the conversion ("transform_length"); and where the selector window (270) is formed to selectively, depending on the length information of the inclination of the window, to use or not to include information about the length of the conversion, to select the window type(310, 312, 314, 316, 318) to handle this part of time-frequency representation (242).

3. Audio decoder (200) according to one of p. 1, where the selector window (270) is formed to select the window type(310, 312, 314, 316, 318) for processing a current portion of the time-frequency information (242), so that left the length of the tilt window for processing a current portion of the time-frequency representation (242) corresponded to the right the length of the tilt of the window, used for processing the previous part of time-frequency representation (242).

4. Audio decoder (200) p. 3, where the selector window (270) is formed to select between the first type (310) and second type (312) window, depending on the value of bit length information of the inclination of the window if the right length tilt window for processing the previous part of time-frequency representation (242) accepts a long value, and if before the previous portion of the audio information, the current portion of the audio information and the audio information is encoded by using the basic mode is the main fashion the frequency domain, where the selector window (270) allows you to choose the third type (314) open in response to the first value bit length information of the inclination of the window, pointing to the long right-hand slope of the window, if the right length tilt window for processing the previous part of the sound takes a short value, and if the previous portion of the audio information, the current portion of the audio information and the audio information is encoded through the use of a basic mode (main mode) frequency domain; and where the selector window (270) formed to choose between the fourth type (316) open, and the fifth type (318) window, which defines a short sequence Windows (a-319h), depending on the bit length information conversion, if the single-bit information about the length of the tilt window takes a second value indicating a short right-hand slope of the window, if the right length tilt window for processing the previous part of the sound information (242) takes a short value, and if the previous portion of the audio information, the current portion of the audio information and the audio information - everything deruyts through the use of a basic mode (main mode) frequency domain; where the first type (310) window includes a relatively greater left-sided length of the tilt window, a relatively greater right length tilt open and relatively long conversion; where the second type of window (312) includes a relatively greater left-sided length of the tilt window, a relatively short right length tilt open and relatively long conversion; where the third type of window (314) includes a relatively short left length tilt window, a relatively greater right length tilt open and relatively long conversion; where the fourth type of window (316) includes a relatively short left length tilt Windows, a relatively short right length tilt open and relatively long conversion; and where the sequence of Windows (a-319h) the fifth type of window (318) specifies the overlay multiple Windows (a-319h) associated with a single segment of the audio information (242), and where each of the Windows (a-319h) multiple Windows includes a relatively short length conversion, a relatively short left tilt open and relatively short right-hand slope of the window.

5. Audio decoder (200) p. 1, where the selector window (270) is formed to selectively evaluate the bit length conversion window information code words of variable length (224) tecamachalco audio information, only if the window type for processing the previous part of the sound information (242) includes the right length tilt window corresponding to the left-side length of the tilt window sequence window (318) short Windows, and one-bit information about the length of the tilt of the window associated with the current part time-frequency representation (242), the driver determines the length of the tilt of the window corresponding to the right length tilt window sequence window (318) short Windows.

6. Audio decoder (200) p. 1, where the selector window (270) configured to receive information about previous basic mode associated with the previous frame of the audio information, and describe the underlying encoding mode of the previous frame of the audio information; and where the selector window (270) allows you to select the window type for processing a current portion of the time-frequency representation (242) depending on information about the previous basic mode, and depending on the window information code words of variable length (224) associated with the current segment of the audio information (242).

7. Audio decoder (200) p. 1, where the selector window (270) allows you to get information about the next basic mode associated with the subsequent part of the sound information (242), and describe the underlying encoding mode further audio information; and where the selector is con (270) is formed, to select a window for processing a current portion of the audio information (242) depending on information about the following basic mode, and depending on the window information code words of variable length (224) associated with the current part time-frequency representation (242).

8. Audio decoder (200) p. 7, where the selector window (270) allows you to select the window(362, 366, 368, 382), with a shortened right-hand slope, if the information about the subsequent basic mode indicates that the subsequent portion of the audio information is encoded through the use of basic mode field of linear prediction.

9. Audio encoder (100) for providing the encoded audio information (192) based on the input audio information (110); an audio encoder (100) includes based on the application window signal Converter (130) formed to provide a sequence of parameters of the audio signal (132) on the basis set implemented by the organization of the open parts of the input audio information (110), which is based on the application window signal Converter (130) is formed, to accommodate the types window to get implemented through the organization of open parts of the input audio information in dependence on the characteristics of the input audio information (110), where the basis of the p on the application window signal Converter (130) is formed, to switch between using Windows(310, 312, 314, 316, 318), has a longer slope of the transition, and Windows that have a shorter slope of the transition, as well as to switch between a usage of Windows having two or more different lengths of conversion; and where based on the application window signal Converter (130) is formed to determine the type of window used to convert the current portion of the input audio information in dependence on the type of window used to convert the previous part of the input audio information and the audio content of the current portion of the input audio information; where the audio encoder is generated to encode information about the window (140), describing the type of window used to convert the current portion of the input audio information (110) through the use of code words of variable length.

10. Audio encoder (100) for p. 9, where the audio encoder formed with the opportunity to provide a code word of variable length so that a code word of variable length that is associated with this part of time-frequency representation includes one-bit information describing the length of the tilt of the window used for this part of time-frequency representation (132); and where the alarm sound the TV encoder (100) is formed, to provide a code word of variable length so that a code word of variable length selectively included information on the length of the conversion from single-digit code that describes the length of the transformation used to obtain this part of time-frequency representation (132) if, and only if the information with single-digit code that describes the length of the tilt of the window, takes a predetermined value.

11. Audio encoder (100) for p. 9, where the audio encoder generated to encode information about the length of the tilt window describing the right length tilt window used to obtain this part of time-frequency representation, and information about the transform length, describing the length of the transformation used to obtain this part of time-frequency representation (132) through the use of the individual bits of the bitstream (192), and to make a decision about the presence of the bit, which carries information about the transform length, depending on the value of information about the length of the tilt of the window.

12. Method (1200) providing a decoded audio information on the basis of an encoded audio information, including the assessment of the (1210) window information code words of variable length for the selection of multiple Windows, including Windows, various N. the clones transition window, related to the same with different lengths of conversion, to handle this part of time-frequency representation associated with a given frame of audio information; and displaying (1220) this part of time-frequency representation, which describes the encoded audio information, on the representation of the time interval by using the selected window.

13. The method (1100) for providing the encoded audio information on the basis of an input audio information, including the provision of (1110) sequence parameters of the audio signal on the basis set implemented by the organization of the open parts of the input audio information, where switching is performed between a usage of Windows having a longer slope of the transition, and Windows that have a shorter slope of the transition, as well as between using Windows associated to the same with two or more different lengths of conversion to accommodate the types window to get implemented through the organization of open parts of the input audio information in dependence on the characteristics of the input audio information; and encoding information describing the types of Windows used to convert parts of the input audio information through the use of code words of variable length.

14. Machine-readable but Itel information with the stored computer program, having a program code for performing the method according to p. 12, when the computer program runs on a computer.

15. The computer-readable storage medium stored thereon a computer program having a program code for performing the method according to p. 13, when the computer program runs on a computer.



 

Same patents:

FIELD: radio engineering, communication.

SUBSTANCE: invention relates to encoding and decoding an audio signal having the harmonic or speech content, which can be subjected to time-deformation processing. An encoder includes a window function controller, a window set-up device, a time deformation device with final quality check functionality, a time/frequency converter, a TNS stage or encoding device quantiser. The window function controller, time deformation device, TNS stage or an additional ambient noise analyser are controlled by the signal analysis results obtained by the time deformation analyser or signal classifier. The decoder applies an ambient noise operation using an estimate of the controlled ambient noise depending on the harmonic or speech characteristic of the audio signal.

EFFECT: high encoding efficiency.

16 cl, 37 dwg

FIELD: measuring instrumentation.

SUBSTANCE: invention refers to telemetry and data compression during measurement data translation in monitoring systems, in measurements in inaccessible places, and in measurement data storage, e.g. in aircraft and vessel black boxes. The method involves context creation for measurement data compression where entropy of measurement data typical for particular measurement devices and conditions is adjusted, by measured parameter modulation as well, for actual and/or required measurement accuracy, and compression rate and transmitted/stored data content are adjusted by source data system forming one or more linked data arrays for measurement parameters and one or more data arrays for measurement content.

EFFECT: limitation of data volume during measurements, increased data coding rate and data security ensured.

7 cl, 5 dwg

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering. A method of transmitting and receiving information from multiple information sources to information consumers in a digital communication system, the method comprising, at the transmitting side a primary binary digital stream, selecting consecutive groups with a given number p of bits per group; identifying for each group a corresponding sequence of binary codes and one-to-one transformation of each group of bits into an ordered set of bits with a corresponding sequence of codes of said bits; in addition to values of the set of binary codes, using other given code values, wherein the codes may assume values only from the set of binary codes, and the last code may assume values only from the additional code values. The formed digital bit stream is converted to a signal stream. Upon receiving each of the consecutively received bits, binary codes and codes from the set of additional code values are identified, the ordered set of bits and the corresponding code sequence are identified and the primary digital stream of binary bits is restored uniquely without loss of information.

EFFECT: increasing information capacity without loss of information.

1 tbl

FIELD: information technology.

SUBSTANCE: method for information transmission and reception between the first and the second transmitting-receiving sides, as per which initial information or its part of a specified volume, which is transmitted with the first side and presented with an orderly in-sequence numbered set of integral values, which corresponds to it, is converted by the proposed method with conversion elements known only at the first side, and transmitted to the second side. It is received at the second side, converted by means of the proposed method with conversion elements known only at the second side, and transmitted to the first side. It is received at the first side, again converted by means of the proposed method with the conversion elements known only at the first side and transmitted again to the second side. It is received at the second side, converted by means of the proposed method with the conversion elements known only at the second side, and initial information or some part of the specified volume is restored with the proposed method.

EFFECT: improving efficiency of information transmission and reception systems between the first and the second transmitting-receiving sides.

FIELD: physics, video.

SUBSTANCE: group of inventions relates to data processing for performing video compression. The method includes encoding a plurality of video frames or portions thereof according to a first encoding format, the first encoding format being optimised for transmission to a client device over a current communication channel; transmitting said plurality and concurrently encoding same according to a second encoding format, the second encoding format having a relatively higher-quality compressed video and/or a lower compression ratio than the first encoding format; storing the first plurality of video frames encoded in the second encoding format on a storage device and providing access and playback of the first plurality of video frames encoded in the second encoding format on the client device.

EFFECT: improved capacity to manipulate audio and video media and shorter loading time.

18 cl, 57 dwg

FIELD: radio engineering, communication.

SUBSTANCE: method of transmitting and receiving information from an information source to a consumer in a digital communication system, in which each of series-arranged symbols in a message is transmitted to a one-to-one corresponding ordered set of bits with a given number and sequence of codes of said bits; in addition to values of a set of binary codes 0 and 1, other given code values are input, wherein codes from the first to the second last in the sequence of codes, corresponding to the ordered set of bits, can assume values only from the set of binary codes 0 and 1, and the last code can assume values only from the additionally input code values. Upon reception, each of the successively received bits is identified with binary codes 0, 1 and codes from the set of additionally input code values; the ordered set of bits and the corresponding sequence of codes situated between the previous received bit with a code from the set of additionally input code values and the next received bit with a code from the set of additionally input code values, including it, is identified and a message symbol is uniquely restored from said sequence of codes.

EFFECT: increasing information capacity without losing information.

1 tbl

FIELD: radio engineering, communication.

SUBSTANCE: system for transmitting and receiving information from an information source to a consumer in a digital communication system, in which the receiving side includes a symbol converting unit which is configured to transmit each of series-arranged symbols in a message to a one-to-one corresponding ordered set of bits with a given number and sequence of codes of said bits; inputting, in addition to values of a set of binary codes 0 and 1, other given code values, wherein codes from the first to the second last in the sequence of codes, corresponding to the ordered set of bits, can assume values only from the set of binary codes 0 and 1, and the last code can assume values only from the additionally input code values. At the receiving side, the system includes a unit for restoring symbols of the primary alphabet, which is configured to identify each of the successively received bits with binary codes 0, 1 and codes from the set of additionally input code values and identify the ordered set of bits and the corresponding sequence of codes, and uniquely restore a symbol of the message from said sequence of codes.

EFFECT: increasing information capacity without losing information.

1 dwg

FIELD: radio engineering, communication.

SUBSTANCE: system for transmitting and receiving information optionally from multiple information sources to consumers via digital communication, in which at the transmitting side a unit for converting a digital stream of binary bits is capable of selecting successive groups with a given number p of bits per group, identifying for each group a corresponding sequence of binary codes and one-to-one corresponding conversion of each group of bits into an ordered set of bits with a corresponding sequence of codes of said bits, inputting, in addition to values of a set of binary codes 0 and 1, other given code values, wherein codes from the first to the second last in the sequence of codes corresponding to the ordered set of bits can assume values only from the set of binary codes 0 and 1, and the last code can assume values only from M additionally input code values. At the receiving side, the system includes a unit for restoring the primary digital stream of binary bits, capable of identifying the ordered set of bits and the corresponding sequence of codes, and uniquely restoring the primary digital stream of binary bits without loss of information.

EFFECT: increasing information capacity without losing information.

1 tbl, 1 dwg

FIELD: information technology.

SUBSTANCE: disclosed is a method of processing a digital file of the image, video and/or audio type which comprises a phase for putting into line per colour layer and/or per audio channel of digital data of any audio, image and video file, a compression phase using an algorithm in which each compressed value VCn of position N is obtained by subtracting from the value Vn of same position N of the original file, a predetermined number of successive compressed values (VCn-1, VCn-2,…) calculated previously, and a restoration phase using an algorithm in which each restored value VDn of position N is obtained by adding to the value VCn,of the same position of the compressed file, a predetermined number of successive compressed values (VCn-1, VCn-2,…).

EFFECT: providing high quality and less compression of the digital file.

7 cl, 7 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio encoding/decoding techniques and particularly to a method of encoding/decoding audio and a lattice-type vector quantising system. The method involves: dividing frequency domain coefficients of an audio signal for which a modified discrete cosine transform (MDCT) has been performed into a plurality of coding sub-bands, and quantising and coding an amplitude envelope value of each coding sub-band to obtain coded bits of amplitude envelopes; performing bit allocation on each coding sub-band, and performing normalisation, quantisation and coding respectively on vectors in a low bit coding sub-band with pyramid lattice vector quantisation and on vectors in a high bit coding sub-band with sphere lattice vector quantisation to obtain coded bits of the frequency domain coefficients; multiplexing and packing the coded bits of the amplitude envelope and the coded bits of the frequency domain coefficients of each coding sub-band, then sending them to a decoding side.

EFFECT: providing good quality of encoding a voice information source through combined lattice vector quantisation using pyramid lattice vector quantisation and sphere lattice vector quantisation.

25 cl, 7 tbl, 9 dwg

FIELD: physics, computer engineering.

SUBSTANCE: group of inventions relates to means of encoding and decoding a signal. The encoder comprises a first layer encoding section which encodes an input signal in a low-frequency range below a predetermined frequency. First encoded information is generated. The first encoded information is decoded to generate a decoded signal. The input signal is broken down in a high-frequency range above a predetermined frequency into a plurality of frequency subbands. A spectrum component is partially selected in each frequency subband. An amplitude adjustment parameter is calculated, which is used to adjust the amplitude of the selected spectrum component in order to generate second encoding information.

EFFECT: high efficiency of encoding spectral data of a high-frequency part and high quality of the decoded signal.

14 cl, 15 dwg

FIELD: physics, video.

SUBSTANCE: invention relates to a method and an apparatus for improving audio and video encoding. A signal is processed using DCTIV for each block of samples of said signal (x(k)), wherein integer transform is carried out using lifting steps which represent sub-steps of said DCTIV. Integer transform of said sample blocks using lifting steps and adaptive noise shaping is performed for at least some of said lifting steps, said transform providing corresponding blocks of transform coefficients and noise shaping being performed such that rounding noise from low-level magnitude transform coefficients in a current one of said transformed blocks is decreased whereas rounding noise from high-level magnitude transform coefficients in said current transformed block is increased, and wherein filter coefficients (h(k)) of a corresponding noise shaping filter are derived from said audio or video signal samples on a frame-by-frame basis.

EFFECT: optimising rounding error noise distribution in an integer-reversible transform (DCTIV).

26 cl, 13 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to means of generating an output spatial multichannel audio signal based on an input audio signal. The input audio signal is decomposed based on an input parameter to obtain a first signal component and a second signal component that are different from each other. The first signal component is rendered to obtain a first signal representation with a first semantic property and the second signal component is rendered to obtain a second signal representation with a second semantic property different from the first semantic property. The first and second signal representations are processed to obtain an output spatial multichannel audio signal.

EFFECT: low computational costs of the decoding/rendering process.

5 cl, 8 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio signal transmission and is intended for processing an audio signal by varying the phase of spectral values of the audio signal, realised in a bandwidth expansion scheme. The audio signal processing method and device comprise a window processing module for generating a plurality of successive sampling units, a plurality of successive units including at least one added audio sampling unit, an added unit having added values and audio signal values, a first converter for converting the added unit into a spectral representation having spectral values, a phase modifier for varying the phase of spectral values and obtaining a modified spectral representation and a second converter for converting the modified spectral representation into a time domain varying audio signal.

EFFECT: high sound quality.

20 cl, 15 dwg

FIELD: physics, acoustics.

SUBSTANCE: invention relates to audio encoding technologies. An audio encoder for encoding an audio signal has a first coding channel for encoding an audio signal using a first coding algorithm. The first coding channel has a first time/frequency converter for converting an input signal into a spectral domain. The audio encoder also has a second coding channel for encoding an audio signal using a second coding algorithm. The first coding algorithm differs from the second coding algorithm. The second coding channel has a domain converter for converting an input signal from an input domain into an output domain audio signal.

EFFECT: improved encoding/decoding of audio signals in low bitrate circuits.

21 cl, 43 dwg, 10 tbl

FIELD: physics, computation hardware.

SUBSTANCE: invention relates to audio signal processing. Proposed method comprises audio signal filtration for division into two frequency bands and generation of multiple sub bands for signal of every frequency band. Note here that for signal in one frequency band multiple signals of sub bands are generated by conversion from time band to frequency band. For another frequency band, multiple signals of sub bands are generated with the help of bank of sub band filters. Proposed device comprises one processor and one memory device with computer program code. Note also that one memory device and one computer program code are configured to make at least one processor control over process implementation.

EFFECT: higher accuracy of audio signals due to improved signal source SNR.

31 cl, 8 dwg

FIELD: physics, acoustics.

SUBSTANCE: audio encoder (100) for encoding audio signal readings includes a first encoder with time superposition (aliasing) (110) for encoding audio readings in a first encoding region according to a first windowing rule, with attachment of a start window and a stop window. The audio encoder (100) further includes a second encoder (120) for encoding readings in a second encoding region, which processes a frame format-set number of audio readings and comprising a series of audio readings of an encoding mode stabilisation interval, which applies a different, second encoding rule, wherein the frame of the second encoder (120) is an encoded representation of time-consecutive audio signals, the number of which is set by the frame format. The audio encoder (100) also includes a controller (130) which performs switching from the first encoder (110) to the second encoder (120) according to the characteristics of the audio readings and corrects the second windowing rule when switching from the first encoder (110) to the second encoder (120) or modifies the start window or stop window of the first encoder (110) while keeping the second windowing rule unchanged.

EFFECT: improved switching between multiple working regions when encoding sound in both the time and frequency domains.

34 cl, 28 dwg

FIELD: physics.

SUBSTANCE: input spectrum is broken into a plurality of subbands. A representative value is calculated for each subband using an arithmetic mean and a geometric mean. Nonlinear conversion is performed with respect to each representative value. The nonlinear conversion characteristic is amplified as the value increases. The representative value, which was subjected to nonlinear conversion for each subband, is smoothed in the frequency domain.

EFFECT: faster spectral smoothing and higher quality of the output audio signal.

11 cl, 15 dwg

FIELD: information technology.

SUBSTANCE: audio signal decoder designed to provide a decoded representation of an audio signal based on an encoded representation of the audio signal, which includes information on evolution of a temporary deformation loop, includes a temporary deformation loop computer, a device for changing the scale of the temporary deformation loop data and a deformation decoder. The temporary deformation loop computer is designed to generate temporary deformation loop data through multiple restarting from a predefined starting value of the temporary deformation loop based on information on evolution of the temporary deformation loop, which describes time evolution of the temporary deformation loop. The device for changing the scale of temporary deformation loop data is designed to change the scale of at least part of temporary deformation loop data to avoid, reduce or eliminate non-uniformity during restart in a scaled version of the temporary deformation loop. The deformation decoder is designed to provide a decoded representation of an audio signal based on an encoded representation of the audio signal and by using the scaled version of the temporary deformation loop.

EFFECT: supporting low bit rate with reliable reconstruction of the required temporary deformation information at the decoder side.

14 cl, 40 dwg

FIELD: information technology.

SUBSTANCE: in the encoder, spectrum residue form vector candidates are stored in a spectrum residue form codebook (305), spectrum residue gain candidates are stored in a spectrum residue gain codebook (307), and the spectrum residue form vector and the spectrum residue gain are successively output from the candidates in accordance with an instruction from a search unit (306). A multiplier (308) multiplies the spectrum residue form vector candidate by the spectrum residue gain candidate and sends the result to a filtration unit (303). Using the internal status of the filter, the filtration unit (303) filters the fundamental tone given by the filter status setting unit (302), lag T which is output by a lag setting unit (304), and the spectrum residue form vector and the controlled gain.

EFFECT: obtaining a high quality decoded signal with scalable coding of the initial signal in first and second layers, even if the unit of the second or higher layer encodes at a low bit rate.

9 cl, 21 dwg

FIELD: technologies for encoding audio signals.

SUBSTANCE: method for generating of high-frequency restored version of input signal of low-frequency range via high-frequency spectral restoration with use of digital system of filter banks is based on separation of input signal of low-frequency range via bank of filters for analysis to produce complex signals of sub-ranges in channels, receiving a row of serial complex signals of sub-ranges in channels of restoration range and correction of enveloping line for producing previously determined spectral enveloping line in restoration range, combining said row of signals via synthesis filter bank.

EFFECT: higher efficiency.

4 cl, 5 dwg

Up!