Space-time prediction for bi-directional predictable (b) images and method for prediction of movement vector to compensate movement of multiple images by means of a standard

FIELD: video encoding, in particular, methods and devices for ensuring improved encoding and/or prediction methods related to various types of video data.

SUBSTANCE: the method is claimed for usage during encoding of video data in video encoder, containing realization of solution for predicting space/time movement vector for at least one direct mode macro-block in B-image, and signaling of information of space/time movement vector prediction solution for at least one direct mode macro-block in the header, which includes header information for a set of macro-blocks in B-image, where signaling of aforementioned information of space/time movement vector prediction solution in the header transfers a space/time movement vector prediction solution into video decoder for at least one direct mode macro-block in B-image.

EFFECT: creation of improved encoding method, which is capable of supporting newest models and usage modes of bi-directional predictable (B) images in a series of video data with usage of spatial prediction or time distance.

2 cl, 17 dwg

 

Related patent applications

This nepredvidatelne application U.S. patent application claims priority to and incorporates by reference all of the disclosure jointly filed provisional patent application U.S. No. 60/385965 of 3 June 2002 on "Spatio-temporal prediction for danaperino predicted (B) pictures, and a prediction motion vector for motion compensation of multiple images using the standard".

The technical field

The present invention relates to video encoding and, more specifically, to methods and devices for providing improved coding and/or methods of forecasting associated with different types of video data.

Prior art

The desire for increased coding efficiency in the process of coding resulted in the adoption of group standards JVT (Joint Video Team of the joint group on data processing) more complex models and profiles, information describing movement of a given macroblock. These models and modes aimed at benefiting the temporal redundancy that may exist in the sequence. See, for example: ITU-T Video Coding Expert Group (VCEG), "JVT Coding - (ITU-T H.26L & ISO/IEC JTC1 Standard - Working Draft Number 2 (WD-2)", ITU-T JVT-B118, Mar. 2002; and/or: Heiko Schwarz and Thomas Wiegand, "Tree-structured macroblock partition". Doc. VCEG-N17, Dec. 2001.

There is a continuing need for improved methods and devices that can support the latest models and modes and also have the possibility of introducing new models and modes to provide the benefits from the use of improved methods of encoding.

The invention

The above and other problems are solved, for example, a method, intended for use of encoding video data within a sequence of video frames. The method includes identifying at least part of the at least one video frame as danaperino predicted (B) pictures, and selective coding In the image, using at least spatial prediction to encode the at least one parameter of motion associated with the image. In some illustrative embodiments, the embodiment In the image may include a block, a macroblock, a subunit, a layer or other portion of the video frame. For example, when used as part of the macroblock, the method provides for the formation of direct macroblock.

Some additional illustrative embodiments embodiment the method further includes applying vector prediction linear or non-linear movement referred to In the image based on the at least one reference image is Azania, which is at least another portion of the video frame. For example, in some embodiments the embodiment of the method uses a prediction vector median motion to create at least one motion vector.

In other illustrative embodiments, embodiments, in addition to spatial prediction, the method can also handle at least one part of at least one video frame, to selectively encoded image using temporal prediction for encoding at least one temporal parameter of movement associated with the image. In some cases, the temporal prediction includes the bi-directional temporal prediction, for example, based at least in part, predicted (P) frame.

In some other embodiments, embodiments of the method also selectively determines the applicable scaling for the temporary motion parameter based at least in part on the temporal distance between the frame of the predictor and the frame, which includes In-picture. In some embodiments the embodiment of the information of the temporary distance is encoded, for example, in the header or something similar, the layout of the data associated with the encoded-image.

Brief description of drawings

In Yes ineichen the invention is illustrated by description of specific variants of its embodiment with reference to the accompanying drawings, in which similar components and characteristics are denoted by the same reference position.

Fig. 1 is a block diagram showing an illustrative computing environment that is suitable for use with some variant of the embodiment of the present invention,

Fig. 2 is a block diagram showing illustrative of a typical device which is suitable for use with some variant of the embodiment of the present invention,

Fig. 3 is an illustrative diagram showing the spatial prediction associated with parts of the image according to some illustrative variations of the embodiment of the present invention,

Fig. 4 is an illustrative diagram showing a direct prediction in encoding images according to some illustrative variations of the embodiment of the present invention,

Fig. 5 is an illustrative diagram showing what happens when there is a scene change or when the next door unit is internally encoded according to some illustrative variations of the embodiment of the present invention,

Fig. 6 is an illustrative chart showing the processing of the adjacent internally encoded block in the existing codecs (coder-decoders), where the motion is assumed to be zero according to some of Illustra the active variants of the embodiment of the present invention,

Fig. 7 is an illustrative chart showing the processing in the direct mode, when the reference image adjacent block in the subsequent P-picture is different from zero according to some illustrative variations of the embodiment of the present invention,

Fig. 8 is an illustrative diagram showing an illustrative scheme in which MVFWand MVBWderived from the spatial prediction according to some illustrative variations of the embodiment of the present invention,

Fig. 9 is an illustrative diagram showing how spatial prediction solves the problem of changes of scenes and such problems according to some illustrative variations of the embodiment of the present invention,

Fig. 10 is an illustrative diagram showing a joint spatio-temporal prediction for direct mode encoding images according to some illustrative variations of the embodiment of the present invention,

Fig. 11 is an illustrative diagram showing a prediction motion vector of the current block based on information of the reference image macroblock predictor according to some illustrative variations of the embodiment of the present invention,

Fig. 12 is an illustrative diagram showing how to use more candidates for prognose the Finance direct mode, especially if within-image uses a bi-directional prediction according to some illustrative variations of the embodiment of the present invention,

Fig. 13 is an illustrative diagram showing, as may be limited In image when using the next and previous reference images according to some illustrative variations of the embodiment of the present invention,

Fig. 14 is an illustrative diagram showing the projection of the adjacent motion vectors for the current standard for direct temporal prediction according to some illustrative variations of the embodiment of the present invention,

Fig. 15 a-b - illustrative diagrams showing the predictors of the motion vector for one MV in different configurations according to some illustrative variations of the embodiment of the present invention,

Fig. 16 a-b - illustrative diagrams showing the predictors of the motion vector for one MV with 8x8 partitions in different configurations according to some illustrative variations of the embodiment of the present invention,

Fig. 17 a-b - illustrative diagrams showing the predictors of the motion vector for one MV with additional predictors to split 8x8 according to some illustrative variations of the embodiment of the present invention.

Detailed description of the preferred options of the incarnation

Below is described and illustrated in the accompanying drawings several enhancements for use with danaperino predicted (B) images within the sequence. Some improvements coding in direct mode and the prediction motion vector is improved using the methods of spatial prediction. Other improvements for more accurate prediction, the prediction motion vector includes, for example, information of the temporary distances and subunit. These and other improvements presented here significantly improve the performance of any of the applicable system and the logic of coding.

While the present invention describes these and other illustrative methods and devices, it should be borne in mind that the methods of the present invention is not limited to the described and shown in the drawings examples, it is obvious that they can also be adapted to other similar existing and future schemes of coding, etc.

Before describing these illustrative methods and devices the following section provides suitable illustrative operating environment, such as computing devices and other types of devices/appliances.

Illustrative operations the organizational environment

The invention is described below with reference to the drawings, where the same reference position indicate similar elements as embodied in a suitable computing environment. Although not required, the invention will be described in the General context for executing the computer instructions, such as program modules, executed by the personal computer.

Generally, program modules include procedures, programs, objects, components, data structures, etc. that perform separate tasks or implement particular abstract data types. Professionals should be clear that the invention can be implemented in practice with other configurations of computer systems, including handheld devices, multiprocessor systems, microprocessor-based or programmable devices consumer electronics, network PCs, minicomputers, mainframe computers, portable communication devices, etc.

The invention can also be implemented in practice in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local or remote memory devices.

Fig. 1 illustrates an example of a suitable computing environment 120, to the th can be carried out in the following system, devices and methods. Illustrative computing environment 120 is only an example of a suitable computing environment and does not imply any restrictions on the scope of use or functionality described herein are improved methods and systems. Computing environment 120 should not be interpreted as having any dependency or requirement relating to any component or combination thereof, illustrated in the computing environment 120.

Described herein are improved methods and systems operate with numerous other universal or specialized environments or configurations of computer systems. Examples of well known computing systems, environments and/or configurations that may be suitable include personal computers, server computers, thin clients", "fat clients", pocket or compact portable computers, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, mini-computers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like, but are not limited to.

As shown in Fig. 1, computing environment 120 includes a computing device for General purposes in the form of a computer the EPA 130. Computer components 130 may include one or more processors or processor modules 132, memory 134 and bus 136, which connects various system components including the system memory 134, processor 132.

Bus 136 represents one or several types of bus structures including a memory bus and a memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. For example, such architectures include bus, industry standard architecture (ISA)bus, a microchannel architecture (MCA)bus, enhanced industry standard architecture (EISA), local bus Association standards in the field of video electronics (VESA) and tyre interaction of peripheral components (PCI), also known as the expansion bus.

The computer 130 typically includes a variety of machine-readable media. Such media can be any available media that can be accessed by the computer 130, and they include both volatile and nonvolatile media, removable and non removable media.

In Fig. 1 system memory 134 includes machine-readable media in the form of volatile memory, such as a mass storage device with random access (RAM, NVR) 140, and/or in the form of e is organizational memory such as a permanent storage device (ROM, RAM) 138. The system basic input / output system (BIOS) 142, containing the basic routines that help to transfer information between elements within computer 130, such as during startup, is stored in ROM 138. NVR 140 typically contains data and/or program modules that are immediately accessible to the processor 132 and/or with whom he is currently working.

The computer 130 may also include other removable/non-replaceable, volatile/nonvolatile computer storage media. For example, in Fig. 1 illustrates a drive 144 hard disks for read and write on a fixed non-volatile magnetic media (not shown and typically called a "hard drive"), a drive 146 magnetic disks for reading and recording on a removable non-volatile magnetic disk 148 (e.g., a "floppy disk"), and drive 150 optical disks for reading and writing to a removable, nonvolatile optical disk 152 such as a CD-ROM/R/RW, DVD-ROM/R/RW/+R/RAM or other optical media. Drive 144) hard disk drive, a magnetic disk drive 146 and drive 150 optical drive attached to the bus 136 to one or more interfaces 154.

The drives and their associated machine-readable media provide argentovivo preservation of machine-readable instructions, data structures, program modules and other data for the computer 130. Although described here for illustrative environment uses a hard disk, a removable magnetic disk 148, and a removable optical disk 152, specialists should be understood that in the illustrative operating environment, you can also use other types of machine-readable media that can store data accessible to the computer, such as magnetic cassettes, miniature flash memory cards, digital video disks, devices, NVR, ROM and the like.

A number of program modules may be stored on the hard disk, magnetic disk 148, optical disk 152, ROM 138, or NVR 140, including an operating system 158, one or more application programs 160, other program modules 162, and data 164 program.

Improved the described methods and systems can be implemented in an operating system 158, one or more application programs 160, other program modules 162, and/or data 164 program.

The user can enter commands and instructions to the computer 130 through input devices such as keyboard 166 and positioning device 168 (such as "mouse"). Other input devices (not shown) may include a microphone, joystick, game console, satellite dish, serial port, scanner, camera, etc., These and rogierstraat input connected to the CPU module 132 via the interface 170 user input, which is connected to the bus 136, but may also be connected by other interface or structures of the tire, such as a parallel port, game port or a universal serial bus interchange (USB).

Monitor 172 or a display device of another type may also be connected to the bus 136 via an interface, such as a video adapter 174. In addition to the monitor 172, personal computers typically include other peripheral output devices (not shown), such as speakers and printers that can be connected via the interface 175 peripheral output devices.

The computer 130 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer 182. The remote computer 182 may include many or all of the elements and features described herein relative to computer 130.

Logical connection shown in Fig. 1, represent a local area network (LAN LAN) 177 and WAN, HS) 179. Such networking environments are usually found in offices, computer networks scale enterprises, intranets and the Internet.

When using in a network environment LAN computer 130 is connected to the LAN 177 through a network interface or adapter 186. When used in a network environment, a global network of computer is usually the content is t modem 178 or other means for establishing communications HS 179. Modem 178, which may be internal or external, can be connected to the system bus 136 through the user input interface 170 or other suitable mechanism.

In Fig. 1 depicts a particular embodiment of the GS via the Internet. Here, for communication with at least one remote computer 182 via the Internet, the computer 130 uses the modem 178.

In a networked environment, program modules shown in relation to the computer 130 or its components, can be stored in a remote storage device. Thus, for example, as shown in Fig. 1, the remote application program 189 may reside in a storage device of the remote computer 182. It should be clear that shown and described a network connection are illustrative and that to establish lines of communication can be used other means.

In Fig. 2 shows a block diagram illustrative of a device 200, which also provides the benefits described here are methods and devices. The device 200 is any one or more devices or devices that are operatively configured to process video data and/or other related data types in accordance with all or part described here are methods and devices and their equivalents. Thus, the device 200 can make the shape of a computing device according to Fig. 1 or some other form, such as, for example, a wireless device, a portable communications device, personal digital assistant, a video player, a TV, a DVD (digital versatile disk) player, a CD player, karaoke system, telephone booth, digital video projector, flat video display, digital set-top box, gaming videoevent etc. In the specified example, the device 200 includes a logical system 202 configured for processing video data, the source of the video data 204, configured to input video data in a logical system 202, and at least one module 206 display capable of displaying at least part video data for viewing by the user. Logic 202 is a hardware, firmware, software and/or any combination of them. In some embodiments, embodiment, for example, logic 202 includes a compression/decompression (codec), etc. the Source video data 204 represents any device that can provide, transmit, display and/or at least instantly save video data suitable for processing logic 202. Source video playback illustrative as shown in the device 200 or outside of it. The module 206 of the display video data represents any device that the user m who can view directly or indirectly to observe it visual results are presented video. Additionally, in some embodiments, embodiments of the device 200 may also include some means for playback or other processing of the audio data associated with the video data, and therefore shows module 208 audio playback.

Whereas the examples in Fig. 1 and 2, and others like them, the following sections provide illustrative methods and devices that can be at least partially implemented in practice with the use of similar environments and similar devices.

Encoding danaperino predicted (B) pictures and prediction of motion vectors

This section describes illustrative of the improvements that can be implemented to encode danaperino predicted (B) pictures and prediction of the motion vector in the coding system or similar systems. Illustrative methods and devices can be applied for prediction of motion vectors and improvements in the design of the direct mode processing B-image. Such methods and devices are particularly suitable for codecs standards multiple images, such as standards JVT, and can achieve significant gains at the expense of coding, especially for sequences pan or scene change.

Bidirectional is predicted (B) pictures are an important part of most standards and systems for video encoding because they tend to increase coding efficiency of such systems, for example, compared with using only the predicted (P) pictures. Improving coding efficiency is primarily achieved with compensating bi-directional movement, which may significantly improve the prediction motion compensated and, thus, provides the possibility of encoding information at a significantly reduced balance. Moreover, the introduction of direct mode prediction for a macroblock/block within such images may further significantly increase the efficiency (for example, more than 10-20%), so as not encoded information movement. This operation can be performed, for example, by providing forecasting information both direct and reverse movements, directly derived from the motion vectors used in the corresponding subsequent macroblock of the reference image.

For example, in Fig. 4 illustrates a direct predictive encoding image at timet+1on the basis of the P frame at timetandt+2and applies the motion vectors (MVs). Here the assumption is made that the object in the image moves at a constant speed. This allows us to predict the current position within the image, without requiring PE is Adachi no motion vectors. The motion vectors () direct mode depending on the motion vectorslocated next to the macroblock in the first subsequent reference R-image are calculated by the formula

whereTRB- the temporal distance between the current B-picture and the reference picture specified by the direct motion vector MV adjacent macroblock, andTRD- the temporal distance between successive reference image and the reference image specified by the direct motion vector MV adjacent macroblock.

Unfortunately, there are some cases in which an existing direct mode does not provide an adequate solution, thus inefficiently using the properties of the specified mode. In particular, existing schemes of this mode usually cause the motion parameters direct macroblock is equal to zero when the next macroblock in the subsequent P-picture is internally encoded. In this case it essentially means that the macroblock In the image will be encoded as the average of the two adjacent macroblocks in the first subsequent and previous P samples. Hence, this immediately yields the following: if the macroblock assetcentre coded, how can we know the extent to which it is connected with the adjacent macroblock its reference image. In some situations, it may be a small link or this link might be missing. Therefore, it is possible that the coding efficiency of the direct mode can be reduced. Extreme cases can take place when changing scenes, as illustrated in Fig. 5. Fig. 5 illustrates what happens when there is a change of scene in the video sequence and/or what happens when the next block is internally encoded. Here, in this example, obviously the lack of communication between the two reference images in the presence of a scene change. In this case, the bidirectional prediction could give a slight advantage or not at all would have advantages. As such direct mode could be useless. Unfortunately, the known embodiment of a direct mode limit it so that it always did bidirectional prediction macroblock.

Fig. 7 depicts an illustrative diagram showing how, in accordance with some illustrative variations of the embodiment of the present invention, processed direct mode, when the reference image adjacent block in the subsequent P-picture is great who is from zero.

An additional problem associated with the direct mode macroblock, there is, when used as a motion compensation reference multiple images. Until recently, for example, JVT standard provided information distance synchronization (TRDandTRIn), providing, thus, the ability to correctly scale parameters. Recently this was changed in the new version of the codec (see, for example, the publication "Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, "Joint Committee Draft (CD) of Joint Video Specification (ITU-T Rec. H.264 ISO/IEC 14496-10 AVC)", ITU-T JVT-C167, May, 2002", which is incorporated into this description by reference). In the new version the parameters of the motion vector of the subsequent R-image should be scaled equally for predicting direct mode, without taking into account the information of the reference image. This could lead to a significant deterioration of the characteristics of the direct mode, since the assumption of constant movement is no longer valid.

However, even if there were options available temporal distance, not always it is undeniable that the use of direct mode as defined earlier, is the most suitable solution. In particular, for images that are closer to the first direct reference image, the correlation with the image would be much stronger than with the subsequent etal is authorized image. Extreme example that could contain such cases could represent the sequence in whichscene Andchanged tothe sceneΒand then goes back tothe scene(For example, as may occur in the Bulletin news and so on). All of the above could significantly impair the characteristics of encoding images as direct mode will not be effectively used in the encoding process.

Given these and other circumstances, in contrast to previous definitions direct mode, using only temporal prediction according to some aspects of the present invention appears to be a new type of macroblock, which are considered as temporal prediction and spatial prediction. Used types of prediction can depend, for example, information from the reference image to the first subsequent reference R-image.

According to some other aspects of the present invention can additionally significantly improve the prediction of motion vectors for P-and B-images, when using the benchmarks multiple images, given the temporal distance, if they are available.

These improvements are embodied in some illustrative methods and device is wah, as is described below. The methods and devices can reduce the data rate while achieving similar or better quality.

Improvement direct mode

In most known systems of coding direct mode is designed as a scheme bidirectional prediction, in which the motion parameters are always projected as a temporary way of motion parameters in the subsequent P-pictures. This topic provides an improved method for direct mode, in which spatial information may also/alternatively be considered for such projections.

As necessary may be embodied in one or more of the following illustrative methods, for example, depending on the complexity or system specifications.

One method is to implement a spatial prediction of the parameters of the motion vector of direct mode without regard to forecasting. Spatial prediction can be performed, for example, using existing methods of predicting the motion vector used for coding the motion vector (as, for example, the median prediction). If you use the standards of multiple images, it can also be considered the reference image adjacent blocks (even though no such limitation and canwas would always use the same standard, for example 0).

The motion parameters and the reference image could be forecasted as follows, with reference to Fig. 3, which illustrates the spatial prediction associated with parts a-E (e.g., macroblocks, layers, etc), assuming they are available, and part of the image. Here E is projected primarily from A, B, C as the median (A,B,C). If it really is out of the picture, is used instead of D. If B, C and D are outside the image, then use only And where, if not exists, it is replaced by the point (0,0). Professionals should be clear that the spatial prediction may also be performed at the level of subunits.

Generally speaking, spatial prediction can be viewed as a linear or nonlinear function of all available information movement, calculated within a image or group of macroblocks/blocks within the same image.

There are various methods that can be adapted for prediction reference picture for direct mode. For example, one method may be to select the minimum reference image among the forecasts. In another method can be selected the median of the reference image. In some ways it can be made a choice between the minimum reference image is agenies and the median of the reference image, for example, if the minimum is equal to zero. In other embodiments, the embodiment of a higher priority may be given either vertical or horizontal predictors (a or B) in view of their possible strong correlation with E.

If one of the predictions does not exist (for example, all of the surrounding macroblocks are predicted from the same direction FW or BW or are internal), there is only one existing forecast (forecasting one direction), or could make predictions from one of the available forecast. For example, if there is a direct prediction, then:

Temporal prediction is used for macroblocks, if the subsequent P-the standard is not internal, as in the existing codecs. In Fig. 8 MVFWand MVBWderived from the spatial prediction (median MV of the surrounding macroblocks). If any forecast is not available (for example, no prediction device), then a single direction. If the subsequent P-the standard is internal, spatial prediction may be used instead of the above. Assuming that there are no limits, if one of the predictions is not available, then the direct mode mode becomes PR is generowania in the same direction.

This could provide significant benefits in terms of coding when you change scenes, for example, as shown in Fig. 9 and/or even in the presence of damping in the sequence. As is illustrated in Fig. 9, the spatial prediction can be used to solve a scene change.

If the information of the temporary distances is not available in the codec, the time prediction is not as effective in direct mode for blocks when located near the reference P-block has non-zero reference image. In this case, can also be used for spatial prediction, as described above. Alternatively, you can estimate the parameters of the scale, if one of the surrounding macroblocks also uses the same reference image, which is located near the reference P-block. Moreover, there may be provided special treatment for the case of zero motion (or close to zero motion) with non-zero standard. Here, irrespective of the temporary distances, vectors of the forward and backward movement could be taken equal to zero. However, the best solution would be to always check the information of the reference image of the surrounding macroblocks and on its basis to decide in this case it is necessary to perform processing to direct the m mode.

More specifically, for example, by setting a non-zero reference signal, can be considered's look through subcases:

Case a: temporal prediction is used, if the motion vectors near the P-block is equal to zero.

Case B: If all the surrounding macroblocks use different reference images, and not located near the P-model, the spatial prediction turns out to be the best choice, and temporal prediction is not used.

Case C: If the flow of movement in the image is quite different from those in the P-reference picture, instead, is used for spatial prediction.

Case D: the spatial or temporal prediction macroblocks in direct mode could be provided in the image header. In order to decide what should be used, could be performed a preliminary analysis of the image.

Case E: the correction of the parameters of the interim forecasting based on spatial information (or Vice versa). Thus, for example, if it turns out that both has the same or approximately the same phase information, the spatial information could be a very good candidate for predicting direct mode. The correction could also be done with phase correcting those who amym forecasting accuracy subpixel.

Fig. 10 illustrates a joint spatio-temporal prediction for direct mode when encoding In images. Here, in this example, the direct mode may be a mode 1-4-x directional, depending on the available information. Instead of using bi-directional prediction for direct mode macroblocks can be performed mnogovershinnoe extension of such a regime, and can be used multiple predictions.

Taking into account the above direct mode macroblocks can be forecasted using from one to four possible motion vectors depending on the available information. This problem can be solved, for example, on the basis of the mode adjacent macroblock P is the reference image and on the basis of the surrounding macroblocks in the current image. In this case, if the spatial prediction too different from a temporary, one of them must be selected as the only prediction in favor of another. As described above, the spatial prediction could favor another reference image different from the temporal prediction, the same macroblock could be predicted from more than 2 reference images.

JVT standard does not limit the first subsequent etal is n P-picture. Therefore, in this standard, the image may be represented In the image, as illustrated in Fig. 12, or mnogovershinnoe (MH) image. This implies that the macroblock has more vectors. The latter means that you can use this property to increase the efficiency of direct control through more effective use of additional information movement.

In Fig. 12 the first subsequent reference image is In the image (image In8and In9). This allows you to use more candidates for predicting direct mode, especially if In-picture uses a bi-directional prediction.

In particular, you can do the following:

a) If located near the reference block in the first subsequent benchmark uses bidirectional prediction to calculate motion vectors of the current block using the corresponding motion vectors (forward and reverse). Since the vector reverse movement pattern corresponds to a subsequent reference image, it is necessary to pay special attention to the evaluation of current motion parameters. You should pay attention, for example, in Fig. 12, where the first subsequent reference image is In the image (image In8and In9). This exacerbated the government allows the use of more candidates for predicting direct mode, especially, if in In-picture uses a bi-directional prediction. Thus, as illustrated, the reverse motion vector In8andMV8bwcan be calculated as 2MV7bwdue to the temporal distance between the In8,In7and R6. Similarly for9the reverse motion vector may be accepted asMV7bwif they belong to the In7. You can also restrict them, that they belonged to the first subsequent P-picture, in this case, these motion vectors can be scaled accordingly. A similar conclusion can be drawn about the vectors of direct motion. Patterns of multiple images or internal macroblocks can be processed just as stated above.

b) If you are using bi-directional prediction for the adjacent block, in this example, it is possible to evaluate four possible prediction for one macroblock for the case of direct mode, designing and reversing the vectors of the forward and backward movement of the benchmark.

C) depending on the temporal distance can be used for selective projection and inversion. According to this solution selects the motion vectors of the reference image, which are more reliable for prediction. For example, considering illustrats the Yu in Fig. 12, it is possible to notice that In8much closer to P2than R6. This circumstance implies that the reverse motion vector In7may not be very reliable forecasting. Therefore, in this case, the motion vectors of the direct mode can only be calculated from the forward prediction In7. However, for the In9both motion vectors seem to be adequate enough to predict and, therefore, can be used. Such decisions and information can also be received and/or maintained in the image header. Can also be implemented with other terms and policies. For example, can be considered an additional spatial accuracy of forecasting and/or phase of the motion vector. In particular, note that if the vectors of the forward and backward movements have a relationship, then the vector reverse movement could be too unreliable to use.

Standard single image for the image.

There is a special case with only one reference image for the image (although usually necessary forward and backward reference), regardless of how many reference images used in P-pictures. From observations of the sequences encoding the modern JVT codec, for example, it was observed that if cf is W standard single image to reference multiple images when using images, even though the characteristics of the encoding, P-picture for the case of multiple image almost always exceed those for standard single image, the same is not always true In images.

One reason for this observation is that the title of the reference image is used for each macroblock. Given that images rely more on information than the P-picture, the header information of the reference image reduces the number of bits that are transmitted to the residual information for a given data rate, thereby, reduces the efficiency. A fairly easy and effective solution could consist in the choice of reference only one image for any compensation, direct or reverse movement, thus not requiring the transmission of any information of a reference image.

The above case is considered with reference to Fig. 13 and 14. As is illustrated in Fig. 13, the image can be limited to only one subsequent and previous reference image. Thus, to calculate the motion vector of direct mode is required projection of the motion vectors. Projection of adjacent vectors MV for the current benchmark for temporal direct prediction is illustrated in Fig. 14 (note that the possibility is about, that TDD,0>TDD,1). Thus, in this example, the motion parameters of the direct mode are calculated by projecting the motion vectors belonging to different reference images, two reference images, or through the use of spatial prediction, as in Fig. 13. Note that these options not only provide the opportunity for reduced complexity encoding In images, but also tend to reduce memory requirements, because you have to remember a few less images (for example, a maximum of two), if images are taken as the reference In images.

In some cases, the reference image of the first subsequent reference image may no longer be available in the buffer pattern. This circumstance can lead to a problem of estimating direct mode macroblocks, and require special handling such cases. Obviously, this is no problem, if you use the standard single image. However, if desired standards of multiple images, possible solutions include the design of motion vectors or the first direct reference image or a reference image that is the closest to an inaccessible image. Any solution would be viable, although al the alternative solution again could be spatial prediction.

Refinement of the prediction motion vector for motion compensation with reference single and multiple images.

The prediction motion vector for motion compensation with reference multiple images can significantly affect the performance of coding and In-, and P-images. Existing standards, such as for example, JVT, not always consider the reference image macroblocks used in forecasting. The only thing taken into account such standards, it is only when one macroblock prediction uses the same standard. In this case, to predict the movement uses only the specified predictor. The reference image is ignored if one or all of the predictors use a different pattern.

In this case, for example, according to some additional aspects of the present invention, it is possible to scale the predictors according to their temporal distance relative to the current standard. In Fig. 11 illustrates the prediction motion vector of the current block (C) taking into account the information of the reference image macroblock prediction (Pr) and characteristics of the relevant settings (e.g., scaling of the predictors).

If the predictors A, B and C use reference images with temporal distancesTRA, TRInandTRCrespectively, and the current reference image is the temporal distance equal toTRthen the median predictor is calculated as follows:

If you must use an integer calculation, it may be simpler to put the multiplication operation inside the function median, increasing, thus, the accuracy. The division can also be replaced by shift, but it reduces performance, while it may be necessary processing shift with sign (-1 N=-1). Thus, it is very important in such cases to have the information of the temporary distances available to perform proper scaling. It can be available in the header, if not predicted otherwise.

The above prediction motion vector is generally offset by the median, which means that forecasting is selected the median value from a set of predictors. If you use only one type of macroblock (e.g., 16x16) with one motion vector (MV), the predictors can be set, for example, as illustrated in Fig. 15. Here the MV predictors are depicted for a single MV. In Fig. 15A MB is not in the first row or the last column. In Fig. 15B MB is the last column. In Fig. 15V MB is in the first row is.

The JVT standard improves this situation by taking into account the case when there is only one of the three predictors (i.e., the macroblocks are inside or use a different reference image in the case of prediction of multiple images). In this case, is used to predict only the existing or the same reference predictor, and all others are not checked.

The internal encoding is not always imply that there is a new object or scene has changed. Instead, may, for example, be the case when the estimation and motion compensation are not adequate to represent the current object (for example, the search range used by the motion estimation algorithm, the quantization residue and so on) and when the best results can be achieved through the internal encoding. Available predictors can be adequate enough to provide good solutions for the predictor motion vector.

Interest is the account of the sub-blocks within a macroblock, each of which has different information movement. For example, MPEG-4 and H.263 can have up to four such subunits (e.g., size 8x8), although the JVT standard allows up to sixteen sub-blocks, with the possibility of processing for various block sizes (e.g., 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, and 16x16). In addition to t the th, JVT standard allows the use of 8x8 internal subunits, thus leading to more complexity.

Total cases standards JVT, MPEG-4/H.263 (8x8 and 16x16) are illustrated in Fig. 16A-predictor set for the macroblock 16x16, with a layout similar to the one shown in Fig. 15A-b, respectively. Here is a predictor motion vector for a single motion vector with 8x8 partitions. Although described predictors could in some cases give acceptable results, it appears that they cannot cover all possible predictions.

In Fig. 17A-b shows a layout similar to Fig. 15A-b respectively. Here in Fig. 17A-there are two additional predictor, which can also be taken into account at the stage of forecasting (C1and a2). If account is also taken of 4x4 blocks, it will increase the number of possible predictors for four.

Instead of using the median of the three predictors A, B and C (A1,B and C2now you can have some additional and perhaps more reliable options. For example, one can observe that the predictors And1and C2essentially too close to each other, and it may be that they are not very representative at the stage of prediction. Instead, a more reliable solution seems to be the choice of predictors And1,In and sub> 1due to their diversity. An alternative may be the choice of values And2instead of a1but they may be too close to the predictor B. Simulation assumes that the first case is usually the best choice. For the last column instead of a1you can use a2. For the first row, you can use either one of the values And2and a1or even their average value. This implementation within the standard JVT provided cash of up to 1%.

Previous case adds some criteria for the last column. Analyzing, for example, Fig. 17B can be seen that there is a tendency to provide the best available separation. Thus, an additional solution could be the selection And2,With1and (from the upper left position). However, this cannot be recommended always, because this implementation may adversely affect the characteristics of the right-wing predictors.

An alternative solution would be to use the average values of the predictors within the macroblock.

Then the median value can be obtained in the following way:

To calculate the median values of the row/column median can be calculated as

Another possible solution may be a solution Median5". This solution is probably the most challenging because of calculations (for example, can be used in quick sort, and "boiling" sort), but it could potentially give the best results. If taken into account, for example, the 4x4 blocks, you can use the solution Median9":

Given that the JVT standard allows for the possibility of internal subunits in the inner macroblock (for example, the tree structure of the macroblock), this circumstance can be taken into account when predicting motion. If the subunit (for example, only macroblocks top or left)that should be used to predict the MV is inner, then you can instead use an adjacent subunit. Thus, if a1is internal, and A2"no, forecasting And1can be replaced by A2. An additional possibility is to replace one missing internal macroblock MV predictor from the top left position. For example, in Fig. 17A, if you skipped C1instead it can be used D.

In the above sections have presented various improvements direct mode-shows the I and the prediction motion vector. It is shown that the spatial prediction can also be used for direct mode macroblocks; where for more accurate prediction when the prediction motion vector must be taken into account the temporal distance and information subunits. This should significantly improve the performance of any system used video encoding.

Conclusion

Although in the above description of the used terms that are specific to structural features and/or methodological techniques, it should be clear that the invention defined by the applied claims, is not limited to the described specific features or actions. Rather, the specific features and steps are disclosed as illustrative forms of embodiment of the invention.

1. Method for use in encoding video data in a video encoder, comprising

implementation of decision forecasting the spatial and temporal motion vector for at least one direct mode macroblock in the image, and

the alarm information solutions predict the spatial and temporal motion vector for at least one direct mode macroblock in the header, which includes header information for a set of macroblocks in the image,

when the volume of the alarm of the aforementioned information solutions predict the spatial and temporal motion vector in the title passes to the video decoder solution to predict the spatial and temporal motion vector for at least one direct mode macroblock in the image.

2. The method according to claim 1, wherein a set of macroblocks in the picture are in the layer In the image.

3. The method according to claim 1, in which at least one macro block in the direct mode contains many direct mode macroblocks.

4. The method according to claim 3, in which many of the direct mode macroblocks are 16 macroblocks×16.

5. The method according to claim 4, in which each of the macroblocks of 16×16 includes four sub-blocks 8×8.

6. The method according to claim 1, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode indicates the prediction of the spatial motion vector for at least one macroblock direct mode, the method further comprises selecting a reference image for at least one macroblock direct mode.

7. The method according to claim 6, in which the selection of the reference image for at least one macroblock direct mode includes selecting the minimum reference image for at least one macroblock direct mode.

8. The method according to claim 1, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode specifies the prediction space is stannage motion vector for at least one direct mode macroblock, the prediction of the spatial motion vector includes projections of the median motion vector.

9. The method according to claim 1, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode indicates the prediction of the temporal motion vector for at least one macroblock direct mode, the method further includes selecting a reference image for at least one macroblock direct mode.

10. Method for use in decoding the video data in the video decoder, containing the reception signaled information solutions predict the spatial and temporal motion vector for at least one direct mode macroblock in the header, which includes header information for a set of macroblocks in the image, and

finding a solution to the prediction of spatial and temporal motion vector for at least one direct mode macroblock in the image of the signaled information solutions predict the spatial and temporal motion vector in the header.

11. The method according to claim 10, wherein a set of macroblocks in the picture are in the layer In the image./p>

12. The method according to claim 10, in which at least one macro block in the direct mode contains many direct mode macroblocks.

13. The method according to item 12, wherein a set of direct mode macroblocks are 16 macroblocks×16.

14. The method according to item 13, in which each of the macroblocks of 16×16 includes four sub-blocks 8×8.

15. The method according to claim 10, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode indicates the prediction of the spatial motion vector for at least one macroblock direct mode

the method further comprises selecting a reference image for at least one macroblock direct mode.

16. The method according to item 15, in which the selection of the reference image for at least one macroblock direct mode includes selecting the minimum reference image for at least one macroblock direct mode.

17. The method according to claim 10, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode indicates the prediction of the spatial motion vector for at least one macroblock direct mode

thus the prediction of the spatial vector DWI the program involves the prediction of the median motion vector.

18. The method according to claim 10, in which the solution of the prediction of the spatial-temporal motion vector for at least one macroblock direct mode indicates the prediction of the temporal motion vector for at least one macroblock direct mode, the method further includes selecting a reference image for at least one macroblock direct mode.



 

Same patents:

FIELD: compensation of movement in video encoding, namely, method for encoding coefficients of interpolation filters used for restoring pixel values of image in video encoders and video decoders with compensated movement.

SUBSTANCE: in video decoder system for encoding a video series, containing a series of video frames, each one of which has a matrix of pixel values, interpolation filter is determined to restore pixel values during decoding. System encodes interpolation filter coefficients differentially relatively to given base filter, to produce a set of difference values. Because coefficients of base filter are known to both encoder and decoder and may be statistically acceptably close to real filters, used in video series, decoder may restore pixel values on basis of a set of difference values.

EFFECT: efficient encoding of values of coefficients of adaptive interpolation filters and ensured resistance to errors of bit stream of encoded data.

5 cl, 17 dwg

FIELD: video decoders; measurement engineering; TV communication.

SUBSTANCE: values of motion vectors of blocks are determined which blocks are adjacent with block where the motion vector should be determined. On the base of determined values of motion vectors of adjacent blocks, the range of search of motion vector for specified block is determined. Complexity of evaluation can be reduced significantly without making efficiency of compression lower.

EFFECT: reduced complexity of determination.

7 cl, 2 dwg

The invention relates to the field of digital signal processing

The invention relates to the formation, transmission and processing of the television program guide for broadcast television service

The invention relates to the field of digital signal processing
The invention relates to the field of digital signal processing image and can be used when transmitting video over narrowband communication channels and implementing quick view large archival databases

FIELD: mobile robot, such as cleaner robot, and, in particular, device for tracking movement of mobile robot.

SUBSTANCE: suggested device for tracking movement of mobile robot includes: video camera for filming an individual object; unit for tracking movement and creation of image for setting support one in an image for current moment by means of filming of individual object by video camera and creation of image in current moment, for which support zone is set; unit for selecting image of difference of pixels of image support zone limit based on difference between pixels present only at limit of support zone of aforementioned images; and micro-computer for tracking movement of separate object on basis of selected image of difference.

EFFECT: decreased time of pixel comparison operation and increased efficiency of room perception.

5 cl, 4 dwg

FIELD: system for encoding moving image, in particular, method for determining movement vector being predicted, of image block in B-frame in process of decoding of moving image.

SUBSTANCE: in accordance to method, at least one movement vector is produced for at least one block, different from current block, while aforementioned at least one block is related to one, at least, supporting frame in a row of supporting frame, movement vector is predicted for current block on basis of received one, at least, movement vector, while prediction operation includes also operation of comparison of value of order number of B-frame to value of order number of one, at least, supporting frame, while movement vector for current block and aforementioned one, at least, movement vector are vectors of forward movement.

EFFECT: increased efficiency.

2 cl, 1 dwg

FIELD: technology for processing images of moving objects, possible use, in particular, in theatric art, show business when registration/recording is necessary or repeated reproduction of scenic performance.

SUBSTANCE: method includes inserting enumeration system for each object and performing projection of enumerated objects onto plane, while projection is displayed in form of graph with trajectories of movement of enumerated objects in each staging.

EFFECT: spatial-temporal serial graphic display of scenic action for its further identification and repeated reproduction.

2 dwg

FIELD: device and method for recognizing gestures in dynamics from a series of stereo frames.

SUBSTANCE: method includes producing a series of stereo-images of object, on basis of which map of differences in depths is formed. System is automatically initialized on basis of probability model of upper portion of body of object. Upper portion of body of object is modeled as three planes, representing body and arms of object and three gauss components, representing head and wrists of object. Tracking of movements of upper part of body is performed with utilization of probability model of upper part of body and extraction of three-dimensional signs of performed gestures.

EFFECT: simplified operation of system, high precision of gesture interpretation.

3 cl, 12 dwg

FIELD: movement detection systems, technical cybernetics, in particular, system and method for detecting static background in video series of images with moving objects of image foreground.

SUBSTANCE: method contains localization of moving objects in each frame and learning of background model with utilization of image remainder.

EFFECT: increased speed and reliability of background extraction from frames, with possible processing of random background changes and camera movements.

4 cl, 14 dwg

FIELD: television.

SUBSTANCE: support frame is assigned with sign, showing information about direction of support frame, and during determining of predicted vector of movement of encoded block averaging operation is performed with use of vectors of movement of neighboring blocks, during which, if one of aforementioned blocks has movement vectors, information about direction of support frames is received, to which these movement vectors are related, and one of movement vectors is selected with reference to received information about direction, than averaging operation is performed with use of selected movement vector to receive subject movement vector of encoded block.

EFFECT: higher precision, higher reliability.

3 cl, 1 dwg, 3 ex

The invention relates to a method and apparatus for identification and localization of areas with relative movement in the scene and to determine the speed and oriented direction of this relative movement in real time

The invention relates to the field of image processing and can be used in automated systems management traffic, for monitoring and documenting the landing maneuvers at airports, in robotics and in a more General approach can serve as a subsystem for systems with a higher level of interpretation, which are detected, segmented and can be observed moving objects, and automatically defined parameters

The invention relates to a video system technology and can be used when designing a digital coding device for video telephony, video conferencing, digital television broadcasting standard and high definition

FIELD: television.

SUBSTANCE: support frame is assigned with sign, showing information about direction of support frame, and during determining of predicted vector of movement of encoded block averaging operation is performed with use of vectors of movement of neighboring blocks, during which, if one of aforementioned blocks has movement vectors, information about direction of support frames is received, to which these movement vectors are related, and one of movement vectors is selected with reference to received information about direction, than averaging operation is performed with use of selected movement vector to receive subject movement vector of encoded block.

EFFECT: higher precision, higher reliability.

3 cl, 1 dwg, 3 ex

FIELD: movement detection systems, technical cybernetics, in particular, system and method for detecting static background in video series of images with moving objects of image foreground.

SUBSTANCE: method contains localization of moving objects in each frame and learning of background model with utilization of image remainder.

EFFECT: increased speed and reliability of background extraction from frames, with possible processing of random background changes and camera movements.

4 cl, 14 dwg

Up!