Method for recognition of gestures in a series of stereo frames

FIELD: device and method for recognizing gestures in dynamics from a series of stereo frames.

SUBSTANCE: method includes producing a series of stereo-images of object, on basis of which map of differences in depths is formed. System is automatically initialized on basis of probability model of upper portion of body of object. Upper portion of body of object is modeled as three planes, representing body and arms of object and three gauss components, representing head and wrists of object. Tracking of movements of upper part of body is performed with utilization of probability model of upper part of body and extraction of three-dimensional signs of performed gestures.

EFFECT: simplified operation of system, high precision of gesture interpretation.

3 cl, 12 dwg

 

The technical field to which the invention relates.

The invention relates, in General, the system interfaces and more specifically relates to the recognition of gestures in dynamics for sequence stereohedron.

The level of technology

In computer systems recently developed area of gesture recognition. In General, the recognition system recognizes gestures physical gestures of the person and is responsible in accordance with the interpretation of gestures. Gesture recognition can be used in computer interfaces for interpreting sign language, management in industry, in entertainment applications, or for various other purposes. Task systems, gesture recognition is to offer simple, easy-to-use system that very accurately interprets gestures.

In traditional systems, gesture recognition process may proceed as shown in figure 1. In block 100, the sequence of video frames obtained from a video source, such as a conventional video camera. In the process of removing the background, block 110, from a sequence of video frames is eliminated background that represents any part of the image that do not contain gesticulating man, the gestures you want to interpret. If the video frame is the first frame in the sequence of video frame is in, the block 120, the process proceeds to the stage of manual initialization unit 130. Manual initialization block 120, is determined by the position and orientation of the visible part (usually the head, hands, arms and torso) of a person. Traditional systems typically use manual processes initialization, in which, for example, the process can be initialized as follows: a person is asked to begin to gesticulate with some predetermined position his or her arms or hands. In other systems, the system may be adjusted to the object in the following way: brush object wear colored gloves or make color marks on the hands and the head of the object.

If the video frame is not the first frame of the sequence, block 120, the traditional system continues the process of tracking the upper body of the object, the block 140. After system initialization or tracking movements of the upper body of the object to the new position is in the process of feature selection, block 150, which defines the characteristics that best describe the hand gestures and allocate them among many other gestures. Often the signs used in the recognition of hand gestures is determined from the position of the hands in the image plane or derived from the relative position of the hands and head of the object. In the traditional system input to the system is s is the two-dimensional image and the upper body can be described by six Gaussian "spots", covering the head, torso, two arms and two brush object. Further, the conventional system continues to work in blocks 160 recognition, which are used to identify the gesture object. Blocks recognition may contain hidden Markov models (hmm).

The traditional system of gesture recognition is limited in several places. The use of two-dimensional images can provide information about the depth of field, which may be insufficient for proper positioning of the upper body, which could lead to misinterpretation of gestures. The need initialization or system configuration gesture recognition using certain gestures or the use of certain devices creates additional complexity of using the system and may discourage a user try to use the system with gesture recognition.

Brief description of drawings

The characteristics of the present invention will be carefully stated in the claims. The invention together with its advantages can be better understood from the subsequent detailed presentation, accompanied by drawings, in which:

figure 1 depicts a block diagram illustrating the traditional system of gesture recognition;

figure 2 depicts a block diagram illustrating some of the current implementation of the system of gesture recognition in the dynamics;

figure 3 shows a block diagram of a hidden Markov model used in some implementations;

figure 4 illustrates the arrangement of the system of gesture recognition in the dynamics;

figure 5 depicts a block diagram illustrating the segmentation of the image during the initialization process;

6 depicts the image of the object and background;

Fig.7 depicts the image of the object with the remote background;

Fig depicts the process of allocating the body;

Fig.9 depicts the process of selection of the head;

figure 10 depicts the process of selection of hands;

11 depicts the process of selection of brushes.

Detailed description of the invention

The following describes a method and apparatus for recognizing gestures in the dynamics of the sequence stereohedron.

In the further description, with the aim of better explanation, will contain numerous specific details. This is done in order to achieve a thorough understanding of the present invention. However, any person skilled in the art it is obvious that the present invention can be implemented without some of these specific details. In other instances, well-known systems and devices are shown as block diagrams.

The present invention provides various processes that will be described below. The processes of this invention can be ispolneny hardware or implemented using executable on the machine commands. These commands are programmed General-purpose processor or a dedicated processor or a logic circuit so that the specified device solves the above-mentioned processes. The processes can be implemented using a combination of hardware and software.

Figure 2 shows a block diagram that illustrates some implementation of a system of gesture recognition in the dynamics. In this implementation, images are taken from sequences stereoviewer, block 200. Sequence stereoviewer obtained using stereo cameras or multiple cameras. Then the stereo images, the system creates a map of the differences in depth, unit 210, from which you can determine the depth of field (DOF). The system then removes the background (block 220), which, according to one implementation, can be done by removing those parts of the image that are too far from the device forming the image to be part of the object.

If the video frame is the first frame in the sequence of video frames, the block 230, the operation of the invention proceeds to step automatic initialization block 240, which is a comparison of the upper body of the object and the probabilistic model of the upper part of the body. If the video frame is not the first to the drôme sequence, block 230, then this implementation of the invention continues in block 250: monitors the upper part of the body of the object using the model of the upper body. If object tracking is not enough, block 260, the process returns to the automatic initialization block 240, in order to re-initialize the system. Thus, this implementation describes the criteria for making decisions about the bug tracking system.

After system initialization or track the movement of the upper body to a new position applies the allocation process of three-dimensional signs, block 270. In traditional systems, the separation occurs in two-dimensional space. After this stage, the transition to the blocks recognition of three-dimensional signs, block 280, which are used to identify the gesture in the dynamics. In some implementations, the device 280 recognition can be used hidden Markov models. However, unlike traditional systems, hidden Markov models describe the trajectory of brushes in three-dimensional space. In one implementation, the gesture made by the object is detected and interpreted by comparing gesture in the dynamics with information from a database of known three-dimensional gestures, block 290.

Hidden Markov models are well-known data processing systems and, consequently the nutrient, will not be described in detail. Hidden Markov model is a nite set of States, each of which is characterized by a probability distribution. Transitions between States in the model occur under the control of a set of probabilities called transition probabilities. There is the opportunity to conduct observations for each state of the model, but the actual condition cannot be determined. Therefore, States are called hidden. In a specific implementation, the system uses continuous left-to-right hidden Markov model with five States. In this implementation there is not a noise-States, and each state is modeled by a mixture of three densities of the normal distribution. The model is illustrated in figure 3, where five States in the model, labeled 300, 310, 320, 330 and 340.

The location of the equipment for the specific implementation shown in figure 4. In this implementation, the object 400 is sitting in front of a computer or terminal 410. Above the terminal is located, the device 420, forming the image. The device 420, forming the image and shown in figure 4, is a stereocamera, but in other implementations may use multiple cameras.

In the implementation of the invention for segmenting the upper body is used probabilistic basis. In the implementation of p is educatives tracking movements of the upper body with the help of stereo images. In the implementation as observations for a system of recognition of three-dimensional gestures based on the SMM uses the trajectory of the brush object. In the system based on stereo images are maps of differences in density. The system provides precise gestures even when confronted with different lighting conditions, partial marks, and when an object obscures itself. Unlike traditional systems, gesture recognition, in which the initialization is controlled by the user, in the implementation of the present invention when initializing the approach to the segmentation of the upper body using the minimum number of assumptions concerning the relative position of the object and the device forming the image. After initialization, the model parameters are tracked in subsequent frames and adjusted for new parameter values or re-computed using the algorithm of maximization of the expectation. The three-dimensional position of the brush object used in the system of gesture recognition as vectors of observations.

According to some implementations, the sequence of video frames is a new stereo image of the object. In accordance with one implementation of map differences in the depths is based on the stereo image. In accordance with another implementation what s the stereo images obtained from cameras, which itself generates the necessary information in depth, you do not need to build additional map differences in depth. The use of such cameras allows the system to operate without the need for a large amount of computation required to build a map of the differences in depth.

Further details concerning the system of gesture recognition are described as follows.

The model image and the model of the upper body Probabilistic model of the upper body consists of a set of three planar components, describing the torso and arms of the object and sets, and a set of three Gaussian blocky image component, representing the head and hands of the object. In this description, the parameters of each planar component (m-th planar component) will be denoted by πmand the parameters of each Gaussian component (n-th Gaussian component) will be denoted by βn. Set plane and Gaussian component, which describe the state of the upper body of the object, therefore, is a

In the image of the object vector of observations Abouti,jis the pixel in the i-th row and j-th column of the image and consists of a three-dimensional position of a pixel Ondij={x, y, z}ijp is obtainable from the map of the differences in the depths, and from the color of the pixel in the image space OcijAboutijit turns out the connection colors Ocijand differences in the depth Ofdij.

If we assume that the observation vector are independent, then the probability of a particular sequence of observationsfor this model image is equal to

where P(Oij| Ω) is a probability vector of observations for a given model of the upper body and through P(Oij| background) denotes the probability of the observation vector given the background of the object. In one implementation, the probability of the vector of observations for a given background is obtained from the normal distribution of each pixel of the image obtained from a sequence of "clean" backgrounds, without an object. In another implementation, the computational complexity of the system is reduced by modeling only the vectors of observations that are not associated with background. In three-dimensional space, in which you can define the depth of field, any image that is not close enough to the camera to be part of the object is considered part of the background. Because the object is in the foreground, then the probability of a sequence of observations foreground for a given model of the upper body is defined as follows:

where ui,j- uniform distribution, which models the noise in the image, andandare the a priori probabilities plane and Gaussian States in the model of the upper body. In one implementation, the initial values of the priors are chosen from a uniform distribution for all components of the upper body.

After initialization, the model of the upper body are estimated values of the priors using the corresponding parameters of the state model. The probability P(Oi,jm) and P(Oi,jn) are probability vectors of observations Oi,jgiven a planar πmand Gaussian βncomponents. Given these probabilities, the density function of the distribution (DPF) for the Gaussian components of the image is

wherethe vector of mathematical expectations and covariance matrix density of the normal distribution. For the realization of a system of gesture recognition parameters of the Gaussian component are denoted as. Because the distribution of colors and rehmannia position can be considered as independent random variables, the probability vectors of observations Abouti,jgiven a planar components (arms and torso) can be decomposed as follows:

In the equality [5] the probability P(Oci,j|π) maybe the density of the normal distribution or a mixture of the densities of the normal distributions that describe the distribution of color pixels in the plane. In accordance with one implementation, for simplicity, uses a uniform distribution on the set of color values (for example, 0,...255 for 256 colors). The probability vectors of observations Ondi,jfor a given in-plane component π can be calculated as follows:

From the formula [6] can be understood that the in-plane density distribution describes a normal distribution with mean μ=axi,j+byi,j+and dispersion σ2z. In the description of the parameters of the planar components will be denoted πm=(a, b, C, σ2zfor m=1, 2, 3.

Segmentation of the upper body - Model initialization - Optimal set of parameters for the model of the upper body is obtained using the algorithm of maximization and evaluation (MOE), which is the maximumthe model parameters. This is due to the maximization algorithm and the evaluation of, used to model the upper body. Since the algorithm of maximization and evaluation, essentially, is an algorithm for local optimization, the convergence to the global solution strongly depends on the initial estimates of the model parameters. To ensure proper convergence of the maximization algorithm and evaluation of the segmentation algorithm is divided into two processes. In the first process, the system initializes the parameters of each class is determined by the visibility of each component in the image. In the second process, all model parameters are estimated simultaneously again, and thus achieved the best fit to the data.

The initialization process is essentially a sequence of problems of classification of two types, which are repeated for each component of the model. In each of these tasks data ascribed or one component of the upper body or "remaining" data class, not assigned. Data assigned to the class of the remaining data when the first task classification, become the input for the process of the second classification, where they are either assigned to the next component of the body or become part of a new class of remaining data. This process continues until all the data will not be classified or all of the components of the upper body will not be the initial iravani. The last remaining class is modeled using a uniform distribution. Note that in this described implementation uses a specific allocation, but experts in this field know that a possible alternate selection, and implementation described here are not limited to the description.

The block diagram of the initialization process shown in figure 5. In block 500 the process of breaking the foreground - background removes the background image object. In one implementation of all the pixels of an image with depth, indicating that the distance to the camera or other device forming the image, more than a certain value, attributed to the background and are excluded. The remaining pixels are assigned to the foreground. In the allocation process of the body, a block 510, in the foreground is defined by the plane of the torso and the rest of the pixels included in the class of the remaining data. In the process of selection of the head, block 520, is determined by a Gaussian spot of the head and the rest of the pixels included in the class of the remaining data. In the process of allocation of hands, block 530, similar to the plane of the body, defines the plane of the left and right hands and the rest of the pixels included in the new class the remaining data. Using pixels from the class of the remaining data, in the process of selection of brushes, block 540, are determined by the Gaussian spot Lavoie right brushes.

Implementing segmentation initialization are described in more detail below.

Split front - back plan - the First initialization process model is the process of allocating the background. All pixels in the image are from the camera further than the predefined threshold, or for which no wealthy depth information, are attributed to the background. The remaining pixels are assumed to belong to the upper body. In the case of stationary background using color can improve the segmentation results. However, the condition of immobility background is often not the case, and an incorrect assumption about the statistics of the background can significantly reduce the accuracy of the results of the selection. For this reason, in a particular implementation to split the foreground - background uses only information about the depth.

6 shows the image of the object captured by the camera or other device imaging. Note that the image is limited to two dimensions, and images from the implementation contain information about the depth, thus forming a three-dimensional data about the image. The image 600 is composed of foreground 610, which is the object and background 620. The result of the separation of foreground - background shown in Fig.7. If the division went correctly, the image 700 contains only the foreground 710 comprising pixels indicating the object, while the pixels of the background 720 excluded. For simplicity in Fig. 7 through 11, all the pixels of the background are excluded, but in practice, certain pixels of the background can be attributed to the foreground and some of the foreground pixels can be included in the background. Similarly, the pixels representing a specific part of the body of the object, can be attributed to other parts of the body.

The selection of a trunk is Any pixel is classified as foreground, or generated in any plane of the body, or class of the remaining data with a uniform distribution. Assuming that all observations are independent random variables, the probability of observation vectors Oi,jfor a given model of the foreground image ΩFit appears as follows:

where ui,j- uniform distribution, which describes all the members of the class of the remaining data. The goal of the algorithm MO is to find the parameters of the plane, such that at the point π probabilitywill reach the maximum. Because the value Ofci,jevenly distributed, it can be ignored when RA is described below in the discussion of algorithm MIS. Throughdenote the parameters of the plane after re-assessment. New plane parameters are obtained after equating to zero the derivatives of E{R(O)logP(O)} according to the parameters of planar state π. In the re-estimated plane parameters are obtained after solving the following equations for the M-step (maximization):

The covariance matrix has the form

From this posterior probability γi,j(π), the vector of mathematical expectationsand the vector of covariances With obtained after solving the following equations E-step (evaluation):

[0042] the Algorithm MO repeats until then, until it converges, what happens whenon subsequent iterations falls below a threshold of convergence. Given the re-estimated plane parameters, all pixels that arewill be referred to the plane of the torso. One necessary condition for the convergence of the algorithm the MOD to the correct set of parameter what is that torso is the biggest part of the upper body. When many different situations, except when hands are heavily obscured the torso, we can assume that the above condition during the initialization stage is performed.

Fig illustrates the selection of a trunk in accordance with some implementation. The image 800 includes a plane body 810 and the class 820 remaining pixels. The class of the remaining pixels will include the rest of the pixels showing the object, that is, those that show the head, the hands and wrists of the object.

The selection of the head -- the Initial position of the head is determined by searching the space above the head. However, it is possible that the head will be included in the plane of the trunk and the area above the trunk contains a small number of points with noise. In this case, the system searches for the head in the upper part of the body. Next, using the depth information, you can get an approximate head size on the image plane using the distance and the orientation of the plane of the body of the camera. The probability of a sequence of observations OHin the initial search area of the head H is according to the following formula:

In the formula [16] uijis the density of the uniform distribution corresponding to the class ostasis the data in the head area. The parameters of the Gaussian spot corresponding to the head, are translated using the algorithm of MO for the density of the normal distribution:

where

All pixels that arewill be referred to the head area, and the remaining pixels will be again considered the class of the remaining data. This process is illustrated in Fig.9. Now the image 900 includes a body 910, which identified earlier, and head 920 of the object. A new class 930 remaining pixels includes the remaining foreground pixels, which include the hands and wrists of the object.

Selection of hands -- the Hands are modeled by functions of the density distributions for planes. The model distributions for plane does not restrict the natural degrees of freedom of movements of the hands and provides a good description of the data on the movements of the hands, available from stereo images. The parameters of the planes of the respective left and right hands, are obtained using the same formulas as in the case of the plane for the body. The search area on the left and right hands consist of pixels on the left and right side from the center of the body, and these pixels before should not be attributed to the torso or head.

Figure 10 is illustreret the process of selection of hands. After identifying the left and right hand object image 1000 includes torso 1010 and head 1020 of the object identified previously, and the right arm 1030 and left arm 1040 object. Class 1050 remaining pixels includes the remaining foreground pixels, which include the left and right hand object.

The selection of brushes - Brushes are modeled using the density of the normal distribution. Similar to modeling the head of the monitoring object models brushes consist of a three-dimensional position and color values of pixels. Some traditional approaches to gesture recognition is used to detect the image of the hands and/or face of a priori information about the color of the skin. However, these approaches often mistaken if the environment is notable for the strong variations in lighting. Instead of the described implementation of the present invention determines the position of the hand with finding the areas of the planes of the arms having a color similar to the value of the color tone obtained by selection of the head. Therefore, the parameters of Gaussian spots for brushes are determined using the same algorithm MO for the densities of the normal distributions used to estimate the parameters of spots for the head.

11 illustrates the process of selection of brushes. After completing the process identificat the AI image 1100 includes a body 1110, head 1120, the right hand 1130 and left hand 1140 object identified previously, and the right hand 1150 and left hand 1160 object. For simplicity, figure 10 and 11 show that prior to the selection of brushes class remaining pixels contains only brush object, but in practice other pixels that have not been assigned to the background or to other parts of the body of the object, may also be included in the mentioned class.

Model to track the movements of the upper body - the Initial parameters obtained separately for the torso, head, arms and hands are recycled after their simultaneous evaluation. The optimal set of parameters for the model of the upper body is obtained using the algorithm MO, equating to zero the derivativesthe parameters of the model Ω. The a priori probabilitiesandvectors of observations are computed for the estimated model parameters from the previous frame. These parameters are estimated by the Kalman predictor. During the M-process algorithm MO posterior probabilities of model parameters when the received data are calculated as follows:

During the S-process (assessment) a new set of parameters plane remeasured in accordance with formulae [8]-[11] and the parameters of the Gaussian spots are revalued using the formula [17] and [18]. Pixels for which the inequality

ascribed to the plane πk. Similarly, the pixels that satisfy the inequality

attributed to Gaussian spot βk.

Gesture recognition is Hidden Markov models (hmm) are a widely used tool for classification of gestures in dynamics due to the flexibility of such models for modeling signals and simultaneously preserving the main structure of the gestures of the hands. In the implementation described herein, the system of gesture recognition based on hmm uses, as vectors of observations, the trajectory of the hands of the object in three-dimensional space. Although the trajectory of the brush on the image plane are the traditional sign for gesture recognition, trajectory in a two-dimensional image plane can not clearly describe the movement of the brushes in the plane perpendicular to the image plane. The use of maps differences in the depth allows you to obtain the trajectory of the brushes in three-dimensional space, and data path are used in the implementation as vectors of observations. Moreover, the use of maps differences in depth together with color information results in sustainable segmentierung the upper body, which largely does not depend on the lighting conditions or changes in the background.

The use of maps differences in depth for gesture recognition is useful, because stereo is much more stable in comparison with the same color information, to changes in lighting conditions and due to the fact that maps the differences in the depths reduce fatal uncertainty of depth in two-dimensional images and thus allow for more accurate segmentation of the image in a partial block and the block itself.

Using maps of the depth of field adds some difficulties in the process of gesture recognition. Sariolghalam often difficult to develop, it requires a lot of time. They also require large computing. Sariolghalam based on consistency, can generate maps of differences in depth with a large amount of noise. However, consumer cameras have become more affordable and the performance of personal computers has increased so much that stereovision may be conducted at a reasonable frame rate. An example would be the camera that was used in the implementation, = it Digiclops Stereo Vision System, developed by Point Grey Research Inc., Vancouver, British Columbia. Since the performance of gesture recognition in the dynamics with the flax depends on the quality of the sequences of the vectors of observations, the use of images in the system requires additional caution. The use of depth maps instead of color information to describe the model of the upper body is one very important element when building a system that provides a stable efficiency under different conditions of lighting, shadow effects, movable back and when something or part of the body blocking any part of the upper body.

In further detail, the invention is described with reference to its concrete implementation. Nevertheless, it is clear that it is possible to provide various modifications and changes do not go, however, for the broader spirit and scope of the present invention. The description and drawings are, accordingly, should be considered in the illustrative sense and not in the sense of any restrictions.

1. The method of gesture recognition, implemented by a computer system, including

obtaining a sequence of stereo images, and the images contain at least part of the object, making a gesture in the dynamics;

obtaining differences in the depths associated with the stereo image;

the tracking object;

removing the three-dimensional characteristics of the images and

the interpretation of the gesture in the dynamics, the exac the frame object.

2. The method according to claim 1, characterized in that it further comprises segmenting the image of the object into its component parts.

3. The method according to claim 2, characterized in that the component parts are, at least, torso, head, arms and hands of the object

4. The method according to claim 1, characterized in that it further comprises automatically initialized the parameters of the probabilistic model of the object.

5. The method according to claim 4, characterized in that the probabilistic model of object models the arms and torso of the object in the form of planes.

6. The method according to claim 4, characterized in that the probabilistic model of object models the head and brush of the object as a Gaussian components.

7. The method according to claim 1, characterized in that it further comprises removing background from images.

8. The method according to claim 7, characterized in that the removal of the background from stereo images includes the removal of all parts of the stereo images, which are at a distance greater than a specified distance from a certain place.

9. The method according to claim 1, characterized in that the stereo images are obtained using cameras.

10. The method according to claim 1, characterized in that the receipt of the differences in the depths involves building a map of the differences in the depths.

11. The method according to claim 1, characterized in that the interpretation of the gesture in the dynamics involves the comparison of the ill is in the dynamics of the three-dimensional model of the gesture.

12. The method according to claim 11, characterized in that the comparison of the gesture in the dynamics of the three-dimensional model of the gesture involves the use of hidden Markov models of three-dimensional gestures.

13. Recognition of gestures, containing

the device forming the image intended for capturing three-dimensional images of at least part of the object and background, the object makes a gesture in the dynamics;

the processor is designed to perform operations, including:

processing the set of differences in the depths associated with stereo;

the tracking object;

removing the three-dimensional characteristics of the object and

the interpretation of the gesture in the dynamics, the perfect object.

14. System for gesture recognition according to item 13, wherein the forming device of the image is stereofidelics.

15. System for gesture recognition according to item 13, wherein the processor additionally performs operations that contains the remove background from a sequence of stereo images.

16. System for gesture recognition according to item 15, wherein removing the background from a sequence of stereo images includes the removal of all parts of the stereo images, which are at a distance greater than the specified rasstoyaniyam device forming the image.

17. System for gesture recognition according to item 13, wherein the processor additionally performs operations that contains the splitting image of an object into its component parts.

18. Recognition of gestures to 17, characterized in that the component parts are, at least, torso, head, arms and hands of the object.

19. System for gesture recognition according to item 13, wherein the processor additionally performs operations including automatic initialization of the parameters of the probabilistic model of the object.

20. System for gesture recognition according to claim 19, wherein the probabilistic model of object models the arms and torso of the object in the form of planes.

21. System for gesture recognition according to claim 19, wherein the probabilistic model of object models the head and brush of the object as a Gaussian components.

22. System for gesture recognition according to item 13, characterized in that the interpretation of the gesture in the dynamics, the perfect object includes comparing the gesture in the dynamics of the three-dimensional model of the gesture.

23. System for gesture recognition according to item 22, wherein the comparison of the gesture in the dynamics of the three-dimensional model of the gesture involves the use of hidden Markov models of three-dimensional gestures.

24. Machine-readable media on which is stored data representing the sequence of commands to run on the machine causes the machine performs operations, including

obtaining a sequence of stereo images, and the images contain at least part of the object, making a gesture in the dynamics;

obtaining differences in the depths associated with the stereo image;

the tracking object;

removing the three-dimensional characteristics of the images and

the interpretation of the gesture in the dynamics, the perfect object.

25. Media in paragraph 24, characterized in that it further contains a sequence of commands, the execution of which by a machine causes the machine performs operations comprising segmenting the image of the object into its component parts.

26. The media A.25, characterized in that the component parts are, at least, torso, head, arms and hands of the object.

27. Media in paragraph 24, characterized in that it further contains a sequence of commands, the execution of which by a machine causes the machine performs operations including automatic initialization of the parameters of the probabilistic model of the object.

28. The media according to item 27, wherein the probabilistic model of object models the arms and torso of the object in the form of planes.

29. The media according to item 27, wherein the probabilistic model of object models the head and brush of the object ka is Gaussian components.

30. Media in paragraph 24, characterized in that it further contains a sequence of commands, the execution of which by a machine causes the machine performs operations, including the removal of the background from images.

31. The media according to item 30, wherein removing background from images involves removal of all parts of the stereo images, which are at a distance greater than a specified distance from a certain place.

32. The media according to paragraph 24, wherein the stereoscopic images are obtained using cameras.

33. The media according to paragraph 24, wherein obtaining the differences in the depths involves building a map of the differences in the depths.

34. Media in paragraph 24, characterized in that the interpretation of the gesture in the dynamics involves the comparison of the gesture in the dynamics of the three-dimensional model of the gesture.

35. Media in clause 34, wherein the comparison of the gesture in the dynamics of the three-dimensional model of the gesture involves the use of hidden Markov models of three-dimensional gestures.



 

Same patents:

FIELD: movement detection systems, technical cybernetics, in particular, system and method for detecting static background in video series of images with moving objects of image foreground.

SUBSTANCE: method contains localization of moving objects in each frame and learning of background model with utilization of image remainder.

EFFECT: increased speed and reliability of background extraction from frames, with possible processing of random background changes and camera movements.

4 cl, 14 dwg

FIELD: television.

SUBSTANCE: support frame is assigned with sign, showing information about direction of support frame, and during determining of predicted vector of movement of encoded block averaging operation is performed with use of vectors of movement of neighboring blocks, during which, if one of aforementioned blocks has movement vectors, information about direction of support frames is received, to which these movement vectors are related, and one of movement vectors is selected with reference to received information about direction, than averaging operation is performed with use of selected movement vector to receive subject movement vector of encoded block.

EFFECT: higher precision, higher reliability.

3 cl, 1 dwg, 3 ex

The invention relates to a method and apparatus for identification and localization of areas with relative movement in the scene and to determine the speed and oriented direction of this relative movement in real time

The invention relates to the field of image processing and can be used in automated systems management traffic, for monitoring and documenting the landing maneuvers at airports, in robotics and in a more General approach can serve as a subsystem for systems with a higher level of interpretation, which are detected, segmented and can be observed moving objects, and automatically defined parameters

The invention relates to a video system technology and can be used when designing a digital coding device for video telephony, video conferencing, digital television broadcasting standard and high definition

The invention relates to a video system technology and can be used when designing a digital coding device for video telephony, video conferencing, digital television broadcasting standard and high definition

The invention relates to Peru to record the selected sequence of symbols and to the manner of exercise of such record
The invention relates to computing

FIELD: device and method for recognizing gestures in dynamics from a series of stereo frames.

SUBSTANCE: method includes producing a series of stereo-images of object, on basis of which map of differences in depths is formed. System is automatically initialized on basis of probability model of upper portion of body of object. Upper portion of body of object is modeled as three planes, representing body and arms of object and three gauss components, representing head and wrists of object. Tracking of movements of upper part of body is performed with utilization of probability model of upper part of body and extraction of three-dimensional signs of performed gestures.

EFFECT: simplified operation of system, high precision of gesture interpretation.

3 cl, 12 dwg

FIELD: image identification systems, possible use for increasing productivity and reliability of image in image identification system.

SUBSTANCE: the system includes a set of image evaluation functions, meant for fast processing of a part of available image data and for providing check connection to system user pertaining to quality and authenticity of the image. Invention includes functions, meant for creating models of the image on basis of original image data and for recording such image models into searching database, and also functions for comparing one image model to another one and functions for realizing fast determining of which one, if it is available, image model in searching database demonstrates required likeness level compared to target image model.

EFFECT: increased speed and improved image identification quality.

3 cl, 49 dwg, 2 tbl

FIELD: digital photography.

SUBSTANCE: automatic photograph framing method is suggested. In case of the album orientation of the image, the image horizontal line uniformity analysis is performed, whereas in case of the portrait orientation of the image, the image vertical line uniformity analysis is performed. The analysis is based on the line fragment clustering with the use of their texture attributes. Based on the analysis results, the number g(i) of different clusters including rectangular fragments covering the line, is determined. The g(i) number is used for estimating uniformity of the image line. Based on this estimation, the preliminary location of the image cutting frame is defined for the image. Then, the location of faces on the image is defined, and the position of the frame is corrected by finding the maximal y-coordinate yt and the minimal y-coordinate yb of the rectangles circumscribed around the defined faces with consecutive straightening of the image cutting frame with the centre of the vertical interval with its ends at the points yt and yb.

EFFECT: creating the modernized method of intellectual framing and cutting of digital images to be used in automatic devices of image processing and printing.

6 cl, 14 dwg

FIELD: information technology.

SUBSTANCE: image processing device has a calculation module adapted to calculate information on movement of two images making up the captured moving image, a selection module adapted to select components of target images as the target of a composition of multiple images making up the captured moving image, based on reliability which denotes the probability of calculated information on movement.

EFFECT: short time for moving groups of images with selection of similar images.

13 cl, 80 dwg

FIELD: physics.

SUBSTANCE: scalar image of difference between object and background on the basis of difference between illumination difference is produced. In regions with illumination difference lower than preset threshold it is created on the basis of the difference in colour. Mask is initiated on the basis of results obtained at previous video frame wherein scalar difference image is smaller than preset threshold, if said results may be accessed. Note here that objet mask is filled with zeroes and unities wherein unity designates that relevant pixel belongs in said object or otherwise, zero. Said scalar image depth data are clusterised on the basis of several clusters. Mask for every video frame pixel position is produced using centers of gravity of scalar difference and data on depth for pixel current position. Scene background change is compensated in time y updating background image on the basis of created mask and difference image.

EFFECT: perfected procedure.

12 cl, 4 dwg

FIELD: information technologies.

SUBSTANCE: method identifies appropriate image members with suspicion for twice printed characters, in the first place, providing a set of reference images of single characters from images of characters identified in a text processed with the OCR system, and then by combination of reference single characters to provide a model of candidates for the image with suspicion for twice printed characters. Correlation between each appropriate model of a candidate and an image with suspicion for twice printed characters provides indication of that pair of model reference images of single characters with highest probability, which is correct identification of appropriate images of characters at the image with twice printed characters.

EFFECT: recognition of twice printed characters.

5 cl, 4 dwg, 1 tbl

FIELD: information technology.

SUBSTANCE: method involves initial redundant segmentation by growing regions until said regions completely fill the image area using colour information; a step-by-step region merging process is performed with possibility of using texture information, a multi-scale gradient image is calculated using a wavelet-statistics calculation method; morphologic filling and level quantisation are used when determining crystallisation centres by searching for minima of the gradient image; wavelet-statistics calculation is performed to obtain texture information which is equally used during pixel-by-pixel growing of regions and when merging regions; when searching for the optimum step of ending region merging, the maximum growth rate of the sequence of values of the merging cost function, calculated on the entire region merging range, is determined, the wavelet-statistics is calculated as a function of the initial image.

EFFECT: high quality of segmentation.

2 cl, 4 dwg

FIELD: information technologies.

SUBSTANCE: in the method searching is carried out by output data for identification of character images having image quality, which exceeds a predetermined level, and using these character images as a set of reference images for characters, location of a section area with affected visibility is identified on a character sample with affected visibility, information on location and area from a character sample with affected visibility is used to find appropriate areas on reference images prior to comparison of appropriate reference images with the character sample having affected visibility, neglecting the image content during comparison in the found appropriate areas, and the reference image is used, having greatest similarity with the character image having affected visibility as proper identification of the character with affected visibility.

EFFECT: increased quality of data recognition in the system of optical character recognition.

8 cl, 7 dwg

FIELD: information technology.

SUBSTANCE: segmentation is carried out in one step, from coarser to more detailed, and search is carried out on N detail levels of the original image; at detail level, the image is divided into regions; a single segmentation value is assigned for each region via n successive iterations; the value of the price function of joints on boundaries of regions is calculated for different versions of image segmentation and for each region, a segmentation value is selected, which leads to minimisation of the sum of price functions of joints and data.

EFFECT: optimum image segmentation, wherein a global minimum sum of price functions of joints and data of the image is achieved; closeness of segmentation to absolute optimum segmentation without the need for additional memory resources of the mobile device, resistance to noise in images and high speed of operation.

3 cl, 8 dwg

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to digital image processing means. In the method, said operator is formed from linear structure-forming elements with different orientation parameters relative the raster of an image of equal length; each filtered image is obtained through interaction of the linear structure-forming element of the composite morphologic operator with the original image; pixel brightness in the filtered image is obtained by performing, for each pixel of the original image, three morphologic operations of interaction of the original image with the linear structure-forming element.

EFFECT: high accuracy of highlighting boundaries of complex-structure images owing to formation of multiple direction-filtered images from an original half-tone image via local processing with a composite morphologic operator.

6 dwg

FIELD: television.

SUBSTANCE: support frame is assigned with sign, showing information about direction of support frame, and during determining of predicted vector of movement of encoded block averaging operation is performed with use of vectors of movement of neighboring blocks, during which, if one of aforementioned blocks has movement vectors, information about direction of support frames is received, to which these movement vectors are related, and one of movement vectors is selected with reference to received information about direction, than averaging operation is performed with use of selected movement vector to receive subject movement vector of encoded block.

EFFECT: higher precision, higher reliability.

3 cl, 1 dwg, 3 ex

FIELD: movement detection systems, technical cybernetics, in particular, system and method for detecting static background in video series of images with moving objects of image foreground.

SUBSTANCE: method contains localization of moving objects in each frame and learning of background model with utilization of image remainder.

EFFECT: increased speed and reliability of background extraction from frames, with possible processing of random background changes and camera movements.

4 cl, 14 dwg

Up!