Method for storing and selecting data of n-dimensional intervals

FIELD: information technologies, possible use for optimizing storage and selection of data.

SUBSTANCE: in accordance to method, spatial data structure is formed with elements in form of original n-dimensional intervals; lens is determined, being a 2n-dimensional interval of interval request operator, representing an instruction about selection of data of required n-dimensional intervals with description of given interval and positioning conditions of required ones relatively to it, while lens is determined in accordance to rose of intervals, representing virtual two-dimensional geometric diagram of areas such as 2n-dimensional points in axes xp and yp of their space, coordinates {xp,yp} of which are appropriate coordinates of p-projections of n-dimensional intervals appropriate for these points, where p-projection of n-dimensional interval is its projection on p-axis of its space basis; built and stored on physical data carrier is interval request operator about selection from data structure of such points, which are enveloped by this lens, by means of this operator interval request to structure is performed, resulting in a set of data of 2n-dimensional points, simultaneously being the data of sought after intervals corresponding to the data.

EFFECT: possible efficient execution of any interval requests in one spatial data structure.

10 dwg, 1 tbl, 8 ex

 

The invention relates to the field of information technologies, in particular to methods of optimizing storage and sampling interval and heterogeneous data. The invention can be used in systems such as the capacious data warehouse, database management system (DBMS), geographic information systems (GIS).

There is a method of storing and sampling data of n-dimensional intervals [1]. This method is used in different ways, in particular, in data warehousing and database management, as currently one of the most productive in the tasks associated with storage and sampling data of n-dimensional intervals. According to the method of storing and retrieving data such intervals is performed in the spatial structure of the data R-tree, which is balanced block tree. Data elements leaf level of the tree, R-tree are the elements of the real data - source n-dimensional intervals, and the data elements of other levels are elements of the auxiliary data is n-dimensional intervals appropriate levels. The data elements of each level separately integrated spatial intervals as the appropriate level. This integration is achieved by combining data elements of each non-zero level in these blocks of data elements that are pointers associated with corresponding the named blocks-the ancestors of the highest level. The data elements of the zero level are integrated into a single root block.

The disadvantage of this method is that the structure of R-tree high performance interval queries are guaranteed to be mentioned in the article [1]. This is due to the fact that the data elements in this structure spatially integrated as an n-dimensional intervals. With this method of storage is impossible simultaneous integration of data elements as intervals according to the criteria of intersections, attachments, coatings, touch and other their mutual locations in space. The reason for this is the fact that there is no way simultaneous ordering of n-dimensional intervals according to these criteria in their own n-dimensional space. Therefore, the blocks of data elements of the tree R-tree may overlap. This, in turn, cannot provide the required data on the unique path during the execution interval of the query in the R-tree. Moreover, the effect of overlapping blocks leads to the following additional negative phenomenon. Although the data elements of each level of the tree, R-tree and integrated spatial a n-dimensional intervals, however, this does not imply a structural spatial proximity close intervals of the same level in the whole tree. Spatially close and is tervala in the R-tree can be structurally distant, and spatially distant intervals structurally similar. In the R-tree is the worst case execution interval of the query is very slow with zero recoil and corresponds to the full tree traversal when receiving an empty set of results.

Also there is a method of storing and sampling data of n-dimensional intervals [2]. According to the method of storing and retrieving data such intervals is carried out in combination n of data structures, each of which is a binary tree of partitions. The elements of the real data for each of these trees are the corresponding baseline projection of the original n-dimensional intervals. These projections in each tree are both 2-dimensional points, but they sorted according to the criterion of their intersections. This approach provides optimal search data of all n-dimensional intervals that intersect with the set.

The first disadvantage of this method is the fact that in accordance with the storage and sample data in n data structures, which is associated with the additional requirements of memory and performance degradation with increasing n. The second disadvantage of this method is the limitation of the implementation of interval queries in each tree. This limitation is reflected in the fact that with such a storage intervals are usort Rovaniemi in terms of their intersections, but they are not integrated in terms of investments, coatings, touch and other their mutual locations that do not ensure an effective search criteria.

Ways in the same data structure to efficiently perform any interval queries that have not been identified.

The objective of the invention is to provide opportunities for effective execution in the same spatial data structure any interval queries with different conditions relative position of the required n-dimensional intervals relative to a set.

This object is achieved in that in the method of storing and sampling data of n-dimensional intervals, according to the invention, up to a spatial data structure S, the elements of the real data which are the original n-dimensional intervals, spatial integrated as a 2n-dimensional point whose coordinates match the coordinates of the projections of these intervals on an axis basis of their space; determine the lens, which is the 2n-dimensional interval operator query interval, which is a statement about the sample data required n-dimensional intervals with the description given interval and terms of location relative to it required, and the lens is determined according to rose intervals representing virtual DV is dimensional geometric chart areas such 2n-dimensional points in the x pand Iptheir space coordinates {xp,p} which are the corresponding coordinates of the p-projections corresponding to these points of n-dimensional intervals, where p is the projection of n-dimensional interval is its projection on the R-axis basis of its space; design and maintain the physical media operator query interval about the sample from the structure's data such points that are covered by this lens; perform using this operator query interval to the structure S, which will result in the data set 2n-dimensional points that are both data corresponding to the searched interval.

This method, due to the spatial integration of n-dimensional intervals as 2n-dimensional points, allows in the same spatial data structure to create the ability to effectively perform any of interval queries with different conditions relative position of the required intervals relative to a set. The elements of real data such patterns are both n-dimensional intervals and the corresponding 2n-dimensional points. These data elements are in the 2n-dimensional space patterns simultaneously integrated as a 2n-dimensional spatial point and a n-dimensional intervals according to the criteria of intersections, attachments, coatings, touch and other is zaemnyh locations in their own n-dimensional space. Therefore, this spatial data structure is optimized to run any interval queries.

In the present invention uses the following terminology.

The term "data" in the context of the presented material refers to the physical data, i.e. the information presented in formalized form and intended for treatment of technical equipment, such as computers. Physical data can be stored on physical media, modified, moved, removed, deleted.

Under each of the concepts "point", "cut", "rectangle", "interval" refers to the physical data sets.

What are these sets, explain the following examples.

A point is called a data set consisting of one element clear (precise) data type: age equal to 43. Here the data set is {43} is a data set consisting of a single data element, this element is clear data, with the exact value of 43.

A run is a data set consisting of one element fuzzy ("blurry", approximate, assumed) data type: age is approximately equal to 35-40. Here the data set is {35-40} is a data set consisting of a single data item (not two, and one), this element are fuzzy data (about the but instead of two) interval is 35-40.

And the point and the segment in these examples are sets of 1-dimensional data.

The set of n-dimensional data is a dataset consisting of n data elements, these elements can be as crisp and fuzzy data.

Example (for two-dimensional case, i.e. when n=2):

2-dimensional point is called a data set consisting of two elements, clear (precise) data type: age equal to 43, and weight is 85. Here the data set is {43, 85} is a data set consisting of two data elements, these elements are clear data, with the exact values of 43 and 85.

2-dimensional cut is a set of data consisting of two data items, one of which is an element of precise (accurate) data, and the other is a fuzzy element ("blurry", approximate, assumed) data type: age is approximately equal to 35-40, and weight is 80. Here the data set is {35-40, 80} is a data set consisting of two data items, one of which is a clear data element having the exact value of 80 and the other element of the fuzzy data having interval is 35-40.

2-dimensional rectangle is called a data set consisting of two elements of fuzzy ("blurry", approximate, assumed) data type: the age of about equal 33-35, and the weight is approximately equal 74-76. Here laboratornyh is {33-35, 74-76} is a data set consisting of two data elements, these elements are fuzzy data with interval values 33-35 and 74-76.

n-dimensional rectangle is a set of n-dimensional data, i.e. the data set consisting of n data elements, these elements can be as clear data, and fuzzy.

Any of the above points and segments (including 1-dimensional) fall under this definition, then there are n-dimensional rectangles. Therefore, points and segments it also sets the n-dimensional data.

Each n-dimensional rectangle (set of physical data) corresponds to the geometric n-dimensional rectangle in a n-dimensional Cartesian space, which is the geometric image. Such conformity virtual.

No material transactions on mathematical (abstract, logical) objects and spaces nor in the claims, neither in the present description are not included. All compliance material and geometric objects and these match the drawings are only for illustrative explanations. If said about any of the geometry of n-dimensional rectangles (datasets), it is assumed geometry of their virtual images in a n-dimensional Cartesian space. Some of the definitions below, podpad the Ute under this definition as its special cases.

n-dimensional rectangle (set of physical data) is called orthogonal if every edge of its geometrical image parallel to any axis of a basis of this space.

n-dimensional interval is called the orthogonal n-dimensional rectangle.

So the n-dimensional interval, as well as n-dimensional rectangle is a set of physical n-dimensional data consisting of n data items. Therefore, any n-dimensional interval, as well as any n-dimensional rectangle, is a material object - a certain set of physical data.

n-dimensional interval can be point, line segment, square, cube, hypercube, etc. for Example, n-dimensional point is a n-dimensional (orthogonal) rectangle, all edges which have a length of zero.

n-dimensional intervals may be open, closed or partially open.

Open n-dimensional intervals are intervals, rectangular geometric images which do not include their boundaries.

Closed n-dimensional intervals are intervals, rectangular geometric images which include their borders.

Partially open n-dimensional intervals are intervals, rectangular geometric images which include all its borders, but at least one of them.

Open 1-dimensional intervals(sets of physical data), as well as their geometric images in this text are denoted as (x, y), closed - as [x, y], and partially open - as (x, y] or [x, y), depending on which end of this interval excluded from it.

Heterogeneous n-dimensional intervals (with respect to each other) are the n-dimensional orthogonal rectangles, some of which have different number of zero edges.

For example, heterogeneous n-dimensional intervals (with respect to each other) are points, line segments, rectangles, cubes, hypercubes, etc.

Heterogeneous data is data of heterogeneous n-dimensional intervals.

The space of n-dimensional interval is called the space of its geometrical image, that is, the space belongs to the geometric image of this interval.

A region of n-dimensional intervals is called the scope of their geometrical images.

Coordinates n-dimensional interval are called the coordinates of the geometric image.

Projections of n-dimensional interval, called its geometric projection of the image.

The coordinates of the projections of n-dimensional interval are called the coordinates of the projection of the geometrical image.

the p-projection of n-dimensional interval is its projection on the R-axis basis of its space. Any p-projection of n-dimensional interval is a 1-dimensional interval.

One n-dimensional interval l which refers strictly to the left (right) another along the R-axis if and only when the p-projection of the first one lies on this axis is strictly to the left (right) R-projection of the second, and any other of the other of the first projections is covered by a corresponding projection of the second.

One n-dimensional interval crosses to the left (right) another along the R-axis, then and only then, when their geometric images overlap, and the p-projection of the first of them on this axis intersects the left (right) R-projection of the second.

In General, when it comes to the intersection of n-dimensional intervals, their attachments, coatings, taps or other of their mutual location in space, then under these terms shall be interpreted to mean the corresponding relative position of their geometrical images in the appropriate space.

In the described method under the data structures need to understand physical systems naturally related and are in mutual communication of material objects - physical data. More precisely:

The data structure is stored on physical media set, which includes naturally associated and are in mutual communication, the elements of real and auxiliary physical data, which effectively execute specific algorithms. The data structure of the material.

Any data structure has its own logic, otherwise it would not be a structure, and random set Yes the data.

Real data is the original data, that is, the physical data that is actually interesting to the user when storing and fetching. For example, temperature data or signals of the beating heart.

Under auxiliary data refers to such physical data that the user is not interested in, but necessary for the organization of logical connections within the structure (physical systems)to ensure effective implementation of specific algorithms over its data (i.e. ensuring the functioning of its own logic). For example, formalized metadata (data about data) or address positions (locations) of specific data. Formalized metadata and addresses are physical data.

The relationship of the elements of the structure can be implemented, for example, by using pointers.

Under the directions are understood to address those or other data blocks or positions within these blocks. So pointers are physical data, which addresses the memory and stored on physical media in the same way as the real data. Pointers are auxiliary data.

Auxiliary data can be other physical data other than pointers, such as metadata (data about data), etc. All the internal organization of any material structure the market data with auxiliary data. Here, the term "internal organization of material data structure" refers to the ordering of its elements - physical data in accordance with the logic of this structure.

Under the data model in the present text refers to the physical system, which is the more common system physical data than the data structure. For example, data models (in that sense) are DBMS and GIS. The efficiency of the data model depends on the data structures that are part of the kernel.

Forming a data structure is a complex material such as memory allocation (allocation) for storing data on a physical carrier of information; search data, accompanied by their extraction and comparison; insert, delete, and modify data predvaryayutsya their search; storing data on physical media, etc. Each particular data structure is formed her own way.

The physical structure of the data in the required order contains at least one material object. If there are no such objects, there is no material structures. This single object can be auxiliary data, for example, the address of the first allocated memory block. The data structure is empty if it does not contain the actual data.

If the data structure of everything is there (even if it is empty), which means that it was formed. Any further modification of the data structure in accordance with its logic, for example, inserting data, deleting data, updating data, are a continuation of the process of its formation. In every moment of the completion of any such modification data structure is again formed. Any modifications to the data structures that do not meet its logic, is the process of its destruction.

The spatial data structure is a spatial structure (n-dimensional) data, the elements of real data which integrates spatial, that is, by the criterion of spatial proximity to each other. This criterion, for example, may be the proximity of the centers of gravity of the objects corresponding to these data elements, or the closeness of their nearest (or farthest) boundaries.

Spatial integration (spatial grouping) physical data - it is their organization (ordering) on a physical storage medium, which corresponds to the spatial integration of their geometrical images in a virtual geometrical space in the sense of a spatial criterion of proximity. Spatial integration of physical objects system allows to minimize the access time between her closest is myectomy, in the sense defined in this system, the spatial criterion of proximity. Spatial integration of data elements in a spatial data structure can be implemented connections between them and / or their groups by using pointers. Spatial integration of data elements provides spatial data structure, the possibility of their independent interval search along any of the axes of the basis of its space.

On the basis considered in the described method the principle of integration it is possible to develop different logical data structures (for example, in the form of a block or binary trees) for subsequently implemented on the basis of their relevant physical data structures. Any of these physical systems fall under spatial structure referred to in the claims, unless the principle of integration in one, which describes the proposed method.

The space of geometric images of the physical data of the spatial data structure is called the space of this structure.

The space of the spatial data structure is Cartesian. Each of the axes of this space corresponds to a multiplicity of unique data, which determined the order relation. The data type of each axis of the space corresponds to a data type that is noreste, which this axis corresponds. These types of data can be numeric, character, string, time (date), etc.

The spatial structure (n-dimensional) data can be spatial or not. For example, the structure of spatial data as a binary tree of n-dimensional data, is not a spatial data structure, since the spatial neighboring data in this structure cannot be integrated spatial. Examples of spatial data structures are KDB-tree, Quad-tree, R-tree.

The query is the process of retrieving data from a data structure that includes : search, extract, and save in a separate set on the physical storage media. Request to the data model is essentially a request to the data structure of this model. Request to the specific structure of the data is performed by a method in accordance with its algorithms.

The result of the query is retrieved from the data structure and stored on physical media set of data satisfying the query.

Range query is a query to the data structure of the sample data required n-dimensional intervals relative to a set.

The query statement is an expression that describes the actual data that you retrieve from the data structure.

The statement is not ODA shall determine the method of data retrieval. Ways to find them is entirely determined by the algorithms of the search data structure that stores or perhaps stores these data among others. The query statement is a command that is in accordance with algorithms data structures shall be implemented by technical means, such as a computer.

The operator interval query is a statement about the sample data required n-dimensional intervals, which includes the description of the specified interval and terms of location relative to it are required.

The lens is called an n-dimensional interval operator query interval on which it is possible to sample required. The lens may not be an element of the data structure. It is an object defined by its data and set as the parameter of the operator of any interval of the query.

The terms of the operator interval query conditions are the location of the desired n-dimensional intervals relative to its lens. These conditions may be crossing, attachments, cover, touch, and other requirements of the mutual locations of selected intervals relative to the lens.

Rose intervals is a virtual two-dimensional geometric chart areas such 2n-dimensional points in the xpand Iptheir space coordinates {xp,p} which are relevant CCW is dinati p-projections corresponding to these points of n-dimensional intervals.

the p-projection of all intervals in the axis roses ordered as point and, in addition, integrated as a 1-dimensional intervals by various criteria of their mutual arrangement on the axis R. In essence, rose intervals is a virtual geometric Visualizer full and global ordering in the xpand Ipall intervals of space dimension n points of dimension 2n. Each of the areas ("petals") and combinations of areas roses intervals corresponding to a particular group of p-projections of intervals on the axis R by a certain criterion with respect to such p-projection interval, which has coordinates {pbp}. Linep=xprose intervals corresponds to a point (zero-length) projections of intervals on the p-axis. Rose intervals has the form of a triangle, but a mirror reflection of its regions relative to the y axisp=xpit can be converted into the corresponding lower triangle. Both these configurations roses equivalent. The data type of the axes xpand Iprose intervals are the same and match the data type of the axis p of the space of intervals.

From the definition of roses intervals it is clear that she is a diagram for p running through the values p=0, ..., n-1, so it can be interpreted as a set of n graphs for each fixed is. In this sense, the rose intervals can be considered as n-dimensional.

Rose intervals virtual, for the application of the described method is not required to perform the image on any media or perform any material action. However, the process of integration of n-dimensional intervals as 2n-dimensional points in the data structure itself leads to action under the rose intervals (steps in accordance with the virtual chart).

The method is illustrated by the following drawings.

In figures 1A and 1b depicts a rose intervals, which showed a black dot corresponds to a p-projection of a given n-dimensional interval and xpand Ipthis rose coordinates {pbp}.

Marked on the figure 1A region roses mean the following:

- R1 (gray) - the scope of all p-projections of n-dimensional intervals, covering the p-projection given;

- R2 (gray) - the scope of all p-projections of n-dimensional intervals covered by the p-projection is given;

- R3 (gray) - the scope of all p-projections of n-dimensional intervals, lying to the left of the p-projection given;

- R4 (gray) - the scope of all p-projections of n-dimensional intervals, lying to the right of the p-projection given;

- R5 (white) - the scope of all p-projections of n-dimensional intervals that intersect to the left of the p-projection given;

- R6 (below the color.) the scope of all p-projections of n-dimensional intervals that intersect at right R-projection given;

- R7=R1+R2+R5+R6 - area all p-projections of n-dimensional intervals that intersect the p-projection specified.

In figure 1A the point qpis the left coordinate of the projection upone of the 2n-dimensional lenses operator corresponding interval of the query, and rp- right coordinate of the projection of vpthis lens.

Figure 1b shows a particular case of the rose intervals for all possible p-projections of n-dimensional intervals with integer coordinates (from 0 to 9) of these projections. Coordinates given interval in figure 1b are andp=3 and bp=7.

In figures 2-8 shows examples of heterogeneous 2-dimensional intervals having different positions relative to the 2-dimensional set (gray). In each of these figures shows the R-axis 2-dimensional space of these intervals. The coordinates of the R-projection interval, denoted by apand bp.

Figure 2 shows examples of 2-dimensional intervals that cover the specified, excluding those specified comes from within. Such intervals for a given can be only 2-dimensional intervals with non-zero edges.

Figure 3 shows examples of 2-dimensional intervals, which covers specified, excluding those specified on the inside. Still is and intervals for a given can be 2-dimensional intervals with non-zero edges, as well as points and segments.

Figure 4 shows examples of 2-dimensional intervals that lie strictly to the left of a specified along the R-axis of their space. Such intervals for a given can be 2-dimensional intervals with non-zero edges and points and segments.

Figure 5 shows examples of 2-dimensional intervals, p-projections which lie to the right of the p-projection of the specified, excluding those p-projections which are related to the p-projection specified. Such intervals for a given can be 2-dimensional intervals with non-zero edges and points and segments. But some of them may not lie strictly to the right of the specified value. So the case shown in this figure, is not mirroring the case depicted in figure 4.

The figure 6 shows examples of 2-dimensional intervals that intersect the left with a specified along the R-axis of their space, except those with p-projections which are related to the p-projection of a given inside and or outside. Such intervals for a given can be only 2-dimensional intervals with non-zero edges and line segments parallel to the R-axis.

Figure 7 shows examples of 2-dimensional intervals, p-projection which intersect to the right with a p-projection, excluding those p-projections which are related to the p-projection of a given inside and or outside. Such intervals for a given can be only 2-Mer is s intervals with non-zero edges and segments, parallel R-axis space intervals. But some of them may not intersect with the set. So the case shown in this figure, is not mirroring the case depicted in figure 6.

Figure 8 shows examples of 2-dimensional intervals that intersect with the given, including those that cover it, covered it, and touch it from the inside and or outside along any of the axes of their space. Such intervals for a given can be 2-dimensional intervals with non-zero edges and points and segments.

In the figure 9 in the p-axes of the basis of the 3-dimensional space shows an example 3-dimensional interval D (white), which is the composite relative to the 3-dimensional specified L (gray). This composition is that the condition of the interval D with respect to L is a combination of the conditions of their mutual arrangement of projections: 0-projection interval D is covered by 0 is the projection of L, 1-D projection intersects the left with the 1-projection of L, 2-D projection covers 2-projection Axis L. depicted on this figure are the basis correspond to the columns of the table below fuzzy ("blurry", approximate, assumed) data specified intervals: 0-axis corresponds to the column names, 1-axis - column ages, 2-axis - column scales. The figure 9 shows the parameters of the interval for the s D and L axes of their space. The parameters D correspond to the interval data of one particular row of this table, and the parameters of the L - data of a given 3-dimensional interval operator interval of the query.

The method is as follows.

All the examples dealt with in accordance with Rosa intervals (figa). In the first seven examples of the p-projection of n-dimensional interval {c0c1, ..., cn-1} are intervals cp=[apbp], where ap=3 and bp=7 for all p=0, ..., n-1, and all coordinates of the p-projection of the original intervals are integer values from 0 to 9. This occasion corresponds rose intervals shown on fig.1b.

Example 1.

Let many of the original n-dimensional intervals you want to select the data intervals that cover the specified, excluding those intervals, which are specified comes from within.

An example of this case when n=2 is depicted in figure 2.

To obtain data for all intervals of this request up to a spatial data structure S (for example, R-tree), the elements of the real data which are the original n-dimensional intervals, spatial integrated as a 2n-dimensional point whose coordinates match the coordinates of the projections of these intervals on an axis basis of their space. While the elements of the real data structure S are the Xia and simultaneously the n-dimensional intervals, and the corresponding 2n-dimensional points. These data elements are in the 2n-dimensional space structure of S simultaneously integrated as a 2n-dimensional spatial point and a n-dimensional intervals according to the criteria of intersections, attachments, coatings, touch and other of their mutual arrangement in their own n-dimensional space. Among the initial intervals of the structure's required, covering specified, always integrated in region R1 roses intervals (figa) for any R-axis space intervals. Under this determines the lens operator interval query. Definition lenses can be automated, for example, by means of software. In this example, the lens may be a 2n-dimensional interval {u0v0u1v1, ..., un-1vn-1}, p-projections which for all p=0, ..., n-1 are

up=(qp, apand vp=(bp, rp),

where for all such p values of qpchoose smaller than the smallest xpin the structure S, and the values of rp- larger than the largest ypin this structure (figa). For example, qp=-∞ and rp=+∞. With this choice of values of qpand rpp-the projection lens of the operator interval query are

up=(-∞, apand vp=(bp, +∞).

Then construct and maintain the physical nose is the body of information the operator query interval about the sample from the structure's data such points, covered this lens. This operator can have any form allowed by the syntax of the query language to the structure S. for Example:

Select from the structure's data such points that covers the lens is defined by the formula up=(-∞andpand vp=(bp, +∞) for all p=0, ..., n-1".

This is equivalent, in particular, the operator recorded on SQL - standard structured query language (Structured Query Language):

"SELECT ALL FROM 'S' WHERE x0<a0AND y0>b0AND ... AND xn-1<an-1AND In-1>bn-1".

Then using the constructed operator perform the query interval to the structure S. the Result of this query will be a lot of data 2n-dimensional point patterns S, at the same time which issued the corresponding desired intervals.

Any other interval queries perform similarly. Below are examples of

- conditions relative position of the required intervals relative to a set in their space,

formula of projection lenses, the corresponding operators of interval queries to the structure S,

- and these operators are written in SQL.

Example 2.

Let many of the original n-dimensional intervals you want to select the data intervals that are specified, excluding those intervals to the e deal given from within.

An example of this case when n=2 is shown in figure 3. For any R-axis space of intervals corresponds to the region R2 roses intervals (figa). Therefore, the p-projection lenses operator interval query for all p can be the following:

up=(ap, +∞) and vp=(-∞, bp).

Example SQL statement corresponding to this lens:

"SELECT ALL FROM 'S' WHERE x0>a0AND y0<b0AND ... AND xn-1>an-1AND In-1<bn-1".

Example 3.

Let many of the original n-dimensional intervals you want to select the data intervals that lie strictly to the left of a specified along the R-axis of their space.

An example of this case when n=2 is depicted in figure 4. For the axis p which corresponds with the region R3 roses intervals (figa), and for the other axes q≠p - region R2. Therefore, the p-projection lens can be:

up=(-∞, +∞) and vp=(-∞, ap),

and q is the projection (q≠p):

uq=[aq, +∞) and vq=(-∞, bq].

Example SQL statement corresponding to the lens (for 0<p<n-1):

"SELECT ALL FROM 'S' WHERE x0≥a0AND y0≤b0AND ... AND yp<apAND ... AND xn-1≥an-1AND In-1≤bn-1".

Example 4.

Let many of the original n-dimensional intervals you want to select the data intervals, p-projection which Le is a at right R-projection given, excluding those intervals, p-projections which are related to the p-projection specified.

An example of this case when n=2 is depicted in figure 5. For the p-axis corresponds to the region R4 roses intervals (figa). Therefore, the p-projection lens can be:

up=(bp, +∞) and vp=(-∞, +∞),

and the other projection is arbitrary.

Example SQL statement corresponding to this lens:

"SELECT ALL FROM 'S' WHERE xp>bp".

Example 5.

Let many of the original n-dimensional intervals you want to select the data intervals that intersect the left with a specified along the R-axis of their space, excluding those intervals, p-projection where p is the projection given comes from the inside and or outside.

An example of this case when n=2 is depicted in Fig.6. For the p-axis corresponds to the region R5 roses intervals (figa), and for the other axes q≠p - region R7. Therefore, the p-projection lens can be:

up=(-∞andpand vp=(apbp),

and q is the projection (q≠p):

uq=(-∞, bq] and vq=[aq, +∞).

Example SQL statement corresponding to the lens (for 0<R<n-1):

"SELECT ALL FROM 'S' WHERE x0<b0AND I0>a0AND ... AND xp<apAND Ip>apAND Ip<bpAND ... AND xn-1≤bn-1AND In-1≥an-1".

Example 6.

Let m is Oresta the original n-dimensional intervals you want to select the data interval, p-projections which overlap on the right with the p-projection of the specified, excluding those intervals, p-projection where p is the projection given comes from the inside and or outside.

An example of this case when n=2 is depicted in Fig.7. For the p-axis corresponds to the region R6 roses intervals (figa). Therefore, the p-projection lens can be:

up=(apbpand vp=(bp, +∞),

and the other projection is arbitrary.

Example SQL statement corresponding to this lens:

"SELECT ALL FROM 'S' WHERE xp>apAND xp<bpAND Ip>bp".

Example 7.

Let many of the original n-dimensional intervals you want to select the data intervals that intersect with the given, including on the inside and or outside.

An example of this case when n=2 is depicted in Fig. For any R-axis space of intervals corresponds to the region R7 roses intervals (figa). Therefore, the p-projection lens can be:

up=(-∞, bp] and vp=[ap, +∞).

Example SQL statement corresponding to this lens:

"SELECT ALL FROM 'S' WHERE x0≤b0AND I0>a0AND ... AND xn-1<bn-1AND In-1≥an-1".

Example 8.

Consider the specific case of the interval of the query.

Table.

T is D
Last nameAgeWeight
IvanovNULL75
NULL2570
Petrov35-4080
PA-Pb2090-100
-PP25-3580-120
Sidorov43NULL
.........

Suppose you have a table of fuzzy data (TND), each column of which is capable of storing as accurate data and fuzzy ("blurred", approximate, assumed)specified intervals. So this table is heterogeneous. In a similar table can be any number of records, for example, 500 million.

If the data is not known, but known range to which they belong, the interval in the table is recorded by a dash "-". For example, in the column "last Name" data "PA-Pb" means that exactly the surname of the person is unknown, but we know that it starts with "PA". The symbol "NULL" means that the data is unknown and considered as 1-dimensional interval of all possible values of the corresponding column.

Each line of the data under consideration table can be one-to-one mapped to a 3-dimensional interval protrans the VA, each axis of which corresponds to a column in the table: 0-axis corresponds to the column names, 1-axis - column ages, 2-axis - column weights.

Let this table you want to find all people whose surname obviously begin with "P", age or knowingly lies in the range of 30-40, including its ends, and the weight may or knowingly lies in the range of 90-105, including its ends. These requirements can be formulated differently. For example: "From a data set of 3-dimensional intervals you want to select the data interval:

- 0-projections, which cover 0-projection specified, including those 0-projections which are related to the 0-projection given inside the left and excluding those 0-projections which are related to the 0-projection given from the inside to the right,

- 1-projections which overlap with the 1-projection given,

- 2-projections which overlap with 2-projection given,

where the set is a 3-dimensional interval, 0-projection which is [P, P], 1-projection [30, 40], and 2-projection- [90, 105]."

p-the projection lens operator query interval can be such:

u0=[P, I] and v0=[A, P) (figa, R2)

u1=(0, 40] and v1=[30, +∞) (figa, R7),

u2=(0, 105] and v2=[90, +∞) (figa, R7),

and the SQL statement query interval, respectively, can be in the form:

"SELECT ALL FROM 'TND' WHERE x0≥P AND x0&#cha AND y 0≥A AND y0<P AND x1>0 AND x1≤40 AND y1≥30 AND x2>0 AND x2≤105 AND y2≥90".

This request corresponds to the fifth row of the table as illustrated in Fig.9.

Conditions of mutual location of the required intervals relative to a set can be combinations of any mutual arrangement of p-projections required intervals relative to the p-projection. In particular, combinations of any of the 13 basic types of mutual arrangement intervals classification Allen [3], or any of the 16 types - classification Prexy [4].

Since there are a finite number of types of mutual arrangement of 1-dimensional intervals in relation to the set, the parametric form of the operators of interval queries can be created for the dimension not exceeding n in advance. For example, in this syntax:

Select from the structure's data such points that covers the lens is defined by the formula up=(apbpand vp=(cpdpfor all p=0, ..., n-1".

Then constructing operators patented method will consist of substitution of input values of the selection conditions in the parameters andpbpcpdplenses of these forms.

In addition, the query operators may include additional terms. For example, you the op only data which correspond to the segments parallel to a particular axis space intervals, and with the additional option of color, which is green. Or, another example, when the initial conditions included additional conditions of mutual locations of required intervals relative to the second interval different from the first. These conditions correspond to a complex combination of interval queries regarding several lenses.

If the existing structure's undergoing some modifications, for example, due to insertions, deletions or changes its data, these modifications are a new process of its formation, different from the previous one. Therefore, mixed action, which includes various modifications of the structure S and interval queries to it, fall under the formula patentable way to store and fetch data n-dimensional intervals.

For the implementation of the proposed method as a structure S can be chosen arbitrary spatial data structure. If such data structure is the structure, oriented on the use of memory blocks (for example, R-tree), which merged its data elements n-dimensional intervals and which are put in correspondence the other intervals, the latter can be regarded as elements of the real data more vyshegoroda this structure. Therefore, the k-dimensional intervals any level according to the method of the invention can be represented as the corresponding 2k-dimensional point. In this case, the data of the data elements of the lower level data structures are "fractal nested in the data element data of its higher levels. This approach means multi-level use of the proposed method in one spatial block data structure. In addition, the splitting (split) point data blocks block spatial data structures can be implemented without overlapping. This means the structural closeness close spatial elements of the real data in such a structure that enables during sampling processes to access the required sets for the unique ways in minimal time. All this allows a block of spatial data structures are fully optimized to run at any interval queries.

The performance of interval queries according to the method of the present invention is that the data structure of the method, the requested elements of the real data is always integrated spatial points and integrated as intervals on various criteria of their mutual arrangement in their own space, which provides a high soon the th processes fetch the required data.

Thus, the present invention, unlike known, allows in the same spatial data structure to efficiently perform any interval queries with different conditions relative position of the required intervals for arbitrarily given. Describes how, in particular, opens the way to the creation of specific classes of effective storage and databases fuzzy data, optimized to run at any interval queries. This method can be effectively applied also in such spatial data models, such as:

- GIS

- mining systems and artificial intelligence,

- system monitoring and control dynamic objects

- system monitoring and management processes

systems for decision support,

- system analysis, planning and optimization

chronological, information and statistical system,

information systems, transportation networks,

- search and discover patterns.

In addition, as has been shown, any interval queries the sampling data of n-dimensional intervals in the described manner are reduced to the simplest samples of the 2n-dimensional point data covered some lens. Such requests are easily constructed and executed using the SQL language. So opican the second method can be used effectively without having to reorganize the data structures of the existing spatial models. This is possible by creating the appropriate drivers, which are the input n-dimensional intervals are converted into 2n-dimensional point and the input interval queries interval - interval in the query points, delivering these results to existing data structures.

Sources of information

1. A.Guttman. "R-trees: a dynamic index structure for spatial searching". Proceedings of the SIGMOD, p.47-57, Boston, Massachusetts, 1984, ISBN: 0-89791-128-8.

2. E.M.McCreight. "Efficient algorithms for enumerating intersecting intervals and rectangles", Technical Report CSL-80-9, 1980.

3. J.F.Allen "Maintaining Knowledge about Temporal Intervals". Communications of the ACM, Vol.26, No. 11, p.832-843, 1983.

4. .Freksa. "Temporal reasoning based on semi-intervals". Artificial intelligence 54, p.199-227, 1992.

The method of storage and retrieval of data of n-dimensional intervals, namely, that form such spatial structure S data, the elements of the real data which are the original n-dimensional intervals, spatial integrated as a 2n-dimensional point whose coordinates match the coordinates of the projections of these intervals on an axis basis of their space; determine the lens, which is the 2n-dimensional interval operator query interval, which is a statement about the sample data required n-dimensional intervals with the description given interval and terms of location relative to it required, and the lens is determined according to rose intervals, representing virtual dome the strong geometric chart areas such 2n-dimensional points in the x pand Iptheir space coordinates {xp,p} which are the corresponding coordinates of the p-projections corresponding to these points of n-dimensional intervals, where p is the projection of n-dimensional interval is its projection on the R-axis basis of its space; design and maintain the physical media operator query interval about the sample from the structure's data such points that are covered by this lens; perform using this operator query interval to the structure S, which will result in the data set 2n-dimensional points that are both data corresponding to the searched interval.



 

Same patents:

FIELD: information extraction systems engineering.

SUBSTANCE: system contains data storage, analysis mechanism of lower level, analysis mechanisms of higher level, indexer. In accordance to method, on basis of first set of rules, appropriate for first analysis mechanism, first key is connected, which is sent output second analysis mechanism, in which second key is generated based on second set of rules, first and second keys are connected to objects and keys and key values are indexed.

EFFECT: decreased time and computing resources spent on processing of large data arrays to extract required information.

2 cl, 5 dwg

FIELD: criminalistics and forensic examination.

SUBSTANCE: automated workplace consists of stand for researching electronic information carriers and personal computer. Stand featured in invention consists of controllable commutation device, ensuring possible mating of electronic information carrier and personal computer, and a source of controllable voltage. Controllable commutation device has m+n inputs/outputs and is represented by a set of m·n controlled rectifying cells, forming a commutation matrix of m×n dimensions, connecting 1÷m and (m+1)÷(m+n) inputs/outputs, while m=k+1, numbers k and n corresponding to maximal values of numbers of contacts of sockets of personal computer and electronic information carrier, respectively. Controllable rectifying cell is in turn represented by device, providing controllable capability of one-direction commutation with controllable transfer coefficient.

EFFECT: no limitations on types of electronic information carriers being connected, increased quality and speed of reviews of electronic information carriers, in other words, suggested automated workplace allows highly reliable fast access to information, stored in memory of electronic information carrier received for review, while quantitative and qualitative characteristics of electronic information carriers are not changed.

3 dwg

FIELD: informatics; computer technology.

SUBSTANCE: device can be used for soling tasks of composing dictionaries, manual as well as for creation of new databases. Device has entrance memory unit, processed words memory unit, unit for analyzing search, substitution memory unit, substitution unit, result storage unit, control unit.

EFFECT: widened functional abilities; improved reliability of operation; simplified algorithm of operation.

16 dwg

FIELD: electric communications, possible use for finding and quickly identifying information in multi-service digital data transfer networks with commutation of packets.

SUBSTANCE: device contains N generators of time intervals, N selection blocks, frequency divider, N temporary storage registers, N two-input AND elements, solving three-input element AND, N-input OR-NOT element, electronic key, mask storage register, n-input AND-NOT element, control block.

EFFECT: expanded area of possible use of device, increased speed of operation.

5 cl, 6 dwg

FIELD: syntactic analysis of bit stream, containing data having structure and content, matching certain format, possible use for generation of tree-like representation of said stream.

SUBSTANCE: proposed scheme is produced from XML, making it possible to describe encoding format in generalized form. Such scheme is used for performing syntactic analysis of stream of bits for production of document, which represents a stream of bits, which acts as a sample of aforementioned scheme, or for generation of stream of bits from document, representing the stream of bits.

EFFECT: increased resistance to interference.

7 cl, 3 dwg, 4 app

FIELD: statistical language models, used in speech recognition systems.

SUBSTANCE: word indexes of bigrams are stored in form of common base with characteristic shifting. In one variant of realization, memory volume required for serial storage of bigram word indexes is compared to volume of memory, required for storage of indexes of bigram words in form of common base with characteristic shifting. Then indexes of bigram words are stored for minimization of size of data file of language model.

EFFECT: decreased memory volume needed for storing data structure of language model.

7 cl, 4 dwg

FIELD: communication systems; method for storing geographical information in communications center.

SUBSTANCE: geographical data is received, authentication query is sent to geographical data authentication database, which communicated with communications center. Answer for authentication query is received, and geographical data is stored in informational storage, which is a database, which communicates with communications center.

EFFECT: increased accuracy of service rendering corresponding to location in communication network on the basis of previously stored location information.

10 cl, 5 dwg

FIELD: information search means, database structures.

SUBSTANCE: two data areas are created. At least one of them is resident area, and at least one other area is non-resident for searched data object query source. Control data objects array is created in resident area, and/or control data objects array with corresponding to each object initial hyperlinks as linked data. In nonresident area control associated information data objects array is created and/or control associated information data objects array with corresponding to each object associated data and/or at least one secondary hyperlink.

EFFECT: simplified logical and physical database organization with permanent renewal of control associated information data objects, and increased performance of system due to simplified functioning of informational network communication nodes.

37 cl, 1 tbl

FIELD: computer engineering, automated system for collecting and processing electronic polls data.

SUBSTANCE: system consists of input messages receiving unit, data from server database receiving unit, election committee identification unit, first and second units for candidates base addresses identification, polls results disclosure time cycles selection unit, polls results recording time cycles selection unit, input messages receiving time cycles selection unit, database read and write signals forming unit, final polls results data forming unit.

EFFECT: increased system performance due to database entries address localization using receiving messages identifiers and forming of progressive total of polls results in real-time.

9 dwg

FIELD: computer engineering, systems for supporting informational identity of geographically distributed databases of airline companies.

SUBSTANCE: systems consists of address identifiers unit, memory area identification unit, input message target selection unit, database entries base address selection unit, adder, read signal forming unit, six registers, database entries identification unit, entries quantity identification unit, counter, control signal forming unit, OR elements.

EFFECT: increased system performance due to database entries addresses localization using data sources and flights identifiers.

9 dwg

FIELD: data access technologies.

SUBSTANCE: method includes assignment of simplified network address, recording URL and converting numbers into storage system with net access, inputting assigned number into computer, transferring inputted number to storage system, converting number to URL, receiving page matching URL, and displaying it. Method for use in operation systems for message transfer include intercepting system level messages to certain objects and forming pseudonym messages during that. Systems realize said methods.

EFFECT: broader functional capabilities.

12 cl, 30 dwg

FIELD: computers.

SUBSTANCE: system has entries memory block, words memory block, control block, substitutions block, n blocks for searching and replacing.

EFFECT: broader functional capabilities.

17 dwg

FIELD: computers.

SUBSTANCE: system has nine registers, four address selectors, triggers, AND elements, OR elements and delay elements.

EFFECT: higher speed.

8 dwg

FIELD: computers.

SUBSTANCE: system has operation mode setting block, first and second blocks for selecting records addresses, block for forming addresses for reading records, data output block, first and second record codes comparison blocks, records quality comparison block, year intervals comparison block, records selection control block, register, adder and OR elements.

EFFECT: higher speed of operation.

10 dwg

FIELD: computers.

SUBSTANCE: system has memory for programs, including browser, display block, database for storing documents, addressing control block, while each document of base has at least one link with indicator of its unique number and indicator with address of program for control stored in addressing control block, system contains also, connected by data buses and control of other blocks of system, memory for links of couples of unique numbers of links and forming means for lists of unique numbers of documents links, which are interconnected.

EFFECT: higher efficiency.

2 cl, 1 dwg

FIELD: telecommunication networks.

SUBSTANCE: messages, sent by cell phones, are formed by means of printed and public-distributed classifier, wherein at least one category is made with possible detection of at least one identifier of individual mark of object, identifier is sent by sender via at least one message to computer server with software, which transfers such message into database record at server for its transfer to at least one receiver, or searches for such record in database at server in accordance to received message and transfers to sender of such message at least one found database record.

EFFECT: broader functional capabilities.

2 dwg

FIELD: web technologies.

SUBSTANCE: method for integration of printed business documents, requiring original signature, with electronic data concerning these documents and later extraction of data, inputted for forming documents, is characterized by steps for forcing end user or agent to input all necessary data for forming of required document, saving collected data in database, linking saved data to unique ID code and printing unique ID code on printed document during printing. Printed documents is signed by end user and sent together with supporting documentation. When document is received by business-client, business-client inputs ID code, which is then used for access to saved data, and updates private database of business-client with all data, used for creation of original documents.

EFFECT: higher efficiency.

2 cl, 7 dwg

FIELD: computer science.

SUBSTANCE: device has string memory block, comparator, memory block for words and substitutes, block for analysis and forming of displacement results, block for storing string address, control block.

EFFECT: broader functional capabilities, higher reliability.

10 dwg

FIELD: data bases.

SUBSTANCE: method includes presenting operations at all levels of company in form typical product life cycle tree, wherein existing objective functional-technological connections of each manufacture stage are decomposed, and forming information system in form of pertinent-relevant complex information system and search, for which typical structure-information modules of information system are formed, system objective information requirements of data consumers, being a result of decompositions by levels of operations and problems, are determined as precisely as possible, data base of found documents in form of files is formed of key nodes with set of elementary data block for each system information requirement and files of information system modules, starting from lower levels of current stage and then upwards, while each data block has a list of pertinent documents ordered by determined information requirements.

EFFECT: higher search efficiency.

13 cl, 11 dwg

FIELD: computer science.

SUBSTANCE: system has first, second, third, fourth and fifth registers, first and second memory blocks, first, second and third decoders, triggers, elements AND, OR and delay elements.

EFFECT: higher speed of operation.

1 dwg

Up!