Assigning actionable attributes to data describing personal identity

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to database search means. The method includes receiving a request to initiate a search for data for a specific individual; determining, based on the request, a strategy to search a reference database; searching the reference database, in accordance with the strategy, for a match to the request and outputting the match; extracting, from said request, an attribute that is relevant to the search; assigning a weight to the attribute, thus yielding a weighted attribute, wherein said weight is indicative of the usefulness of the attribute in finding a match to the request; establishing a function, based on said weighted attribute; retrieving from the reference database, candidates having attribute values that indicate likely matches to the request, based on said function; determining a best candidate from said candidates and returning said best candidate as the match, wherein the request includes a request value for the attribute; modifying the weight depending on the number of records in the reference database that have the request value for the attribute.

EFFECT: improved match of the result with the request data.

9 cl, 2 dwg, 8 tbl

 

Area of technology

[0001] the Present invention relates to search in the database and, in particular, to search for records in the database that provides the best match with the query connected with identity, which can include the expected and not expected attributes of the data and to retrieve the record that provides the best match with the query, together with actionable feedback that explains the event and the result of coincidence.

The level of technology

[0002] the Approaches described in this section are approaches that you can follow, but not necessarily approaches that have been previously considered or used. Therefore, unless otherwise indicated, the approaches described in this section may not correspond to the prior art, which is repelled by this order, and are not recognized as prior art by inclusion in this section.

[0003] the Possibility of effective treatment to a database and important for the effective use of data maintained in the databases of reference data in order to find a match. To solve this problem, the critical role played by the ability to provide efficient extraction of the result of the match, i.e. the result of the coincidence of the reference data with the request, �which includes personal pointers which are assumed to be part of the query, as well as previously unknown pointers for identification and selection of results establishing coincidence efficient and economical manner, and to provide actionable feedback, which you can use to make business decisions regarding the use of the results of establishing coincidence, for example, for efficient current control data.

[0004] In respect of the identification of the current technology addresses the specific and finite number of data fields, such as names, physical and electronic addresses, titles and aliases, or a set of uncertain data components, which may include or not include the information associated with the personality. This existing technology is generally based on a character-by-character heuristic or mathematical comparison, which gives an estimate of precision based on the number of matching symbols or other reference correlation information, taking into account a possible spelling variations, such as different spellings of specific words and the use of hyphens, capital letters, hyphens, punctuation, well-known abbreviations and synonyms. Further, the present technology involves a specific data structure of the request and does not allow the use�one end, but temporarily unlimited variety of valuable predictive data elements or other derived pointers associated with the personality that have been verified and synthesized or aggregated into a database of individuals for use in the process of establishing a match.

Summary of the invention

[0005] Provided a method that includes the steps in which (a) accept request to initiate a search for data for a particular individual, (b) determine on the basis of request a search strategy in the database of reference data, (c) makes a search in the database of reference data in accordance with the strategy matches to the query and (d) outputting the result of the match. The method can also provide feedback associated with the coincidence that expresses the estimated quality of a coincidence that the end user can use to determine the extent to which a matching object meets the quality criteria of this end user. Also provides for a system that performs the method, and storage medium that contains instructions that control the processor to perform a method.

[0006] the Requests are processed for recognition and synthesis of pointers to the query that includes both the expected and not expected components of the data for the evaluation and selection of candidates. The fair�full-time data concerning individuals, maintained in the database, they are accessed, they are estimated and used to identify matches with the query. The requesting party or the requesting system are the result of the match and practically applicable data, including guiding accuracy, which describe the relative strength of the result of the match, and attributes for specifying feedback data and alternative indicators, which were used for the propagation of the match.

Brief description of the drawings

[0007] Fig.1 - functional block diagram of a method that assigns identity to apply the attributes.

[0008] Fig.2 is a block diagram of a system for employing the present invention.

Description of the invention

[0009] the Pointers represent the information associated with the identity. Pointers include the recognized attributes of the request, i.e. data components, which are the expected components of a query, such as your name, address and date of birth of the individual, or which are particularly specified in the request, for example as metadata using the column headers in the file or specific data entry fields in the online application that can be used with other data to uniquely identify �licnosti. Pointers can also include the attributes that you've never met, and alternative ways of expressing or estimation of data values, such as alternative spellings of names.

[0010] feedback is information about the coincidence, which expresses the estimated quality of the event matches in the confidence expressed by the degree of overlap between the query and a candidate for a match, a comparative rating of each data field used in the event of coincidence, and specifying a data source that was used to match the query. The end user can use feedback to determine the extent to which a matching object meets the quality criteria that the end user, and to initiate various actions and control interventions on the basis of this feedback.

[0011] Fig.1 shows a functional block diagram of a method 100, which assigns data of a person's identity to apply the attributes. In short, the method 100 103 accepts the request and performs the processes 115, 120, 125, 130 and 135 to establish coincidence of the data from the query 103 with the data in the database 110 reference data and, thus, the result of the issuance of 160.

[0012] the Method 100 uses the rules 104 processing, spreadsheet, 105 attributes, and the frequency table 109 and the intermediate floor�groin generates data 140, attributes 145, function 150 and the best candidate 155.

[0013] Each of the processes 115, 120, 125, 130 and 135 are described here in relation to their respective General operations. Each process 115, 120, 125, 130 and 135 may be configured as a standalone process or as a hierarchy of child processes.

[0014] the Request 103 is a request that initiates a search for information about a specific person. The search is performed on the basis of the pointers included in the request 103, and therefore the inquiry 103 includes a plurality of data elements, which, in turn, include specific information relating to the person in respect of the data fields that represent the full set or subset of pre-defined recognizable attributes that are defined in regulation 104 of the processing and the attribute table 105, and may also potentially include additional and virtually unlimited pointers relating to the person. Request 103 may be provided to the method 100 of a human user or an automated process. For example, a request 103 may receive from an individual request that is processed using screens online data entry or file transfer using batch processing. Request 103 includes data that way 100 performative� as data 140 and the method 100 will be used to uniquely identify an individual. Data 140 may include, for example, such data as name, address, date of birth, social security number and other forms of identification.

[0015] the Base 110 reference data is a database of information about the identity with the highest degree of personal and professional information, i.e. the well-known attributes relating to each personality. Processes (not shown) are used to estimate data that are then entered into the database 110 reference data, which can then be used in order to find a match. Through the recruitment of additional processes (not shown) of the base 110 reference data can be updated to include additional information about the identity that is already present in the database 110 reference data, and to include information on additional personalities.

[0016] Rules 104 processing include automated and repeatable business rules and metadata (hereinafter referred to as "the rules") on the basis of the processes of standardization and normalization, which include semantic and numeric logic to disambiguate the interpretation of the values of the query, such as various combinations of words (name/middle name/last name or first name/last name/middle name, and various procedures like other permutations of attributes, including a full set or under�of nousta attribute name), addressing (individual address components or mixed address) and different date formats. Rules to define metadata information about each data item, for example, (a) whether it is alphabetic, i.e., consisting of letters of the alphabet, numeric or alphanumeric, i.e. consisting of letters and numbers, (b) permissible size and (c) formatting. Processing rules business rules define the actions performed on the basis of the value of one or more data elements, such as a condition that must be met before you can perform subsequent operation or calculation.

[0017] an Example of processes of standardization in rules 104 treatment involves replacing different versions of the abbreviated spelling of the word "street" in the address, such as "St." and "Strt" General and consistent value, such as "street". An example of a process of normalization in rules 104 treatment involves replacing common words or abbreviations, such as "manufacturing" and "mnfctring", the abbreviation "mnf" as a common term to facilitate the establishment of a match. An example of semantic logic and logic disambiguate includes separating a street address into separate fields for house number and street name.

[0018] the attribute table 105 is a table of recognized attributes, i.e., data fields that can �be associated with the data which can identify a person. Table 105 attributes also includes metadata that define the characteristics of recognized attributes. Metadata are information about data, that is, describe the characteristics of the data. For example, in table 105 attributes can be specified the attribute "name" and it can include metadata regarding the name that indicate that the name must be a string of alphabetic characters. Table 105 attributes can also be updated with data from the data 140 to include attributes that have not been recognized previously, for which you can define predictive weighting or other information. The values in the table 105 of the attributes will be monitored and adjusted when performing database updates 110 reference data.

[0019] In a frequency table 109 shows the number of records in the database 110 of reference data that have specific values for specific attributes. Thus, a frequency table 109 is generated from the database 110 reference data to identify the frequency (F) of occurrence of particular data values in the database 110 reference data. For example, the base 110 reference data can have 5,647 occurrences of "Jon" as the name, 893 occurrences "Smythe" as the last name and 197 occurrences "Jon Smythe" as a combination name. Accordingly, a frequency table 109 bude� indicate that (a) the name "Jon" has a frequency 5647, (b) the name "Smythe" has a frequency of 893 and (c) a combination of first/last name "Jon Smythe" has a frequency of 197. Frequency table 109 is updated when you update a record in database 110 reference data.

[0020] Method 100 begins with process 115.

[0021] the Process 115 103 accepts the request pointers and structures from the request 103 in a common format, i.e. data 140. In the following table 1 shows an approximate representation of the data 140. In table 1, data 140 is shown as an exemplary set of data elements, presented in a rough format of expected values of inquiry, such as name, address, city, state, zip code and phone number.

Table 1
Approximate data representation 140
ActionApproximate data representation 140
The process took 115 request 103, which included pointers, expressed as individual data elements or data fields that provide specific information relating to the identity, and formed data 140.Jon Smythe, President
350 Sixth Ave Suite 7712
Manhattan, NY 10118A
(917) 555-5555
01271960
123-456-7890
mailto:jsmith@abc.com
http://www.abcllc.com

[0022] the process 115, the method 100 proceeds to process� 120.

[0023] the Process 120 140 analyzes the data to identify the specific data fields that are associated with the attributes in table 105 attributes to extend the identification of matches from the database 110 reference data using one or more of these data fields. In this regard, the process 120 selects data from 140 attributes that belong to the matching, thus revealing the attributes of 145.

[0024] the Process 120 is carried out in accordance with regulation 104 of the processing for the purification, analysis and standardization of all components of the entered data values of the query expressed in the data 140.

[0025] Treatment involves removal of outliers, such as punctuation and other forms of useless characters, such as dashes in the phone number of digits of the fraction, separating the components of a date. For example, treatment of a date value, presented in the format 01/27/60, leads to the value 012760.

[0026] the Analysis includes data partitioning 140 to improve the ability to identify matches with the query 103. It may include the decomposition of individual signs the request for multiple elements of data, such as the separation date of birth 012760 presented in the format MMDDYY, on the individual elements, which include the month (MM (01)), number (DD (27)) and year (YY (60)). The analysis also may include merging separate e�elements, for example name (John), second name or initial (Q) and surname (Public) in one element, such as the name (JohnQPublic).

[0027] the Standardization includes the binding of alternative values with data 140 to improve the ability to identify matches. It may involve the binding of two character values (NJ) for several values of the query that represent the name of your state (New Jersey N Jersey; New Jrsy).

[0028] the Process 120 also uses the rules 104 processing for the analysis and preservation of information from data 140 that had never met before, to generate new rules, which will be stored in rules 104 processing for use in the further implementation of the process 120. New rules can be determined automatically on the basis of analogies with existing regulations. Thus, the pointers are included in data 140, but is not defined in table 105 attributes, i.e. additional pointers will be stored for later use by the processes 120 and 125, and may use processes 130 and 135 for processing candidates identified from the database 110 reference data. Method 100 includes the ability for the automated preservation of these additional pointers to develop and define the attributes that will be listed in the attribute table 105, and the development of appropriate rules that Boo�ut made to rules 104 processing.

[0029] Thus, the process 120 140 analyzes the data, and if the process 120 detects the absence of rules in rules 104 processing for specific data, the specific data is stored in rules 104 marked for processing and analysis. For example, if the request 103 contains the address of the email and if the email address is a previously unrecognized value and therefore has no corresponding rule in the rules 104 processing rules 104 processing can be updated by the upgrade process (not shown) to store e-mail addresses as new signs that can be recognized by the attribute.

[0030] the following table 2 shows a rough idea of the rules 104 processing, and table 3 shows an approximate representation of the attributes 145. Examples of processing rules include (i) separation field name data 140 to separate fields name and surname, (ii) separation of the address field data 140 to separate fields house number and street name, and (iii) the separation of the field date of birth data 140 to separate fields month, day and year. Flexible signs include data from data 140 that were not previously identified as data that are considered part of the query, but which must be kept by the rules 104 processing for future events coincidences. They include in with�as BOJ data which can be classified on the basis of templates and data in any form.

Table 2
A rough idea of the rules 104 processing
A ruleThe approximate result
Rules for metadata (examples)
Name: decompose the full name request the individual values of the name and surname and remove outliersName: Jon
Last name: Smythe
Address: move all values of an address query for a single value, standardize the value for "city", and clear the postcodeHouse number address: 350
Name street address: Sixth Ave
Alternative name street address: 6th
(determined on the basis of alternative logic in rules 104 processing)
Address2: Suite 7712
Alternative address2: 7thfloor (determined on the basis of alternative logic in rules 104 processing)
City: New York (instead of "Manhattan", which is not a city)
State: NY
Postal code: 10118 (remove "A" as extraneous data)
The rules of the business regulations (example)

Date of birth (DOB): to lay out the full date of birth on the individual values of day, month, year, to standardize for the treatmentDOB/MM: 01 (instead of "Jan")
DOB/DD: 27
DOB/YY: 60
Flexible pointers
Assumed values based on the formatemail address: mailto:jsmith@abc.com

Table 3
A rough idea of the attributes 145
AttributeValue
Recognized attributes
NameJon
Last nameSmythe
House number address350
The name of the street addressSixth Ave
Address2Suite 7712
CityNew York
StateNY
Postcode10118
Phone number(917) 555-5555
DOB/MM01
DOB/DD27
DOB/YY60
The mobile phone number1234567890
Flexible pointers
The e-mail addressmailto:jsmith@abc.com
PositionPresident
Unified pointer
resource (URL) of the company
http://www.abcllc.com

[0031] for Example, according to table 2, rule 104 processing indicate that the name should be placed on a separate name and surname. Thus, "Jon Smythe" is decomposed into the name "Jon" and last name "Smythe" and stored, as shown in table 3.

[0032] Method 100 moves from process 120 to process 125.

[0033] the Process 125 references table 105 attributes to further Refine the attributes 145 for the development of the function 150. For each attribute from the attributes 45 process 125 assigns a weight based on the relative value of the effect attribute in identification, thus providing a weighted attribute, where the weight indicates the usefulness of the attribute when a match with the data 140. For example, this definition would include weighing, as defined in table 105 attributes, which provides a static weighing, for example, the name has a higher weight than the address, and the weighting in relation to other populated fields defined in table 105 attributes such as employment commencement date more valuable when she is at least 18 years greater than the date of birth, and the weighting based on the actual values of the data field defined in table 105 attributes, such as an unusual name, such as Erasmus, has a higher weight than more common name, for example John. This analysis also considers alternative values of data fields in the attributes 145, such as acronyms and alternative spellings (for example, Jon and Jonathan as the name). In addition to static weighing the attribute table 105 assigns attributes adjusted weighting coefficients on the basis of the absence or presence of data values for other attributes and evaluation predstatelnoj. For example, the weight of the name is less important in the absence of data for names, and the combination of house number and street name has more weight than these two fields in separate�STI.

[0034] the Process 125 determines the optimal search strategy in the database 110 reference data and presents this strategy as a function 150, represented here as f(x). In particular, the process 125 receives the weight (W) from table 105 attributes and frequency (F) from frequency table 109, and calculates the predictive weighting (K), where K=W×F, for each attribute (x), thus K(x), where K(x) - predictive weighting of attribute x. Function 150 can calculate multiple values of f(x) based on different combinations of attributes, such as name and DOB or name and DOB, and the results of a calculation process used by 125 to determine the optimal search strategy. Function 150 has the following General format:

f(x)=K1<field1>+K2<field2>+K3<pole>+...+KN<N>,

where K is calculated for each component attribute 145.

[0035] the following table 4 shows an estimate of 105 table of attributes, and table 5 shows an estimate of the frequency table 109.

Table 4
Approximate 105 table view attributes
AttributeMetadataWeight (W)
Recognized attributes:
Namealphabetic0,25
Last namealphabetic0,5
House number addressalphanumeric0,4
The name of the street addressalphabetic0,8
Address2alphanumeric0,25
Cityalphabetic0,9
Statealphabetic0,9
Postcodealphanumeric0,75
Phone numberdigital0,5
DOB/MMdigital0,3
DOB/DDdigital0,2
DOB/YYdigital 0,5
The mobile phone numberdigital1
Flexible pointers
The e-mail addressalphanumeric1
Positionalphanumeric0,2
URL the companyalphanumeric0,7
The combination of attributes
First/last namealphabetic0,9
DOB/MMDDYYdigital0,7

[0036] In the example shown in table 4, table 105 attributes includes the attribute "name", metadata indicating that the name must be a string of alphabetic characters, and for a name - the weight (W)=0,25. Weights (W) a co�ut relative impact of attributes in the query 103, expressed in data 140 that identifies the coincidence of the base 110 reference data. In the example presented in table 4, when the attribute is set to W=1, this attribute is considered to be the best predictor of a coincidence than an attribute with a value of W less than 1. For example, if the request 103 includes a personal mobile phone number, which is the attribute whose value can be considered unique, personal mobile phone number will largely affect the event matches than the name, which is likely to have a more common value.

Table 5
A rough idea of the frequency table 109
AttributeFrequency (F)
Name = Jon5,647
Name = Smythe893
Name/name = Jon Smythe197
DOB=012760211
Room mobile�nogo phone=1234567890 1

[0037] In the process 125 the definition of predictive weighting may consider the relationship between attributes and calculating the modified weight on the basis of this ratio. For example, although the name will have their own predictive weights, the combination of these first and last name may be more or less predictive predictive in identifying the proper match in the database 110 reference data. For example, you may experience more frequent occurrences of the combined name and last name "Jon Smith" in the database 110 reference data, which is reflected in the frequency table 109, than Erasmus Hoffert". The combined value name may have a frequency (F) according to the frequency table 109, to indicate more predictive weight or less predictive weighting.

[0038] As noted above, for each attribute (x) process 125 receives the weight (W) from table 105 attributes and frequency (F) from frequency table 109, and calculates the predictive weighting (K), where K=W×F. On the basis of different combinations of attributes can be calculated multiple predictive value weighting. For example, using the sample data given in table 4 and table 5 for one computation of f(x):

[0039] name=Jon, K1=0,25×5647=1411,75

[0040] for surname=Smythe, K2=0,5�893=446,5

[0041] accordingly, f(x), i.e., function 150, for the first and last name will be expressed as:

[0042] f(x)=1411,75 <name "Jon">+446,5 <name "Smythe">

[0043] using the sample data given in table 4 and table 5 for the second computing f(x):

[0044] for name/name=Jon Smyth, K1=0,9×197=177,3

[0045] DOB/MMDDYY= 012760, K2=0,7×211=of 147.7

[0046] accordingly, f(x), i.e. the function 150 for first/last name and DOB/MMDDYY is expressed as:

[0047] f(x)=177,3 <name/last name "Jon Smythe">+of 147.7<DOB/MMDDYY "012760">

[0048] In the General case for this attribute weight (W) increases, if the attribute is a good predictor of coincidence, but an increase in the frequency (F) indicates that the attribute is not a good predictor of coincidence. Consider the example of a search for identity that bears a common name, for example "John", but a unique mobile telephone number, for example "1234567890", and respectively in a frequency table 109, for the name "John", (F)=10000, and mobile phone number "1234567890", (F)=1. Predictive weighting (K), where K=W×F, for these attributes, on the basis of table 4, is K<name "John">=0,25×10000=2500, and K<a mobile phone number "1234567890">=1×1=1. Thus, it seems that f(x) the name "John" has a higher predictive weighting than the mobile phone number "1234567890". However, based on the run the actual logic lower�nd f(x) may be more predictive, the higher f(x).

[0049] Although in the present example, the function 150 is represented as the sum of the works, function 150 is not necessarily the sum or the arithmetic equation. In the General case, the function 150 is a weighted list of attributes, where the weight for a particular attribute or combinations of attributes specifies predstatelnoj and, consequently, the importance of this attribute or combinations of attributes when identifying an appropriate match with the entry in the database 110 reference data.

[0050] the Method 100 moves from process 125 to process 130.

[0051] the Process 130 searches the database 110 reference data in accordance with the function 150, i.e., the strategy that was defined by the process 125, and creates the best candidate 155. In particular, the process 130 retrieves records from the database 110 reference data in accordance with the function 150. Then the process 130 compares the attributes of these records with the data 140 and based on the comparison, selects from the database 110 reference data set of candidates that are most likely to provide the coincidence data 140. After that, the process 130 evaluates a set of candidates by comparing the value of each attribute extracted from the database with reference data 110 with the value of the same attribute from the data 140 to determine definitively the best candidate for matching, i.e. the best candidate 155./p>

[0052] the following table 6 shows the approximate representation of the set of candidates from the database 110 reference data.

Table 6
Approximate a set of candidates from the database 110 reference data
The record numberFieldValue
NameJonathan
1Last nameSmith
House number address350
The name of the street address6th Ave
Address2(void)
CityNew York
StateNY
Postcode10118
Phone number (void)
DOB/MM(void)
DOB/DD(void)
DOB/YY50
The mobile phone number1234567890
NameJohn
2Last nameSmarth
House number address340
The name of the street address5th Ave
Address27thfloor
CityNew York
StateNY
Postcode10118
(917)555-5000
DOB/MM(void)
DOB/DD(void)
DOB/YY(void)
The mobile phone number(void)

[0053] the Best candidate 155 is the entry from the set of candidates that has the greatest similarity with the data of 140 obtained by the methods of screening, recruitment in process 130. Such methods include consideration of the data source where the content database 110 reference data, and quality assessment relating to these data (if some sources are deemed more relevant and higher quality than other sources).

[0054] for Example, entry 1 in table 6, the process 130 compares the data value for the attribute "name" from data 140 ("Smythe") and the base 110 reference data ("Smith") and identifies a high degree of similarity, and for the attribute "name street address" that has the value "Sixth Ave" in the data 140 and 6thAve" in the database 110 reference data. For entry 2 in table 6, the process 130 compares the data value for the attribute "name"from data 140 ("Smithe") and the base 110 reference data ("Smarth"), and defines a lower degree of similarity, and for the attribute name and the street address, the process 130 determines the lack of similarity between the "Sixth Ave" in the data 140 and 5thAve" in the database 110 reference data.

[0055] the following table 7 shows a rough idea of the best candidate 155.

Table 7
Rough idea the best candidate 155
AttributeValue
NameJonathan
Last nameSmith
House number address350
The name of the street address6th Ave
Address2(void)
CityNew York
StateNY
Postcode10118
Phone number(void)
DOB/MM(void)
DOB/DD(void)
DOB/YY50
The mobile phone number1234567890

[0056] the Method 100 proceeds from the process 130 to process 135.

[0057] the Process 135 160 outputs the result, which includes the best candidate 155A and feedback 165. The best candidate 155A is a copy of the best candidate 155. Feedback 165 is information concerning the degree of similarity between the data 140, and the best candidate 155A, which is applicable in practice, i.e. can be used by the end user to make business decisions.

[0058] the feedback included 165 160 with the result to indicate the quality of the best candidate 155A, for example the level of confidence that the best candidate 155A provides an adequate match to the query 103. Feedback 165 may also include the relative degree of similarity, expressed relative correlation between each field in the data 140 and the components of each best candidate 155A. This feedback is expressed in three components: (1) code of reliability that indicates the relative degree of confidence in the similarity between data 140 and candidates in the database 110 of reference data; (2) the string class matches, which indicates the degree of similarity between attributes of data 140 and candidates in the database 110 reference data; and (3) p�file data matches which specifies the type of data in the database 110 reference data that were used in the event of coincidence. These feedback components the end user can use to define business rules prescribing the use and consumption of coincidences identity to this end the user can make business decisions regarding the event of coincidence, on the basis of the extent to which a matching object meets the quality criteria that the end user, and for current control interventions. These feedback patterns can be flexible, reflecting the scope and flexible start pointers in the query 103. The user may be provided additional opportunities view and review data for queries that may not produce a match.

[0059] the following table 8 shows an estimate of the feedback 165.

Table 8
A rough idea of feedback 165
Code validity: 8
Class string matches:
Name: A
Name: A
House number address: B
Street name addresses: A
Address2: Z
City: A
Postcode: B
State: A
Phone number: Z
Mobile number: A
Profile data matches:
Name: 03
Surname: 03
House number address: 00
Street name address: 00
Address2: 99
City: 00pm
Postal code: 00
State: 00
Phone number: 98
DOB: 98
Mobile phone number: 00

[0060] the feedback line class matches can be determined using the structure of the encoding as follows: "A" indicates that data for a candidate for a match from the database 110 reference data and the data in the data 140 are considered the same (for example, Jon and John); "B" means that there is some similarity between data 140 and the entry from the database reference data 110 (e.g., Jon and Jhonny); "F" means that the data for the candidate match in the database 110 reference data and the data in the data 140 are considered to be unequal (e.g., Jon and Jim); "Z" means the absence of a value for a data field in the data in the database 140 or 110 reference data for a specific data field.

[0061] the feedback on the profile data of the coincidence indicates the type of data in the database 110 of reference data used by process 130 to determine the matching records from the database 110 reference data with the data 140, and it can be defined using the structure of the encoding, for example, "00" means the principal name or address of the enterprise, "03" means alternative values,such as the chief Executive officer (CEO) or previous name or address, "98" specifies the attribute data 140, which was not used by process 130, or "99" indicates that the attribute is not included in the data 140.

[0062] Thus, method 100 includes 1) receiving the request to initiate the search of a particular person, 2) processing the request for the maximum use of each data field of inquiry, individually and jointly with other fields of inquiry, including the processes for the purification, analysis and standardization of the request, 3) determination of the optimal ways to search on the basis of reference data on the basis of single or multiple purified, analyzed and standardized values of the query, 4) extraction of the candidates for selection of objects in reference data that give a matching request, and 5) return the best candidate and provide feedback, including the results of the establishment of coincidence with actionable attributes.

[0063] the Method 100 includes the steps of 1) receiving input data comprising a plurality of elements, 2) conversion of subsets of the set of elements in the set of terms, 3) the evaluation of predstatelnoj ability to identify candidates for matching with the flexible use of pointers based on the request of the end user, including data that are assumed to be part of a customer's�and, and alternative data that can be provided by the end user, 4) retrieve a stored reference data on the basis of terms to identify the most likely candidates for matching with the input data, 5) select the best match from the set of candidates for matching on the basis of the evaluation predstatelnoj and 6) ensure results establish coincidence with actionable attributes, certain unique aspects of each initiation request and receive candidates that allow the end user to make business decisions regarding the use of the candidate match.

[0064] the Method 100 includes the functionality to identify an individual using finite but temporally infinite set of pointers that can be used to form an estimate of similarity between the query and the candidates for matching. Method 100 solves certain problems associated with the unique identification, including 1) the prevalence of personal names belonging to different individuals, that is less true for businesses, 2) the specific name without any additional pointers that may be associated with personality, as well as with the company or with more than one person or company�eat and 3) individuals who are often associated with multiple addresses and physical locations or other identifiers. Expanding its flexible and variable set of identity attributes and similarities in personality, it is possible to solve these issues. The flexibility of the method includes the values of both the metadata and the actual data and will be used when 1) the content of the database information associated with individuals, and 2) selecting the identity from the database based on the query and rules that determine the approval threshold for these purposes.

[0065] According to the method 100, the first set of pointers is defined so that X1, X2, ..., Xnrepresent the attributes to be used to establish matches (e.g. name, middle name, last name, address, and other descriptive information). This set of pointers is extensible without restrictions on size, and all background data will be used in the process of establishment of coincidence, choice, and evaluation. Reference data are constructed in such a way as to contain as large a data set to include all of the expected values of X and additional evaluation or data derived based on the predictive equations and algorithms.

[0066] At each iteration of establishing coincidence queried set of queries defined for�I S, subsets of a set X. based On the set of correlation coefficients, defined in a broader set of X at the time of coincidence or for another predetermined predictive interval, is the establishment of a match, and the feedback is returned in the form of (1) the confidence interval that describes the degree of coincidence in the case of the modification of the correlation coefficients in the X and observable subset of S for a set of queries, (2) string class matches that indicate related elements of S and the quality of coincidence in these specific items, and (3) the line profile of coincidence indicating what reference data was used for formation evaluation concerning the quality of matches, i.e. assessment concerning the level of confidence that the best candidate is an appropriate match to the query. Class string matches the profile matches can be flexible on length and format defined data components that are used in the process of establishing a match.

[0067] In Fig.2 shows a block diagram of a system 200 for applying the present invention. The system 200 includes a computer 205 is connected to the data network, i.e. the network 220, such as the Internet.

[0068] the Computer 205 includes a user interface 210, the process�PR 215, and memory 225. Although the computer 205 is presented here as a standalone device, this is not a limitation, on the contrary, it can connect to other devices (not shown) in the distributed processing system.

[0069] the User interface 210 includes an input device, such as a keyboard or the speech engine that allows a user to transmit information and elections commands to the processor 215. The user interface 210 also includes an output device, such as a display or printer. The device is cursor control, such as a mouse, a trackball, a joystick or the material sensitive to touch, located on the display, allows the user to influence the cursor on the display to convey additional information and elections commands to the processor 215.

[0070] the Processor 215 is an electronic device that is configured based on the logic circuitry which responds to the instructions and executes them.

[0071] the Memory 225 is a non-volatile machine-readable medium on which is recorded a computer program. In this regard, the memory 225 stores data and instructions that are read and executed by the processor 215 to control the operation of the processor 215. The memory 225 can be implemented as random access memory (RAM), hard disk drive, permanent memory (PZ�), or combinations thereof. One of the components of the memory 225 is a software module 230.

[0072] the Software module 230 contains instructions directing the processor 215 to perform the methods described herein. For example, under control of program module 230, the processor 215 performs the processes of the method 100. The term "module" is used herein to denote a functional operation that can be implemented either as a standalone component or as an integrated configuration of a plurality of subordinate components. Thus, the software module 230 can be implemented as a single module or as multiple modules that operate in conjunction with each other. In addition, although the program module 230 described herein as mounted in the memory 225 and, thus, implemented in the form of software, can be implemented as hardware (e.g., electronic circuitry), firmware, software or a combination thereof.

[0073] the Processor 215 receives the request 103, via the network 220, or via the user interface 210 and is drawn to regulation 104 of the processing, the attribute table 105 and base 110 reference data. Rule 104 of the processing, the attribute table 105 and base 110 reference data can be computer components 205, for example, be stored in memory 225 or may be located on devices that are external to it�mputer 205, the computer 205 accesses via the network 220. The processor 215 160 outputs the result to the user interface 210 or to a remote device (not shown) via the network 220.

[0074] Although stated that the program module 230 is already loaded into memory 225, it can be configured on the media 235 data for subsequent loading into memory 225. Media 235 data is non-volatile machine-readable medium on which is recorded a computer program, and may constitute any conventional medium for storing the program module 230 in tangible form. Examples of the carrier 235 data include a flexible disk, a CD-ROM, magnetic tape, non-volatile memory, optical storage media, flash media, connected to a universal serial bus (USB), digital versatile disk or a zip disk. The storage medium 235 may also be memory or electronic storage device of another type installed in the system for remote storage and connected to the computer 205 via the network 220.

[0075] the methods Described herein are exemplary and are not intended to impose no specific limitation on the present disclosure. It should be understood that specialists in this field of technology can offer various alternatives, combinations and modifications. For example, stages, products�exercises with the processes described herein, can be performed in any order, unless expressly stated otherwise and is not determined by the stages. The present disclosure is intended to embrace all such alternatives, modifications and variations within the scope of the following claims.

[0076] the Terms "contain" or "containing" should be interpreted as indicating the existence of the mentioned features, integers, steps or components, but not precluding the presence of one or more other features, integers, steps or components or groups thereof.

1. Method of searching a database containing phases in which
accept request to initiate a search for data concerning a particular individual,
determine on the basis referred to request a search strategy in the database of reference data,
in accordance with the strategy produced in said database of reference data that matches with the query, and
deduce mentioned coincidence
over and above the definition of the strategy contains the stages on which:
highlight of the request attribute that is relevant to the search
designate the weight of the said attribute, thus providing a weighted attribute,
over and above the weight indicates the usefulness of the named attribute when a match with said request, and
value�the function based on the aforementioned weighted attribute, and
wherein said searching includes the steps in which: extracted from the background database of candidates, attribute values, which indicate a probable coincidence with the said request on the basis of this function,
determine the best candidate from the candidates mentioned and return the best candidate as mentioned matches,
over and above the request includes a request value for the named attribute, and
wherein said establishing includes the stage at which: change mentioned weight depending on the number of records in said database of reference data that have mentioned the query value for a named attribute.

2. A method according to claim 1, additionally containing a stage, on which output the pointer of the level of confidence that the mentioned coincidence is an adequate match with the query.

3. A method according to claim 2, wherein the said pointer indicates which reference data was used to form an estimate concerning the aforementioned confidence level.

4. The system to search in the database containing the processor and
a memory that contains instructions, which, being a matter referred to by the processor, instruct the aforementioned processor:
accept request to initiate a data search, Casa�students of a particular person,
to determine on the basis referred to request a search strategy in the database of reference data,
in accordance with the strategy to produce in said base reference data that matches with the query, and
show mentioned a coincidence,
moreover, to determine the strategy mentioned mentioned instructions prescribe the processor:
highlight of the request attribute that is relevant to the search
to assign a weight to the said attribute, thus providing a weighted attribute,
over and above the weight indicates the usefulness of the named attribute when a match with said request, and
to install the function on the basis of the mentioned weighted attribute, and
thus for conducting a search in said database of reference data mentioned mentioned instructions prescribe the processor:
to extract from the background database of candidates, attribute values, which indicate a probable coincidence with the said request on the basis of this function,
to determine the best candidate from the candidates and
return said best candidate as mentioned matches,
over and above the request includes a request value for the named attribute, and
in this case, for the mouth of�of olenia mentioned functions mentioned mentioned instructions prescribe the processor
to change referred to the weight depending on the number of records in said database of reference data that have mentioned the query value for a named attribute.

5. A system according to claim 4 in which the instructions mentioned also require the processor to output the pointer of the level of confidence that the mentioned coincidence is an adequate match with the query.

6. A system according to claim 5, in which the pointer indicates what reference data was used to form an estimate concerning the aforementioned confidence level.

7. A data carrier containing instructions, which, being a matter referred to by the processor, instruct the aforementioned processor:
accept request to initiate a search for data concerning a particular individual,
to determine on the basis referred to request a search strategy in the database of reference data,
in accordance with the strategy to produce in said base reference data that matches with the query, and
show mentioned a coincidence,
moreover, to determine the optimal strategy referred to these instructions prescribe the aforementioned processor
highlight of the request attribute that is relevant to the search, and
to assign a weight to the aforementioned attribute, so�way providing a weighted attribute,
over and above the weight indicates the usefulness of the named attribute when a match with said request, and
to install the function on the basis of the mentioned weighted attribute, and
thus for conducting a search in said database of reference data mentioned mentioned instructions prescribe the processor:
to extract from the background database of candidates, attribute values, which indicate a probable coincidence with the said request on the basis of this function,
to determine the best candidate from the candidates and
return said best candidate as mentioned matches,
over and above the request includes a request value for the named attribute, and
thus for the establishment of the mentioned functions mentioned mentioned instructions prescribe the processor:
to change referred to the weight depending on the number of records in said database of reference data that have mentioned the query value for a named attribute.

8. A data carrier according to claim 7, in which the instructions mentioned also require the processor to output the pointer of the level of confidence that the mentioned coincidence is an adequate match with the query.

9. A data carrier according to claim 8, in which the mentioned �ratio indicates what reference data was used to form an estimate concerning the above-mentioned level of confidence.



 

Same patents:

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering and specifically to intelligent automated assistant systems. Disclosed is method of operating an intelligent automated assistant. The method is carried out in an electronic device having a processor and memory which stores instructions for execution by the processor. The processor executes instructions on which a user request is received, wherein the user request includes a speech input received from the user. A prompt is provided to the user, the prompt presenting two or more properties relevant to items of an object selection domain. The user is requested to specify relative importance between the two or more properties.

EFFECT: high accuracy of providing a user with relevant information owing to consideration of relative importance between properties which correspond to items of an object domain.

12 cl, 50 dwg, 5 tbl

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering and specifically to intelligent automated assistant systems. Disclosed is method of operating an intelligent automated assistant. The method is carried out in an electronic device having a processor and memory which stores instructions for execution by the processor. The processor executes instructions on which a user request is received, wherein the user request includes a speech input received from the user. Two or more alternative interpretations of user intent are obtained based on the received user request and one or more similarities and one or more differences between said alternatives are identified. Further, the user is presented with a response, said response being at least one of the identified differences.

EFFECT: high accuracy of presenting relevant interpretations of user intent in the correct context.

13 cl, 50 dwg, 5 tbl

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering. Proposed method converts all info-important cells of standard down-loads from data bases from data base with indication of their position in every down-load. Definite conditions are set to indicate interrelations between cells in one line of down-load. Converted standard down-loads and named conditions are memorised in definite memory. Revealed are cell of standard down-loads in electronic file of analysed document. Found cells matrix is compiled to apply preset named conditions to matrix of found cells. Compiled the list of conditions whereto corresponds the matrix of found cells. Decision is made on if the portion of standard down-load exists in analysed document which satisfied the preset named conditions.

EFFECT: protection of data stored in protected data base from leaks.

2 cl, 2 dwg

FIELD: information technologies.

SUBSTANCE: in the method of automatic classification of formalised documents in an electronic document circulation system they identify and analyse characteristics of identical text sections (details) in a formalised document, and identified details are analysed. The informative part of the document is converted into text in natural language, document words are transformed into basic wordforms, insignificant words are deleted, word weights are counted in accordance with frequency of their occurrence, forming predicates of text criteria identification. According to the proposed set of manually classified texts they generate a system of predicates of text criteria identification, which is saved in a data base. Values of significant wordform weights are added into the system of predicates. If it is necessary to use a priori information on dependences of information areas between each other, algebra of end predicates is used, which makes it possible to perform operations over logical expressions, with the help of which information areas are described.

EFFECT: reduced time of system operation through making it possible to classify documents by form and identified metadata and to perform analysis only in the informative part of the document.

1 dwg

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to merging information sources relating to individuals and business entities with which the individuals are or were affiliated. The method includes: receiving a first record which contains personal data of an individual, a name of a business, and a role of the individual in the business; matching the first record to data which provide an unique business identifier for said business; matching the first record to data which provide an unique individual identifier for the individual; appending to the first record an unique business identifier, an unique individual identifier, and an unique role identifier for the role of the individual in the business; matching the first record to a second record based on the unique business identifier, the unique individual identifier, and the unique role identifier; and merging the first and second records into a resultant record.

EFFECT: constructing an accurate professional profile of an individual.

12 cl, 4 dwg, 1 tbl

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering and specifically to search systems on the Internet. Disclosed is a computer-implemented method of providing live content. The method comprises steps of receiving a partial query term from a user; generating, based on the partial query term, a suggested query term which includes the partial query term; in response to generating the suggested query term, initiating search of live content of a third-party content provider to obtain substantially live content which relates to the suggested query term. Obtaining live content includes a search engine searching for live content after generating the suggested query term.

EFFECT: minimising computational costs by generating a suggested query term in real time based on live content.

20 cl, 12 dwg

FIELD: information technologies.

SUBSTANCE: in the method of formation of the relational description of command syntax on the basis of the metadescription of command syntax 110 metadescription of command syntax is identified. 120 elements of the metadescription are identified and each element is assigned by a unique identifier (ID), and ID is assigned in the order of arrangement of elements in the metadescription. 130 table containing all elements is formed, and each element is contained in one column of the table in different lines of the table. 140 opening structural elements and the closing structural elements among the elements contained in the table are identified and bidirectional communications between the corresponding opening and closing structural elements are generated. 150 unidirectional hierarchical communications between the opening elements and the respective opening element being at the previous level of encapsulation are generated, and generation of the named communications is performed for each opening element located on any of levels except for the first level.

EFFECT: providing of automatic formation of the relational description of command syntax on the basis of metadescription of command syntax.

17 cl, 15 dwg

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering. A system for storing a report variant comprises a report database configured to store and provide reports; computer user input means configured to create and edit a report, wherein a report variant is not stored in the report database; a request database configured to store and provide requests, wherein the requests are suitable for search in the report database; a hardware search device configured to retrieve one or more requests from the request database; retrieve a report variant from the user input means; execute one or more requests based on the report variant to determine relevance of the report variant, wherein relevance characterises whether a report variant will be retrieved from the report database when executing one or more requests; compare relevance with a predefined threshold for entering into the report database; add the report variant to the report database if relevance exceeds said threshold; and storing the report variant in the report database if relevance exceeds a predetermined value.

EFFECT: fewer substandard reports in a database.

15 cl, 3 dwg

FIELD: physics.

SUBSTANCE: method of functioning of a smart automated assistant is offered. The method is performed in the electronic device containing the processor and memory where the instructions executed by the processor are saved. The processor executes instructions by which the user request is accepted, which includes the speech input accepted from the user. The information on the sender name is taken from the transfer accepted in the electronic device before reception of speech input. Meanwhile this transfer is accepted from the sender that is isolated from the mentioned user. The intention of the user is revealed on the basis of the mentioned text line and a sender name.

EFFECT: improvement of accuracy of representation of relevant information to the user due to identification of intention of the user on the basis of the text line and a sender isolated from a user.

15 cl, 50 dwg, 5 tbl

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to database management and specifically to database applications for performing certain functions on databases. The technical result is achieved due to a database server application program which is provided such that it is configured to provide a programmable interface into a database application through uniform resource locators (URL) of database services. A database services URL used by the database application can be updated programmatically by program code executing within or under control of the database server application program. A macro action for use in conjunction with a database server application that provides functionality for displaying a database object, such as a form or report, locally in a Web browser is also described.

EFFECT: enabling users without a copy of the client database application to gain access and use the database application through a Web browser and a local or wide area network.

19 cl, 8 dwg

FIELD: data access technologies.

SUBSTANCE: method includes assignment of simplified network address, recording URL and converting numbers into storage system with net access, inputting assigned number into computer, transferring inputted number to storage system, converting number to URL, receiving page matching URL, and displaying it. Method for use in operation systems for message transfer include intercepting system level messages to certain objects and forming pseudonym messages during that. Systems realize said methods.

EFFECT: broader functional capabilities.

12 cl, 30 dwg

FIELD: computers.

SUBSTANCE: system has entries memory block, words memory block, control block, substitutions block, n blocks for searching and replacing.

EFFECT: broader functional capabilities.

17 dwg

FIELD: computers.

SUBSTANCE: system has nine registers, four address selectors, triggers, AND elements, OR elements and delay elements.

EFFECT: higher speed.

8 dwg

FIELD: computers.

SUBSTANCE: system has operation mode setting block, first and second blocks for selecting records addresses, block for forming addresses for reading records, data output block, first and second record codes comparison blocks, records quality comparison block, year intervals comparison block, records selection control block, register, adder and OR elements.

EFFECT: higher speed of operation.

10 dwg

FIELD: computers.

SUBSTANCE: system has memory for programs, including browser, display block, database for storing documents, addressing control block, while each document of base has at least one link with indicator of its unique number and indicator with address of program for control stored in addressing control block, system contains also, connected by data buses and control of other blocks of system, memory for links of couples of unique numbers of links and forming means for lists of unique numbers of documents links, which are interconnected.

EFFECT: higher efficiency.

2 cl, 1 dwg

FIELD: telecommunication networks.

SUBSTANCE: messages, sent by cell phones, are formed by means of printed and public-distributed classifier, wherein at least one category is made with possible detection of at least one identifier of individual mark of object, identifier is sent by sender via at least one message to computer server with software, which transfers such message into database record at server for its transfer to at least one receiver, or searches for such record in database at server in accordance to received message and transfers to sender of such message at least one found database record.

EFFECT: broader functional capabilities.

2 dwg

FIELD: web technologies.

SUBSTANCE: method for integration of printed business documents, requiring original signature, with electronic data concerning these documents and later extraction of data, inputted for forming documents, is characterized by steps for forcing end user or agent to input all necessary data for forming of required document, saving collected data in database, linking saved data to unique ID code and printing unique ID code on printed document during printing. Printed documents is signed by end user and sent together with supporting documentation. When document is received by business-client, business-client inputs ID code, which is then used for access to saved data, and updates private database of business-client with all data, used for creation of original documents.

EFFECT: higher efficiency.

2 cl, 7 dwg

FIELD: computer science.

SUBSTANCE: device has string memory block, comparator, memory block for words and substitutes, block for analysis and forming of displacement results, block for storing string address, control block.

EFFECT: broader functional capabilities, higher reliability.

10 dwg

FIELD: data bases.

SUBSTANCE: method includes presenting operations at all levels of company in form typical product life cycle tree, wherein existing objective functional-technological connections of each manufacture stage are decomposed, and forming information system in form of pertinent-relevant complex information system and search, for which typical structure-information modules of information system are formed, system objective information requirements of data consumers, being a result of decompositions by levels of operations and problems, are determined as precisely as possible, data base of found documents in form of files is formed of key nodes with set of elementary data block for each system information requirement and files of information system modules, starting from lower levels of current stage and then upwards, while each data block has a list of pertinent documents ordered by determined information requirements.

EFFECT: higher search efficiency.

13 cl, 11 dwg

FIELD: computer science.

SUBSTANCE: system has first, second, third, fourth and fifth registers, first and second memory blocks, first, second and third decoders, triggers, elements AND, OR and delay elements.

EFFECT: higher speed of operation.

1 dwg

Up!