RussianPatents.com
|
Method and device for system used for forecasting of group trade. RU patent 2510891. |
||||||||||||||||||||||
IPC classes for russian patent Method and device for system used for forecasting of group trade. RU patent 2510891. (RU 2510891):
|
FIELD: physics, computation hardware. SUBSTANCE: invention relates to detection of patterns in pay card transaction data to define the seller affiliation group in said data. Proposed method comprises memorising of transaction data in data base, transaction data sampling by first computer connected with data base, use of at least one algorithm of forecasting and selected transaction data for forecasting multiple group seller affiliation in group of sellers. Note here that algorithm is executed by first computer. It comprises generation of metadata describing every forecast outputted by at least one said algorithm, input of multiple forecast group affiliations for seller and metadata describing every forecast into data analysis program to be executed at second computer. Second computer is used to assign confidence factor to every forecast group affiliation with the help of data analysis program. Said group affiliation is based on at least partially forecast group affiliation and metadata. Note here that confidence factor represents a probability of actual association of the seller with appropriate forecast group affiliation. Second computer is used to forecast the group affiliation with the highest confidence factor as the final forecast of seller affiliation. EFFECT: higher accuracy of forecasting of the seller affiliation with different groups of sellers. 20 cl, 10 dwg
The level of technology The present invention refers primarily to forecasting trade, and more specifically to methods and systems of forecasting group trade on payment transactions carried out through the network of servicing of Bank cards on behalf of the holders account. Historically, the use of "payment" cards for transactions of consumer payments were the most prevalent and is based on the relationship between outstanding loans of local banks and various local vendors. Industry of payment cards since then has evolved with banks, forming of the Corporation (for example MasterCard), and includes third-party processing of transactions ("Merchant Acquirers"), to allow holders of credit cards are widely use a payment card in any trade institutions, independently from banking relationships of the seller to the Issuer the card. For example, in figure 1 of this application is shown presents as an example, the coupled system of industry payment card transactions payment card. As shown, the seller and the card Issuer does not have to be directly related. Yet, today there are various scenarios in the industry of payment card where a card Issuer is a special or specialized communication with a specific seller or group of sellers. More than 25 million merchant locations accept. Sometimes sellers affiliated with more recognizable chain, brand, or other legal entity. In one example, a person who has received from the company the right to self-representation of a large international company of fast food can be identified for the Issuer of the card transaction as "Chris's Restaurants, LLC, and therefore there is no correlation with the company providing the franchise. Consider ways of improving implementation options in the industry make purchases with the card. In particular, consider the use of historical transaction data to predict future financial transaction cards and determine whether there is a correlation, which should be made from these data. More specifically, the location data of the seller that is collected by companies, often set to the high-level group, based on legal possession, brand or some other definition. Often these relationships are not clearly defined or are not publicly available. The establishment of this relationship first included manual control data transaction to find a field or set of fields that can be used to qualify the location as membership of the respective group. Short description of the invention One object is a computer method for detecting pattern in data transactions payment card to determine group membership in these transactions, where the data relate to the merchants that accept payment cards to pay. The method includes receiving data transaction of at least one database, forecasting facilities of the seller to the group, using at least one algorithm of forecasting and selected data transactions, the algorithm that generates the metadata describing the forecasts, enter at least one predicted group membership and metadata in the application of data analysis and appropriation estimates of trust everyone predicted group membership, received by the application when using the predicted group membership and metadata. Another object is a computer system to detect a pattern in these transactions payment card to determine group membership private sellers using these transactions. A computer system is programmed to perform a variety of forecasting algorithms with data transactions each algorithm forecast predicts a group membership of the seller, based on the data of the transaction, assigns a rating of trust to each predicted group membership and displays the prediction of group membership with the highest rating of trust as a final forecast facilities of the seller. Brief description of drawings Figure 1 is a simplified diagram illustrating the approximate multilateral system industry payment card transactions payment card. Figure 2 is a simplified block diagram approximate variant of the implementation of the architecture of the server system in accordance with one of the options for the implementation of the presented invention. Figure 3 - extended block diagram approximate variant of the implementation of the architecture of the server system in accordance with one of the options for the implementation of the presented invention. Figure 4 - the precedence diagram illustrating the high-level components of the combined population of the trading system of the forecast. Figure 5 - the precedence diagram illustrating the work of the mechanism leading the count associated with the combined population of the trading system of the forecast. 6 diagram 250 sequence illustrating the data are entered in the algorithm that classifies the location of trade points. 7 diagram of the sequence of operations that describes an algorithm that classifies the location of trade points. Figa-8B diagram illustrating how trade points are collected and placed in the quality of the documents in the classification system. Figure 9 - the precedence diagram illustrating the definition of a set of reference character strings, or major components in the database. Figure 10 - the precedence diagram illustrating the use of the reference lines to determine the metric of similarity to a character string candidate. Detailed description of the invention This document describes and use systems and methods of forecasting group for trade detection of significant trade patterns (for example, location information outlets)that reveal the high level of orderliness, such as brand, chain, legal possession, or similar to an existing one, to some extent, randomly selected, collection locations trading points. Forecast system group that is used here, refers to the set of forecasting systems, individual forecasts are combined together to form a single prediction. Typically, when group membership is explicitly specified, the ratio should be displayed via manual control of the location data. Described forecast system trade group uses an algorithmic approach to solve this issue for at least part of the space, which includes location recording. Technical effect of the systems and methods described in this document, includes at least one of: (a) definition of templates related to trading firms, such as location data, (b) ensuring the United forecast from multiple forecasts that are associated with the data of the location of the seller and (C) the definition of the levels of trust for each United forecast that uses multiple forecasts and any metadata associated with the forecasts. In one embodiment, offers a computer program which is embodied in a machine-readable carrier and uses Structured query language (SQL) with the client user an interface for the administration and weba standard interface for user input, and reports. In the exemplary embodiment, the system is implemented as a web application and works in existing enterprises intranet. In another embodiment, the system is completely get access users who have authorized access, outside the corporate firewall via the Internet. In additional exemplary embodiment, the system runs under Windows (Windows is a registered trademark of Microsoft Corporation, Redmond, Washington). The application is flexible and designed to work in various environments without compromising the core functionality. Systems and processes are not limited variants identified by the implementation described here. In addition, the components of each system and each process can be carried out independently and separately from other components and processes described here. Each component and process can also be used in combination with other sets of elements and processes. As prior art figure 1 is a simplified diagram 20, illustrate the approximate multilateral system payment card industry for regular payment transactions map in which the transactions are at least partially with the forecasting system of the seller, admitted to the group. Presented in this document the total sold refers to the high level group the location of the seller. More specifically, different locations individual sellers of retail trade are grouped together (e.g connected to each other in the database) to generate aggregate of the seller. One location seller is a component of the aggregate of the seller. As a rule, the total sold is used when referring to the chain of stores, and location grouped together, as described below, based on many field values stored in the database transactions. This invention relates to the systems of payment cards such as credit card payments using MasterCard for information exchange. MasterCard for information exchange is a private communication standard, proclaimed MasterCard International Incorporated® data exchange financial transactions between financial institutions that are members of MasterCard International Incorporated®, (MasterCard is a registered trademark of MasterCard International Incorporated, based in Purchase, New York). In the ordinary card payment system, financial institution, referred to as the "Issuer", produces a payment card, such as credit card for consumers who use this map to provide as a means of payment for the purchase to the seller. For reception of payment by a payment card is usually the seller must create an account with a financial institution that is part of a financial payment system. This financial institution is usually called "investment Bank" or "Bank receipt or the recipient Bank". When the consumer 22 shall pay for the purchase by a payment card (also known as card financial transactions), the seller 24 requests authorization for investment Bank 26 of the purchase amount. The request can be made by telephone, but is usually performed using a terminal sales, which reads the information about the account of the customer with the magnetic stripe on the card and connects via the Internet to computers transaction processing investment Bank. Alternatively, the investment Bank may allow a third party to process transactions on its behalf. In this case, the cash terminal is configured to contact the third party. Such third party is usually called "trading processor" or "processor." Using currency 28 information, computers investment Bank or merchant processor to communicate with the computers of the Bank on 30 card Issuer to determine whether a user account in a positive position and covered I purchase the available credit limit of a customer. Based on these definitions, the request for authorization will be rejected or accepted. If the request is received, the authorization code is sent to the seller. When the authorization request is accepted, the available credit limit, the account 32 consumer decreases. Typically, the request funds not immediately is sent to the account user, because of the Association of Bank cards, such as MasterCard International Incorporated®, proclaimed rules that do not allow the seller to request funds or "get" the transaction until goods or services are not available. When the seller delivers or provides goods or services, the seller receives the transaction, for example, through appropriate procedures for data input on the terminal sales. If the customer cancels a transaction before it is received, generated blank operation". If the customer returns the goods after the transaction has been received, it generates a "credit". After the transaction, the transaction will be settled between the buyer, investment Bank and the Issuer. Resolution refers to the transfer of financial data or funds related to the transaction between the account of the seller, the investment Bank, and the Issuer map. Typically, transactions take place and accumulate in the "package"that will be settled as a group. Data that are associated with such transactions, as described below, are used in the technique of forecasting of the future actions of the buyer. Maps financial transaction or payment card can apply to credit cards, debit cards, and cards with prepaid. All these maps can be used as a method of payment for the transaction. As described here, the terms "financial transaction cards" or "payment card" include cards such as credit cards, debit cards and prepaid, but also include any other devices that may include payment account information, such as mobile phones, personal digital assistants (PDAs), and key chains. Figure 2 is a simplified block diagram of the approximate system 100 in accordance with one of the options for the implementation of the presented invention. In one embodiment, the system is 100 system of a payment card) used to implement, for example, configure a connection between the Issuer and the seller, and at the same time processing of historical data associated with the transaction. In another embodiment, the system is 100 system of a payment card that can be used by owners of Bank accounts in order to enter codes treatment that will be applied to the acts of payment. As described below, the database of 120 stores data transactions generated as part of activities involving the sale, held by the banking network includes data relating to the sellers Bank account holders or customers, and purchases. The database is 120 additionally includes data regarding the programs, awards and special offers, including processing codes and business rules associated with different programs, awards and special offers. Figure 3 - extended block diagram approximate variant of the implementation of the architecture of the server system 122 in accordance with one of the options for the implementation of the presented invention. System components 122, identical components of the system 100 (shown in figure 2), shown in figure 3, using the same reference numbers which were used in figure 2. The system includes 122 system server 112 and client system 114. The server system 112 includes in addition to the database server, 116, application server 124, the web server 126, the Fax server 128, the directory server 130 and mail server 132. Disk drive 134 communicates with the database server, 116 and the directory server 130. Servers 116, 124, 126, 128, 130 and 132 communicate over a local area network (LAN) 136. In addition, the administrative workstation 138, user workstation 140 and workstation supervisor 142 join LAN 136. Alternative, workstations, 138, 140 and 142 contact LAN 136, using your Internet connection or join via the Intranet. Each workstation 138, 140 and 142 is a personal computer with a web browser. Although the functions carried out on the workstations, usually, as illustrated running on their respective workstations 138, 140 and 142, such functions can be performed in one of the many personal computers connected to the LAN 136. Workstations 138, 140 and 142 are illustrated as being connected with certain functions, only to facilitate the understanding of the various types of functions that can be performed by users with access to the LAN 136. The server system 112 is configured to be communication related to various users, including employees 144, and with third parties, for example, holders of Bank accounts, clients, auditors, etc. 146, uses your Internet connection ISPS 148. Shows the approximate embodiment, as illustrated performed with the use of the Internet, however, in other variants of implementation can be used by any type of transmission other than a wide area network (WAN), that is, systems and processes are not limited to the implementation that uses the Internet. In addition, instead of WAN 150 can be used local area network 136. In the exemplary embodiment, any authorized user with a workstation 154, can gain access to the system 122. At least one of client systems includes a workstation of the Manager 156, located remotely. Workstations 154 and 156 are personal computers with a web browser. In addition, workstations, 154 and 156 are configured to connect to the server system 112. In addition, the Fax server 128 associated with remotely located client systems, including client system 156 using a phone line. The Fax server 128 also configured to connect to other client systems 138, 140 and 142. Figure 4 diagram 200 sequence operations, illustrating the high-level functional components for one of the options for the implementation of the system of forecasting grouped, or the aggregate of the seller, where each component provides a forecast concerning the operations of the payment card transaction over the network. After forecasts are combined in a single forecast as described next. This Association forecasts sometimes referred to as cumulative forecast. One example related to the version of the implementation described here includes the United forecasts that relate to the received data the location of the seller. As shown in figure 4, all of forecast algorithms are more fully described in this document. The first component is the algorithm 202 forecast closest locations (sometimes called algorithm forecast k-similar locations)that is configured to fetch "k" location of the seller, the closest to the location of the seller. The algorithm 202 forecast additionally contains a function classification group of close location of the seller as a group modes of selected "k" closest locations. United location as the algorithm Forecast documents 204 is used to calculate the relevance of each of the fields and field values for each of the United locations (high-level data grouping) in the space of known values, the results are saved as a document. The most relevant values of these documents are used to generate the forecasts. Third-party algorithm Forecast data 206, including the comparison of the location used if the forecast is associated with certain third-party brand. At least one input of the algorithm 206 includes a record of the transaction, received from a third party that are used in the formation of the forecast. In one embodiment, the formation of the forecast is executed after the comparison of the location data of third-party data source. The algorithm 208 Prediction numerical signature, an implementation option which is based largely on the Benford's Law and further on observed trends of the sellers belonging to the same group to differ from distribution Benford relatively consistent way, included in the scheme 200 sequence of operations. The Outlook on the algorithm 208 turns into a group of locations that have the closest numerical distribution compared to each location of the seller. Statistical model of the upper level and a mechanism 210 count in one embodiment, implemented in Oracle, use forecasts algorithms 202, 204, 206 and 208 to determine group membership from data that has been recently received and/or stored in a database. The example of data - data about the location of the seller. At least in one embodiment, and as described below, data on location of the seller in the database are described from the point of view of the location and distance, for example multiple locations of the seller, which are at this distance from the given location. At least in one object, the location and distance are not necessarily geographically, but rather based on proximity, calculated using data of the seller stored in the database. In certain embodiments, the implementation of a location and the distance is based on the closeness of as measures of the intersections of attributes weights, relations frequency of occurrence of the term to reverse the document frequency (TF/IDF), calculating values of fields and values marked fields in the database. Figure 5 - circuit 220 sequence illustrating the work of the mechanism 210 counting. Mechanism 210 counting using 222 forecasts of the location of the seller by algorithms 202, 204, 206 and 208 along with metadata related to the forecasting of applications 224 Intellectual analysis of the Oracle database (ODM) for a description of the circumstances surrounding each individual forecast, then produces 226 final forecast is compiled from United individual forecasts. This final forecast may refer to the location of the seller. The application also makes calculation of the coefficient of trust associated with the consolidated forecasts for many algorithms 202, 204, 206 and 208. Each of these four algorithms 202, 204, 206 and 208 will now be described in more detail. To close location (algorithm 202) 6 diagram 250 sequence illustrating the data are entered in the algorithm 202, classifying the location of the seller, based on proximity, e.g. the closeness of the location. A set of fields-level locations or coordinate 252 locations, which are known to be important in the context of a chain of receipt or collection (for example, group), values membership is identified in the database of institutions 254 that accept the card financial transactions. Additionally, data daily new/modified database 256 locations along with their associated new/modified coordinate locations 258, provide the following classification algorithm location of the seller. 7 - scheme 280 sequence of operations that describes one of the algorithms (figure 4 shows the algorithm 202), which is used to classify the location of the seller within the group. The algorithm 202 uses at least the data described in relation to the scheme 250 sequence figure 6. Specifically, the location data of the seller in the database are searched 282 to find several (k) the locations that are within a certain distance from the given location. Additionally, for the value proximity search locations at this distance, to determine 284 any new and/or changed the location. The value of fashion is defined 286 classification locations sellers, which is among (k) locations within a specific feature space (the area from which these transactions are entered in the algorithm 202). Most often obtained a value that follows from classification (k) records of the location, has the highest weighting factor is called the value of fashion defined as described below. This value fashion returned 288 as forecast by algorithm 202. The matrix is created that contains the inverse document frequency of all field values and marked the values of fields and in one embodiment, has nine dimensions. In a particular embodiment, the nine dimensions include the category ID of the seller, individual code of Europay member Association (ICA), business region, name of the seller, the phone number of the seller, acquiring identification number of the seller, the ID of the level of the seller, the legal name of the seller, and Federal tax identification number. These measurements are included in all the records of the location of the seller. Inverse document frequency - logarithm (in one particular implementation base 2) the number of private records divided by the number of entries that contain a certain value. One of the examples shown in Table 1. In one embodiment, this quotient is calculated separately for each of these nine dimensions. The number of entries is calculated as the number of the location of the seller. The number of entries that contain a specific term, calculated by counting the number of locations of the seller, which contain every word in each field type. Table 1Field type The value of the field Inverse document frequency Phone number 2014234177 12.788106546 Phone number 8002285882 6.0265553135 The token name of seller DCC5.0067468324 The token name of seller DFQ8.9807516239 Business region 011.4041323134 For each location cross-attribute normalized relations frequency of occurrence of the term to double inverse document frequency is calculated for the values and bulleted values, covering nine dimensions as illustrated in Table 2, where these nine measurements again include the category ID of the seller, the code ICA region business transactions), the name of the seller, the phone number of the seller, the acquiring identification number of the seller, the ID of the level of the seller, the legal name of the seller and the Federal tax identification number. Table 2Location Field type The value of the field Balanced attitude frequency of occurrence of the term to reverse the document frequency 100Room phone 2014234177 .2453254 100The token name of seller BE .125859 100The token name of seller ST.1125445 100 ID525414152 .2155224 Federal tax 100Business region 01.0252546 The prediction of group membership and trust for the current location are calculated by joining the predicted location to all the other locations on the type of the field and the field value with subsequent summation of the weighted ratio of the frequency of occurrence of the term to reverse the document frequency for common types of fields and field values. The results of the location are then sorted in descending order of total factor, and group mod arising among, for example, thirteen locations with the highest interest rate, issued at the quality of the forecast. The coefficient of trust this forecast is the number of location among the best thirteen locations that are contained in the same group (expected value), private weighting coefficients for k locations that belong predicted the group and changes in the weights. United location as a Forecast of documents (algorithm 204) On figa-8B presents scheme 300 illustrating the location, combined in sets in the documents as a classification system. The algorithm 204 (shown in figure 4), which creates documents of the United locations, similar to the relevancy algorithms documents that are frequently used by search engines on the Internet. Specifically, the relevance of the location of the seller for each join or set of locations seller is calculated as described below. To generate the document 302 relevant features, such as a street address is retrieved from the database, and the data refer to multiple locations 304, and are grouped into sets, for example the set 306. To illustrate the scheme 300 includes a set of four locations; 306, 308, 310 and 312. Set 312 marked as a Set M, specifying that a particular implementation of a number of sets may be more or less than four illustrated. Similarly, the number of locations within the collection can vary from one to "N". Generated documents 302, 320, 322, 324, each of which includes relevant features are extracted, collected in the dictionary 330. Using the dictionary 330, formed sparse matrix 340, whereby calculates the relevance of each field value and marked field value using the extracted features, for each of the combined group of sellers based on at least one of the frequencies of occurrence of the term and the inverse document frequency. First, create a matrix containing inverse document frequency of all field values and marked the values of fields covering nine dimensions listed in this document, specifically the code category of the seller, the code ICA region business transactions, the name of the seller, the phone number of the seller, acquiring identification number of the seller, the ID of the level of the seller, the legal name of the seller and the Federal tax identification number, on all records of the location of the seller. Given the Association locations as algorithm of forecasting of documents, as shown in Table 3, inverse document frequency represents the logarithm (base 2 in a particular embodiment) private from division of the number of records on the number of entries that contain a certain value. In one embodiment, inverse document frequency is calculated separately for each of the nine dimensions. The number of entries is calculated as the number of the location of the seller. The number of entries that contain a specific term, calculated by counting the number of locations of the seller, which contain every word in each field of each type. Table 3Field type The value of the field Inverse document frequency Phone number 2014234177 12.788106546 Phone number 8002285882 6.0265553135 The token name of seller DCC5.0067468324 The token name of seller DFQ8.9807516239 Business region 011.4041323134 For each group of cross-attribute normalized frequency of occurrence of the term - double reverse frequency of the document is calculated for values and bulleted values, covering nine dimensions of the category code of the seller, the code ICA region business transactions, names of the seller, the phone number of the seller, acquiring identification number of the seller, the layer ID of the seller, legal names the seller and the Federal tax identification number, as shown in Table 4, and all locations belonging to each group. Table 4 GroupField type The value of the field The frequency of occurrence of the term - double inverse document frequency 14420Acquiring identification rooms seller 000000077480312 0.0104721165 14420Acquiring identification number of the seller 000000077519532 0.0052360583 14420Federal tax identification number 362023393 0.6529357998 14420Business region 050.0627648557 14420The marker names seller TEN0.0011391784 One prediction of group membership is calculated for a given location, connection to the matrix rows (k)-relatives of the locations described above, to the matrix groups according to the type field and the field value, then summing the results of weights frequency occurrences of the term - double inverse document frequency for common types of fields and field values. Predicted group and the coefficient of trust - the group with the highest degree of proximity (the given sum of weight coefficients x weights are values of the fields being compared and marked values). The rating of trust to forecast the resulting value. Forecast third-party data and the mapping of the location (algorithm 206) The third component of aggregate forecast is the algorithm 206 (shown in figure 4), which uses data provided by a third party that correspond to a database of financial transactions by location of the seller. In one embodiment, these other records is chained ID that is associated, for example, with the supplier. These chain identifiers associated with groups of locations of the seller associated with the brand card financial transactions (such as the Issuer). The Outlook, therefore, is simply a grouping of data of the seller, the relevant chain, which has been linked third-party account. This connection is accompanied by a comparison of the location, as described in the next paragraph. The dataset locations seller is derived from third-party data source, and location were assigned (supplier) chain. Each chain in the space of a third-party locations seller is mapped to the appropriate group. The mechanism of matching locations seller is used to attach a set of third party records the location of the seller to the recordset location of the seller specified by the card Issuer. Predicted the group for the current location is calculated, then, as a group, the corresponding chain of third-party entries locations, which was adjusted to record the location of the seller-Issuer's card. The coefficient of trust it's the equivalent ratio of trust, given by the mechanism of adjusting the approximate location of the seller. Prediction numerical signature algorithm 208) In one embodiment, the algorithm 208 numerical signature of the seller (shown in figure 4) uses the observation of the distribution of digits in the first position of the amount of transactions and volume of transactions per day. For definiteness, the distribution tends to be somewhat unique when combining data of the seller. In addition, the distribution has a tendency is in accordance with the distribution, proposed by Benford's Law in the natural data. In practice, a chain of fast food restaurants can show a tendency to have a recurring figure as the first digit of the number of transactions. Such tendency can be used at least in part, to identify, for example, the location of the chain franchisees quick-service restaurant with a specific location or address. One example forecast that uses this algorithm is a random sample of ten percent of the location of the seller of each of the aggregate of the seller (grouped data of the seller). The distribution of the numbers 1-9 arising in the first position on the number of transactions and volume of transactions is calculated and summed relative to the total of the seller. Calculates the angular distance between the distribution and the distribution is identified by Benford's Law. The distribution of the numbers 1-9 in the first position of the number of transactions and volume of transactions, then, is calculated for the current location of the seller. Calculates the angular distance between the distribution and the distribution is identified by Benford's Law. The aggregate of the seller with angular distance closest to the angular distance of the location of the seller, defined as the predicted cumulative seller for that location. More specifically, for each group, the distribution of the frequency of occurrence of each number (i.e., 1, 2, 3, 4, 5, 6, 7, 8, 9), covering all locations within the group among the number of transactions number of transactions and the average number of transactions is calculated and presented as a percentage of the whole. The above distribution is stored in a table that presents Table 5. Table 5 Group RoomDistribution 14420 1 16% 14420 2 14% 14420 3 20% 14420 4 12% 14420 5 5% 14420 6 19% 14420 7 2% 14420 8 8% 14420 9 4% 58625 1 8% 58625 2 14% 58625 3 12% 58625 4 3% 58625 5 5% 58625 6 3% 58625 7 30% 58625 8 18% 58625 9 7%As soon as calculated distribution for each group is determined by a numerical signature for each group the calculation of the scalar product of the vector distribution group and vector distribution, proposed by Benford's Law. The scalar product (divergence angle) is divided by the sum of squares of vector distributions for each group. The distribution is identified by Benford's law, are calculated and stored in the table, which is presented in Table 6. Table 6 GroupThe scalar product 14420 70.9 58625 75.4For each location, the distribution of the frequency of occurrence of each number(1, 2, 3, 4, 5, 6, 7, 8, 9), covering the number of transactions number of transactions and the average number of transactions observed in the course of one month for that location, and is calculated represented as a percentage of the whole. Then these distributions are stored in a table that illustrates table 7. Table 7 Group RoomDistribution 100 1 16% 100 2 14% 100 3 20% 100 4 12% 100 5 5% 100 6 19% 100 7 2% 100 8 8% 100 9 4% 200 1 8% 200 2 14% 200 3 12% 200 4 3% 200 5 5% 200 6 3% 200 7 30% 200 8 18% 200 9 7%As soon as calculated distribution for each location, a numeric label for each location is determined by calculating scalar works vector distribution location and vector distribution, proposed by Benford's Law. This is the scalar product (divergence angle)divided by the sum of squares of vector distribution, for each location and distribution, identified by Benford's law, are calculated and stored in the table, which is illustrated in the Table 8. Table 8 GroupThe scalar product 100 70.9 200 75.4Predicted group membership for that location, in addition, is calculated by finding group with the numerical signature, closest to the numerical signatures in this location, and the factor of trust, calculated as the distance between these two the signatures. Statistical model and assessment As described above, one of the components of aggregate forecast is an algorithm that uses location data that have been brought in line with, for example, the seller's location in the database of financial transactions card. Some data may be provided by third-party sources. Options for implementation, described below, relate to methods and systems for the extraction approximate string (such as a character string), the corresponding data in the database. In variants of implementation, compliance line is used to determine, for example, is a string representing the location in the database, another string. This algorithm is suitable for many kinds of realization, because of the changes that occur in the records of transactions, especially since records related to the name and location of the seller. The system of setting approximate matching rows in the database, acts to attach one set to another set of records, when there is no common key to the accession, such as an exact match, or obshie field values are present in the data. It is assumed that there is some similarity in the recordset. Usually when two sets of data are joined in the database, they share identical values in one or more fields. When identical field values are not shared by the two data sources (sets records) because of differences in data, the traditional approach to the accession sets of data from relevant sources of data should implement a function that takes two values, then evaluates and returns the value of their proximity. To use this function type as a basis for accession of the data sets requires many iterations, quantitatively equal to the number of records in each data set which is attached. As an example, if you have 10,000 entries in the data set and 500,000 record in the data set, the function will calculate the value of intimacy must be called five billion times to attach the dataset As dataset Century Also can not be used any indexes or function-based indexes optimizer database when calling such a function. This type of dataset is very inefficient and too intensely for use when connecting the data sets that have non-trivial amounts of data. Has been developed a method of string comparison, which is implemented in different variants of implementation, using one or more of the following components. In this case, set the reference string is used in the criteria of accession, which is obtained by means of principal component factor analysis (PCFA). PCFA seeks to identify a set of very different rows are in space known values that will be used as reference lines. Another component - calculating values proximity n g frequency that is implemented in pure ASCII structured query language (SQL)to maximize the performance of relational databases (RDBMS). Additionally, the process is implemented in an RDBMS using calculating values proximity n g frequency to form a binary key, as described below, which indicates the proximity of the entry to each of the reference lines identified in the PCFA. In one embodiment, a set of data-driven standard of functions is implemented in an RDBMS, as the table containing the inverse document frequency (IDF) all n-grams, and SQL-implementation of calculating cross-attribute weights frequency of occurrence of the term to reverse the document frequency (TF/IDF). One of the variants of the method of matching rows includes parameterized analytical SQL query that adds entries that share the same binary key, then sorts them relevance by summing the values of the weights TF/IDF all the corresponding n-grams. z-th bit in the binary key is set to logical 1 if the record is z-y the reference line above a certain threshold. The process is realized in the RDBMS to assign factor of trust to each according received from accession, while the data model RDBMS to store the data included in the accession data sets are also included. One simple version of the problem of accession of the data set - under one name (or address) of a larger set of names (or addresses) contained in the database, such as Oracle table. An example of this n-grams of conformity is illustrated in Table 9. Table 9Candidate (or new) address Existing worksheet addresses sellers 10014 S Clarkson Rd. 100 Manchester Rd 2014 Clarkson Rd 4 Main Street 10014 South Clarkson Rd 1400 Clayton Rd Table 10 summarizes the run of the algorithm of conformity n-grams, which includes the definition of a vector of frequencies n-grams for the line of the applicant (for example, an array of Candidate), the definition of a vector of frequencies n-grams for each record in the database of conformity of the applicant (for example, Candidate_Match_Array), measurement of the degree of closeness between Candidate_Array and Candidate_Match_Array and maintain the correspondence of the candidate that exceed the specified threshold. For example, "Jojo's Diner", takes the form Table 10 Candidate_Array 2-gramsFrequency 1 "Jo" 2 2 "oJ" 1 3 "on" 1 4 "s" 1 5 "s" 1 6 "D" 1 7 "Di" 1 8 "in" 1 9 "ne" 1 12 "er" 1Tables 11, 12 and 13 are examples of n-grams Metric compliance. "Scalar domestic work" is the scalar product of the array, the "Value" is the square root of the sum of squared, "Cos (angle)" - the scalar product divided by the product of Values, and angle - the inverse cosine of the scalar product divided by the product of Values. Table 11 Table 12 Table 13 Reference lines The above table and description illustrate the opportunity to present line quantify and measure the degree of intimacy between them. At this point, the index for each record in the database can be created based on its relative position in a small set of reference lines. When you select a reference lines can be calculated position of the new entry in respect of each of the reference lines. Additionally, each record in the database has its own pre-computed position in relation to the reference line. Therefore, the approximate match can be found, obtaining records indexed in the same vicinity, without calculating the whole metric similarity between the new entry and all entries in the database. One goal of the choice of reference lines is to select records that are dissimilar, thus giving a better future. One approach to the choice of the reference lines outlines in General terms in the following paragraphs. Reference lines are identified, taking fetch rows from indexed database. Generated by n-gram views for each row in the sample, creating a vector of frequencies, where z-th component of the vector contains the number of meetings of n-grams in this line. Generated matrix similarity measure of similarity between each pair of selected rows using the cosine similarity metric. One way of finding dissimilar components in the collection of such data is a major component analysis. Principal component analysis is carried out through the matrix similarity, and saved the first k key components. Fetching rows with maximum load on each component remains, forming a set of reference lines. Binary code and information retrieval To group similar lines together to the index could be created to ensure a quick drawing of a candidate during an approximate comparison of strings, each potential candidate record and each entry comparison compared to each of the reference lines by SQL-calculate the frequency of similarity n-grams. If the calculation of similarity leads to the value that exceeds a predefined threshold, the position of the binary key corresponding to the reference line is set to 1. If the value is below the threshold, the relevant position of the binary key is assigned 0. The calculation of similarity n-grams The SQL query was developed for forming a two-dimensional vector containing the frequency of occurrence of all presents unique n-grams in the two strings. The request is then divides the sum of each frequency multiplied by the square the magnitude of the vector of frequencies of each dimension in obtaining standardized metrics similarity. Such calculation is presented in the following example, in which a string comparison is A "MASTERCARD", and a string comparison B "MASTERCHARGE". The following table, table 14, is a two-dimensional vector containing the frequency of occurrence of each unique n-grams that is present in two rows comparison: Table 14 And In MA 1 1 AS 1 1 ST 1 1 THOSE 1 1 ER 1 1 RC 1 1 CA 1 0 ER 1 1 RD 1 0 CH 0 1 HA 0 1 RG 0 1 GE 0 1Line value is calculated as A square root of the sum of the squares of each frequency in the distribution of A, and the value of A line is 3.0. The value of line B calculated as the square root of the sum of the squares of each frequency in raspredelenii B, and the value of B is equal 3,3166247903554. Calculates the dot product of the vector, and for this example the dot product is 7,0 (the number of table entries, where A and B are set to 1). The similarity is calculated as the scalar product / (Rate And X Value), or 0,703526470681448 for an illustrative example. The formation of the value of the binary key A unique identifier and every binary key value stored in an organized table divided index (IOT) in an RDBMS. Each unique set of data is stored in a single section, and no two data sets do not share the same partition. To maximize performance, load each set of data in the table is done by creating a table operations "select (CTAS)and exchange partition)). The data in each segment are stored in the order of the values of the binary key to maximize the performance of accession. Standardization of data To improve the accuracy of comparisons of similarity and the binary distribution of key values in one of the options for the implementation of standardized data on the known acronyms and synonyms. To do such a standardization of the data table is compiled so that it contains all known acronyms and synonyms for the various field types, along with their corresponding standard views. Then the algorithm works for marking each data item and display any known reduce or synonym to their standard forms. Table IDF For higher performance when calculating the weights TF/IDF for all n-grams that exist in the fields that are included in the Union by the approximate coincidence, creates a table that contains the inverse document frequency of all two characters of n-grams in the record of the candidate. The formation of all n-grams space runs through PL/SQL, while the calculation of the IDF is with SQL ASCII. Table IDF stores the value IDF for each possible n-grams each data category. The table is an index organized according to categories of data and n-grams to maximize the performance of accession. Cross-Attribute weights TF/IDF To assign a weight, or is, for every two characters n-grams that exist in this record for each the fields included in the Association in approximate agreement, the value of cross-attribute the weight of the term frequency/inverse document frequency TF/IDF is calculated for each value of n-grams. Calculates n g terms and their corresponding frequencies of occurrence in each of the current record and in each given field using conveyor-valued function which takes ref_cursor from as input. This calculation is slightly different from the traditional calculation of weights TF/IDF, that after calculation TF/IDF for each n-grams each field is adjusted weighting factor for all n-grams each field by uvelicheniya or decrease according to the totals weighted coefficient of n-grams in the other fields of the same recording. This method leads to dynamically adjust the recording level relative weights of conformity n-grams according to the significance of the values of each field. As mentioned above, a unique identifier for each record in the data set, together with their n-grams of terms and calculated values of weight coefficients are stored in separated Index Organized Table (IOT)in order to maximize the performance of the Association. The table is organized according to a unique ID, categories of data and the value of the n-gram of the term. Each unique set of data is stored in a separate segment of the table. Each segment is loaded by creating a table operations "select (CTAS)and exchange partition"to maximize the performance of the load. The connection request Once the calculations binary keys and cross-attributes TF/IDF loaded in RDBMS, analytical uses a Union query to get the records matching candidates and sort them by relevance or quality as a measure of comparison with the record of comparison. This is done, write coalescing with matching values of the binary key, then attach n-grams values for the account of the candidate or calculating the sum of the results of their weights. The assignment of the coefficient of trust The results of the request associations are sent through the functions that are implemented in an RDBMS that performs a very low-level comparison of each incoming record and the record of the candidate, then assigns the factor of trust, using a statistical model for the application to use the Oracle data analysis described above. The above described processes associated with approximate matching line, optionally are illustrated by figures 9 and 10, which are block diagram 400 and 450, respectively, illustrate the definition of a set of reference character strings, and the use of the reference lines to define the metric similarity character string candidate. Custom line, the maximum load on each component are stored for the formation of a set of reference lines. These custom lines represent the main component in order correlation. Metric similarity based on many matches "n-grams when comparing character strings candidate and individual character string in the selected set of reference character strings. According Fig.9 the database includes space for data comparison potential candidate 402, which is sometimes referred to herein as the database character strings (such as name and/or location data of the seller). As described, a random sample of the match fields or database records), generates a 404 on the basis of, for example, optimized search heterogeneous set of character strings. Calculates the matrix similarity 406, and used factor analysis 408 main component of obtaining basic components 410, each of which addresses the appropriate reference character string. This set reference, a character string is used to compare with character strings candidates, because the set was specially generated to enable dissimilar data. According to figure 10 after receiving a character string candidate is calculated 452 similarity between each character string of the candidate and the reference by the string associated with each main component. As described here, such a comparison can be based on the algorithm of conformity n-grams, so that creates a binary key 454 showing the similarity of character strings candidate to each reference line and the corresponding key component. For fast and efficient establishment of blurry character string write (reference character string) joined 456 to character strings candidate, based on a comparison of their respective records binary keys. This process allows the user to quickly compliance with high probability between reference character strings (which may include trade name and/or location data) and a character string of the candidate, which can represent the name of the seller and/or location data of the seller. By creating 458 binary key for each record of the database to be compared, can be generated 460 matching file reference character strings character strings candidate. Despite the fact that the invention was described from the point of view of different concrete ways of their implementation, specialists in the art must be clear that the invention may be carried out with the changes within the entity and the amount of the claims. 1. Automated way detect patterns in data transactions payment card to determine group membership of the seller in the transaction, providing for: memory transactions in at least one database, the database includes data related to the sellers accepting payment cards for payment; sample data transaction first computer related to at least one database; use of at least one algorithm of forecasting and selected data transaction to predict multiple group memberships of the seller in a group of sellers in this case the algorithm implemented the first computer; the generation of metadata that describes each of the forecast issued at least one prediction algorithm, this metadata is generated at least one algorithm; input of many predicted group memberships for the seller and metadata describing each prediction, data analysis software that executes on the second computer; the assignment by using a second computer, coefficient of trust everyone predicted group membership using the software, data analysis, based at least partially on predicted group memberships and metadata, while the ratio of trust is a probability valid Association of the seller with appropriate predicted by group membership; and the issuance of using a second computer prediction of group membership with the highest coefficient of trust as a final forecast for membership of the seller. 3. Automated method according to claim 1, according to which the use of at least one algorithm of forecasting and selected data transaction to predict multiple group memberships provides: marking at least one field in the database; compute the inverse of the frequency of the document for all values marked fields in the database; calculation of the sparse matrix metric weights for each field value database and each value bulleted database fields; and generation forecast by joining the specified field location database to every other field location in the database based on one or more types of fields and field values using sparse matrix, and sparse matrix includes code category of the seller, the individual number of Europay member Association (ICA), the region's business transactions), the name of the seller, the phone number of the seller, acquiring identification number of the seller, the ID of the level of the seller, the legal title of the seller and Federal tax identification number. 4. Automated method according to claim 1, according to which the use of at least one algorithm of forecasting and selected data transaction to predict multiple group memberships provides for the calculation of relevance for the same location of the seller for the set of locations of the seller, while location-based the calculation of the vicinity, the value of proximity based on the values of the fields and the values marked fields in the database, and the calculation of relevance for the same location of the seller for the set of locations seller additionally provides: the extraction of relevant characteristics from many of the locations of the seller, grouped in sets for generating a document for each set; the Union of the generated document to the dictionary; the formation of a sparse matrix, using the dictionary, resulting calculates the relevance of each field value and values marked fields in the generated documents using extracted relevant features are based on at least one of the frequencies, namely the frequency of the term and the inverse document frequency; and accession of the matrix of weights location of the seller to the matrix of weights group of sellers based on the types of fields and field values in sparse matrix; the use of the sum of weight coefficients location of the seller and the weight coefficients of the locations of the group of the seller in the mechanism of relevance to determine the relevance of each location of the seller for each set of locations of the seller, and the choice of location of the seller with the highest relevance in the quality of the forecast. 5. Automated method according to claim 1, according to which the use of at least one algorithm of forecasting and selected data transactions forecasting multiple group memberships provides for forecasting using the algorithm of the numerical signatures and observed trends for merchants belonging to the same group to deviation from distribution in a relatively consistent manner, groups, locations, which have a close numerical distribution in comparison to each location of the seller, and the location based on the rated value intimacy, and the value of proximity based on the values of the fields and the values marked fields in the database. 6. Automated method according to claim 1, according to which the use of at least one algorithm of forecasting and selected data transaction to predict multiple group memberships provides: the selection process in a random order, the seller of the group of data of the seller in at least one database; the calculation of the distribution of numbers 1, 2, 3, 4, 5, 6, 7, 8 and 9 arising in the first position of a number of transactions; the sum of the volume of transactions in the group of the seller; the calculation of the angular distance between the calculated distribution of numbers and the distribution of numbers identified by Benford's Law (Benford''s Law; and conclusion of the group of the seller with angular distance, very close to the calculated angular distance, as predicted group of the seller for the selected vendor. 7. Automated method according to claim 1, according to which the use of at least one algorithm of forecasting and selected data transactions forecasting multiple group memberships involves the use of multiple algorithm predictions to support multiple predictions membership for the seller, and in which the assignment ratio of trust each predicted membership group provides: assignment coefficient of trust to each of the predictions of many memberships for the seller; provide a forecast of membership with the highest coefficient of trust as a final forecast membership for the seller. 8. The automated system to detect patterns in data transactions payment card to determine individual sellers membership of the seller in one or more groups of sellers, using the data of the transaction, the system contains: the device of processing; and base data, these processing device made with the possibility of performance recorded in the memory of instructions that cause a computer to perform: write data transaction to the database, and the data transactions include data related to the sellers accepting payment cards payment; run multiple prediction algorithms, recorded in the database transaction data, with each prediction algorithm predicts the membership of the seller in one or more groups of the seller on the basis of these transactions, at least one of forecasting algorithms generates the metadata describing forecasting; input metadata and predicted group memberships data analysis software; the assignment of the coefficient of trust everyone predicted group membership for the seller, based on the results secured by analysis program data, thus the factor of trust is a probability valid Association of the seller with appropriate predicted by group membership; and issuing forecasts about group membership with the highest coefficient of trust as a final forecast for membership of the seller. 9. Automated system of claim 8, in which at least one of the algorithms is recorded in the database, made with the possibility of defining in the data of the transaction set of database fields that are important for the extraction of group membership. 10. Automated system of claim 9, which at least one of the algorithms is recorded in the database, made with the possibility: data search of the location of the seller in the transaction data for multiple locations of the seller at the specified distance from the specified location; type calculation values according to the classification of locations the seller that occur within a specified distance from the specified location; and return the most common type of value as a prediction of group membership, with the location and distance is based on the estimated value of the proximity and the vicinity is based on the field values and the values marked fields in the database. 11. Automated system of claim 8, which is configured to run many of the algorithms of the forecast transaction data, at least one of the algorithms is compiled with the opportunity to: mark at least one field in the database; calculate the inverse of the frequency of the document for all values marked fields of the database in the database; generate sparse matrix weight metrics for each value of the database field and each value bulleted database fields; and calculation of the forecast by join a given location database fields with each other locations database field, based on one or more field types and field value in sparse matrix. 12. Automated system of claim 8, made with the possibility of run many of the algorithms forecast data transaction, these automated system is programmed to calculate relevance for the same location of the seller for the set of locations of the seller in the transaction, and the location based on the rated value intimacy, and the value of proximity based on the field values and the values marked fields in the database. 13. Automated system for para.12, which is made with the possibility of calculating the relevance for the same location of the seller for the set of locations of the seller in the transaction data, these automated system is programmed to: extract relevant characteristics of many locations of the seller, grouped in sets with the purpose of generating a document for each set; the Union of the generated document to the dictionary; the formation of a sparse matrix, using the dictionary, to calculate the relevance of each value fields and values bulleted field in the generated documents using extracted relevant features are based on at least one of the frequencies: frequency of the term and the inverse document frequency; and accession of the matrix of weights location of the seller to the matrix of weights group of the seller, based on the types of fields and field values in sparse matrix, and the values for each data set the location of the seller mentioned automated system is programmed to use the sum of the weights in the mechanism of relevance to determine the relevance of each location in relation to the group of the seller. 15. Automated system of claim 8, in which to run many of the algorithms of the forecast transaction data mentioned computer system programmed to use data provided by a third party, which were adapted to the database transaction by payment card, using the location of the seller, and the location is based on calculating the values of proximity, and the value of proximity based on the values of the fields and the values marked fields in the database. 16. Automated system of claim 8, in which the location based on the settlement value intimacy, and the value of proximity based on the values of the fields and the values marked fields in the database, to run many of the algorithms of the forecast transaction data mentioned automated system is programmed for: the use of location data third parties that are tailored to the database transaction payment card brand, with location data of a third party contain the ID of the chain; and connection identifiers chains with data on the location of the seller associated with the payment card brand. 17. Automated system of claim 8, in which the location based on the rated value intimacy, and the value of proximity based on the values of the fields and the values marked fields in the database, to run many of the algorithms of the forecast transaction data mentioned computer system programmed by: the use of the mechanism of adjusting the approximate location of the seller for connection recordset locations seller belonging to a third party, with the record set the location of the seller; and the calculation of the predicted group for a given location in as a group, the seller, which corresponds United set for a given location. 18. Automated system on 17 additionally programmed to assign appropriate coefficient of trust as assigned by the mechanism of adjusting the approximate location of the seller. 19. Automated system of claim 8, in which to run many of the algorithms forecasting transaction data mentioned automated system is programmed for the selection of random data of the seller of the group of data of the seller; the calculation of the distribution of numbers 1, 2, 3, 4, 5, 6, 7, 8 and 9 arising in the first position on the number of transactions in these transactions; the sum of the transaction volume in the group of the seller. 20. Automated system on p.19, advanced programmed to: calculate the angular distance between the calculated distribution of numbers and the distribution of numbers identified by Benford's Law (Benford''s Law; and conclusion of the group of the seller with angular distance, very close to the calculated angular distance, as predicted group of the seller for the selected vendor.
|
© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English. |