Method for processing information for detecting identification signs in informational streams

FIELD: computer engineering, possible use in devices for controlling informational streams to monitor informational traffic.

SUBSTANCE: method includes preliminarily forming a base of standard informational signs, receiving informational stream, serially selecting and recording fragments of informational stream, selecting identification signs from these, comparing them to standard ones. Base of standard informational signs is formed by computing morphological coefficient d of identification sign and its address A with utilization of hash-function. For identification signs received from informational stream, morphological coefficients d and identification addresses A are additionally computed, after that on basis of computed address, identification sign selected from informational stream is compared to standard one.

EFFECT: increased information processing speed due to shorter time needed for identification of addresses of signs in base of standard informational signs.

4 cl, 2 dwg, 2 tbl

 

The invention relates to the field of computer science and can be used for processing of information flows and found in the given reference information signs. The method can be used in the device control information flows to monitor the data traffic (Traffic is a set of messages transmitted by telecommunication network (RF Government Decree of 19 October 1996 1254 N "On approval of the Rules of accession departmental and dedicated telecommunications networks to telecommunication networks of General use", p.2).

There is a method implemented in the device searched by the patent RU No. 2133500, CL G 06 F 17/30 declared 20.07.1999, the Known method includes the following steps. Pre-memorize the format of the Protocol data unit Frame Relay (G), the types of transmitted messages Vn=(a,b,c,i, T391), rules for the use of commands and responses R, the possible States of users Vt=(S1, S2, S3). Accept the flow of information, distinguish it from the data blocks, determine the type of the transmitted data block and the format of the data block. Compare the structure of the data block with a reference G. On the basis of this coincidence conclude about using or not using the Protocol FR.

Nedostatki is this method is the narrow scope of its application, since the known method can only work with the structures of the Protocol FR and, as a consequence, the lack of consistent performance when compared when using this method in systems with large number of attributes.

Also there is a method implemented in the device searched by the patent RU No. 2116670, CL 6 G 06 F 17/30, Appl. 27.07.98. The known method includes the following steps. Pre-memorize the structure of the combinations of the beginning of the message (CND), a threshold number of message (SPW, Smo), the maximum values of the symbolic sample of Nmaxtake the flow of information, select from it a sequence of characters N<Nmaxallocate in the received sequence CND, summarize the total number of SPS, compare the sum with the values of SPW, Smoon the basis of comparison make a conclusion about the changes in the intensity of information flow.

The disadvantage of this method is that the conclusion about the status of the traffic is done only on the basis of the increase or decrease in the number of SPS that do not fully reflect the changes in the intensity of traffic in the communication channel.

The closest in technical essence is a method that is implemented in the device information processing for information retrieval is the patent of the Russian Federation No. 2096825, IPC 6 G 06 F 17/00, G 06 F 17/30, Appl. 20.11.1997. Prototype method is that the pre-form the basis of the reference information value, subject to detection in the flow of information, store them, remember the number of characters in the processed text fragment (TF), remember the number of characters in words (phrases), remember the number of digits and special characters in TF, remember pre-selected combination of symbols corresponding to the structural characteristics of TF, set rules for the allocation of TF from the information stream.

Accept the flow of information, remember to predefined rules next TF. Extracted from TF words and phrases, which use pre-stored structural characteristics. Remember, TF, which is written in the memory words and phrases consistently similar positions in the selected timeframe. Compare the memorized words and phrases from the selected TF, for which: choose brute force attack from the memory of words (phrases), determine the number and type of characters in the word selected for the presence of only numbers and (or) spectracom, compare the number of characters with a reference value and remembering the data comparison. Remember data on the number of repetitions of the word in TF (about the same number of words), Zap minut data on the number of matching character patterns. Compare the selected characteristic with reference contained in the reference information signs. If there is a match is detected believe the search feature.

In comparison with analogues prototype method allows to improve the performance of search information, identify the most characteristic of the text words and phrases, taking into account their frequency of occurrence and allows you to automate the search of necessary text information.

The disadvantage of the prototype is the relatively low speed of information processing through the use of algorithms sequential search, which requires for its implementation a large amount of memory and places high demands on the computational resources of the computer. This is because with the increase of the traffic load increases the processing time required text units (words, phrases, etc. and, therefore, increases the total processing time of the entire array of information signs. The increase in memory and the necessity of increasing computing resource leads to unnecessary economic costs.

The purpose of the claimed technical solution is to develop ways to increase the speed of information processing.

This objective is achieved in that in the known method of processing information is AI, namely, that the pre-form the basis of the reference information signs (BEIP)subject identification in the information flow, accept the flow of information, consistently will remember fragments of the received information stream, of which there are rules and information signs, compare them with the reference information signs from BEEP and by comparing the results record the presence or absence of each fragment of information flow identification signs, subject to detection. New in the claimed method is that for the formation of BEIP choose a set of N≥1 reference information signs, allocate contained in them and different from other characters. Then from the selected symbols form the alphabet characters (a-C), calculate the number of S contained characters, assign the j-th (where j=1,2,...,S, the symbol number njits position in the alphabet characters and hope for a given value of the fill factor TO BEEP its volume of Nk=N/K. then for the i-th one, where i=1,2,...,N, the reference information characteristic compute the number of miforming its characters, and its morphological factor diand calculated using the hash function given as f(diaddress reference the information the aqueous symptom of A i=f(di). Then remember the i-th reference information characteristic in BEEP at the position corresponding to the address Ai. For the selection of each piece of received information flow information signs distinguish in it a group of binary digits that are located between adjacent to each other two spaces, decode it to mean information sign, calculate its morphological factor and address. Then compare selected and decoded information characteristic with the reference information signs, stored at this address in BEEP.

For the i-th one, where i=1,2,...,N, the reference information characteristic compute its morphological factor diaccording to the formula:

where nj- the position number of the j-th symbol in the alphabet of symbols, mi- the number of characters forming the i-th attribute, S is the number of symbols of the alphabet of symbols, j=1,2,..., mi- the character position in the i-th attribute.

As a hash function for calculating the address of variable Ai=f(diuse the function

Thanks to the new essential features of the claimed method is achieved by reducing the time of identification addresses signs in BEEP. The search time does not depend on the volume BEEP (unlike sequential or iterational the way), therefore, the maximum achievable winning time characteristic in BEIP can reach several orders of magnitude (depending on the amount BEEP). Marked indicates the possibility of achieving the goal of increasing the speed of information processing.

The analysis of the prior art information processing has allowed to establish that the analogues, characterized by a set of characteristics is identical for all signs technical solutions, there are no available sources of information that indicates compliance of the claimed method the condition of patentability "novelty".

Search results known solutions in this and related areas of technology in order to identify characteristics that match the distinctive features of the prototype of the features of the declared object has shown that they do not follow explicitly from the prior art. The prior art also revealed the popularity of influence provided for distinguishing the essential features of the claimed invention transformations on the achievement of the technical result. Therefore, the claimed invention meets the condition of patentability "inventive step".

The claimed method is illustrated by drawings on which is shown:

figure 1 - example of construction BEEP;

figure 2 - example of digital information stream containing comy sign.

Used in counterparts and the prototype method of finding information is consistent. The search process is reduced to the comparison of each character trait consistently with all symbols of each characteristic in BEEP that because of the low speed processing algorithm, as well as large amounts of database reference information signs, leads to an unnecessary waste of time resources.

Let for the formation of the base reference information of the characteristics of the selected N attributes, each long M characters. In other words, if one iteration takes time t, then for all iterations will be spent: T=N·M·t. When practically used at present BEEP tens of thousands of signs, the above method used in counterparts, on the structure of the execution of the search algorithm cannot provide required at the moment the speed of the search of necessary signs.

Consideration of the claimed method it is advisable to spend on the following example. Let for the formation of the base reference information of the characteristics of the selected N=100 signs, of which the first seven are set to: Bank, iron, mask, machine, frame, aircraft, people, and hundredth sign is set 1985-rise. Of these N selected characteristics distinguish contained in them and different from other characters and form the alphabet of symbols"(AU) by assigning to each symbol of the sequence number in the AC. We assume that all N signs contain characters that are summarized in table 1. Each character of the speaker is assigned a number nj. For example, the character "W" is a number n=8, the symbol "9" is the number n=24, and so on

Table 1
nj123456789101112131415161718192021222324252627
Symbol namespaceAndBInGDEW3AndToLMNOPPTHWB1985-

The composition of the AU contains a set of different symbols, sufficient for the compilation of any of the N pre is sustained fashion selected features.

Then calculate for a given value of the fill factor TO BEEP its volume of Nk=N/K, i.e. the number of rows in the generated BEEP. The fill factor value BEEP TO choose lying in the range [0,2-0,5]. At smaller values decrease the efficiency of memory usage for large - increase the difficulty of ensuring nepovtorimosti address signs in BEIP (Knuth D. the Art of computer programming. V.3. Sorting and searching. TRANS. from English. - M.: Publishing house of William, 2003. - 560 C.).

We believe that in this example K=0.2, respectively the number of rows in the reference information signs will be equal to:

Nk=100/0,2=500.

Next, for each i-th characteristic compute its morphological factor di. In General, it may be calculated in various ways. When using the decimal system for determining the address position of the reference characteristic in BEEP, morphological factor dithe i-th reference characteristic calculated by the formula (1).

As an example, consider the evaluation of the morphological factors signs of "Bank" and "1985-up".

For the characteristic "Bank":

D1=3·273+2·272+14·27+11=60896

For the characteristic "1985-up":

d100=23·2710+24·279+25·278+26·277+27·276+16·275+15·274+6 on; 273+22·272+7·27+13=4925853783453586

Similar calculations performed for the considered characteristics are summarized in table 2:

Table 2
№ p/p (i)The number of characters in the attribute (j)The number of characters in AC (S)SignMorphological factor (di)
1427BANK60896
2627IRON75706179
3627MACHINE188019684
4727PLANE7009479550
5627PEOPLE7855527742
6527MASK6961520
7427FRAME672491
...............
...............
10011271985-RISE4925853783453586

Next, taking into account the calculated values of the morphological factor determine the address Aieach of the i-th characteristic using a predetermined hash function (formula (2)), i.e. define the position of a reference characteristic in BEEP.

For example, address signs And1("the Bank") and a100("1985 - up") in BAIP will have the value:

A1(the"Bank")=(mod 60896 500)+1=396+1=397;

And100("1985-up")=(4925853783453586 mod 500)+1=86+1=87.

Similar calculations performed for the considered characteristics are shown in table 1.

Then take, for example, in telecommunication networks, information flow in the form of a case of double-bit digital electromagnetic signal containing the desired characteristics.

Processing the received digital stream, perform the following way.

Allocate a piece of digital signal between adjoining to each other by gaps (figure 2). White space characters in the data stream indicate in binary form the international code "00100"recommended by the CCITT (see, for example, V.A. Grigoriev. Message transfer by foreign information networks. - Leningrad: YOU, 1989. - 18-19 C.).

Enclosed between the nearest gaps sign decode to view the desired information sign, for example, identified the topic "Bank". The decoding order is known and described in literature, for example the book: Waheguru. Message transfer by foreign information networks. - Leningrad: YOU, 1989. - 14-27 C. Then, similarly to the example above, calculate its morphological factor di(formula 1) and the address Andi(formula 2), in this case we have: d1=60896, A1=397. Then compare it with the reference information signs, stored at this address in the reference information signs. From BEIP (see figure 1) are located on the found address And1=397 sign "Bank" and compare it with the characteristic extracted from the received digital stream (in this example, the sign "Bank"). The coincidence of the compared characteristics provides a basis for fixing presence in the received information stream of the desired attribute.

Thus the inventive method, the search process is reduced to the selection of BEIP sign, located at the computed address, and subsequent comparison of the reference and the selected information flow characteristics.

The practical implementation of the proposed method does not require a high performance CPU, since the search time does not depend on the amount of database reference information signs (in contrast to sequential or iterative methods), so the maximum possible win on their can be several p the rows (depends on the amount of base reference information signs).

Comparative evaluation of search speed characteristics using the proposed and the prototype method of information processing for detecting the identification features in the information flow can be reviewed at the next example.

Assume that the base reference information signs (dictionary of signs) has N=50000 signs, each of length M=8 characters. Upon detection of a symptom of a search from top to bottom, left to right. The maximum number of steps (iterations) for the sequential search method (prototype) for the case when the sign is in the string N:

Rmax=N·M →Rmax=50000·8=4·105.

The minimum number of iterations in the sequential search method for the case when the sign is in the first line:

Rmin=N·(M-1)+1, →Rmin=50000·(8-1)+1=350001.

When using the inventive method to locate a characteristic maximum number of operations is equal to two.

1. Calculate the morphological factor diand the address of Aithe i-th attribute.

2. Compared with the reference information signs, stored at this address in the reference information signs.

If one iteration takes time t=Δt, then for all iterations will be elapsed time: T=N·M·Δt.

For the nearest analogue of the Ton the l =4·105·Δt.

For the inventive method T3=2·Δt.

Therefore, the relative benefit V time needed to search for the sign will be:

Thus, of the essence of the claimed method can be seen that it provides reduced time identification address signs in BEEP. This achieves the stated aim of raising the rate of information processing.

1. The method of information processing for detecting the identification features in the information flow, namely, that the pre-form the basis of the reference information signs, subject to the identification of the information flow, accept the flow of information, consistently will remember fragments of the received information stream, of which there are information signs, compare them with the reference information signs from the database of reference information signs and the results of the comparison record the presence or absence of each fragment of information flow identification signs, subject to detection, characterized in that for forming the base of reference signs choose a set of N≥1 reference information signs, allocate contained therein and wherein the other is from the other characters, form of them the alphabet of symbols, calculate the number of S contained characters, assign the j-th (where j=1,2,...,S, the symbol number njits position in the alphabet characters and hope for a given value of the fill factor To base reference information signs its volume of Nk=N/K, then the i-th one, where i=1,2,...,N, the reference characteristic compute the number of miforming its symbols, and its morphological factor diand calculated using the hash function given as f(diaddress reference information characteristic of Ai=f(di), then remember the i-th reference information characteristic in the reference information signs at the position corresponding to the address Aiand for the selection of each piece of received information flow information signs distinguish in it a group of binary digits that are located between adjacent to each other two spaces, decode it to mean information sign, calculate its morphological factor and address.

2. The method according to claim 1, characterized in that the morphological factor is calculated by the formula

where nj- the position number of the j-th symbol in the alphabet of symbols;

mi- the number of characters forming the i-th characteristic;

S - the total number of SIM the tins in the alphabet of symbols;

j=1,2,...;

mi- the character position in the i-th attribute.

3. The method according to claim 1, characterized in that the fill factor value database reference information signs are chosen in the interval K=0,2÷0.5 in.

4. The method according to claim 1, characterized in that when calculating the addresses Aichoose a hash function of the form f(di)=([di]new games available at)+1).



 

Same patents:

FIELD: computer engineering, in particular, informational-reference system of industrial-economical characteristics of airlifts.

SUBSTANCE: system contains two registers, data commutation block, block for selecting automated workplace of user, block for identification of type of data being requested, block for selection of viewing direction of reference data, block for commutation of synchronization signals, reverse counter, block for receiving database update files, block for identification of type of data being updated, two blocks for comparing codes.

EFFECT: increased speed of system operation due to no need for searching information across whole volume of server database.

10 dwg, 1 app

FIELD: computer science, in particular, engineering of automatic system for controlling routing of text documents in data processing network.

SUBSTANCE: system contains first, second, third, fourth, fifth and sixth registers, first and second blocks for identification of text documents, block for integration of control signals, two counters, adder, block for forming base address for recording finished documents and commutator.

EFFECT: increased speed of system operation by localizing range of data search addresses in server database by means of identifiers of text documents.

6 dwg

FIELD: computer engineering, possible use as device for structural-statistical analysis of information arrays.

SUBSTANCE: device contains generator of signals of current estimate, discriminator of zones of estimate values, distributor of impulses, counter of temporal intervals, commutator, first and second generators of search variable, first and second adding counters, first and second memory blocks, division block, classification device, register of search strategy, signals generator, timer of current day, block for generation of cutting threshold, structural analyzer and third memory block.

EFFECT: possible recognition of target determined combinations, representing n-digit binary numbers.

2 cl, 2 dwg

FIELD: engineering of automated libraries for data storage with loading, unloading and movement of data carriers.

SUBSTANCE: library contains robotized transporting device, for moving data carriers, and multiple universal sockets with means for connecting accumulators positioned in sockets or other devices to transporting device, to which commands from main computer are sent for moving data carriers. Robotized device is programmed for recognition, whether each socket is free or occupied with a certain component, such as an accumulator or command port, and following realization of appropriate communication with component occupying the socket.

EFFECT: decreased hardware costs with adjustable configuration and use of different system components.

6 cl, 5 dwg

FIELD: computer science, in particular, automated identification of data of voting sheets of voters in national automatic system "Elections".

SUBSTANCE: system contains block for receiving data from voting sheets, block for receiving records of server database, block for setting type of signatures, block for selecting supporting addresses of server database, counter of signatures number, comparators, blocks for selecting supporting addresses of signature types, blocks for modification of addresses for recording and reading signatures, block for selecting types of signatures, block for selecting number of checked signatures of voters and block for forming signals for recording and reading signatures of voters.

EFFECT: increased speed of operations due to localization of addresses of documentary data of civilians in database of system by identifiers of their surname, name and patronymic.

12 dwg

FIELD: radio engineering, possible use as mobile communication system for realizing contact with a celebrity in form of a game.

SUBSTANCE: system contains at least two receiving-transmitting personal devices, local control device, and central control device. At least one additional transmitting-receiving device of a celebrity is provided. Receiving-transmitting personal devices are made in form of mobile communication terminals of users, local control device - in form of station of mobile cell phone communications operator, and central control device - in form of a server.

EFFECT: increased efficiency, realized game effect during making of contacts.

6 cl, 3 dwg, 2 tbl

FIELD: computer science, in particular, engineering of internet-banking system for information-marketing electronic trading center.

SUBSTANCE: system contains user identification block, block for identification of user requests, block for selecting bearing addresses of information-marketing center database, block for selection of user workplace addresses, register, block for selection of transaction addresses, block for selecting addresses of payments, block for forming signals for recording and reading for database, data dispensing block, block for identification of transactions, block for receiving dialogue messages, block for providing notifications to suppliers, block for providing notifications to buyers.

EFFECT: increased reliability of financial payments by excluding possible receipt of payment by goods supplier before warehouse of buyer receives goods.

13 dwg

FIELD: computer science, in particular, engineering of system for controlling selection and processing of governmental population register data.

SUBSTANCE: system contains registers, address selectors, adders, counters, data receipt block, block for generating temporal strobes, comparator, decoder, block for predicting quantities, data dispensing block, OR element and delay elements.

EFFECT: increased speed of operation of system due to localization of search only by supporting addresses of database of Russian Federation subjects.

10 dwg, 2 tbl

FIELD: computer science, in particular, automated system of state population register.

SUBSTANCE: system contains first and second registers, block for selecting base address of records, block for forming current address of records, selector of recording and reading modes, control block, counter, first and second comparators, data commutation block.

EFFECT: increased speed of operation of system by localization of addresses of database recording by typical identity identifiers.

6 dwg

FIELD: engineering of information processing systems.

SUBSTANCE: system contains master-system for processing information, interface, central communication device, client system for processing information, object model. In accordance to method each master system sends to central communication device elements of its data array determined in appropriate master-representation, while in master-representation of connected master system elements of data array are contained, for which system has data priority.

EFFECT: simplified specification and development of interfaces between technical applications.

2 cl, 6 dwg

FIELD: data access technologies.

SUBSTANCE: method includes assignment of simplified network address, recording URL and converting numbers into storage system with net access, inputting assigned number into computer, transferring inputted number to storage system, converting number to URL, receiving page matching URL, and displaying it. Method for use in operation systems for message transfer include intercepting system level messages to certain objects and forming pseudonym messages during that. Systems realize said methods.

EFFECT: broader functional capabilities.

12 cl, 30 dwg

FIELD: computers.

SUBSTANCE: system has entries memory block, words memory block, control block, substitutions block, n blocks for searching and replacing.

EFFECT: broader functional capabilities.

17 dwg

FIELD: computers.

SUBSTANCE: system has nine registers, four address selectors, triggers, AND elements, OR elements and delay elements.

EFFECT: higher speed.

8 dwg

FIELD: computers.

SUBSTANCE: system has operation mode setting block, first and second blocks for selecting records addresses, block for forming addresses for reading records, data output block, first and second record codes comparison blocks, records quality comparison block, year intervals comparison block, records selection control block, register, adder and OR elements.

EFFECT: higher speed of operation.

10 dwg

FIELD: computers.

SUBSTANCE: system has memory for programs, including browser, display block, database for storing documents, addressing control block, while each document of base has at least one link with indicator of its unique number and indicator with address of program for control stored in addressing control block, system contains also, connected by data buses and control of other blocks of system, memory for links of couples of unique numbers of links and forming means for lists of unique numbers of documents links, which are interconnected.

EFFECT: higher efficiency.

2 cl, 1 dwg

FIELD: telecommunication networks.

SUBSTANCE: messages, sent by cell phones, are formed by means of printed and public-distributed classifier, wherein at least one category is made with possible detection of at least one identifier of individual mark of object, identifier is sent by sender via at least one message to computer server with software, which transfers such message into database record at server for its transfer to at least one receiver, or searches for such record in database at server in accordance to received message and transfers to sender of such message at least one found database record.

EFFECT: broader functional capabilities.

2 dwg

FIELD: web technologies.

SUBSTANCE: method for integration of printed business documents, requiring original signature, with electronic data concerning these documents and later extraction of data, inputted for forming documents, is characterized by steps for forcing end user or agent to input all necessary data for forming of required document, saving collected data in database, linking saved data to unique ID code and printing unique ID code on printed document during printing. Printed documents is signed by end user and sent together with supporting documentation. When document is received by business-client, business-client inputs ID code, which is then used for access to saved data, and updates private database of business-client with all data, used for creation of original documents.

EFFECT: higher efficiency.

2 cl, 7 dwg

FIELD: computer science.

SUBSTANCE: device has string memory block, comparator, memory block for words and substitutes, block for analysis and forming of displacement results, block for storing string address, control block.

EFFECT: broader functional capabilities, higher reliability.

10 dwg

FIELD: data bases.

SUBSTANCE: method includes presenting operations at all levels of company in form typical product life cycle tree, wherein existing objective functional-technological connections of each manufacture stage are decomposed, and forming information system in form of pertinent-relevant complex information system and search, for which typical structure-information modules of information system are formed, system objective information requirements of data consumers, being a result of decompositions by levels of operations and problems, are determined as precisely as possible, data base of found documents in form of files is formed of key nodes with set of elementary data block for each system information requirement and files of information system modules, starting from lower levels of current stage and then upwards, while each data block has a list of pertinent documents ordered by determined information requirements.

EFFECT: higher search efficiency.

13 cl, 11 dwg

FIELD: computer science.

SUBSTANCE: system has first, second, third, fourth and fifth registers, first and second memory blocks, first, second and third decoders, triggers, elements AND, OR and delay elements.

EFFECT: higher speed of operation.

1 dwg

Up!