Computer system and method of preparation of the text in the original language and a translation into foreign languages


(57) Abstract:

The invention relates to a computer system of creation and translation of documents, to prepare the text in the language limitations and translation into a foreign language. The technical result is achieved due to the possibility of creating a translation system, excluding initial and final editing. In the integrated computer system containing a processor, a text editor imposes lexical and grammatical constraints on a subset of the lexical language used by authors to create text. The resulting fit to the translation of the text in the original language undergoes translation to any of the set of target languages. 7 C. and 44 C.p. f-crystals, 9 Il.

The invention relates to a computer system of creation and translation of documents and, in particular, it relates to a system for the preparation of the text in the language limitations and translation into a foreign language without the need for prior or subsequent editing.

Any organization whose work requires the creation of large amounts of information in a variety of types of documents that need to ensure their full phonememory. In ExC is all the necessary expressive attributes to optimize communication. This language should be applied consistently, so that the organization can be recognized by a single and sustainable style of expression or voice. This language should be devoid of ambiguity.

The desire to reach this perfection in writing led to the introduction of a number of techniques or techniques designed to establish control over the author's writing. However, authors with different abilities and different prior experience and education is not convenient for them to limit the unified framework of the standard approach. Guidelines, rules and regulations of writing were not effective: they are difficult to determine and enforce. Previous attempts to simultaneously standardize and improve the quality of the writing ended with different results, but what if they were not positive or successful, these results are inevitably increased the cost of the author's documentation.

Attempts recently to give authors a software environment that would be able to increase their productivity and improve the quality of their documents, succeeded only in creating a program p is rather small.

When the need for information necessitates overcoming language barriers, all these difficulties skyrocket. The organization needs to clear the current channel for your information falls into a significant, if not completely dependent on translation.

Translations of texts from one language into another language made for many years. Before the advent of computers such transfers were executed entirely by hand by professional translators, who owned both the language of the source text (the original), and the language of the target text (translation). In General, preference was given to the translators, the original mastered the target language as their native language, and later mastered the language of the original. It was considered that this approach resulted in the most accurate and efficient translations.

Even the most qualified interpreter requires a significant amount of time to transfer one page of text. For example, it was estimated that a qualified translator when translating technical texts from English into Japanese can translate per hour approximately 300 words (about one page). This shows that the amount of time and Srednich hundred years, the need for translations in the framework of business activity and trade is constantly growing. It was caused by a number of factors. One of them was the rapid growth of volumes of texts related to international business operations. Another factor is the large number of languages, which should be translated texts so that the company could do business on a global scale. The third factor is the rapid growth of trade, leading to the need for frequent revisions of text documents, and this requires further transfer to their new options.

Many organizations are responsible for the creation and dissemination of information in multiple languages. In the global market, manufacturers must ensure that the instructions are widely available in the native languages of the countries which are their target markets. Manual translation of documents in foreign languages is a costly, time-consuming and inefficient process. Translations are usually controversial because of their individual interpretations of the translators, which are not always fluent in the specific language documentation related to a specific area. As a result of these problems panthony research and development the explosion of knowledge, which occurred in the last century, has also led to an increase exponentially the translation requirements for documents. Today there is no longer one dominant language for documents in a specific area of research and development. Typically, such research and development are conducted in several major industrialized countries such as the United States, Britain, France, Germany, and Japan. In many cases there is additional languages, which appear important documents related to any specific field of research and development. Technical and technological progress, especially in the areas of electronics and computer science, has further accelerated the process of creating texts in all languages.

The ability to create text directly proportional to the performance used for this technique. For example, if the documents must be written by hand, the author can write only a certain limited number of words per minute. This performance, however, has increased considerably with the advent of typewriters, mimeographing playback devices and printing machines. Development is kstew. Today, the average author can create a much larger amount of text per unit of time than with the techniques of the Scriptures by hand in the past.

This rapid growth of the volume of texts combined with great technical achievements largely drew attention to the issue of translation from a source language into a target language or languages. A significant amount of research has been carried out in universities, in private and in government laboratories. These studies were devoted to the translation without the intervention of human translators. Were designed computer systems as a means of implementing the so-called machine translation (MT). Such a computer system programmed to perform automatic translation of the source text as input information in the target text as the output information. However, researchers have found that such a computer system for automated machine translation cannot be realized with the use of modern technology and theoretical knowledge. Today there is no system that could perform machine translation of text in natural source language into a text in natural is dzikov. One of the existing methods will be discussed below.

During the operation, the pre-editing, the original text is first reviewed by the editor of the original. The task of the editor of the original is the inclusion in the original text of the changes that will bring it into line with the so-called optimal condition for translation machine translation system. This correspondence the editor of the original study by the method of trial and error.

The above process pre-editing may undergo several stages or iterations with the participation of the editors of the original growing skills and level of competence. Thus the original text is prepared and transmitted in a machine translation system. The output product is a text in the target language, which, depending on the purpose of translation or quality requirements from the user, may be subjected or not subjected to final editing

If the required translation quality should be comparable to the quality of qualified translation, performed by a person, the output of machine translation, most likely, will be subjected to the final editing of kvalificirovatsya of machine translation systems, which can be created on the basis of modern technologies within the obvious constraints of time and money and reasonably justified expectations of achieving the desired economic efficiency. In fact, most created in real time systems requires actions by the final editing, designed to approximate, in varying degrees, to quality levels purely human translation.

One such system is the system KWMT-89 (see below), developed by the Center for machine translation Carnegie Mellon, which performs translations from English to Japanese and from Japanese into English. She is focused on the achieved knowledge of the domain model that helps in interactive eliminate ambiguity (i.e., in the editing of the document to give it uniqueness). However, it is an interactive eliminate ambiguity, usually runs in non-interactive or interactive mode with the author himself. As soon as it encounters an ambiguous sentence that she can not lead to certainty, she is forced to stop the process and to resolve the ambiguity problem, sadalsuud fairly well defined and controlled language input, its a process called interactive disambiguation with the help of a translator produces output incorrect text in the final edit.

With that said, the obvious benefits of creating a translation system that excludes both the preliminary and final editing.

The invention is a system integrated computerized procedures for the preparation of documents in one language and translated into several languages. Interactive computerized text editor imposes lexical and grammatical constraints on a subset of the (shortened version) of the natural language used by the authors in their texts, and provides the authors support in resolving ambiguity in their texts to ensure their privodimosti. The resulting fit to the translation of the text in the original language undergoes translation to any of the multiple target languages and the translated text does not require any final editing.

Fig. 1A and 1B are block diagrams of a high level, illustrating the architecture according to the present invention.

Fig. 2 is a structural diagram of a high level, illustria information flow and architecture of MP 120.

Fig. 4 is an example of an information item.

Fig. 5 is a block diagram of a domain model 500.

Fig. 6 is a structural diagram of a high level, illustrating the use of the language editor 130.

Fig. 7 is a structural diagram illustrating the use of the vocabulary of the controller 610.

Fig. 8 is a structural diagram of a high-level block 630 disambiguate.

Fig. 9 is a block diagram illustrating information flow and architecture of MP 120.

I. General characteristics of the integrated system

A computerized system in accordance with the present invention provides functional integration of components such as:

1) copyright environment for document preparation,

2) module for accurate machine translation into different languages without preliminary and final editing. When using this technology for production of multilingual documentation the user enables consistently accurate, timely and cost-effective translation in large or small volumes with the almost simultaneous release of information as in the original language and languages, is scheduled for a translation of the original text.

1) in a multicultural and multilingual business environment information is not regarded as fully prepared, if it cannot be presented in different languages of its users,

2) the combination of the author's writing (preparation) and translation within a single system gives benefits from the point of view of efficiency, which could not be achieved otherwise.

In Fig. 1A shows a block diagram of a high level Integrated system copyright preparation and translation (ISAPP) 105. ISAPP 105 enables the creation of specialized computer environment intended for the organization of the author's documentation for one language and its translation into various other languages. These two separate functions supported by the integrated group as follows.

1) the Author's training - one group of programs offers the user a text editor (TR) 140, which gives authors the opportunity to create their monolingual texts within lexical and grammatical constraints of a specific application subsets of natural language, which is here called the restricted source language (OIA) or language restrictions. In addition to this is znanosti, that makes the text to be translated without prior editing

2) Translation is another sub-group of programs provides the function of machine translation (MT) 120, able to translate OIA on as many target languages, how was programmed generating module, and the resulting translation does not require the final editing.

For a system in which the role of the Central component plays the translation function of the integration features of the author's training and translation according to the present invention within a single system is the only developed to date approach that eliminates both the preliminary and final editing.

Text editor (TR) 140 is a set of tools to support authors and editors in creating documents for OIA. These tools help authors to use the required dictionary and the desired grammar, OIA in writing their documents. TR 140 communicates directly with the author 160 and Vice versa.

As can be seen from Fig. 1B, ISAPP 105 is divided into four main parts to perform the functions of the author's training and translation, namely: (1) a limited source language (OIA) 133, (2) the first editor 130 and photo editor 150. In addition to these components, the system control files (suf) 110 is also included in the system to manage all its processes.

OIA 133 is a subset of the original language in which the grammar and vocabulary of the cover of a special area of the author of document to be translated. The volume and composition of OIA determined by the requirements of the vocabulary and grammatical structures that are valid for the corresponding region, so as to make possible the translation process, without resorting to preliminary and final editing.

TR 140 is a set of tools to support authors and editors in creating documents for OIA. These tools help authors to use the required dictionary and the desired grammar, OIA in writing their documents. YAR 130 communicates with the authors 160 (and Vice versa) via the text editor 140. The author offers two-way communication line 162 with a text editor 140. YAR 130 160 informs the author about whether the author uses words and phrases (phrases) in OIA. YAR is also able to offer from OIA synonyms words that are relevant to the information area, which includes document you create, but by what to text grammatical restrictions OIA. Further, it provides the author's support in resolving the ambiguity of sentences that may be correct syntactically, but not semantically unambiguous.

MP 120 is divided into two parts: the analyzer 127 MP and generator 123 MP. The analyzer 127 MP serves two purposes: it parses the document to ensure that the document clearly corresponded to OIA and represented the text in the intermediate language, called "Interlingua". Then reviewed and approved from the point of view of OIA the text in the intermediate language is translated to the specified foreign (target) language. MP 120 implements an approach to translation based on the text in the intermediate language Interlingua. Instead of a direct translation of a document into another foreign language generator 123 MP first translates the document independent of any particular language to machine-readable form, called interlanguage or Interlingua, and then generates a translation of the interim text Interlingua. As a result, the documents do not require the final editing. For each of the languages created his own version of CHM 120, consisting primarily of a set of sources of information (knowledge), intended to guide the translation Prohm is material to develop new generator 123 MP.

When their functioning in full YAR 130 sometimes forced to ask the author of 160 to make a choice between alternative interpretations of certain proposals that satisfy grammatical constraints OIA, but the meaning of which is not clear. This process is called resolving the ambiguity. After YAR 130 has determined that a specific part of the text uses only the vocabulary of OIA and meets all grammatical constraints OIA, he refers to this part of the text as approved by OIA, but not yet past this process of disambiguation. As will be explained below, dealing with ambiguity will not require making any changes to those aspects of text that are visible to the author. After the text was operation disambiguate, he is ready for translation into a target language 180.

In practical implementation YAR 130 is constructed as a continuation of the text editor 140, which provides basic word processing text required authors and editors to create texts and tables. Graphic editor 150 is used to create graphic images. This graphical editor 150 provides a means for entering and processing the s labels also tested OE.

YAR 130 (via a text editor 140) communicates with the analyzer 127 MP, and through him with MO 137 during disambiguation through bilateral lines from nest to nest. In a preferred embodiment of the present invention MOE is one of the bases of information, which receives power analyzer 127 MT. MO is a symbolic expression of declarative knowledge about vocabulary, OIA used by the analyzer MP 127 and YAR 130.

In Fig. 2 shows a structural diagram of a high level of work ISAPP 105. MP 120, YAR 130, a text editor 140 and photo editor 150 are under common control and management suf 110. Control lines 111-113 provide the necessary information and control for proper operation ISAPP 105.

At the initial stage, the author turns to suf 110 to select the document you want to edit, and then suf includes work in the text editor 140, displaying a file called document. Using a text editor 140, the author introduces in ISAPP 105 text, which may not be limited and ambiguous, as shown in blocks 160 and 220. The author 160 is standard editorial teams to create and budut to enter, basically, the texts, the preparation of which they are substantially mentally into account the constraints imposed by OIA. Then the text is subjected to correction by the author to meet OE in response to feedback from the system, leaving the violations specified in advance lexical and grammatical constraints. Of course, this process is more effective if it was not originally typed text, completely ignored the specified limits. However, the system will work correctly in the case when initially entered text does not fully take into account these limitations.

Communication of the author with YAR 130 is performed using commands entered by mouse or keyboard. However, you should consider the possibility of other forms of input, such as using optical pen, voice, etc., that is within the scope of the present invention. An example of such input may be a command to check on OIA either team to find definitions and examples of usage of a given word or expression.

The text in OIA, which may still contain residual ambiguity or problems with the style, analyzed in accordance the ANO in block 230. The author gets feedback to correct any mistakes made on-line feedback 215. In particular, YAR 130 is the author of 160 information in relation to used them, but absent in OIA words, expressions or sentences. And, finally, the text is checked for ambiguous sentences, YAR author offers tips for choosing a correct interpretation of the meaning of the sentence. This process continues as long as the text is not completely eliminated the ambiguity.

When the author made in the text all the necessary amendments and phase analysis 230 is completed, the text 240 with fixed ambiguity and satisfied constraints is passed to the analyzer and intermediate translator 250 MP. Intermediate translator is resident in the analyzer 127 MP together with the syntactic part of the analyzer and translates the text 240 with fixed ambiguity and satisfied constraints on the intermediate language. Interlingua 260. This intermediate translation 260, in turn, translates generating unit 270 in the target text 280. As shown in Fig. 3, the text in the intermediate language Interlingua 260 has a shape that can be translated into multiple target languages 3 the existing special vocabulary and grammatical constraints, it becomes possible to implement accurate translations of texts with limited language into a foreign language without the need for any subsequent or final editing. This final editing is not required because the block 217 checking dictionary as part of YAR and analysis unit 230 has already forced the author to change and/or to deny ambiguity all potentially ambiguous sentences and delete from document all untranslatable words before moving to the stage of translation.

II. Detailed description of functional blocks

In a preferred embodiment of the invention each author is in his personal computer subscriber paragraph (workstation - workstation) type DEC with 32 megabytes NVR, 400 megabyte drive and color monitor 19 inches size. Each automated workplace (AWP) has a configuration that enables the swapping (swapping) of not less than 100 megabytes with its local drive. In addition to AWS authors servicing the processors DEC used as a means of providing files, one service processor for every two author groups, but not more than 45 polzovatelyami local network "Ethernet" (Ethernet). ISAPP as the system uses the operating system Unix (derived from a Standard distribution of Berkeley" - BSD is more preferable than derived from "the Y" - SYSV). At the disposal of the system are the compiler of the programming language "C" libraries and OSF/Motif. YAR will operate in the mode window arrangement Motif. It should be noted that the present invention is not limited to the above-mentioned hardware and software support, and that the present invention can be used with other specific types of support.

A. Text editor

In a preferred embodiment of the present invention used a text editor 140, which allows the author to enter the information that it was at a later stage, analyzed and ultimately translated into a foreign language. In the framework of the present invention can be any commercially available software text processing. In a preferred embodiment, use a text editor 140 SGML-type Arbor Text (Arbor Text Inc., 535 West William St. , Ann Arbor, MI 48103). The text editor 140 type of SGML (Standard Generalized Markup Language Standard generalized markup language") provides the basic functions of word-by-word those who set aside by the Russian package Inter Cap (Annapolis, PCs Maryland) to create graphic images.

The text editor 140 SGML-type is used preferably in the present invention, since it creates texts using labels ("tags") "Standard generalized markup language" (SOAR), an international standard markup language for describing the structure of documents in electronic form. It is designed to meet requirements in a wide range of tasks, document processing and exchange of information. Labels SOAR give the opportunity to describe the documents in terms of their content (text, images etc) and logical structure (chapters, sections, figures, tables, and so on). In the case of larger and more complex documents in electronic form, this language also provides the ability to describe the physical organization of the document in the form of files. Language SOAR designed to be able to describe documents of any kind, simple or complex, long or short in a way which does not depend on the system, nor from its practical application. This independence allows you to share documents between different systems for different applications without distortion of meaning or information loss.

The Language Of The IRS, added to the usual textual information transmitted by a specific piece of text. In most cases this is done in the form of sequences of characters in different places along the full length of the electronic document. Each such sequence is distinguishable from the surrounding text using special characters that begin and end. Software can check the correctness of the entered text markup by examining the labels SOAR on request. This markup is generalized in the sense that it is not specific to any particular system or task. More detailed information on the labels of SOAR can be obtained from the International standard of ISO (ISO) 8879 information Processing - Text and office systems - Standard generalized markup language (SMGL) N ISO 8879 in 1986 (E).

Through the use of labels SOAR possible to implement the following tasks:

(1) the division of documents into passages or available for translation units. Software text editor 140 is used as punctuation and marks SOAR recognition available for translation units in the originally entered text (for example, label SOAR recognition headers , on the assumption that all words and sentences will be within a given limited language, it can never expect in advance in full extent, for example, in regard to the names or addresses, or class dictionary, which is impossible (or very difficult) to classify with high accuracy (for example, part numbers or alarm signals coming from the equipment). At the beginning and at the end of such elements is text, you can put labels SOAR to show the system that they are excluded from the scan,

(3) recognition of the document content (e.g., part numbers), as discussed above in paragraph (2),

(4) enable translation of parts of a sentence (e.g., highlighted passages),

(5) assisting in the translation tables (cell by cell) by specifying patterns of text. This task is similar to the task described in paragraph (1).

(6) assist the process of grammatical and syntactic parsing (which will be discussed below) using tasks(2), (3), (4) and (5),

(7) assisting in resolving ambiguity by providing a means to enter the invisible marks in the original text is for guidance on the correct interpretation of ambiguous sentences,

(8, rebaudi special treatment,

(9) providing tools to mark parts of the text as translatable. In other words, the proof that this text was the following process and represents devoid of ambiguity of the text in a limited language that can be translated without final editing.

In the past the authors have already created (using a text editor 140) electronic documents (text only - no graphics), which is a full "book". This refers to the situation when all the work is done one a contributing author, and the information obtained is difficult to reuse. However, the present invention allows to make (or create) a book or pamphlet (such as manuals, instructions or documents) from a variety of smaller fragments of information elements, which means that several authors writing can perform one work together. As a result of use of the present invention there is a high possibility of reuse of the created documents. Information element is defined as the smallest single fragment of the service data in a specific area. Should acitivity, the present invention allows to obtain a precise and unambiguous translated documents and without the use of information elements.

In Fig. 4 shows an example of information element 410, which includes a "unique" header 415, "unique" piece of text 420, "borrowed" (shared) graphics 430, "borrowed" table 435 and "borrowed" a piece of text 425.

"Unique" is the information that relates solely to that information element in which it is contained. This means that the "unique" information entered in the file as part of the information element 450.

"Borrowed" position (graphics, table, or piece of text) is that the information in the information element contains a "reference". The content of the "borrowed" information is displayed on the author uses the tool, but in the file information item 450 on it only "indicated".

"Borrowed" position different from the information items that they are not "free-standing" (i.e., by themselves they do not have enough information to Express the essential information). Each zaimstvovannye items are created from a combination of unique fragments or pieces of information (text and/or tables) and one or more "borrowed" items. Note how "unique" header 415 and "unique" text 420 are combined with "borrowed" graphics 430, "borrowed" table 435 and "borrowed" text 425. A set of one or more information elements constitutes the entire document ("book").

"Borrowed" positions are stored in the libraries 'borrowing'. The types of such libraries include libraries "borrowing" graphics a, library borrowing" tables 460b, library borrowing" texts 460c, library borrowing" sound (audio) information 460d and libraries "borrowing" video 460e. "Borrowed" item is stored only once. If this position is used to separate information elements information element 450 is entered only "reference" to the original borrowed position. This reduces the memory required on the disk. When the source position is changed automatically change all information items that contain "references" to this position. "Borrowed" position may be used in publications of any kind.

Next, a "borrowed information element is an information element used paleontology to create document sections 480 and 485.

All communication between the author and YAR 130 manages the user interface (UI) language editor (YAR), performed either as an extension of the standard tools editor SOAR, such as menu options, either in the form of a separate window. IE provides access to the controllers of OIA and to call the dictionary OIA and controls this access and is the primary tool that enables users to conduct a dialogue with the language editor (YAR) OIA. Although the term "user interface" is often used in a broader sense, meaning the access interface to the system software in General, in this description the expression has a more limited sense, and means the access interface to test the controllers OIA means call on the screen of the dictionary and means to disambiguate.

Among other things, here from the PI is required to provide a clear information in relation to: (a) undertaken YAR action, (b) the result of these actions, and (C) any resulting actions. For example, if you ever start by PI operation results in real-time to any pause, except for very short, the entrepreneur is obliged to inform the author about the possible delay in the form of a clear Bennie in a text editor 140. Offer options allow the author to run and track the progress check OIA (as vocabulary and grammar check), and to cause to view dictionary. The author can order either a start of scanning of the document on the screen, or to view the dictionary in relation to given them a word or expression.

IE shows clearly each situation identification in the text of any inconsistency with OIA. Possible techniques indicate a mismatch with OIA may be highlighting or drawing, or font size in the window editor SOAR. When this IE displays all known information regarding not appropriate OIA words. For example, where it is justified, IE displays a message indicating that this word is not included in OIA, but has in OIA synonyms and a list of these synonyms.

In cases where the message dictionary controller includes a list of alternatives for the selected word is not found in OIA (for example, alternative spellings or synonyms in OIA), the author gets the opportunity to choose one of these alternatives and be ordered to produce the document automatic replacement. In some cases the author will be forced scoreby it was introduced in the document in the required form.

When the author requests information from the dictionary, IE shows him alternate spelling, synonyms, definition and/or example of usage in the texts specified positions.

The author has the ability to quickly transition from controller to view the dictionary within their PIS. This allows the author to conduct an information search (e.g., viewing synonyms) in the process of changing the document for deletion because it is not relevant OIE words and expressions.

In most cases, the PI provides automatic replacement of missing in OIA vocabulary units on the dictionary OIA, and the author do not need to change the word from OIA to ensure its correct form in the document. However, in some cases, the dictionary controller (described below), which does not re-parse the document, is unable to determine the correct form of the word to enter into the document. Consider the following inscription in the text when the word "watch" is not OE, but has in OIA a synonym for "to see":

The direction of rotation of the crankshaft

(when observed from the side of the flywheel)

The controller of the dictionary may not know whether to offer as zamzummim solution would be to offer both options and let the author make between them is the choice of the correct form. Further, since there can be no assurance that in every case the author can be offered such a choice, which will allow him direct replacement, YAR 130 provides a list of possible replacements in the correct form, wherever it under power. Can be, however, cases where the author will be forced to edit the suggested word or expression from OIA before he gives the command to enter the words or expressions in the document.

Finally, PI YAR provides support in resolving the ambiguity of the meaning of sentences. He does this by offering the author a list of possible alternative interpretations, allowing the author to choose among them the correct interpretation, and then marking the text in order to indicate the author's choice.

B. the file management System

The file management system (suf) 110 serves as an interface of the authors with the library of release information elements (IE) 470 and with a text editor 140 SOAR. In General, the authors choose IE for editing, naming interface with suf the file in which this IE is. Then suf 110 starts the session editor SOAR this IE and manages the session. After that finished documents transmitted to humans-reduktorny language (OIA)

Given the complexity of modern technical documentation high-quality machine translation of unrestricted natural language is practically impossible. The main obstacles in this regard are the difficulties of a linguistic nature. A critical process in the translation of the source text of the original is to transfer its meaning in the target language. Because the meaning veiled text signals, these visual signals need to be analysed, and the point detected as a result of this analysis, is used in the process of generating signals in the target language. Some of the most difficult problems of translation arise from those inherent in the language characteristics that complicate the analysis and the generation of meaning.

Some of these characteristics.

1. Words that have more than one value in an ambiguous context - Example: be Made of material of high density. (What material you have in mind: "less permeable" or "heavier"?).

2. Ambiguous compound words

Example: Conservation.

(What is meant by: the nature conservancy or organization involved in the protection?)

3. Words that perform different intuitional flow in the pipeline.

(G) the Environment should flow without any turbulence.

4. Combination of words, each of which can have more than one syntactic function

Example: right and left here not pass away.

(Who here does not pass - conservatives and radicals (nouns) or right and left members of the ranks (adjectives - P)?).

5. The combination of the words in ambiguous constructions -

Example: Visiting relatives can be tedious.

(What or who here is tired of - fact visiting relatives or visiting relatives?).

6. The ambiguous use of pronouns -

Example: Lecturer crumpled the report, because he was...

(Who here is "he" - the lecturer or the report?)

In the illustrated difficulties reading there are problems generating the translated text, thus increasing the overall complexity of machine translation.

The size of the problems can be significantly reduced by any reduction of the range of linguistic forms, which are abundant in natural language. Abridged language or sub-language" covers a limited range of objects, actions and relationships in specifically delineated area. However, "sub-language" mo the Oia grammar. In a controlled situation, the strategy aimed to facilitate machine translation, is to limit as vocabulary and grammar "sub-language".

Restrictions on vocabulary, reduce its size by eliminating synonyms and control lexical ambiguity by giving the specialization allowed to lexical units, so that they possibly had only one value for each unit. It is easy to understand how those restrictions would allow to overcome the problems illustrated above in examples 1, 2 and 4. Grammatical constraints can simply deny techniques such as substitution of nouns pronouns (example 6 above), or be required to put in text meaning was made clearer either through expansion or through the repetition of information, which otherwise might seem to be excessive, either through a complete rehashing. The following example specifies the parameters of the applicability of such requirements.

The phrase unlimited and ambiguous English language you can understand triply (A, B1 or B2):

Clean the connecting rod and main bearings.

Devoid of ambiguity option "A" in the English language is neodnoznachnosti option "B1" in English: Clean the main bearings and the connecting rod. (To clear the main bearings and the connecting rod).

Devoid of ambiguity option "B2" in English: Clean the main bearings and the connecting rods (Clean main bearings and connecting rods).

Considering the above, the present invention restricts the author's writing documents within a reduced or limited language. Limited language is a "sub-language" of the source language (for example, the American varieties of English), designed for the specific application by the user. A more detailed General discussion of aspects of limited language set forth in the material Adriaens et al, From COGRAM to ALCOGRAM: Toward a controlled English Grammar Checker, Proc. of Coling-92, Nantes (Aug 23-28, 1992) (Adriaens and others, From COGRAM to ALKEGRA: towards controlled means test of English grammar, proceedings of the conference "Coling-92, Nantes (23-28, Aug. 1992), which here is referred to as a reference. In the context of machine translation as a task of limited language is the following:

1. to facilitate consistent copyright preparing source documents and to facilitate clear and unambiguous writing,

2. create the following one principle framework for source code, which would allow quick authors must follow to ensure that to grammar written their texts to meet OIE, will hereinafter be called "Grammatical restrictions OIA". Computer embodiment of the grammatical constraints of OIA used for the analysis of texts in OIA in block MP, will be hereinafter referred to as "Functional grammar, OIA", based on well-known techniques of formalization developed by Martin Kay and subsequently modified by R. Kaplan and J. Bresnan see Kay M., "Parsing in Functional Unification Grammar (Kay M., "Parsing in functional unified grammar") in the book edited by D. Dowty, L. Karttunen and A. Zwicky Natural Language Parsing: Psychological, Computational and Theoretical Perspectives (Parsing natural language: Psychological, computational and theoretical aspects), Cambridge, Mass.: Cambridge University Press, PP 251-278 (1985) and Kaplan, R and J. Bresnan, Lexical Functional Grammar: A Formal System for Grammatical Representation" (Kaplan, R. and J. Smith. Bresnan "Lexical functional grammar: "a Formalized system for grammatical display") in the book edited by J. Bresnan, The Mental Representation of Grammatical Relations (mental mapping of grammatical relations), Gambridge, Mass.: MIT Press, pages 172-281 (1982). Both these materials are mentioned as references.

In a subsequent part of this description we will often refer to the notion that slowh restrictions which will interfere with OIA and explain the use of the expression "in OIA".

One word or one phrase in English can have many different meanings, for example, dictionary of General type can lead to the following definition of the word "leak" (leak, leaking, leaks, and so on):

(1) verb: to prevent the descending order something through the gap or defect

(2) verb: to disclose information without official authorization or permission, as well as

(3) noun: a crack or hole that allows something out of the vessel or pipeline, or to enter him.

Each of these different values is called the "sense" of a word or expression. Different meanings of the same word or expression capable of creating problems for MT system, which has all the values that people use for understanding of the multiple values, what is implicit in a particular sentence. For many words, the machine translation system can eliminate some of the ambiguity by recognizing the parts of speech, which is the word used in this sentence (noun, verb, adjective, and so on). This is possible due to the fact that h is it was explained above with the word "leak".

However, in order to avoid those types of ambiguity, what MP 120 may not resolve, the terms and definitions of OIA strive to include only one meaning of a word or expression for each part of speech. Thus, if a word or expression is "in OIA", it is permissible to use in OIA in at least one of its possible values. For example, the author creates his text in OIA, may be permitted to use the word "leak" in his above values (1) and (3), but not in its value (2). If you say that word or expression is "in OIA", this does not mean that can be translated to all possible uses of that word or expression.

If the word or phrase included in OIA, all forms of that word or expression that convey assigned to it in OIA the value (or values) are also included in OIA. In the example above, the author is allowed to use not only the verb "leak" (meaning "to flow"), but the associated verb forms "leaked" ("leaked"), "leaking" ("flowing") and "leaks" ("proceeds"). If the word or expression that has a meaning of the noun is part of OIA, it can be used both in singular and plural. Should omethoate these considerations are less important in the case of ambiguous expressions.

A dictionary or vocabulary called the Foundation of the words and expressions used in a particular language or sub-language". So about a limited area, you can tell by dening a limited vocabulary, which is used for communication or transmission of information is relatively limited in this area of the volume of human knowledge or experience. As an example, limited area include agriculture, where limited vocabulary will include terms relating to agricultural equipment and activities. Bloc MP system works with more than one type of dictionary. Words and expressions for machine translation are stored in the vocabulary (lexicon) MP. The dictionary itself can be divided into different categories: (1) functional position, (2) the total content, and (3) technical nomenclature.

Words total content is used mostly to describe the world around us, their main purpose is to reflect the usual and common to all human experience and views.

In the typical case, however, the documents focus on a very specialized part of the human understanding and experience (for example, maicsa item includes words and expressions technical content, as well as a special dictionary for applications of interest to the user. Positions technical content we call here the words and expressions that are narrowly specific areas or applications. Most of the technical words are nouns used to refer to items such as parts, assemblies, components, machines or materials. However, in the technical composition may include other categories of words, such as verbs, adjectives or adverbs. Quite obviously, because such words are not used in common everyday conversations, they differ sharply from the words of General content.

Expression of technical content are phrases built from the words all of the above categories. These expressions represent the most characteristic form of the language of technical documentation. Special application user dictionary is part of the terminology that clearly contain the words and complicated terms established in the applied area of the user. This may include the following: names of products names of the documents used by the user acronyms and abbreviations, and numbers of forms of the activity in the field of documentation. When documentation is subsequently subjected to translations, this dictionary is an important resource for translation work. MP is designed to operate with most of the functional positions available in English, except for items related to very personal use (I, me, my, and so on), for use by gender (her, she, and so on) or the use of other pronouns (it, they, and so on). It also includes certain loans from common English words (such as "truck" (meaning "truck" or "leangth" (meaning "cut")). The vast majority of the components of the limited dictionary of the language consists of "special" (e.g., technical) terms of one or more words that represent objects and processes or operations in a particular area. To the extent that such a dictionary can convey the full range of concepts in a particular field, this dictionary can be called complete.

The development of a fairly simplified but complete dictionary makes a great contribution to the success of the system 105 ISAPP. Limited language pointing to the correct or incorrect use of the dictionary, provides a position in which you can create documents that the e position should reflect a clear concept and must be suitable for the target reader. Avoid terms that are signs of sexual orientation, jargonistic, idiomaticity, excessive complexity or technical mastery or have other traits that hinder communication. These and other common stylistic considerations, although they are not always absolutely required for processing, calculated at MP are, however, important starting points for creating documents in General.

It should be noted that, although the main part of the reasoning in this description related to a limited source language and/or language in General, focuses on the American version of the English language (i.e., American English), similar comparisons can be made with respect to all other languages. In the described system 100 is not laid fundamentally anything that would make her intended solely for the use of American English as the source language. Actually, the system 100 is not designed to work with American English as the sole source language. However, the database (for example, model applications, interacting with YAR 130 and CHM 120, will require changes is ographie standard American English. Standard American English spelling, such as "thru" instead of "through", "moulding" instead of "molding" or "hodometer" instead of "odometer" should not be used. Words with capital letters (for example, On - Off (On-Off), Value Planned Repair ("Smooth Repair")) can be applied only in cases when it is necessary to indicate a special meaning to these words. The same applies to non-standard capitalization (Brake Saver). Similarly, if you use abbreviations (ROPS, API, PIN), they should be listed and explained in the application specific user dictionary. The format of the digits, units of measurement and calendar dates must be uniform and consistent.

Search the position of limited language should also be used in accordance with their values in this restricted language. By doing so, the author provides a position in which MP will always translate any word using the correct values in the restricted language. As already mentioned, some of the words of the English language can belong to more than one syntactic category. In the limited language of all syntactically ambiguous words should be used in revolutions, depriving them of ambiguity.

technical terms from a specific application user dictionary, and

compound terms from more than one word.

It should be possible to avoid complex combinations of noun - noun (which in the English language, preceding the word performs the function of determining the words that follow). Instead, however, some items that are listed in the lexicon, MP can handle this important characteristic of the documentation. Note that the combination of type sushchestvitelno-noun, which is very common in the English language, may not necessarily be familiar with the administration of another language, and therefore the limitations on the basis of which there is a limited language will be different when using any the which the verb is used together with the preposition, the adverb or another part of speech. Because such a "particle" can often be separated from the verb additions and other expressions, this gives rise to difficulties in the processing of the source text by means of MP. Thus, the combination of verb-particle should be rewritten where possible. Typically, this can be achieved by the use of the verb of the same meaning, but one word. For example, you would use:

"must" or "need" instead of "have to" (all three verb forms indicate prescriptions),

"consult" instead of "refer to" (a reference to the source),

"start the motor" instead of "turn the motor on" and the other means "start the engine").

Wherever possible, you should use the full terms and concepts. This is especially important in cases where there may be a misunderstanding. For example, in the expression:

"To loosen the bolt to use curved key..." (in English curved key literally sounds like "key-monkey")

it is unacceptable to omit the word "key". Although the majority of technically knowledgeable people will understand the meaning and the word "key" (which in English technical texts sometimes omitted, and the key referred to simply as "monkey"), this expression must be made unambiguous in chaosmage, however, acronym or abbreviated terms should be rewritten in the form of lexically full expressions.

Consider another example:

"If the density of the electrolyte indicates that..."

In this case, the meaning would be more clear and full, if the idea is fully expressed:

"If the measured density value of the electrolyte indicates that..."

And, finally, in the following sentences, where a single word or expression is omitted, the underlined words added to make the meaning more clear:

Turn the ignition key to the off position and remove the key.

"Pull back (1) up and move the backrest to the desired position.

"Starting from an external source starter): the machine must not touch one another."

When such "gaps" are filled in, the idea becomes more complete, and correct the value of the transfer means ISAPP 105 is made more reliable. Translation errors caused by missing or gaps, are a common reason in the final edit. Accordingly, the gaps or spaces are allowed.

Spoken English often uses words very Shirokov the and. For example, words such as conditions, remove, facilities, procedure, go, do, is for, make, get (conditions, remove, remove, objects, designed for, to do, to take), etc., are correct, but often inaccurate.

In this example, the sentence:

"When the temperature reaches 32oF, it is necessary to resort to special measures of precaution"

the word "is" does not inform the reader, is increased if the temperature or decreasing, one of these two words would be in this case, more accurate, and in this case the text would be no less "readable".

In some languages, the differences in the words exist where they are not always available in English, for example, we say "oil, oil, oil and so on) and about lubricating oil and fuel or we say "fuel" (fuel) and about diesel fuel and other fuels. Similarly, when the word "door" (door to door) applied without context, it is not always possible to know what kind of door it is. The door of the car? The door to the building? The door to the apartment? In some languages call these doors in different words. Where possible, the English original shall be used for the full term.

B. Model applications

Based on the knowledge (information) machine translati the carrying value (meaning) of the lexical units and their combinations. Base information SMP must be able to represent not only the overall taxonomic scope object types, such as "a car is a kind of vehicle", "door handle is part of doors", "products should be characterized (among other qualities) property "made by", it must also embody knowledge about specific cases types of objects (for example, "IBM" may be included in the model region as noted case of object type "Corporation"), and cases types of events (potentially complex) - for example, the election of George W. Bush President of the United States is a marked case of complex actions "select". The ontological part of the knowledge base takes the form of multiple hierarchies of concepts, combined with one another through links, building a specific taxonomy (systematics), such as "is a" ("is (something)"), "part-of" ("part (of something)") and some others. We call the resulting structure a multiple hierarchy as valid, so the concept had multiple "parents" in every relationship.

The domain model or conceptual lexicon includes ontological model, not only the wear, properties-quality episodes and so on) that are used as "bricks" to build specific areas. The model of the world is relatively static and is organized as multiple interconnected network of ontological concepts. General principles for ontology development application world or "under-world" is well known from the literature - see, for example, Brachman and Schmolze, An Overview of the KL-ONE Knowledge Representation System (Brahman and Smola, system Overview display knowledge of KL-ONE), Cognitive Science, vol. 9, 1985; Lenat et al, Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks (Lenat and others, Use common sense to overcome the problems of fragility and accumulation of knowledge), Al Magazine, VI: 65-85, 1985; Hobbs, Overview of the Tacitus Project (Hobbes. Project overview Tacitus"), Computational Linguistics, 12:3, 1983; Nirenburg et al, Acquisition of Very Large Knowledge Bases: Methodology, Tools and Applications (Nirenburg and others, the Accumulation of very large knowledge bases: Methodology, tools and applications) Center for Machine Translation, Carnegie Mellon University (1988); all of these materials are mentioned here as a reference.

Ontology is a language-independent conceptual display of specific under-world, such as Troubleshooting and repair of heavy equipment or the interaction between a person whom one language to parse the source text when it is converted to text in the intermediate language (Interlingua) and when generating target text from the text in the intermediate language. The domain model should be detailed enough to introduce sufficient semantic constraints, is able to eliminate cases of ambiguity when parsing text, and the ontological model should provide a unified definition of basic ontological categories, which are the building blocks for descriptions in specific areas.

In the model world ontological concepts can be first divided into subjects, events, forces (introduced for taking into consideration unintended actors - agents) and properties (qualities). Properties can be further subdivided into relationships and characteristics. In this relationship get definition as pegging relationships (mappings) between concepts (for example, "belongs-to" is a relation, because it lays out the object in the set (*human*organization), while signs are defined as markup concepts especially in certain sets of values (for example, "temperature" is a property that lays out the physical objects in magnitude to half open scale (0,*), with calibration in degrees Kelvin). Concept, generally shown as slotted frame whose slots represent the capacity of any system on the basis of knowledge (information), not only a machine translation system based on knowledge. The domain model is a semantic hierarchy of concepts (concepts) in the field of translation. For example, we can define object*O-TRANSPORT-AGENT, including*O-WHEELED TRANSPORTATION MEANS and*O-TRACKED-VEHICLE, and the first will include*ON-FREIGHT-CAR,*On-WHEELED-TRACTOR, and so on. At the lowest level of the hierarchy are specific concepts), the corresponding terminology in OIA. We call this lower level I/MO (Core/Model region). In order to obtain an accurate translation, we must impose semantic constraints on the roles played by different concepts or ideas. For example, the fact that the role of the agent in action*E-the CONDUCT must be done by the person who is a semantic constraint imposed on*O-TRANSPORT-AGENT, and it automatically applies to all types of vehicles (thereby saving repetitive work on manual coding of each sample). The part of the Author's writing" in the model region increases I/MO synonyms, not vkluchen he or she is each of the information elements.

In Fig. 5 illustrates the conceptual domain Model (MO) used in the present invention. MO 500 is a representation of declarative knowledge in respect of the dictionary OIA used MP 120 and YAR 130. MO 500 consists of three different parts.

1. The core (I/IO) 510 contains all lexical information, which is needed as the analyzer 127 MP and YAR 130, in particular, the core includes all the lexical items (words and phrases) OIA, along with the associated schematic concepts, descriptions of the parts of speech, morphological information, etc.

2. Model region for MP (MP/MO) 520, which contains only the information that needs analyzer 127 MT. Model region for MP is a hierarchy of concepts used for nedvuznachno markup relations and semantic checks during translation. It includes selective restrictions on concepts and hierarchical classification of concepts.

3. Model region for JAR (JAR/MO) 530 contains information that is needed only for YAR 130 includes synonyms outside OIA lexical items OIA, the dictionary definitions of the lexical items OIA, as well as examples of usage of lexical items OIA.

The presence of the adopted core 510 speeds up operations refinement and expansion of OIA, avoids duplication of work units of the author's writing and translation, and provides human-readable structure to facilitate care system and its extensions.

So, the core 510 represent a lexicon containing both syntactic and morphological information in respect of terms (words and phrases) in the text, written in a restricted language. It is the Central source of lexical knowledge for analytical aspect of the process of automated machine translation (MT). Next, the core 510 is also used as the basis for JAR/MO.

The core 510 contains otdelno is significant, and the verb ("truck" or "truck" and "transport truck"), correspond to two entries. The entries in the kernel contain the following information:

the root (e.g., "truck");

part of speech (for example, N (noun));

morphological information (e.g., incorrect change word forms);

syntactic information (e.g., whether the noun is countable or countless);

key information: short definitions and text examples that illustrate different meanings and use of words, as well as an indication of the sense in which that word should be used in a limited language.

MO 500 is defined in three sets external read the man files, which can also be read in those operations that require access to this model. Since MP 120 and YAR 130 will be carried out in separate processes or operations, the information in this model are represented internally in two forms: one for those parts of the MOD, which requires MP 120, and the other for the part that you want to YAR 130. Thus, the core 510 is defined as a set of files that can be represented in two forms, the block YAR/MO 530, not only the who describes the formats of external files the contents of the various parts of the MOD and the internal representation of information used YAR 130.

It should be reiterated that the core contains the information required for MP 120 and YAR 130. It includes lexical position of OIA - basic word, expression, or quote the word and the semantic concept and semantic concept associated lexical item represented in the lexical entries via name concept". Further, there is a part of speech is one of a fixed set of parts of speech (for example, the verb to be attached, and so on ), the definition of "rough" definition of terms from the General dictionary, lookup, what is of several possible values can have this lexical unit of OIA and "wrong" morphological variants - list of irregular morphological forms and the name of the desired morphological transformation for each of them. Examples of items morphological transformations for verbs (English language) are a form of the past tense", "third person singular present", "present participles" and "form the past participle". For example, values for this field for " this verb is incorrect, and all other forms correctly. And, finally, the core includes typographical limitations - for example, if a given lexical item must be all capital letters, indicating that the first letter must be uppercase, and so on

Bloc MP/MO 520 contains information that is only required to block MP 120. This information includes: the selection restrictions of concepts and hierarchical classification of concepts for the organization and continuity of selection restrictions.

Block YAR/MO 530 lists the synonyms are not included in OIA as assist authors in the selection of permitted lexical items from OIA. Taken together, the core and YAR/MO contain all the information and all the restrictions that are required for characterization of the lexicon OIA as a support for vocabulary controller YAR (described below). YAR/MO contains additional information required only for vocabulary controller YAR. It includes: the dictionary definition - i.e. the definition of the word or expression, which will be shown to the author language editor (YAR), synonyms, are not included in OIA - i.e. synonyms for lexical items OIA that authors can use to write documents, as well as examples of usage - i.e. prcu inclusion of this information in the RAVINE/MO is to assist the authors to help ensure that, so that the text was composed of allowed words and expressions, OIA. Definition from the dictionary and usage examples will help authors to ensure that they use the word and expression as permitted in OIA parts of speech and resolved in OIA value, however, dictionary definitions and examples of use are not required for all lexical items OIA. Rather, they are consumed only for a small percentage of ambiguous or unclear terms, whose value in OIA may not be obviously apparent to the authors. Most likely, it will be less than half of the lexical items in the MOD. For example, such functional words, such as "for" (or "the" (the definite article) does not require definitions or examples, can also not be required definitions or examples for many of the technical terms, especially terms with very specific technical meaning.

Not included in OIA synonyms entered in the RAVINE/MO, will help the author who wrote not included in OIA word or phrase, choose to replace synonymous or similar word or expression from OIA. It is desirable that the dictionary controller gave out information not only about the synonyms, which is the same part of speech, ogle would help authors to paraphrase their proposals. If the latter is not included, YAR/MO should contain information regarding these related words in addition to its mandatory content.

, Language editor

As shown in Fig. 1B, the editor of the limited language (YAR) 130 is a set of tools to support authors and editors in the creation of documents within OIA. Such tools will help the author to use the correct vocabulary and grammar, OIA when writing documentation. YAR 130 is constructed as a "continuation" of the text editor 140 SOAR. Although YAR 130 uses the same communication channels as the text editor 140 SOAR, the functions of these two editors are mutually exclusive. However, the user interface used for the dialog interaction with YAR is a seamless continuation of the" interface text editor SOAR.

The author 160 creates documents in a text editor 140 SOAR and causes YAR 130. YAR 130 informs the author about those distinct words in the document that are not included in OIA, and is able to offer synonyms from OIA for those words that are relevant to the application information of the user area, but not OE. In addition, YAR 130 informs the author about luchot the following components: vocabulary controller, grammatical controller, which includes the interface with the parser, MP, providing a key feature grammar check, and the user interface (UI). Besides, dictionary information, OIA used YAR, AIA, is also present in the nucleus and in the RAVINE/MO.

The ultimate goal of the YAR is the evidence that all of the vocabulary and structure of sentences in the document correspond to the technical conditions on OIA. In this case, YAR 130 marks the document with tag SOAR, indicating that the confirmation in respect of OIA. The examination must cover the full text of the document that includes the following components: proposals, titles, positions in the lists, caption and mariannae signature in graphical information, as well as information in tables.

Since the present invention assumes that the productivity of authors should be possible for check OIA and that authors should not be forced to work on the author's writing of several documents at the same time, the batch form in which the user is required to submit a document for processing, and then wait until the entire document is finished, prid Ecevit interactive communication with the authors when checking dictionary when checking grammar and during the disambiguation dialog mode.

In Fig. 6 shows a structural diagram of the high level working YAR 130. YAR 130 accepts the document 605, which could potentially be ambiguous and indefinite. This potentially ambiguous and unlimited input text 605 is first subjected to testing vocabulary controller 610 that performs its functions (as described above) with the assistance of the controller spelling 615. (Spelling checker presented in this embodiment of the invention the controller of the spell, which is usually present in his TR 140 - "the owner"). After the dictionary controller 610 made his check and made all the necessary corrections (with the author), the resulting lexically restricted text 617 is transmitted grammatical controller 620. This grammatical controller 620 provides a syntactically correct text 625 within OIA. Then this limited syntactically correct text 625 subjected to resolve the ambiguity, as shown in block 630. The result of disambiguation is suitable for transfer devoid of ambiguity and agrobiodiversity in the draft. The accuracy of the subsequent translation also eliminates the need for the final version.

1. Vocabulary controller

In Fig. 7 shows a block diagram of the working vocabulary of the controller 610. Vocabulary controller 610 performs found in the original text the words are not from OIA and helps the author to find in OIA allowed replacement of the missing in OIA words. He recognizes the boundaries of words in the document and identifies each case appear in the text of lexical items, which are not known as being in OE.

As shown in block 706, the text unit is selected to test the first term. Then this term is checked, as shown in block 710, in comparison with lexical database OIA (i.e. dictionary) containing all included in OIA words. If used, the term is not found in the dictionary OIA, this term is then subjected to a spelling checker according to the standard dictionary, as shown in block 722. If this word was written with a violation of spelling, the author provides a means of correcting spelling errors (i.e., vocabulary controller 610 displays alternative spelling), as shown in block 726.

Then this position is reviewed by the CIO in block 718. If, however, this position is not in the dictionary of OIA, the system checks whether there are any HOLES/MO synonym scanned position, as shown in block 736. If YAR/MO found at least one synonym, the system displays the synonym or synonyms, which is part of the dictionary OIA, and allows the author to make the appropriate selection, as shown in block 738. However, if YAR/MO does not have any synonym scanned position, the author has the opportunity to process your input text, as shown in block 740. The result of such processing again goes to block 710. As soon as the author makes selection is allowed, the operation 700 enters the next block 718.

When identified in the text are not included in OIA word, the author offers the following choices: he or she can choose from the suggested alternative words word and replace them used in the document, or he or she can enter a new position and replace it in the word document. As a rule, to replace missing in OIA position the author chooses one of the options offered to him synonomous. If the author decides to generally be away from solving the problem, such inaction will lead to the fact that the text's author is verified as sotetsu no, operation 700 stops. Otherwise, select a new term, as shown in block 714, and the operation 700 begins again from block 710.

In particular, the dictionary controller 610 detects each case the availability of the proposed text of a lexical unit, not known as part of OIA. For each such word dictionary, the controller 610 determines which of the following descriptions apply, and reports supporting information to the user interface, as shown below.

Missing in OIA word with OIA known synonyms, in this case the dictionary controller (SC) 610 identify those synonyms. For example, suppose the word "let" (allow, to let, to let, to let, and so on) are not included in OE.

Enter the author of the text, to be validated: "Open the faucet and let the additional amount of nitrogen in pneumotachometer". Message BCC: the term "let" is not OE, but in OIA is related alternatives.

Alternatives from OIA: let, permitted, permit, permitted, left, left, allow to enter.

Edited the sentence to OIA: "Open the faucet and allow more nitrogen to enter pneumotachometer".

Stacom allowed the expression in that context, proposed by the author, in this case the dictionary controller 610 will inform the author allowed in OIA expressions containing the word.

Enter the author of the text, to be validated: "When the first check valve clearance must be checked synchronization injection".

Message BCC: This term is used in context, not in OE.

Alternatives from OIA: synchronizes the signal timing, groove synchronization timing, gear synchronization, the synchronization mechanism.

Edited the sentence to OIA: "When the first check valve clearance must be checked, the synchronization engine injection".

Word or expression in OIA should be applied in double quotes, but in the author's context is not enclosed in quotation marks; in this case the dictionary controller 610 will inform the author that this term should be in quotes.

Enter the author of the text, to be validated: "More of this is discussed in the Chapter Tests and Adjustments in the following section.

Message BCC: This term in OIA in the normal case, is quoted.

Alternatives from OIA: none.


Word or expression in OIA must be specifically listed in capital letters, but which, as it is used by the author, these mandatory capital letters are missing (for example, when writing in lowercase letters are allowed in OIA acronym), in this case the dictionary controller 610 will inform the author of the correct shape or form allowed in OE.

Enter the author of the text, to be validated: "Turn the screw until the reading of the manometer is not equal to 0 kPa".

Message BCC: This term in OIA has a mandatory capital letters.

Alternatives from OIA: kPa.

Edited the sentence to OIA: "Turn the screw until the reading of the manometer is not equal to 0 kPa".

A non-existent word (i.e., the group of letters representing a misspelled word), which has a alternative of writing, in this case the dictionary controller 610 will identify alternatives to writing, regardless of whether a selected word are available in OIA (because the user will present the chosen alternative for further investigation).

Enter the author of the text, to be validated: "When you want to raise the boom, for

Alternatives from OIA: required.

Edited the sentence to OIA: "When you want to raise the boom, it must be ensured reliable support".

A word that is not in OIA and in respect of which the system knows nothing. The message is completely unknown word or expression provides the author with an opportunity either to modify his piece of text, or "hedge" unresolved expression from the test, if necessary. In the example below, the author uses the label of SOAR to tell the system to ignore unacceptable for her turn and leave it the way it is.

Enter the author of the text, to be validated: "Pour approximately 0.9 liters (1 quart) of oil for hydraulic system SAE10W in the nitrogen part of pneumotachometer".

Message BCC: This term is not known.

Alternatives from OIA: None.

Edited the sentence to OIA: "Pour approximately 0.9 liters (1 quart) oil hydraulic SAE10W in the nitrogen part of pneumotachometer".

A punctuation mark or special character that is not allowed in OIA in any context.

In cases where missing in OIA word ecumene), the system is able to identify related words or expressions in OIA that the author could have used to Express his thoughts. This functionality is provided authors with additional support rephrasing the sentence so that it included only dictionary OIA. However, the changes using these interrelated words can not be completed by the replacement funds provided for synonyms, since such changes require certain changes in sentence construction. For example, if the word "may" is in OIA, and the word "can" or "able" - no, the author who wrote this sentence:

"This system is capable of programming multiple parameters defined by the user will be informed that the word "capable" ((capable)) is not the word of OIA. Although the word "may" ((can)) included in OIA, neither the word "able" or the expression "capable" can't be directly replaced by "may" without having to make further changes to the proposal.

2. Grammatical controller

The task of the grammar of the controller is to identify places where the author's text does not correspond to the grammatical constraints of OIA, and accommodates the MT 127 MT system 120, advanced so that the system could inform the author about the cases of syntactic or semantic ambiguity. Interface grammar controller allows the author to respond in dialog mode to requests for clarification of ambiguities. Interface grammar controller gives the author some indication of two or more possible values of the proposals and to ask for clarifications. An example of an ambiguous sentence can be: "Check the cylinders inside." Are themselves cylinders inside the node or it is assumed that you want to check the inside of these cylinders? There are two types of possible manifestations of ambiguity.

Lexical ambiguity. Cases of lexical ambiguity occurs when a word has two or more values in a restricted language. Although it is desirable that in a limited language, each word had only one meaning for each part of speech, there are some words that inevitably can have more than one value.

For example, the word "gas" can have the values "gas", "natural gas" and "gas".

Further, at the lexical level, problems can be called the word, which in OIA can act in two different syntactic who may in the English language can be a verb "to heat" or act as a noun in the function of the attached "fuel"). When the author introduces in the text of the proposal, where this syntactic role is not quite clear. Grammatical controller (CC) 620 can give the author the following prompt:

Enter the author of the text, to be validated: "Sensor mounted on the fuel ("fuel") rod".

Message CC: This term can be used as a noun (also used as an adjective or as a verb.

At this stage, the author provides the option of editing suggestions without assistance from the system (to do this you just need to rewrite the proposal and enter it again for verification by the controller). If the author chooses to access the system for help, the system can provide him with specific instructions for resolving these types of issues. In this case, the proposed use specific:


Message CC: If this word is a noun, you might want to give him a decisive word. If it is a verb, can help defining the word after him?

Example: "hull streaming" or "Should flow the hull?".

Then the author starts editing suggestions and then offers it to the consideration of grammatical counter is the statement can be combined with one another. For example: Remove the valve with the lever.

(In English this phrase can mean either "to remove the valve with lever and remove the valve using the lever"). United do in the semantic relation expression "with the lever to the noun "The valve" or is it, on the contrary, combined in a semantic relationship with the verb "remove"? In other words, whether it is in this proposal to lift the valve, which is fixed to the lever, or we are talking about removing the valve using the lever?

Element ISAPP 105, which is designed to give the answer to the question, is the model region (Mr) 137, which is designed to minimize the possibility of such ambiguity.

As shown in Fig. 5, a block MO/MP 520 that supports exclusively the process of machine translation, contains two types of information. On the one hand, semantic information (A) provides the relationship between the concepts. On the other hand, contextual information (In) specifies for a particular verb in its so-called "deep mate" and the control words with which it may be consumed. In our example, let's first establish how semantic information (A) and to"Remove the valve with the lever.

Among the many types of semantic relationship relationship "is part of the" applicable, for example, to the relationship between the concept of "hat" and the term "clothes", because "hat" is the part of "service". The same relationship applies to the concepts of "sole" and "shoes", "heel" and "shoes" and so on Semantic information (A) entered in MO/MP 520, recognizes this and similar relationships between the concepts in this limited area.

When the process performed by the analyzer MP 127 addresses in MO/MP 520 for semantic information in terms of relationships of the concepts of "valve" and "lever" that is stored in the MO 137 information will not prompt the analyzer MP 127 whether the "arm" part of the "valve" - for the simple reason that no information about such a relationship is not there. Thus, the analyzer MP 127 still doesn't know whether to bind the expression "arm" to the word "valve".

But when the analyzer MP 127 refers to contextual information (B), he finds out from her that the verb "remove" ("remove", "remove") can be combined with three cases of nouns: nominative (IMA), accusative (WINES), and instrumental (TWT) (however, on a deeper level of analysis than that which is key cases:


On the basis of the abstract schema we can build suggestions of the type described at the end of the description.

Because MO/MP entered information regarding the combination of the preposition "with" in English can mean "with ", "by", but also means simply the preposition "with") with nouns having the semantic feature (+TOOL), this combination creates an instrumental phrase. This information allows the analyzer to determine that:

a) because the "lever" ("arm") is the sign (+TOOL), the expression "with the lever is TWT,

b) because "remove" can be used with the primary objective case, the expression "with the lever connected with "remove" refers to him and adverbial complements it.

However, MO is rich only so far as we do it will create. In cases where semantic information is not developed as fully as possible, the lexical items in a limited area may not be able to provide a process of disambiguation performed by the analyzer machine translation 127.

Let's look at the case of the "nail" in the sentence "Peter took a box of nails". If MO 137 contains information about the pow wer tool), then the analyzer 127 cannot determine whether the combined preposition "with" with the word "nail", forming instrumental phrase. As the analyzer itself can not resolve structural ambiguity, he will ask the author to solve it. So, when presented by the author, the text goes through a grammar check, takes place in the following interaction.

Enter the author of the text, to be validated: "Peter took a box of nail/nail".

Message grammatical controller (CC) 520: the Proposal does not clearly.

1. Whether a nail gun?

2. Whether the "nail" part of the "box"?

As soon as the author will make the selection of the read sense controller will give this proposal an invisible label SOAR, which will tell the system exactly how it should be translated.

As mentioned above, the analyzer MP 127 is called grammatical controller to check whether the entered text or IE (or part of) the grammatical and semantic constraints of OIA. In this regard, in the preferred embodiment of the invention the response for each of the analyzed sentence is a clear message "green light/red light", and the last with the funds made available to him. When either all of the entered text, or consider IE fully tested for compliance with OIA, it can be either transferred to memory or sent for immediate translation.

In Fig. 8 shows a block diagram of a high level of grammatical controller 620 (parse) and controller on the ambiguity of (semantic analysis). Hereinafter in this description, the word "proposal" means a unit of text that either passes or fails the analytical module 127. This scanned text unit may be part of the text without quotes, such as title, subtitle, the position in the list attached with signature or narisokonai inscription.

Grammatical controller 620 recognizes the boundaries of sentences and border elements SOAR in the text, marked with tags SOAR. It identifies each proposal that does not comply with the specified conditions of OIA, including every sentence, which cannot be satisfactorily disassembled analytical module MP 127. The parsing may not happen for the following reasons, but not limited to.

The proposal contains grammatical structures that the parser MP 127 not able to parse. So Z be the result of a deliberate pass (in English) the relative pronoun "that" (", -d, -s, etc.) and the appropriate form of the verb "be" ("be") in a sentence such as: "Don't change the values that are programmed into the unit" ("Not to change the values that were programmed in this node").

Enter the author of the text, to be validated: Don't change values programmed into the unit ("Not to change programmed into the node value").

Message grammatical controller (CC): This sentence is difficult to parse.

Please check one of the following problems:

Then grammatical controller 620 lists the typical and most common situation, when the analysis is hampered or even made impossible the use of grammatical structures, not included in the "repertoire" OE.

The use of punctuation marks in the sentence does not conform to the restrictions of OIA. As mentioned above, punctuation marks and special characters, which are not included in OIA will be in any context highlighted vocabulary controller 610. However, the dictionary controller 610 does not parse text units, and because he could not report on cases where such sign exists in OIA, but used in the wrong context. Such a case will be called with the in syntactic form, which for this words are not allowed in OIA. Vocabulary controller 610 will identify some of these cases: for example, if the word "test" ("test", "test") included in the dictionary OIA as a noun but not a verb. Vocabulary controller will report that the past form of "tested" ("had") is not included in OIA. However, the dictionary controller 610 will ignore this verb in the third person present tense of "test" ("experiences"), since this form is identical to the plural "tests" ("tests") are allowed in OIA noun. In this case, the grammatical controller 620 reaction is "failed".

Grammatical controller 620 uses the analyzer MP 127 (and the model region 137) to identify sentences that are not relevant grammatical restrictions OIA is called parse and illustrated by block 805. For each of these types of grammatical sentences controller 620 reports that the proposal does not comply with OIE. It is also possible that the proposal meets OE, but not unequivocally. For this reason, the present invention includes semantic analysis, as shown in block 710. If the sentence is not semantically unambiguous,x values, and will require clarification as shown in blocks 815 and 825. In a preferred embodiment of the invention, if the proposal does not pass the validation grammatical controller 620 and/or the controller on the ambiguity 630, the author presented the following options: edit the document if it is ambiguous, to resolve the ambiguity in the offer or continue the scan without editing.

Note that the present invention provides an absolute adherence to the restrictions of vocabulary and grammar, and not just stylistic hints or easy identification of obvious errors (such as incorrect subject and the verb).

If the proposal is clearly semantically, it is translated into an intermediate language Interlingua, as shown in block 820. Once the document is authenticated grammatical controller 620, label SOAR indicating the confirmation of the conformity of OIA may be included in this document.

In a preferred embodiment of the invention grammatical controller 620 generates 160 author feedback "is"/"not pass". However, you can provide direction for the author more to the political test, including the elimination of ambiguity discussed in the materials Tomita M., "Sentence Disambiguation by Asking" (Tomita M. , "dealing with ambiguity in sentences by asking questions") in the journal Computers and Translation (Computers and translation). 1:39-51 (1986), and J. Carbonell, M. Tomita, "Knowledge-Based Machine Translation" (Carbonell j. and M. Tomita, "knowledge Based machine translation") in the collection of Machine Translation: Theoretical and Methodological Issues (Machine translation: Theoretical and methodological issues) edited by S. Nirenburg, Cambridge: Cambridge University Press, PP 68-89 (1987) mentioned herein by reference.

D. Machine translation

MP 120 is a system of machine translation based on an intermediate language Interlingua. In such systems limited source language (OIA) and target or target language can never come in direct contact. Word processing in such systems is usually in two stages. First, the expression of the meaning of the text on OIA in formal intermediate language that is independent from any living language called "Interlingua", and, secondly, the expression of this meaning using lexical items and syntactic structures of the target language.

A machine translation system based on the interim the region. A detailed description of these different approaches to machine translation can be found in materials such as Hutchins, Machine Translation: Past, Present, Future (Hutchins, Machine translation: Past, present and future). Ellis Horwood Ltd., Chichester, UK, 1986 and Zarechnak, "The History of Machine Translation. (Zarechnak, "the History of machine translation" in proceedings of Machine Translation, Trends in Linguistics: Studies and Monographs ("Machine translation. Trends in linguistics: papers and monographs) edited by Henisz - Doctert, McDonald, Zarechnak, The Hague, Mouton, 1979; both these material are mentioned in the reference.

The meaning of the text 350 on OIA is expressed in a specially designed circuit expressions of knowledge, called "Interlingua" (intermediate language), which is well known to specialists in this field. Interlingua, in turn, is expressed in the master system of record and therefore can be considered as a kind of semantic network. Like other artificial or formal language Interlingua has its own vocabulary - the vocabulary and its own syntax. Vocabulary based on the area from which are taken the texts to be translated (for example, care for computers, study space, etc). Thus, the "nouns" in Interlingua PfP" in the ontology, and adjectives, and adverbs in Interlingua are different "properties" defined in the ontology. This ontology forms a closely interconnected network of concepts of various types, called domain model.

As can be seen in Fig. 3 and Fig. 9, the system of machine translation (MT) 120 as an integral part of ISAPP 105 consists of two main sections. The first of them - analyzer MP 127 - performs a first processing stage for the expression of the text, written in OIA, in Interlingua. The second main section - generator MP 123 - transfers denominated in Interlingua "tested on OIA" texts in the target language (e.g. French, Japanese or Spanish). Performing both of these tasks, the unit MP 120 operates as one or more independent service modules, the host application for transfer from a person, the head of the translation (not shown). During the generation of the text in the target language generator MP 123 lays out the text 260 on Interlingua for the relevant units of the target language to obtain a high quality output text 950, does not need final editing.

Once the analysis module MP 127 developed text 260 on Interlingua for proven and appropriate OIA informationline transferred to the IE in the target language or in different IE for several target languages using a generator MP 123, includes marker from semantics to syntax and generating set (see Tomita M., E. Nyberg. The Generation Kit and Transformation Version of 3.2 User's Manual, Technical Memo (1988) (Tomita, M. and E. Nyberg, user Manual set for generation and transformation. Version 3.2, a Technical note (1988)), which can be obtained from the Center for Machine Translation, Carnegie Mellon University, Pittsburgh, Pa (Center for machine translation Carnegie Mellon, Pittsburgh, pieces Pennsylvania). The analyzer MP 127 and the generator MP 123 interact in two ways. First, the output of the first is the input of the second, and then they share some information from external sources, particularly from the model region 137.

Machine translation (MT) 120 subdivided as shown in Fig. 9. Analytically it consists of a block parsing 910 and block translation 920. The second half of the MP 120 can be subdivided into the Plater 930 and generator 940. Oval circles in Fig. 9 indicate the data that are produced and exchanged between the main modules of the software.

MO 137 (especially MP/MO 520) is used in the translation process in three different ways: (1) block parsing 910 uses MO 137 to limit possible prisoedinenie is a), (2) translation unit 920 uses MO 137 for placement of the relevant concepts of this area in the course of translation, and (3) the Plater 930 uses MO 137 to select the appropriate implementation in the target language of each of the notions expressed in Interlingua.

MP 120 works in the form of one or more service processes. Each such process MP accepts orders for transfers from suf 110 and returns the result. Requests include text on OIA labeled SOAR, and the results include the translation in the target language also marks SOAR. Because the translation into multiple languages can occur at the same time, the query also includes indicating the desired target language. Further, since the processes in the service device MP is specialized in the target language, there is a need for routing functions. This routing function is automatically performed suf 110. The exact set of processes MP at any given time and their distribution among different computing devices or machines are defined suf 110, which corrects the interaction of hardware depending on the set of the translation work is ordered, but not yet completed at that exact moment.

As can be seen in Fig. 9, the analysis is called in this region as "the interpreter of the rules of the markup. Block parsing 910 receives input text 305 on OIA and produces a syntax structure diagram. Block parsing 910 uses a grammar type FLG ("lexical functional grammar"). FLG is a formal grammar, well known to the specialists in machine translation. The resulting structure is an f-structure (functional structure) FLG 960. As soon as the f-structure for the sentence on OIA 960, translator 920 begins to apply to it the rules of the markup to replace lexical items and syntactic structures of the source language in their translations for Interlingua. Lexical units are laid out in their refraction in terms of the scope (for example, the word "data" will be marked in the "information" on Interlingua), while syntactic structures are laid out in the conceptual relationships (subject sentences often are laid out in an expression of the type "agent" or "contractor" Interlingua) - see material Mitamura, The Hierarchical Organization of Predicate Frames for Interpretive Mapping in Natural Language Processing (Mitamura, Hierarchical organization of part of the widening for markup for translation during natural language processing). Center for Machine Translation, Carnegie Mellon Univers is key.

Analyzer machine translation 127 sent analytical information (data files), translates the input sentence of the text on AIA 305 on the source language into the framework of the semantic expression of the meaning of the sentence. The structures of information that are selected to influence at the stage of analysis are analytical grammar, the rules of the markup and vocabulary concepts.

The first part of the analysis is the operation of parsing, syntax directed analysis of the input sentence. Block parsing 910 uses semantic constraints embodied in the vocabulary of concepts (i.e., in a domain model) as guidance when determining its approach to cases of syntactic ambiguity identified in the analysis of the input information. Rule markup perform the role of mediator between the grammar parser and vocabulary (lexicon) concepts.

Output of this analysis are syntactic f-structure containing all applicable semantic information. This structure may be further processed second portion of the parser MP 127 to obtain a semantically organized framework of an expression in the form of separation required by the Orme by finding semantic criteria f-structure, moreover, these signs contain all relevant semantic information.

Block parsing 910 used in the present invention, well-known specialists in this field and are described in detail in materials Tomita, Carbonell, The Universal Parser Architecture for Knowledge-Based Machine Translation, Technical Report (Tomita and Carbonell, the Architecture of the universal unit parsing based on knowledge of machine translation. Technical report).

Center for Machine Translation, Carnegie Mellon University

(Center for machine translation Carnegie Mellon), may 1987, and The Generalized LR Parser/Compiler Version 8.1: User's Guide, Technical Memo, Tomito (ed.) et al.

(Generalized block language parsing and compilation, Version 8.1: user manual. Technical note, Ed. by Tomita and others). Center for Machine Translation, Carnegie Mellon University

(Center for machine translation, University of Carnaghi-Mellon), April 1988, which are mentioned in the reference.

One of the benefits transfer systems using intermediate language Interlingua over other MT engines is the fact that the unit Interlingua 260 does not depend on natural languages, i.e., either the source or target languages never directly in contact with each other. This allows you to create native system, the cat is hiliterow structure. From the above it is clear that any such system must be able to parse multiple source languages. For this reason you want a universal unit of analysis, capable of receiving the grammar of the language at its input, instead of embed grammar in the translation unit as such. This will make the system better extensible and more generalized.

In other words, when we are dealing with multiple languages, linguistic structure ceases to be a universal invariant, portable for all types of application (as in the case of blocks to parse exclusively in English), but, rather, becomes a new dimension of parameterization and extensibility. However, semantic information may remain invariant from one language to another (although, of course, not from one area to another). For this reason, it is essential to keep the sources of semantic knowledge separately from syntactic sources, so that when you add a new linguistic information, it stretched across all semantic field, and adding a new semantic information it could be distributed to all appropriate languages. Universal is of indicators of efficiency of use of computer time or semantic precision.

Block parsing 910 is using three types of sources of knowledge (information). One of them contains syntactic grammars for different languages, other bases of semantic information for different areas, and the third set of rules for marking syntactic forms (words and expressions) in patterns of semantic information (knowledge). Each of syntactic grammars are not dependent on any particular region; similarly, each of the bases of semantic information does not depend on one specific area, then, similarly, each of the bases of semantic information does not depend on any particular language.

Further, the rule markup depend on language, and region, and therefore, for each combination of language-region creates a different set of rules markup. The syntactic grammar, database information by regions and rule markup is written in a very abstract form, available to read for humans. Such a construction facilitates their expansion or modification, but probably is not effective to block parsing from the point of view of computer time.

The function of the translator 920 is to generate a parser and SEMA is rirovanie these structures.

Block parsing 910 produces all possible, i.e., having a right to exist, f-structures, which can be obtained from the parsed sentences. Each of these syntactic f-structure has semantic features, and in accordance with theory FLG these characteristics are created at the same time as the rest of syntactic f-structure. Thus, the semantic component can be considered as an additional sign of f-structures.

Thus, the semantic component is the "visible" part of the parsing. This approach of creating semantic and syntactic structures gave rise to a system that promotes excluded from the operation of "meaningless" partial analyses before they were completed. Semantic aspects are added to the syntactic structure, when the lexicon (vocabulary) ask for a definition of the word. The second part of the definition of each word is a set of rules and structural markup. These rules markup used when parsing equations in grammatical rules exacerbate the fragility of syntactic structure.

The generator target language 123 as part of the overall system receives input text on Interlingua 260 and proago semantic module and one of the syntactic module. Semantic module performs the function of lexical selection in the target language, and syntactic module selects the syntactic structures of the target language, in performing these tasks help them respectively lexicon generation and rule markup structure generation. The output of this component is the f-structure of a sentence in the target language, and this is the output of the system as a whole.

So, the goal of generating block is receiving sentences in the target language from part (frame) of the text on the Interlingua 260, issued by the analyzer OIA 127. The generation of the target text includes three main stages.

1. Lexical selection.

For each concept in the intermediate language Interlingua you want to choose the most appropriate lexical item.

2. The creation of the f-structure.

From part (frame) of the text on the Interlingua must obtain functional syntactic structure that defines the grammatical structure of the target implementation.

3. Syntactic generation.

Functional syntactic structure is processed by the grammar generation for receipt of proposals in the target language.

Kibera with the paradigm markup-generating, which has already been used in earlier systems of translation.

For more detailed consideration of the issues in machine translation, as well as specific construction and operation described above modules, you can refer to such materials as Nirenburg and others, Machine Translation: A Knowledge - Based Appoach (Machine translation: an Approach based on knowledge), Morgan Kaufmann Publishers, Inc. (1992); Sommers and Hutchins, Introduction to Machine Translation Introduction to machine translation, Academic Press, London (October 1991); Mitamura and others, An Efficient Interlingua Translation System for Multi-lingual Document Production (Effective translation system using Interlingua to obtain documents in many languages), Reports on the III summit of machine translation. Washington, CA. Columbia (July 2-4, 1991). Nirendurg S., "Word Knowledge and Text Meaning

("Knowledge of words and meaning of the text) in The KBMT Project: A Case Study in Knowledge-Based Machine Translation (Project ASMP: case Study of knowledge-based machine translation (OSMP). San Mateo, Calif.: Morgan Kaufmann Project Report SMR-89 can be obtained from: Center for Machine Translation, Carnegie Mellon University Center for machine translation Carnegie-Mellon University), Pittsburgh, pieces Pennsylvania (phone (412)268-6591), 4th edition: March 1990, Machine Translation: Theoretical and Methodological Issues

(Machine translation: Theoretical and methodological issues, Cambridge: Cambridge University Press, PP 68-ASS="ptx2">

Proceedings of the IEEE on circuit analysis and machine intelligence, so RAM-3, N 4 (July 1981) mentioned herein by reference.

Although the present invention has been specifically described and shown with reference to preferred solutions, specialists in this field will understand that these embodiments can be made various changes in form and detail within the General intent and scope of the present invention.

1. Integrated system of training and translation (105) for the preparation of a document in one language, containing a processor, including a text editor (140) for receiving from the author (160) in interactive mode input text in a source language, a language editor (130) in the form of a continuation of the text editor (140) that is designed to blend in interactive mode, lexical and grammatical constraints in a subset of the natural source language of the input text used by the author (160) to create the input text, ensuring the author using in interactive mode in the imposition of lexical and grammatical constraints constraints on the input text to obtain devoid of ambiguity, limited text, machine translation system (120), usamodafinil language, and the model region (137), interacting with the language editor (130), and the model region (137) is for reporting predetermined information from the field and linguistic semantic information in respect of the lexical units and their combinations to assist language editor (130) in the imposition of lexical and grammatical constraints, characterized in that the model region (137) consists of three parts and includes the core (510), containing lexical information required language editor (130) and machine translation (120), moreover, lexical information includes lexical units of this subset of natural language attributed to him semantic concepts, parts of speech and morphological information, the model region (530) language editor, which contains the information required only language editor (130), said information includes at least one subset of natural language synonyms for units not contained in the above-mentioned subset of natural language, dictionary definitions mentioned logical units and numerous usage examples mentioned lexical units and area model for machine translation (5 is on translation (520) includes hierarchically organized concepts for an unambiguous mapping and semantic verification in translation.

2. Integrated system of training and translation (105) for the preparation of a document in one language, containing a processor, including a text editor (140) for receiving from the author (160) in interactive mode input text in a source language, a language editor (130) in the form of a continuation of the text editor (140) that is designed to blend in interactive mode first lexical restrictions and then grammatical constraints on a subset of the natural language used by the author (160) to create the input text, with the help of the author (160) in interactive mode to overlay mentioned lexical and grammatical constraints on the input text to obtain devoid of ambiguity, limited text, characterized in that language editor (130) contains grammatical controller comprising a controller ambiguity.

3. Integrated system of training and translation (105) to produce a text in one language that contains the processor, including a text editor (140) for receiving from the author (160) in interactive mode input text in a source language, a language editor (130) in the form of a continuation of the text editor (140) that is designed to blend in inter the second author (160) to create the input text, with the provision of the author (160) aid in interactive mode in the overlay mentioned lexical and grammatical constraints on the input text to obtain devoid of ambiguity, limited text, characterized in that language editor (130) contains a dictionary controller to validate the input text in comparison with the permitted vocabulary and continuation of alternative concepts.

4. The method of computer training document in one language using a processor with a text editor and a language editor in which you can enter the input text in a source language into a text editor (140), characterized in that the implement validation using a text editor (130) of the input text against a pre-defined set of constraints stored in the model region (137), which provide predefined information from the mentioned areas, as well as linguistic semantic information about lexical units and their combinations, these pre-defined set of constraints includes a set of rules source sublanguage, related to vocabulary and grammar, and mentioned the domain model (137) consists of three parts and includes the core (510), the soda is classical information includes lexical units, which satisfy the aforementioned predetermined set of constraints, together with ascribed to them semantic concepts, parts of speech and morphological information, the model region (530) language editor, which contains the information required only language editor (130), said information includes at least one of the subsets of synonyms for lexical items that do not meet the aforementioned predetermined set of constraints, dictionary definitions mentioned lexical units and a set of examples of the use of the lexical units, and area model for machine translation (520) containing the information required only a machine translation system (120)under this model the field of machine translation (520) includes hierarchically organized concepts for an unambiguous mapping and semantic verification in translation, and provides the author (160) interactive feedback in relation to the input text, showing satisfied with a pre-defined set of constraints, with interactive feedback is performed after conversion to the model region (137), which supplies the necessary information from the field and the lexical semanti the military language, receive upon completion of the previous operation devoid of ambiguity, limited text.

5. The method of computer training on p. 4, wherein the predefined set of constraints includes many rules source sublanguage related to vocabulary and grammar, and interactive feedback is to bring the input text in accordance with the rule set of the source sublanguage and resolve ambiguities.

6. The method of computer training document in one language using a processor with a text editor and a language editor in which you can enter the input text in a source language into a text editor (140), characterized in that to validate an input text in comparison with dictionary limitations of the source language, provide the author (160) interactive feedback in relation to the input text, if the input text in the source language includes positions unlimited input language, as the author (160) will not correct the input source text in limited source, the interactive feedback is performed after conversion to the model region (137), which supplies the necessary information to carry out checks for syntactic grammatical errors and semantic ambiguities in the restricted source code by accessing a domain model (137), as well as provide the author (160) interactive feedback to remove syntactic grammatical errors and resolve semantic ambiguities from a limited source text to get devoid of ambiguity, limited text.

7. Integrated system of training and translation (105) for the preparation of a document in one language, containing a processor, including a text editor (140) intended for receiving from the author (160) in interactive mode input text in a source language, a language editor (130) in the form of a continuation of the text editor (140) that is designed to blend in interactive mode, lexical and grammatical constraints on a subset of the natural language used by the author (160) to create the input text, with the author (160) help in interactive mode in the overlay mentioned lexical and grammatical constraints on the input text to obtain devoid of ambiguity, limited text, characterized in that it contains means dlst through interactive restrictions, moreover, the aforementioned label indicates the linguistic characteristics of the portion of the input text.

8. System (105) under item 7, characterized in that it further comprises means for marking a label separation of the input text is converted to devoid of ambiguity, limited text through interactive, restrictions, and specified tag indicates suitability for translation.

9. System (105) under item 7, characterized in that it is designed to work in a server environment, transfer to ensure multiple authors (160) the possibility to use the system.

10. System (105) under item 7, characterized in that the workplace of the author (160) is part of a computer network.

11. System (105) under item 7, characterized in that it includes a block of translation (920) designed to transfer devoid of ambiguity constrained source text in the intermediate language Interlingua.

12. System (105) under item 7, characterized in that language editor (130) is designed to provide interactive communication with the author (160) in batch mode.

13. System (105) under item 7, characterized in that it further comprises a graphical editor (150) designed to created the existing translation with machine translation system (120).

14. System (105) under item 7, characterized in that limited language is a subset of natural language, and this limited the language specified in respect of its vocabulary and grammar.

15. System (105) under item 7, characterized in that language editor (130) includes vocabulary controller and grammatical controller.

16. System (105) under item 15, wherein the dictionary controller (610) is intended to validate the input text in comparison with the permitted vocabulary and to offer alternatives to words that are not in this vocabulary.

17. System (105) p. 15, characterized in that the grammatical controller (620) designed to test for compliance with predetermined grammatical rules and suggestions for alternatives not specified in these rules of grammatical structures.

18. System (105) p. 15, characterized in that the grammatical controller (620) is designed to provide the author (160) feedback in relation to lexical ambiguity and structural ambiguity.

19. System (105) p. 15, characterized in that the grammatical controller (620) includes a controller ambiguity.

21. System (105) under item 15, wherein the dictionary controller (610) has a configuration that makes it possible to identify words that are not included in the restricted source language.

22. System (105) under item 7, characterized in that the input text is specified in the form of blocks of information elements.

23. System (105) under item 22, wherein the information elements contain labels to describe these information elements (410) according to their content and logical structure.

24. System (105) under item 7, characterized in that it further comprises means for marking the label text of the input text is converted to devoid of ambiguity, limited text through interactive, restrictions, and specified tag indicates suitability for translation.

25. System (105) under item 7, characterized in that it further comprises storage means for saving devoid of ambiguity of the source text for further use.

26. The method according to p. 4, characterized in that the label indicates the content and logical structure.

27. The method according to p. 4, characterized in that the label indicates the specific meaning of the above passage, selected by the author.

29. The method according to p. 28, characterized in that the above description identifies this passage as a mathematical value or mathematical one.

30. The method according to p. 28, characterized in that the label represents the label of standard generalized markup language (SGML).

31. The method according to p. 28, characterized in that the said tag identifies the passage as common.

32. The method according to p. 28, characterized in that the label can be made invisible to the user.

33. The method according to p. 28, characterized in that it includes the step of using the above-mentioned characteristics to assist in the translation of the document.

34. The method according to p. 33, characterized in that the said characteristicimpedance description identifies the passage as suitable for translation.

36. The method according to p. 33, characterized in that the above description identifies the passage as not suitable for translation.

37. The method according to p. 33, characterized in that the above description identifies the passage as not requiring translation.

38. The method according to p. 28, characterized in that the above description identifies the passage as not requiring analysis.

39. The method according to p. 28, characterized in that the above description identifies the passage as parsed.

40. The method according to p. 28, characterized in that the above description identifies the passage as successfully parsed.

41. The method according to p. 28, characterized in that the above description identifies the passage as having special content.

42. The method according to p. 28, characterized in that the mentioned feature identifies a valid lingvisticheski the context of the text.

43. The method according to p. 28, characterized in that the above description identifies the passage as having specific linguistic content.

44. The method according to p. 28, characterized in that the above description identifies the passage as having specific Tory document is an element of list-newsletter.

46. The method according to p. 28, characterized in that the said specific type of document structure is a table.

47. The method according to p. 28, characterized in that the said specific type of document structure is a table element.

48. The method according to p. 28, characterized in that the said specific type of document structure is a header.

49. The method according to p. 28, characterized in that the said specific type of document structure is the name.

50. The method according to p. 28, characterized in that the said specific type of document structure is a label associated with graphics.

51. The method according to p. 28, characterized in that the label is set interactively by the user.


Same patents:

The invention relates to the field of computer technology and is designed to create high-performance processing of large data streams in real time

The invention relates to a method of congestion control messages elementary program in the electronic switching system

The invention relates to electronic armament of ships, in particular to combat information and control systems and ship control automated systems

The invention relates to automated storage devices and is used to store values

The invention relates to systems transmit and receive data in an electronic data exchange apparatus

The invention relates to computing and mainly can be used for automated scheduling of work deterministic systems conveyor-type, widely used currently in production, transport, educational process, the military field, the science, such as statistical simulation (Monte Carlo), and in other areas where processes are conveyor systems

The invention relates to means for Informatics and computer technology and can be used to solve problems switching of processor elements

The invention relates to automation and computer engineering and can be used in adapters network Ethernet connection

The invention relates to the field of computer engineering and can be used to manage the service requests of users in lumped and distributed computing systems

The invention relates to a system and method for rapid transfer of large blocks of video data

FIELD: electric engineering.

SUBSTANCE: method includes estimation of quality coefficients of electric energy in electric energy system, determining degree of matching of these coefficients to normal values, forming of control signal for correcting devices and predicting electric energy characteristics expected after effect of these devices. On basis of analysis of predicted characteristics quality coefficients are newly estimated and if necessary control signals for correction devices are formed. Estimation of not only voltage and frequency is provided, but also current. Whole cycle is repeated for each node of electric energy system.

EFFECT: higher efficiency.

1 dwg

FIELD: computers.

SUBSTANCE: device has pulse generator and OR element. First input of OR element is connected to input of pulse generator and is meant for receiving signal, being sign of data transfer in local network. Output of generator is connected to second input of OR element. Output of the latter is meant for output of signal, matching condition of data bus of a network.

EFFECT: higher speed of data transfer, higher reliability of operation of Ethernet network.

3 dwg, 2 tbl

FIELD: computers.

SUBSTANCE: device has three blocks for forming messages lines, block for analysis of messages line, multiplexer, decoder, broadcast control block, buffer register, launch trigger, synchronization block, AND elements block, denying element, blocks for organizing messages lines, direction selection block, OR element, AND elements.

EFFECT: higher efficiency.

3 cl, 12 dwg, 2 tbl

FIELD: computers.

SUBSTANCE: device has a group of buffer blocks from first to eighth, direction correction block, direction selection block, first registers group, output register, first decoder, multiplexer, first counter, group of switchboards from first to eighth, launch trigger, first and second univibrators, OR elements from first to third, first and second AND elements, clock pulses generator. Also inserted are buffer block, second and third registers group, second decoder, a group of demultiplexers from first to eighth, second counter, third univibrator, first and second groups of OR elements and fourth and fifth elements.

EFFECT: higher efficiency.

10 dwg, 1 tbl

FIELD: computer science.

SUBSTANCE: system has block for receiving actualization files, first and second devices for selecting supporting database record address, device for selecting address of user workplace block for selecting addresses of database record, data dispensing block, reverse counter, first and second codes comparison blocks, two registers, AND elements, OR elements and delay elements.

EFFECT: higher speed of operation.

8 dwg

FIELD: playing machines.

SUBSTANCE: system has computing center, playing machines groups, provided with inbuilt individual mating assemblies, and common information displays; information transfer channels, global communication means communication channels and working stations of engineer, global communication devices, local computing centers, wherein workstation of engineer is positioned as well as manager workstation and local database server.

EFFECT: higher efficiency, lower costs, broader functional capabilities.

4 cl, 2 dwg

FIELD: computer-aided systems, namely the extensible automatic systems.

SUBSTANCE: the proposed extensible automatic system has the following units: two programmable controllers, capable of autonomous operation; module of the interface of the first type; module of the interface of the second type; switch; decoder; bus shapers, used as attached elements; adapters; modules of expansion of the first type; modules of expansion of the second type; interface bus; information bus of the first type and information bus of the second type.

EFFECT: technical result of the invention is increasing the reliability and flexibility of extensible automatic systems.

1 cl, 4 dwg

FIELD: computer science.

SUBSTANCE: device has communication blocks for direct and main communications, control block, environment adjustment block, routing block, inner exchange environment.

EFFECT: possible use of different types of external communication lines and decreased delay of data transfer.

4 dwg

FIELD: computer science.

SUBSTANCE: device has block for forming geographical address, interfaces of serial system bus, processor modules, consisting of processor, memory block, logical control block, input-output sub-modules, interfaces for measuring and controlling an object, connected by system bus, while processor modules are connected by determined local network of low level as clusters.

EFFECT: higher durability, higher reliability, broader functional capabilities, higher efficiency.

1 dwg

FIELD: electronic mailing technologies.

SUBSTANCE: method for notification of user about receipt of electronic mail message by mail center, wherein information is stored, related to mail accounts, assigned to identifiers of decoder receivers, enables transfer of notification message in broadcast signal, while notification message includes at least additional portion of text of electronic mail message and identifier of decoder receiver targeted as destination for current notification message. Described transmission is realized by appropriate devices and decoder receivers.

EFFECT: decreased load of addressed transmission channel.

3 cl, 7 dwg

FIELD: computer science.

SUBSTANCE: method includes text messages from data channel, linguistic words processing is performed, thesaurus of each text message is formed, statistical processing of words in thesaurus is performed, text message and thesaurus are stored in storage. Membership of text message in one of categories from the list is determined, starting data value of text message is determined, stored in storage with text message, data value values are periodically updated with consideration of time passed since their appearance and text messages with data value below preset threshold are erased, during processing of each message values of categories classification signs are updated.

EFFECT: higher efficiency.

1 dwg