Method and device for executing commands, floating point and packed data using a single register file

 

(57) Abstract:

The invention relates to the field of computer systems and may be used to execute processor commands floating point and Packed data. The technical result is to increase functionality. The processor includes a decoder, a set of physical registers, the display device. Another option processor contains an additional exception, a set of buffer registers. Methods describe the order of execution of commands in the processor using the contained elements. 4 C. and 47 C. p. F.-ly, 27 ill. 2 table.

The invention relates to the field of computer systems. More specifically, the invention relates to the execution processor commands floating point and Packed data.

In a typical computer system one or more processors work with data values, represented by a large number of bits (for example, 16, 32, 64 and so on) to get the result in response to a software command. For example, the command of the addition is the addition of the values of the first data and the values of the second data and storing the result as the third value of the notes in equipped with a computer cooperation (CSC - integration of the organization of teleconferences with the manipulation of data of mixed type), 2D/3D graphics, image processing, compression/decompression of video information, recognition and manipulation of sound) require the manipulation of large amounts of data, which often presents a smaller number of bits. For example, multimedia data is typically represented as a 64-bit number, but only a small part of the bits can carry meaningful information.

To improve the effectiveness of multimedia application programs (as well as other application programs, which have the same characteristics), known processors Packed data formats. Packed data format is a format in which the bits used to represent a single value, broken into a series of data elements of fixed size, each of which represents a particular value. For example, the data in the 64-bit register can be divided into two 32-bit elements, each of which represents a single 32-bit value.

The main 32-bit machine architecture Hewlett-Packard took this approach to perform multimedia data types. That is,% a 64-bit data types. The main drawback of this simple approach is that it severely restricts the available register space. In addition, the performance advantage when working with multimedia data in such a way from the point of view of the effort required to extend the existing architecture, it is minimal.

Somewhat similar to the approach taken in the 88110 processorTMMotorolais used to combine pairs of integer registers. The idea of combining two 32-bit registers includes Union (concatenation) of random combinations of certain registers for a single operation or a command. Again, however, the main drawback of implementing 64-bit multimedia data types, using paired registers, is that there is only a limited number of pairs of registers that are available. Due to a lack of adding additional register space to the architecture needed another way to implement multimedia data types.

Line of processors, which has a large base of software and hardware, is the architecture of the processor family is a block diagram, illustrating a sample computer system 100, which uses the Pentium processor. For a more detailed description of the Pentium processor than presented here, see the Pentium Processor's Users Manual - Volume 3: Architecture and Programming Manual, 1994, Intel Corporation Santa Clara, CA. Sample computer system 100 includes a processor 105, a memory device 110 and the bus 115. The processor 105 is connected to the memory device 110 bus 115. In addition, a number of devices I / o user, such as a keyboard 120 and a display 125, also connected to the bus 115. Network 130 may also be connected to the bus 115. The processor 105 is a Pentium processor. The memory device 110 represents one or more mechanisms for storing data. For example, the memory device 110 may include a storage device read-only (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, memory devices with parallel erasing and/or other machine-readable medium. The bus 115 represents one or more buses (e.g., PCI, ISA, X-bus, EISA, VESA, etc.) and bridges (also called a bus controller).

Fig. 1 also shows that the device memory 110 stores an operating system Armee software (not shown). Fig. 1 additionally illustrates that the processor 105 includes a device (module) 135 floating-point register 155 state floating (account "FP" is used to denote the term "floating point"). Of course, the processor 105 provides additional schemes that are not necessary for understanding of the invention.

Module 135 floating-point is used to store floating point data and includes a set of registers floating-point (also called a register file floating-point) 145, a set of tags 150 and the status register 155 floating-point number. The set of registers 145 floating-point number includes eight registers, denoted R0-R7, record Rn is used to indicate the physical location registers floating-point number. Each of these eight registers is 80-bit field contains the sign (bit 79), the order field (bits [78: 64] ) and the mantissa (bits [63: 0] ). Module 135 floating point uses a set of registers 145 floating-point as a stack. In other words, the module 135 floating-point number includes a register file, called the stack. When the register set is used as the stack operations are performed with reference voltage is used, to refer to the relative position of the logical registers n floating at the top of the stack). Register 155 state of the floating-point number includes a field 160 of the top of the stack, which identifies which of the registers in the set of registers 145 floating-point are currently at the top of the stack floating-point number. In Fig. 1 index of top stack identifies the register 165 to the physical location of R4 as the top of the stack.

The tag set 150 includes 8 tags and stored in the same register. Each tag corresponds to a different register floating-point and contains two bits. As shown in Fig. 1, the tag 170 corresponds to a register 165. The tag identifies information regarding the current content of the register floating-point number that corresponds to the tag - 00 = valid; 01 = null; 10 = special; and 11 = empty. These tags are used by module 135 floating-point to distinguish between the provisions of empty and non-empty register. Thus, the tags can be say that they identify two States: empty, which is designated 11, and a non-empty, which is denoted by 00, 01 or 10.

These tags can also be used to service the event. "Event" is any act or crust is haunted interrupt exception, fault, system interrupt, abort, machine inspection, maintenance and debugging events. After the occurrence of the event handling mechanism of the event processor causes the processor to terminate execution of the current process, to keep the runtime of the interrupted process (i.e., the information necessary to continue execution of the interrupted process) and to call the appropriate event handler to service the event. After service event handler event causes the processor to resume the interrupted process, using pre-stored process execution environment. Programmers event handlers can use these tags to verify the contents of various registers floating-point to better service events.

While each of the tags has been described as containing two bits, an alternative implementation can only store one bit for each tag. Each of these one-bit tag is identified as empty or non-empty. In such scenarios, the implementation of these single-bit tags can be implemented so as to appear to the user as containing two S="ptx2">

The status register 140 includes box 175 EAT and field 180 TS preserve EAT indicating and TS display, respectively. If EM indication is equal to 1 and/or the TS indication is equal to 1, the hardware processor cause a system interrupt to the operating system after the command is executed floating-point through the formation of an exceptional situation, "device not available". According to the agreement for the software you EAT and TS indicate, respectively, are used for emulation commands floating-point and perform multitasking. However, the use of these displays is simply an agreement for the software. Thus, any or both of the display can be used for any purpose. For example, EATING indication can be used to perform multitasking.

According to the agreement for the software described above, box 175 EAT is used to store the indication of the emulation of floating-point ("EAT the indication"), which identifies whether the device floating-point to be emulated using software. The sequence of commands or a single command (for example, CPUID) is usually performed when the system zagadenosti. Thus, it usually display changes to indicate that the module floating point should be emulated, if the processor does not contain a module with floating point. While in one implementation of the EM indication is equal to 1 when the module floating point should be emulated, alternative implementations may use other values.

Using the operating system a lot of processors capable of performing multitasking with multiple processes (here called "tasks"), using methods such as cooperative multitasking, multitasking with time-slicing, and so on, because the processor can execute only one task at a time, the processor must divide the processing time between different tasks, switching between different tasks. When the processor switches from one task to another, then we say that actuates the switch task (also named as the "context switch" or "switch process"). To perform switching tasks, the processor must stop the execution of one task and or resume or start the execution of another task. There are a number of registers including registers floating-point), whose sod is at any given point in time during the execution of a task is called a "status register" of this task. During multitasking while running multiple processes "status register" task is saved during the execution of other processes, saving it in the data structure (called "contextual framework" task), which is contained in a memory external to the processor. When the task should be resumed, the status register of the task is restored (e.g., loaded back into the processor, using the context structure of the task.

Saving and restoring the register state of a task can be performed using a number of different ways. For example, one operating system stores all the status register of the previous tasks and restores all status register the following tasks at each task switch. However, as the preservation and restoration of all of the status register requires time, it is desirable to avoid the conservation and/or restoration of any unnecessary parts when switching tasks. If the task does not use the module floating, there is no need to save and restore the contents of registers floating-point as part of the status register of this task. To this end TS indication historically ispolzovalas sagati save and restore the contents of registers floating-point during task switching (commonly referred to as "partial context switching" or "switching context on demand").

The use of TS display for the implementation of partial context switching is well known. However, for the purposes of the invention is the value that the command attempted floating-point, while the TS indication indicates that a partial context switch was triggered (i.e., that the module floating-point unavailable" or "blocked"), leads to an exceptional situation, "device not available". In response to this exception event handler running on the processor, determines whether the current task is the "owner" module floating-point (if data stored in the device floating-point belong to the current task or previously completed the task). If the current task is not the owner, then the event handler causes the processor to store the contents of registers in the floating point context the structure of the previous problem, restores the state of the current task floating-point (if available) and identifies the current task as the owner. However, if the current task is the owner of the module floating, the current task was the last task that uses the module floating-point (part of widimosti to take any action regarding module floating, and TS was not installed, and no exception will not take place. The handler is executed, forces the processor to modify the TS indication to indicate that the module floating point captured by the current task (also referred to as "available" or "involved").

After completion of event handler execution of the current task continues, restart command, floating point, which caused the exception of device unavailability. Since the TS indication was changed to indicate that the module floating point available, execute the following commands floating point will not lead to additional exceptional situations unavailable device. However, over the next partial context switch TS, the display changes to indicate that he was involved partial context switch. Thus, if the attempted execution of another command, floating point, will be another exceptional situation of unavailability of the device, and the event handler will be executed again. Thus, the TS indication allows the operating system to delay and possibly avoid saving and loading is, the smaller the number of registers that must be saved and loaded.

While described one operating system in which the state of the floating-point number is not saved or restored during operation switches tasks, alternative implementations may use any number of other ways. For example, as mentioned above, the operating system can be implemented to always save and restore the entire status register each time you switch tasks.

In addition to the various times during which the state of the floating-point process can be stored (for example, within the context of switches, in response to the event of the unavailability of the device, and so on ), there are also various ways to save the state of the floating-point number. For example, the operating system may be implemented so as to save all the state of the floating-point (referred to here as "simple switcher"). Alternatively, the operating system may be implemented so as to save the contents of only those registers floating-point whose corresponding tags indicate a non-empty state (referred to here as the "minimum the comma, which contain useful data. Thus, the overhead of state saving floating-point number can be reduced by reducing the number of registers that must be saved.

Fig. 2 is a block diagram illustrating the command processor Pentium. Work on the flowchart begins at step 200, after which moves to step 205.

As shown in step 205, the set of bits treated as a command, and the operation process moves to step 210. This set of bits includes the operation code that identifies the operation(s) (s) need(s) to be performed(s) command.

At step 210 determines whether the operation code is valid. If the opcode is not valid, go to step 215. Otherwise move on to step 220.

As shown in step 215, formed an exception about invalid operation code and executes the corresponding event handler. This event handler can be implemented to cause the processor to display a message to perform the abort the current task and continue to perform other tasks. Of course, alternative ways of implementation can implement this handler benefits comma. If the command is not a command, floating point, move on to step 225. Otherwise it moves on to step 230.

As shown in step 225, the processor executes the command. Because this step is not necessary for describing the invention, it is not further described.

As shown, at step 230 determines whether the EM indication is equal to 1 (according to the mentioned agreement for the software, if the module floating-point must be emulated), and whether the TS indication is equal to 1 (according to the mentioned agreement for the software, if you have been involved partial context switch). If you EAT readout and/or the TS indication is equal to 1, go to step 235. Otherwise move on to step 240.

At step 235 form exception device not available" and perform the appropriate event handler. In response to this event, the corresponding event handler can be implemented to poll EAT and TS display. If EM indication is equal to 1, it can be executed event handler to cause the processor to execute the command, emulating module floating-point, and resume execution of the next command (command, which logically should cohibit so, as previously described in relation to a partial context switch (to save the contents of the module, floating, and to restore the correct state of the floating-point number, if applicable), and to cause the processor to resume execution by restarting execution of the command, adopted at step 205. Of course, alternative ways of implementation can implement this event handler in any number of ways.

If formed some numerical errors during command execution floating-point, such errors are delayed until the following command to perform floating-point number, whose execution can be interrupted for maintenance pending numerical floating-point errors. As shown, at step 240 determines whether there are any such pending errors. If there are any such pending errors transitions to step 245. Otherwise at step 250.

At step 245 is formed by a delay event of floating-point errors. In response to this event, the processor determines whether masked error floating-point number. If Yes, the processor attempts to handle the event inside, using microcode, and the team with the floating point is "who I am without performing any necrotomy handlers (also called event handlers operating system). Such an event is called an internal event (also known as software-invisible event, because the event is handled internally by the processor and, thus, does not require any external handlers operating system. On the contrary, if the error floating point is not masked, the event is an external event (also named as "software-visible event"), and executes the corresponding event handler. This event handler can be implemented to service errors and cause the processor to resume execution by restarting execution of the command received at step 205. This method restart command is called "macroposthonia run" ("MicroStart" or "restart level" command). Of course, alternative implementation may implement this necrology event handler in any way.

As shown in step 250, the command is executed floating-point number. During this run the tags are changed as necessary, of any numerical errors that can be maintained, reported, and any other numerical errors linger.

One limitation of the processor family Intel (ve include a set of commands for manipulating Packed data. Thus, it is desirable to include a set of commands for manipulating Packed data in such processors in a way that is compatible with existing software and hardware. In addition, it is desirable to get the new processors that support a set of commands for working with Packed data, and which are compatible with existing software, including operating systems.

The invention provides a method and device commands floating point and Packed data using a single physical register file, which is combined (superimposed). According to one aspect of the invention proposed processor, which includes a decoding device, the display device in the memory and the device memory. A decoding device configured to decode commands and their operands from at least one of the set of commands comprising at least first and second set of commands. The memory device includes a physical register file. A display device configured to display the operands used by the first set of commands in the physical register file by the way, Ana is dy, used the second set of commands in the same physical register file in a manner different from the treatment to the stack.

According to another aspect of the invention proposed processor, which usually includes a decoding device, the display device in the memory device and seizures. The display device displays the operands of floating point and Packed data in the same set of registers contained in the module exemptions. While the display device displays the floating-point operands in a manner analogous to the conversion to the stack, the display device displays the operands Packed data in a manner different from the treatment to the stack. In addition, the display device includes a set of tags, each corresponding to a different entry in the map table and identifies whether a corresponding entry in the empty state or a non-empty state.

The invention may best be understood from the following description and the accompanying drawings which illustrate the invention. In the drawings:

Fig. 1 shows a block diagram illustrating example computer system that uses a Pentium;

Fig. 2 is aetsa functional diagram illustrating the combination of Packed data state and status of floating point, according to one variant embodiment of the invention;

Fig. 3B and 3C illustrate the mapping of physical registers floating-point and Packed data in relation to the logical registers floating-point;

Fig. 3D illustrates the execution sequence including commands Packed data and floating point;

Fig. 4A is a functional diagram illustrating part of a method of executing commands floating point and Packed data in a way that is compatible with existing software, invisible to the various ways the operating system, and facilitates the efficient programming techniques according to one variant embodiment of the invention;

Fig. 4B is a functional diagram illustrating the remaining part of the way, partially shown in Fig. 4A;

Fig. 5 shows a block diagram illustrating example computer system, according to one variant embodiment of the invention;

Fig. 6A is a block diagram illustrating a device for combining state register Packed data soia inventions;

Fig. 6B is a block diagram illustrating in an enlarged scale view of a portion of a file floating point with reference similar treatment to the stack of Fig. 6A, according to the options of carrying out the invention;

Fig. 7A is a functional diagram illustrating part of the method, in accordance with one embodiment of the invention for executing commands Packed data on the set of registers which are combined with a set of floating point registers in a way that is compatible with existing software, which is invisible to the various ways the operating system, which promotes good programming practice, and which can be carried out using the organization of the hardware depicted in Fig. 6A;

Fig. 7B is a functional diagram illustrating another part of the way, partially shown in Fig. 7A;

Fig. 7C is a functional diagram illustrating the remaining part of the way, partially shown in Fig. 7A and 7B;

Fig. 8 is a functional diagram illustrating a method for performing step 734 of Fig. 7C, according to one variant of implementation sobrellano one variant embodiment of the invention;

Fig. 10 is a block diagram illustrating the data flow through the device for combining state of the Packed data state of the floating-point using a single register file, according to another variant embodiment of the invention;

Fig. 11A illustrates part of a method, in accordance with another embodiment of the invention, to execute the command with the Packed data and floating point on one file registers are combined in a way that is compatible with existing software, which is invisible to the various ways the operating system, which promotes good programming practice and can be implemented using the organization of the hardware depicted in Fig. 10;

Fig. 11B is a functional diagram illustrating another part of the way, partially shown in Fig. 11A;

Fig. 11C is a functional diagram illustrating the remaining part of the way, partially shown in Fig. 11A and 11B;

Fig. 12A illustrates the format memory floating-point according to one variant of the invention, described with reference to Fig. 10;

Fig. 12B illustrates the format of the memory on the. 3 illustrates the method according to one variant embodiment of the invention for step 113 in Fig. 11B, when implemented memory formats described with reference to Fig. 12A, 12B and 12C;

Fig. 14 is a functional diagram illustrating the method for cleaning tags, according to one variant embodiment of the invention;

Fig. 15A depicts the sequence of operations, including team Packed data and floating point, to illustrate the time interval during which a separate physical register files, which are combined, can be modified; and

Fig. 15B shows another sequence of operations, including team Packed data and floating point, to illustrate the time interval during which a separate physical register files, which are combined, can be modified.

In the following description formulated numerous specific details for a full understanding of the invention. However, it is clear that the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and methods op is sushestvennee invention describes a method and apparatus perform different sets of commands, which cause the processor to perform operations with data of different types in a way that is invisible to the various ways the operating system, which promotes good programming practice and is invisible to existing software. To do this, run different sets of commands that cause the processor to perform operations with different data type that is at least logically appears to software as a single combined file registers. Operation type data, performed as a result of executing different instruction sets can be of any type. For example, a single set of instructions can cause the processor to perform scalar operations (floating point and/or integer), and another set of commands may cause the processor to perform operations on Packed data (floating-point and/or integer). As another example, a single set of instructions can cause the processor to perform operations with floating-point (scalar and/or Packed), and another set of commands may cause the processor to perform integer operations (scalar and/or Packed). As another example, one combined file registrum. Moreover, describes a method and apparatus for performing these different sets of commands, using a separate physical register files that are logically appear to software as a single combined file registers. Furthermore, it describes a method and apparatus for performing these different sets of commands using one physical register file.

For clarity, the invention will be described with reference to commands, floating, and teams Packed data (floating-point and/or integer). However, it should be clear that any number of operations with different types of data can be performed, and the invention is in no way limited to floating point and Packed data.

Fig. 3A is a functional diagram illustrating the alignment state of the compressed data and status of floating point, according to one variant embodiment of the invention. Fig. 3A depicts a set of registers 300 floating-point to save floating point data (referred to here as the state of the floating-point) and a set of registers 310 Packed data to save the compressed data (referred to here as the s Packed data. Fig. 3A also shows that the condition of the Packed data is combined with the state of the floating-point number. Then there are teams with floating-point and the team with the Packed data at least appear to software as performed on one and the same set of logical registers. There are a number of ways to perform this alignment, including the use of multiple separate physical register files or a single physical register file. Examples of such methods will be described later with reference to Fig. 4-13.

As described above, existing operating systems are implemented to cause the processor to save the state of floating in the multitasking mode. The Packed data is combined with the state of the floating-point number, the same operating system will cause the processor to store any state of the compressed data, which is combined with the state of the floating-point number. As a result, the invention does not require modification of the program (s) task switching an old operating system (of course, the program switch tasks can be implemented as one or more quantities of the volumes. Therefore, you do not need to develop a new or modified operating system to save the state of the compressed data in a multitasking mode. Thus, there is a possibility to exclude the costs of developing an operating system. Moreover, in one embodiment, any events generated by the execution of commands Packed data maintained within the processor or displayed on existing events, whose corresponding event handler of the operating system can handle events. As a result, the team Packed data are performed in a way that is invisible to the operating system.

Fig. 3A also shows a set of tags 320 floating-point and set tags 330 Packed data. Tags 320 floating-point function like tags 150 described with reference to Fig. 1. Thus, each tag includes two bits that indicate whether the contents of the corresponding register floating-point empty or non-empty (for example, valid, special or zero). Tags 330 Packed data correspond to the registers 310 Packed data and are combined with tags 320 floating-point number. At that time, to Renate only one bit for each tag. Each of these one-bit tag identifies an empty or non-empty state. In such scenarios, the implementation of these single-bit tags can be implemented so as to appear to software as containing two bits, determining the corresponding case of double-bit tag value, when the value of the tag. Operating systems that implement minimum switching tasks, save the contents of only those registers whose corresponding tags indicate a non-empty state. Because tags are combined, this operating system will keep any Packed data and the state of the floating-point number. On the contrary, operating systems that implement a simple switching tasks, saves all the contents of the logical composite file registers regardless of the status tags.

In one embodiment, the registers 300 floating-point registers are used like 145 floating-point described in Fig. 1. Thus, Fig. 3A additionally shows the register 340 state of the floating-point number that contains the field 350 of the top of the stack. Field 350 of the top of the stack is used to save the pointer top of stack (TOS) to identify the od of the operations are performed with reference to the register of the top of the stack as opposed to the registers of the physical locations. On the contrary, the registers 310 Packed data are used as a fixed register file (also referred to as the register file have direct access). Thus, the team Packed data indicate the physical location of the registers that you want to use. The register 310 Packed data is displayed on the physical location of registers 300 floating point, and this mapping is not changed, when the top of the stack is changed. As a result, it is at least evident for software that there is a single logical register file, which can be used as a register file with a case similar treatment to the stack, or as a two-dimensional register file.

Fig. 3B and 3C illustrate the display of the combined registers 300 floating point and tag 320 floating point with reference to the registers 310 Packed data, and tags 330 Packed data, as shown in Fig. 3A. As described above, in the environment of floating-point each register n is relative to the register floating-point identified by the TOS pointer. Two cases are shown in Fig. 3B and 3C. Each figure represents the relationship between the logical or visible to the programmer registers (stack) floating EIT drawings 3B and 3C, represents a physical data registers floating point/Packed and appropriate tags, and the outer circle represent the logical registers floating-point specified by the pointer 370 top of the stack. As shown in Fig. 3B, the pointer 370 top of the stack points to the physical register 0 floating point/Packed data. Thus, there is a consistent logical registers, floating-point physical registers floating point/Packed data. As shown in the figure, since the pointer 370 top of the stack is changed by command, floating point, which causes or pushing on the stack or popping from the stack, the pointer 370 top of the stack is changed accordingly. Pushing onto the stack shows the rotation of the pointer top of stack counterclockwise in the drawing, and the floating-point ejection causes the pointer top of stack, rotating clockwise.

In the example shown in Fig. 3C, the logical register STO floating-point physical register 0 do not coincide. Thus, shown in Fig. 3C if the pointer 370 top of the stack points to the physical register 2 floating point/Packed data, Cotabato appeal regarding TOS 370. While has been described an implementation option, which registers floating-point is used as a stack, and registers the compressed data are used as a fixed register file, an alternative implementation can implement these sets of registers in any way. Moreover, as has been described an implementation option in relation to floating point and Packed data, it is clear that this method could be used to combine any fixed register file with any file registers with reference similar treatment to the stack, regardless of the type of operations performed on it.

The condition of the Packed data can be combined (superimposed) on any part or all of the States of the floating-point number. In one embodiment, the condition of the Packed data combined with field mantissa state of the floating-point number. In addition, the combination may be full or partial. The full alignment is used to refer to the variant of implementation, in which all the contents of registers is imposed. Partial alignment hereinafter described with reference to Fig. 6A.

Fig. 3D is a flowchart illustrating the execution of the command the plants. Fig. 3D shows in chronological order the first set of commands 380 floating-point instruction set 382 Packed data and the second set of commands 384 floating-point number. Execute set of commands 382 Packed data begins at time T1 and ends at time T2, while the execution of the set command with floating point begins at time T3. Other teams may or may not be executed between the execution of the specified set of commands 382 Packed data and a second set of commands 384 floating-point number. The first interval 386 takes time from the time T1 to the time T3, while the second interval 388 is the time between the moments T2 and T3.

As the state of floating point and Packed data stored in the combined file registers, tags should be changed (released) before execution of the second instruction set 384 floating-point number. Otherwise, there may be formed an exceptional situation on stack overflow. Thus, at some point in time during the first interval 386 tags changed to an empty state.

This can be accomplished in several different ways. For example, an implementation option can perform the x data to change the tags to the empty state; 2) conditioning the execution of each command Packed data instruction set 382 Packed data to change the tags to the empty state; 3) change the tags to the empty state after the attempt to execute the first command, floating point, which changes the combined file registers, etc., These embodiments of leave the operating system invisible to existing operating systems that support simple context switching (save and restore the entire state register on each switch tasks) because the condition of the Packed data must be saved and restored along with the rest of the status register.

In another embodiment, in order to remain compatible with operating systems that support simple and/or minimal context switches, executing the set of commands 382 Packed data leads to the fact that tags are changed to a non-empty condition in the rst interval, 386, unless the instruction set of the transition, represented by block 390, fails after time T2 to time T3 (the time in which it starts the second set Comanche suppose that task A is interrupted full switch tasks (i.e. negationem switch tasks) to execute a set of commands transition 390. Because you are full task switch, the processor switches tasks will include team floating-point (illustrated by the second set of commands 384 floating-point, and called in this example, the program switch FP tasks") to save the state of the floating point/Packed data. Because the instruction set transition 390 has not been executed, the processor will change the tags on a non-empty state at any point in time

prior to the execution of the switching program FP tasks. As a result, the switching programme FP tasks, or minimal or simple, will save the contents of the entire composite file registers (in this example, the condition of the Packed data of A task). On the contrary, if the instruction set transition 390 is executed, the processor modifies the tags to the empty state at any point in time during the second interval 388. Thus, interrupts or no task switch task after execution of the instruction set transition 390, the processor will change the tags to the empty state at any time prior to the execution of the second set of commands 384 floating tapatidasi, task A or another program).

As another example, again assume that the set of commands 382 Packed data belongs to A task, and this task is interrupted by the switch task before executing the instruction set transition 390. However, at this time, the switch task is a partial switch tasks (i.e. the state of the floating point/Packed data is not saved or restored). If no other tasks are performed that use commands floating point or Packed data, the processor will eventually return to the running task will be executed the instruction set transition 390. However, if another task (e.g. task) uses floating point or Packed data taken these commands causes the query handler of the operating system on the conservation status of the floating point/Packed data of A task and restore floating point/Packed data task B. This handler will turn the program switch FP tasks (in this example, illustrated by the second set of commands 384 floating-point) to save the state of the floating point/Packed data. As nab and prior to the execution of the switching program FP tasks. In the shift program FP tasks, or minimal or simple, will save the contents of the entire composite file registers (i.e. the state of the compressed data of A task). Thus, this variant of implementation remains the operating system invisible regardless of the method used to save the state of the combined registers.

The command set of the transition can be executed by any number of techniques. In one embodiment, this set of commands may include a new team called team here EMMS (empty state media). This command calls the cleanup tag data floating point/Packed to specify any subsequently executed code that all of the registers 300 floating point available for any subsequent floating-point number that can be made. This avoids the formation of a stack overflow conditions, which might otherwise take place if the team EMMS is not performed after the teams Packed data, but before the command is executed floating-point number.

In the known practice, the programming of floating-point operations, using the processor Intel architecture, usually complete code blocks is wow, do you use a partial and/or minimum context switch, the state of the floating-point left in purified form after completion of the first code block floating-point number. Therefore, EMMS command is intended for use in sequences of Packed data in order to clear the state of the compressed data. EMMS instruction must be executed after the code block of the compressed data. Thus, the processor that implements the described methods and apparatus, and retains full compatibility with the known processors with floating point, using the processor Intel architecture, but also has the ability to execute commands Packed data, which, if programmed good ways of programming and related service activities (clearing condition before transitions between code Packed data and code floating point, allow transitions between Packed data and code with floating-point without adverse effects on the state of the floating-point and Packed data.

In another embodiment, the instruction set of the transition can be implemented using existing commands floating h is the Rianta implement switching between executing Packed data and commands floating point takes time. Thus, a qualified programming method should minimize the number of these transitions. The number of transitions between the teams, floating, and commands the compressed data can be reduced by grouping commands floating without teams Packed data. It is also desirable to maintain such good programming techniques, you want to create a processor, which makes it difficult to ignore such good programming techniques. Thus, one version of the exercise also changes the pointer to the top of the stack initialization status (e.g., zero, to indicate the register R0) during the first interval 386. This can be accomplished by any number of different ways, including: 1) causing the first command Packed data, which will change the pointer to the top of the stack; 2) causing the execution of each command Packed data instruction set 382 Packed data, to change the pointer to the top of the stack; 3) invoking the command EMMS to set the pointer to the top of the stack; 4) change the display to the top of the stack when you try to run floating point in time T3 in Fig. 3; and so on, Again this is done to support packge from the perspective support good programming techniques in one embodiment, during the first interval 386 also save the value, indicates the number of fields in sign and order of any combined register in which write compressed data.

Fig. 4A and 4B depict a flow chart of operations of a method of executing commands floating point and Packed data in a way that is invisible to the various ways the operating system, which supports efficient programming techniques, according to one variant embodiment of the invention. The diagram begins at step 400. After step 400, the execution continues at step 402.

As shown, in step 402 refer to the set of bits as a team, and the sequence goes to step 404. This set of bits includes the operation code that identifies the operation(s) that must be completed by the team.

At step 404 determines whether the operation code is valid. If the opcode is not valid transitions to step 406. Otherwise, goes to step 408. Assuming that the execution of the program containing the compressed data will be taken on a processor that does not support commands Packed data operation codes for the commands of the Packed data will not be dopustimi Packed data codes of operation for these commands will be valid and the sequence continues at step 408.

As shown in step 406, formed an exception about invalid operation code, and executes the corresponding event handler. As previously described with reference to step 215 in Fig. 2, this event handler can be executed to cause the processor to display a message to perform the abort the current task and continue to perform other tasks. Of course, this event handler can be executed in any number of ways. For example, this event handler can be performed to identify whether the processor is unable to execute commands Packed data. This is the same event handler could also be performed to set the indication identifying that the processor cannot execute commands Packed data. Other application programs running on the processor can use this indication to determine whether to perform using a set of scalar programs or the same set of programs Packed data. However, this implementation will require or modify an existing operating system is whether the command is not a command, floating point, neither team Packed data, the flow goes to step 410. However, if the command - floating-point transitions to step 412. On the contrary, if the command is a team Packed data transitions to step 414.

As shown in step 410, the processor executes the command. Since this step is not necessary for the understanding of the invention, it is not further described.

As shown, at step 412 determines whether the EM indication is equal to 1 (according to the agreement for the software, if it was emulated module floating point and whether the TS indication is equal to 1 (according to the agreement for the software, if you did a partial context switch). If you EAT readout and/or the TS indication is equal to 1 transitions to step 416. Otherwise it moves on to step 420. While executed one variant of implementation to invoke the exception unavailable device when EATING indication is equal to 1, and/or the TS indication is equal to 1, alternative implementation can be implemented to use any number of other values.

At step 416 is formed in the exceptional situation of the week is 235 in Fig. 2, the corresponding event handler can be executed to query the values of EM and TS display. If EM indication is equal to 1, then the event handler emulates the device floating point to perform the command and force the processor to continue executing the following command (the command that logically follows the command received at step 402). If the TS indication is equal to 1, then the event handler causes the processor to operate as described above with reference to a partial context switch saves the contents of the device floating-point and restores the correct state of the floating-point number if necessary) and causes the processor to resume execution by restarting execution of the command received at step 402. Of course, alternative ways of implementation can execute the event handler in any number of ways. For example, EATING indication can be used to implement multitasking.

The Packed data is combined with the state of the floating-point number as indicating EAT and TS cause a change of state of the floating-point processor must also be responsive to EAT and TS indication when>/P>At step 414 determines whether EATING indication 1. As described above, the event handler is executed to service the exception that no device can be made to poll EAT zone and try to emulate the device is floating, if EM indication is equal to 1. As the existing event handlers are not recorded for emulation commands Packed data, taken the command of the Packed data, while the EM indication is equal to 1, cannot be handled by this handler. In addition, in order to remain invisible to the operating system, the processor may not require changes to this event handler. As a result, if at step 414 it is determined that EM indication is equal to 1 transitions to step 406, and not to step 416. Otherwise it moves on to step 418.

As described above, at step 406 is formed an exception about invalid operation code and executes the corresponding event handler. Rejecting the attempted command Packed data, while EATING = 1, exception invalid opcode, an implementation option is not visible to the operating system.

In the new system invisible, alternative implementation may use other methods. For example, an alternative implementation may also generate exception of device unavailability, another existing event or a new event in response to the attempted execution of the command the compressed data, while the EM indication is equal to 1. In addition, if a slight modification of the operating system is acceptable, the selected event handler can be modified to perform any action that is deemed appropriate in response to this situation. For example, the event handler can be written to emulate the teams Packed data. Another alternative implementation can only ignore EATING indication while executing Packed data.

As shown, at step 418 determines whether the TS indication 1 (according to the existing agreement for the software, if you have been involved partial context switch). If the TS indication is equal to 1 transitions to step 416. Otherwise it moves on to step 422.

As described above, at step 416 is formed in the exceptional situation of unavailability of the device and is the th event handler, to poll EAT and TS display. Because the step 414 rejects a situation where EM indication is equal to 1, on an exceptional situation on the inadmissibility of the opcode, EAT readout must be equal to 0 and the TS indication must be equal to 1. Since the TS indication is equal to 1, the event handler functions as described above in relation to a partial context switch saves the contents of the device floating-point and restores the correct state of the floating-point number, if required) and causes the processor to resume execution by restarting execution of the command received at step 402. The Packed data is combined (superimposed) on the state of the floating-point number, this event handler works for the state of floating point and Packed data. As a result, this method is invisible to the operating system. Of course, alternative ways of implementation can implement this event handler by any number of techniques. For example, an alternative implementation in which the condition of the Packed data is not combined with the state of the floating-point can use the new event handler, the cat is iant implementation for processing TS display method which is the operating system invisible, alternative implementation may use other methods. For example, an alternative implementation may not implement the TS indication. This alternative implementation would not be compatible with operating systems that use the TS indication to perform a partial context switch. However, this alternative implementation would be compatible with existing operating systems that do not support partial context switch using the TS indication. As another example, taken the command of the Packed data, while the TS indication is equal to 1, could be rejected to the new event handler or to an existing event handler that has changed. This event handler can be executed to perform any action that seems appropriate in response to this situation. For example, in the embodiment, in which the condition of the Packed data is not combined with the state of the floating-point number, this event handler can save the state of the compressed data and/or the state of the floating-point number.

KE floating-point, these errors remain detained until you try the following command to perform floating-point number, whose execution can be interrupted to service them. As shown in steps 420 and 422 is determined whether there are any such pending errors that can now be served. Thus, these steps are similar to step 240 in Fig. 2. If there are any such pending errors, with steps 420 and 422 to step 424. However, if at step 420 it is determined that such pending errors there are no transitions to step 426. Conversely, if at step 422 it is determined that such pending errors there are no transitions to step 430. In an alternative embodiment, these errors remain pending during the execution of commands Packed data.

At step 424 is formed exception error latency floating-point number. As described above with reference to step 245 in Fig. 2, in response to this event, the processor determines if the error is masked floating-point number. If Yes, the processor attempts to handle the event within the team floating-point microperipherals. If the problem with floating point is not masked, the event is an external event and performed and to force the processor to continue execution, repeating the run command received at step 402. Of course, alternative ways of implementation can implement this event handler in any number of ways.

As shown in step 426, the command is executed floating-point number. To remain invisible to the operating system, one option exercise also changes the tags as necessary, reports any numerical errors that can be serviced, and performs any other delay of numerical errors. As there are many ways the operating system to save the contents of your device with floating point, it is desirable to execute commands Packed data and floating point in a way that is invisible to all such methods of the operating system. Supporting tags, this option implementation remains the operating system invisible to any such methods of the operating system that retains the contents of only those registers floating-point whose corresponding tag specifies a non-empty state. However, alternative implementation can be made to be compatible with a smaller number of these ways the operating system. For example, the EU is sche compatible with this operating system. In addition, for the invention it is not necessary to numerical exception floating-point were detained, and thus, although an alternative implementation does not provide for such operations, it remains in the scope of disclosure of the present invention.

As shown, at step 430 determines whether the Packed data command EMMS team (also called transition team). If the Packed data command is EMMS team transitions to step 432. Otherwise, goes to step 434. EMMS command is used to change the tags floating in the initialization state. Thus, if the condition of the Packed data is combined with the state of the floating-point number, this command should be executed when the transition from command execution Packed data to commands floating-point number. Thus, the device floating-point number is initialized to execute commands floating-point number. Alternative implementation that does not involve the combination of States of the Packed data state of the floating-point may not be necessary in steps 430 and 432. In addition, steps 430 and 432 are not t is the right and the pointer to the top of the stack is changed to the initialization value. Changing the tags to the empty state, the device is a floating-point number is initialized and prepared to execute commands floating-point number. Change the pointer to the top of the stack initialization value (which in one embodiment is null, to identify a register R8) allows separate group commands floating point and Packed data and, thus, encourages good programming practices. Alternatives implementation there is no need to initialize the pointer to the top of the stack. After completion of step 432, the system is released to run the following command (the command that logically follows after the command received at step 402).

As shown in step 434, the command is executed Packed data (without generating any numeric exception) and the pointer to the top of the stack is changed to the initialization value. To avoid the formation of any numeric exceptional situations, one option implementation implements command the compressed data so that the data values are separated and/or fixed at the maximum or minimum value. Not forming any numeric exceptional is Ariant embodiment of the invention is invisible to the operating system. Alternatively, an implementation option may be implemented to perform microcode event handlers in response to such numerical exceptions. Alternative embodiments of which are not completely invisible to the operating system, can be designed so that any additional event handlers will be included in the operating system or existing event handlers changed to maintenance errors. The top of the stack is changed for the same reasons explained above. Alternative implementation can be performed to change the top of the stack in any number of ways. For example, an alternative implementation can be implemented to change the pointer to the top of the stack after executing all commands Packed data, except the EMMS. Other alternatives for implementation may be performed to change the pointer to the top of the stack after executing any other commands Packed data, except the EMMS. If any event memory formed as a result of attempts to execute the command Packed data, the execution is interrupted, the pointer to the top of the stack is not changed, and the event served. tsetse transition to step 436.

As shown, at step 436 determines determines whether the team Packed data processor to write to a combined case. If Yes, goes to step 438. Otherwise it moves on to step 440.

At step 438 units are stored in fields of the sign and order of each of the combined register in which the team Packed data causes the processor to write. After step 438 transitions to step 440. This step helps qualified programming techniques, in which it encourages separately to group commands floating point and Packed data. Of course, alternative embodiments of which do not relate to this problem, can avoid this step. While in one embodiment, the units are written in the field of the sign and order, alternative implementation may use any value that represents NAN (not a number) or infinity.

As shown in step 440, all tags are changed to a non-empty state. Change all tags to a non-empty state supports qualified programming techniques, in which it encourages individual grouping commands with playpool operating system saves the contents of only those registers floating-point whose corresponding tags indicate a non-empty state (minimal context switching). Thus, in the embodiment, in which the condition of the Packed data is combined with the state of the floating-point edit all tags, equal non-empty state, causes the operating system to save the state of the compressed data as if it were the state of the floating-point number. Alternative implementation can modify only those tags whose corresponding registers contain valid elements of the Packed data. In addition, alternative implementation can be made to be compatible with a smaller number of these ways the operating system. For example, if the current operating system does not use tags (for example, the operating system, which saves and restores all status register), an implementation option, which does not implement the tags will still be compatible with this operating system. After step 440, the system is free to perform the following commands (commands that are logically following after the command received at step 402).

Thus, in this embodiment, the content is stored (FSTENV), the team is shown with reference to the table. 1.

As shown, any Packed data, except EMMS, calls for the installation of tags 320 in the non-empty state (00). EMMS evokes the setting of register tags floating point tags to the empty state (11). In addition, any Packed command data, including EMMS also causes reset to 0 pointer to the top of the stack, stored in field 350 of the top of the stack.

The remaining registers of the environment, such as the control word and status (except TOS) processor Intel architecture remain unchanged. Any Packed data is read, or EMMS leaves part of the mantissa and the order of the registers 300 floating point in the same condition. However, in one embodiment, any Packed data is written to the register Packed data, because of the mechanism of combining cause a change in the significand corresponding register floating-point according to the operation being performed. In addition, in this embodiment, the write data in the part of the mantissa registers floating-point modifications registers 310 Packed data causes the installation to "1" all bits in the parts of the sign and order of the registers 300 floating point. As the team Packed data registers in the parts of the sign and order of registers floating-point), this doesn't affect the team Packed data. As described above, alternative embodiments of can combine the state of the compressed data with any part of the state of the floating-point number. In addition, alternative implementation may choose to write any other value or no change of the sign and/or registers (see table. 2).

To further specify the commands Packed data part of the sign and order of registers floating-point recording, set to "1". This is due to the fact that the registers floating-point number use part of the order in registers floating-point, and you want this part of the register was left in a state of determinants after command execution Packed data. In the Intel architecture microprocessor part of the order register floating-point, set to "1" is interpreted as not a number (NAN). Thus, in addition to installing the tag 330 Packed data in a non-empty state, the portion of the order registers floating-point number is set to "1" that can be used to indicate that a previously executed Packed data. This will complement what may change the data giving the wrong results. Thus, floating-point code is an additional way to distinguish when the registers floating-point contain floating-point data and when they contain compressed data.

Thus, the described method of executing commands Packed data, which is compatible with existing operating systems (such as operating MS WINDOWS of Microsoft Corporation, Redmond, Washington), and supports professional programming techniques. The Packed data is superimposed on the state of the floating-point state of the Packed data will be saved and restored existing operating systems as if it was a state of floating. In addition, because the events that formed executing Packed data available to service existing event handler of the operating system, there is no need to modify these event handlers and there is no need to add new event handlers. As a result, the processor is compatible top-down, and the update does not require the expenditure required to develop or modify the operating system.

ion systems, described with reference to Fig. 7A-C, 8 and 9 and with reference to Fig. 11A-C. Although these ways of implementation are different, the following is common to all these embodiments (variant implementation, shown in Fig. 4A-B; an implementation option, shown in Fig. 7A-C, 8 and 9; and an implementation option, shown in Fig. 11A-C): 1) status of floating point and Packed data at least appears to the software, which should be stored in a single logical register file; 2) the command Packed data when EATING bit indicates commands floating point should be emulated", leads to an exceptional situation on the inadmissibility of the operation code, and not to the exceptional situation of unavailability of the device; 3) execution of the command the compressed data, when the TS bit indicates "performed a partial context switch, leads to an exceptional situation of unavailability of the device; 4) delay events floating served by attempting any of the teams Packed data; 5) the performance of any of the teams Packed data will cause the pointer to the top of the stack is changed to "0" at any point in time before executing the next command to nd the compressed data, the execution of the EMMS command will cause all tags changed to an empty state at any point in time before executing the next command floating-point; 7) if any of the teams Packed data is not accompanied by the execution of the EMMS command, the tags will be changed to a non-empty state at any point in time before executing the next command floating-point; 8) a value representing a NAN (not a number) or infinity are stored in fields of the sign and order of any register FP/PD, written by the processor in response to the command Packed data; and 9) no new not micromodule event handlers are not required.

Varieties of options the implementation shown in Fig. 4A-B, some of which have been described may be fully or partially compatible with such operating systems and/or to maintain good programming techniques. For example, an alternative embodiment of the invention can move some stages in other locations in the sequence of operations shown on the drawings 4A-B. Other embodiments of the invention can modify or delete one or more stages. For example, alternate the number of system architectures and is not limited to the architecture described here.

When using the above ways to execute commands with floating point and Packed data, it is recommended that programmers who use embodiments of the present invention, highlighted sections of their code and sections that would contain a separate command blocks floating point and Packed data, as shown in Fig. 3. This should allow you to save the state and purification from the state of the Packed data to transition from a sequence of floating-point operations to a sequence of operations on Packed data, and Vice versa. This also solves the problem of compatibility with the known mechanisms of task switching, including those that preserve the context during task switching.

As the team Packed data affect the registers floating-point 300 (Fig. 3A) and any single team Packed data sets all tags floating in a non-empty state, the selection code in the code blocks of type, therefore, it is recommended to correct "accounting" (tracking). An example of executing mixed teams floating point and Packed data blocks illustrated in Fig. 3D. It may include the operation within the s command codes of the application program in a single application program. In any case, the correct accounting registers floating-point 300, the appropriate tags and pointers to the top of the stack is provided by partitioning functionality in separate blocks floating-point code and the compressed data.

For example, as illustrated in Fig. 3D, the execution sequence may include a first set 380 commands floating-point number. After completion of the command block 380 floating-point state of the floating-point number can be stored, if required, by an application program. This can be accomplished using any number of known previous methods, including conservation stack floating-point or use the command FSAVE/FNSAVE in the Intel architecture processor. This can also be done during the minimal context switches that preserve the environment, floating, and check individual tags to indicate that the corresponding register floating-point number contains valid data. For each tag, which indicates that the corresponding floating-point data contain valid data, the corresponding register floating-point will be saved. Moreover, in these obstacal

After the first set of 380 commands floating-point second set 382 teams Packed data is performed in the sequence of operations. Calling this execution of each command Packed data will cause all tags 330 Packed data set in non-empty state at any point in time, 386, if you have not performed a set of commands 390 transition.

If no switches tasks are not involved, followed by executing the set of commands 382 Packed data is a set of commands 390 transition. This set of commands 390 transition can be made to save the state of the compressed data. This can be done using any mechanism, including the famous team save the floating-point described above, or a specialized command to save only the condition of the Packed data. The condition of the Packed data may be stored by any known method, including the mechanisms of partial and minimum context switching. Saved or not the condition of the Packed data instruction set 390 transition frees the state of the Packed data. In this case, the Packed data affects tags 330 Packed Yes avannah data is performed by executing a single command EMMS or a sequence of floating-point operations, as will be described with reference to Fig. 14. As a result, the processor releases the state of the compressed data at any time in the interval 388 and is initialized to execute commands floating-point number.

Following the execution of a set of 390 commands transition is a second set of 384 commands floating-point number. Because tags were released and the pointer to the top of the stack is changed to point to the first physical register 0 during the second interval 388, all registers of the floating-point available for use. This prevents the generation of the exception stack overflow floating-point, which may otherwise occur after the command execution floating-point number. In some implementations, the software, a stack overflow condition can cause a program interrupt to save and liberate the state of the compressed data.

Thus, in embodiments implementing the present invention valid blocks mixed teams Packed data and floating point. However, the corresponding accounting must be performed by a programmer of the application program or cooperative multitasking code, Chu teams Packed data and floating-point to the state of the task has not been damaged during transitions. In addition, this method avoids unnecessary exceptional situations, which otherwise would cause the use of deprecated methods, programming, use is made of embodiments of the present invention.

EMMS command allows a smooth transition between the thread Packed data and command flow floating-point number. As stated above, it cleans tags floating point to avoid any overflow conditions floating-point, which can occur and, in addition, resets the pointer to the top of the stack, stored in field 350 of the top of the stack. Although it may be performed by a specialized team that performs these operations, it is expected, and in the scope of the present disclosure, the operation of this type can be performed using a combination of existing teams floating-point number. An example of such method is shown in Fig. 14. In addition, this functional can be reduced to the first command floating-point after the command execution Packed data. In this embodiment, the first command with a floating command Packed data would cause the processor to perform an implicit operation EMMS (set all tags to the empty state).

Fig. 5 shows a block diagram illustrating example computer system 500, according to one variant embodiment of the invention. Shown in the example computer system 500 includes a processor 505, a memory device 510 and a bus 515. The processor 505 is connected to the memory device 510 bus 515. In addition, a number of devices I / o user, such as a keyboard 520 and the display 525, also connected to the bus 515. The network 530 may also be connected to the bus 515. The processor 505 is a Central processing unit of any type of architecture such as CISC (system with a full set of commands), RISC system with a reduced set of commands), VLIW (a system with a very long command word), or hybrid architecture. In addition, the processor 505 may be implemented on one or more chips. The memory device 510 represents one or more mechanisms for storing data. For example, the memory device 510 may include a read only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, memory devices with parallel erasing and/or other machine readable medium. The bus is the ler bus). While this alternative implementation is described relative to a single processor computer system, the invention can be implemented in a multiprocessor computer system. Moreover, while this alternative implementation is described with respect to 32-bit and 64-bit computer system, the implementation of the invention is not limited to such computer systems.

Fig. 5 additionally illustrates that the processor 505 includes a device 545 bus, the cache memory 550, the device 560 system commands, the device 565 memory management and device 570 event processing. Of course, the processor 505 contains additional schemes that are not necessary for understanding of the invention.

The device 545 bus is connected to the cache memory 550. The device 545 bus is used for monitoring and evaluation of signals generated outside the processor 505, and also for coordinating signals output in response to input signals and internal requests from other devices and mechanisms in the processor 505.

The cache memory 550 represents one or more areas of memory for use by the processor 505 as a cache memory and cache data. For example, in one Ariadne for data. The cache memory 550 is connected to the device 560 system commands and device 565 memory management.

The device 560 command system includes hardware and/or software and hardware to decode and execute at least one command system. As shown in Fig. 5, the device 560 command system includes a decoder/execution 575. The decoder is used to decode commands received by the processor 505, the control signals and/or the entry point of the firmware. In response to these control signals and/or entry points microcode execution units performs the appropriate operations. The decoding device may be implemented using any number of different mechanisms (e.g., lookup tables, hardware implementation, PLA, and so on ). While executing various commands devices decoding and execution is represented by a sequence of conditions "if/then", it is clear that the command does not require sequential processing of these conditions "if/then". Rather, any mechanism for logical these conditions "if/then" is considered in the scope of the invention.

The device 575 decode/execution of the Packed data may be implemented to perform any number of different operations. For example, these commands Packed data when executed may cause the processor to perform the Packed floating-point and/or Packed integer operations.

In one embodiment, these commands Packed data are those described in "A Set of Instructions for Operating on Packed Data," dated August 31, 1995, ser. room 08/521,360. In addition to the teams Packed data system 580 may include new commands and/or command, similar, or the same, what are the existing universal processors. For example, in one embodiment, the processor 505 supports an instruction set which is compatible with the instruction set architecture of Intel used the known processors such as the Pentium processor.

Fig. 5 also shows the device 560 set of commands that includes the memory device 585. The memory device 585 represents one or more sets of registers on the processor 505 to store information, including data floating-point, Packed data, integer data, and control data (for example, EATING indication, TS indication, a pointer to the top of the stack, and so on ). In some embodiments, the implementation, not floating point.

Device control memory 565 is a hardware and hardware-software means for implementing one or more control circuits of the memory, such as paging and/or segmentation. While it can be used any number of control circuits of the memory, in one embodiment, implements a memory management scheme that is compatible with the architecture of the Intel processor. Device 570 will handle the event connected to the unit 565 memory management and device 560 system commands. Device 570 will handle the event is a hardware and hardware-software means for implementing one or more processing circuit events. While can be used any number of schemes for event processing, in one embodiment, is implemented by the processing circuit events, compatible with the architecture of the Intel processor.

Fig. 5 also illustrates that the device 510 memory has stored an operating system 535 and program 540 Packed data for execution by the computer system 500. The program 540 Packed data is a command sequence that includes one or more

teams Packed the e shown), which is not necessary for the understanding of the invention.

While in one embodiment, are implemented in different display (e.g., EATING indication, TS indication and so on) using the bits in the registers on the processor 505, alternative implementation may use any number of ways. For example, an alternative implementation can save these display outside the chip (for example, in the memory device 510 and/or can use a variety of bits for each display. The term "memory" is used to refer to any mechanism for storing data, including locations in the memory device 510, one or more registers in the processor 505, etc.

Fig. 6A is a block diagram illustrating a device for combining state register Packed data state of the floating-point number, using two separate physical register file according to one variant embodiment of the invention. As these two physical file register combined, they logically appear to the software running on the processor as a single logical register file. Fig. 6A shows the device 600 of transition, is a rule device 135 floating-point in Fig. 1. The device 605 floating-point includes a set of registers 615 floating-point, set tag 620, register 625 state of the floating-point unit 630 references on the stack floating-point number. In one embodiment, the device 605 floating-point number includes eight registers (labeled R0-R7). Each of these eight registers is 80 bits and contains the sign, the order field and a mantissa field. The device 630 references on the stack floating-point uses a set of registers 615 floating like a stack. Register 155 state of the floating-point number includes a field 635 of the top of the stack to save the pointer to the top of the stack. As described above, the pointer to the top of the stack identifies which register in the register set 615 floating at the present time is the top of the stack floating-point number. In Fig. 6A is a pointer to the top of the stack identifies the register 640 in the physical location of the R4 as ST(0) is the top of the stack.

In one embodiment, the tag set 620 includes eight tags and stored in a single register. Each tag corresponds to a different register floating-point and contains two bits. Alternatively, each of the tags can be perceived as sukasana in Fig. 6A, tag 645 corresponds to a register 640. As described above, these tags are used by the device 605 floating-point to distinguish between empty and non-empty location register. As described above, an implementation option can be used single-bit tags identifying or empty or non-empty state, but to make these single-bit tags to appear to the software as containing two bits, determining the corresponding case of double-bit value of the tag when the value of the tag. Of course, an alternative implementation may implement a case of double-bit tags. In any case, we can assume that the tags identify two States: empty, which is designated 11, and is not empty, denoted by any one of 00, 01, or 10.

The device 610 Packed data is used to store the compressed data and includes a set of registers 650 Packed data (also called a register file compressed data), the register 655 state Packed data and device 660 Packed data with treatment, other than treatment to the stack. In one embodiment, the set of registers 650 Packed data includes eight registers. Each of these eight registers corresponds to a different R and is displayed on a 64-bit mantissa field of the register floating-point which it corresponds. Device 660 Packed data with treatment, other than treatment to the stack, operates with register 650 Packed data as a fixed register file. Thus, the team Packed data clearly indicate which register in the register set 650 Packed data should be used.

The device transition 600 combines the register 650 Packed data register 615 floating-point copying data between two physical files of the register. Thus, the device 600 transition causes a physical register 615, floating-point physical register 650 Packed data logically appear as a single logical register file to the user/programmer. In this way, it appears the software, as if only a single logical register file is available to execute commands with floating point and Packed data. The device transition 600 may be implemented using any number of techniques, including hardware and/or microcode. Of course, in alternative embodiments of the exercise device of transition 600 can be placed anywhere on the processor. In addition, in alternative embodiments, the implementation of the tx2">

The device transition 600 may be performed, in order to provide full or partial combination. If the contents of all physical registers floating-point copies Packed in a register file data during navigation mode Packed data, the physical register file floating-point is fully combined with the Packed register file data. Similarly, if the contents of all physical registers in the Packed data is copied into the register file floating-point during navigation mode floating-point physical register file compressed data is fully combined with the physical register file floating-point number. In contrast, when overlap is copied the contents of only those registers that contain "useful" data. What registers contain useful data may be determined based on any number of criteria. For example, a partial alignment can be performed by copying the physical register Packed data is stored only in those physical registers floating-point whose corresponding tags indicate a non-empty state. Of course, an implementation option can use tags floating the CSOs combine (overlay) physical registers in the Packed data to the physical registers floating-point number. Alternatively, the registers Packed data and/or registers floating-point, which was handling (reading and/or writing) can be considered as containing useful data. Tags floating point can be used for this purpose instead of or in addition to indicating an empty or non-empty (state). Alternatively, an additional display can be enabled registers for floating-point and/or Packed data for recording which registers. When performing the overlap good examples of programming must accept that those registers in which data has not been copied during the shift shall be considered as containing null values.

Register 655 state Packed data includes a set of untreated fields 665 Packed data field 670 risk, field mode, 675, box 680 state of exception and EMMS box 685. Each of the untreated fields 665 Packed data corresponds to a different one of the registers 650 Packed data and is used for storing crude indication. Since there is a corresponding relationship between the registers 650 Packed data registers 615 floating-point quoi. When the value is written in one of the registers 650 compressed data, corresponding to a crude indication of this register is changed to indicate the crude state. When the device transition 600 causes a transition from device 610 Packed data to the device 605 floating-point units are written to the fields of the sign and order of those registers 615 floating point whose corresponding untreated indication indicates the raw status. Thus can be executed step 430 in Fig. 4B.

Box 675 mode is used for storing display mode, which identifies the mode in which the processor is currently operating - mode floating-point number, in which the device 605 floating point currently used, or Packed data, which uses the device 610 Packed data. If the processor is in idle mode, floating point and accepted the team Packed data, should be made the transition from the regime of floating-point to Packed mode data. On the contrary, if the processor is in idle mode, the compressed data, and accepted the command, floating point, it should meet the transition from the regime of Packed data mode with plausbile respondents display mode, to determine whether the transition. If the transition is necessary, the transition is performed, and accordingly, the display mode is changed. The operation display mode will be further described with reference to Fig. 7A-9.

Box 680 state of exception is used to store the status display of exceptional situations. Indication of the state of exception is used during command execution Packed data to identify whether there are any pending exception from the execution of previous commands using floating point. In one embodiment, if the status indication of the exception indicates that such an exceptional condition pending, such exceptions are serviced before the transition to the regime of Packed data. In one embodiment, the display used by the device 605 floating for this purpose, or are encoded, or directly copied into the status field of the exception as a status indication of the exception.

EMMS box 685 is used to store the EMMS indication that indicates whether the last command executed Packed data EMMS CTI, that last command executed Packed data was EMMS command. On the contrary, when all the other teams Packed data, EMMS indication is changed to zero. The device 600 transition polls EMMS indication when the transition from the regime of Packed data mode to the floating-point to determine whether the last command Packed data EMMS team. If the last command executed Packed data was EMMS command, the device 600 of the transition changes all tags 620 on the empty condition. However, if EMMS indicates that the last command executed Packed data was not EMMS, the device 600 of the transition changes all tags 620 to a non-empty state. Thus, tags are changed in a similar way as in steps 432 and 440 in Fig. 4B.

670 risk is used to store the indication of risk, which identifies whether or not the switching from floating-point to Packed mode data is risky. If the transition is risky, can be saved time, if you want to navigate back to the device 605 floating-point number. The operation display mode will be further described with reference to Fig. 7A-9.

Fig. 6B is a block diagram illustrating in increased mntm embodiment of the invention. Fig. 6B shows the device 630 floating point with reference similar treatment to the stack containing the device 690 modifier tags to selectively change the tags in the tag set 620. In the embodiment shown in Fig. 6B, each of the set of tags 620 contains only 1 bit to indicate - empty or non-empty. The device modifier 690 tag includes a set of devices 696 TOS adjustments and device 698 inspection/modification. Each device 696 TOS adjustment connected to the tire 692 micro-operations for obtaining one or more micro-operations, depending on the implementation (for example, there may be only one device TOS adjustment, which takes only one micro-operations). At least the micro-operation for commands floating-point number, which require that the tags have been changed, are devices TOS adjustments 696. Of course, the device 630 floating point with reference similar treatment to the stack, can be implemented so that all or only the relevant part of each micro-operation is accepted by the device 696 TOS adjustments.

In response to receiving the micro-operation of the device TOS adjustment provides 698 inspections/modifications mengistie, want to perform with this(and) tag(s) (e.g., change from 0 to 1, interview). The poll tag is not necessary for understanding of the invention, it is not described further. Each device 696 TOS adjustments also are connected with lines 694 to obtain the current value of the TOS and adjustments address(es) tag(s), respectively. The device 698 inspections/modifications connected to each of the tags 620 at least line account. For example, a device verification/modification 698 is connected to Daegu 654 line account. In response to receiving the address tag and the corresponding signals, the device 698 validation/modification performs the required checks and/or modification. In implementations in which multiple micro-operations can be taken at the same time, the device 698 validation/modification also performs a comparison between micro-operations to determine change whether they are the same tags (for example, micro-operations 1 requires that the tag 1 was changed to 1, while the micro-operations 2, which was obtained simultaneously with the micro-operations 1, requires that the tag 1 was changed to 0). If the same tag is changed, the device 698 validation/modification determines which micro-operations should be vipolniala 2 must be performed after the micro-operation 1, the device 698 validation/modification will change the tag 1 to point 0.

For example, if you were performing a floating point operation that requires the change of a tag (e.g. tag 645) in the empty state, the device TOS adjustment will take the current value of the TOS and the micro-operations along the lines 692 micro operations that identifies the tag. The device TOS adjustments will determine the address of the tag (e.g. tag 645) and give this address, as well as signals indicating that the tag should be changed to a blank state, 698 inspection/modification.

In response, the device 698 validation/modification will change the tag 645 on the empty condition, passing the 0 line record, coupled with the tag 645. In one embodiment, as the teams floating point can be made so that not all tags must be changed at the same time, the device 690 modifier tag is implemented so that it can't change all tags at the same time. To avoid complicating the diagram can be made a global change tags in response to the switch to a floating-point number, using this known mechanism. In this regard, if the device 600 transition is made in the microcode instruction set microcode will make om, in response to the execution of the transition to the regime of Packed data, while EMMS indication indicates that the most recently executed command Packed data was EMMS command, the decoder will apply to the device 600 of the transition and will give several well-known micro-operations. In response to these micro-operation device 690 modifier tag will change the appropriate tags to the empty state. On the contrary, in response to execution of the transition to the regime of Packed data, while EMMS indication indicates that EMMS team was not the most recently executed command Packed data, the decoding device will appeal to device 00 transition and will give several well-known micro-operations, which will cause the device 690 modifier tag to modify each of the tags to a non-empty state. In this embodiment, global change tags may require approximately 4-8 cycles synchronization.

While one version of the implementation was described to change all the tags in response to the transition to the regime of Packed data, alternative implementation may use any number of mechanisms. For example, changing all the tags in an empty or non-empty state can be completed in about what about it could globally change the tags, receptive to new micro-operation. In this embodiment, the device 600 may be implemented so as to force the decoder to issue this single micro-operations (instead of several separate operations) to change all the tags in an empty state or a non-empty state. As another example, the decoding device can be connected to the tag 620 and to include additional hardware to modify all tags 620 in response to the reception EMMS commands.

As described above, although the tag set 620 is described as having a single-bit tags, the tag set 620 can be implemented so as to appear as if having two bits for each tag. An alternative implementation may implement two bits for each tag include additional encoded or non-encoded lines to indicate different States (for example, 00, 01, 10, 11), which tags should be changed.

Fig. 7A, 7B, 7C, 8 and 9 illustrate the method in accordance with one embodiment of the invention for executing commands Packed data on the set of registers which are combined with a set of registers floating-point way that JW is which can be carried out using a diagram of the hardware of Fig. 6A. This functional diagram similar functional diagram described with reference to Fig. 4A and 4B. With reference to Fig. 4A and B has been described many alternative embodiments in which the stages were changed, moved and/or deleted. It should be clear that the steps described with reference to Fig. 7A, 7B, 7C, 8 and 9, which are similar to the steps performed in Fig. 4A and 4B, can at least be performed using such options. Functional diagram begins at step 700. After step 700 goes to step 702.

As shown in step 702, a set of bits treated as a command, and goes to step 704. This set of bits includes the operation code that identifies the operation(s) that must be completed by the team. Thus, step 702 is similar to step 402 in Fig. 4A.

At step 704 determines whether the operation code is valid. If the opcode is not valid transitions to step 706. Otherwise, goes to step 708. Step 704 is similar to step 404 in Fig. 4A.

As shown in step 706, formed an exceptional situation on the inadmissibility of the operation code, and executes the appropriate handler sobytie. If the command is not a command, floating point, neither team Packed data transitions to step 710. However, if the command is a command floating-point transitions to step 712. On the contrary, if the command is a team Packed data transitions to step 714. Thus, step 708 is similar to step 408 in Fig. 4A.

As shown in step 710, the processor executes the command. Because this step is not necessary for the understanding of the invention, it is not further described. Stage 710 is similar to step 410 in Fig. 4A.

As shown, at step 712 determines whether the EM indication is equal to 1 (according to the mentioned agreement for the software, if the device is a floating point should be emulated) and whether the TS indication 1 (according to the mentioned agreement for the software, if a partial context switch was used). If you EAT readout and/or the TS indication is equal to 1 transitions to step 716. Otherwise, goes to step 720. Thus, step 712 is similar to step 412 in Fig. 4A.

At step 716 is formed the exception of device unavailability, and runs sootvetstvuyushiye can be implemented to use EM and TS display, to determine whether to emulate the team floating-point and/or whether to perform a partial context switch.

At step 714 determines whether EATING indication 1. Thus, step 714 is similar to step 414 in Fig. 4A. As a result, if it is determined in step 714 that the EM indication is equal to 1 transitions to step 706, and not to step 716. Otherwise it moves on to step 718.

As described above, at step 706 is formed an exceptional situation on the inadmissibility of the operation code, and executes the corresponding event handler. Rejecting the attempted command Packed data, while EATING = 1, to an exceptional situation on the inadmissibility of the operation code, an implementation option is invisible to the operating system, as described above with reference to step 406 in Fig. 4A.

While one variant of implementation has been described for treatment of EATING indication in the way that is invisible to the operating system, an alternative implementation may use other methods. For example, an alternative implementation may also generate exception of device unavailability, another existing event or a new event is ve another example, an alternative implementation may ignore EATING indication while executing Packed data.

As shown, at step 718 determines whether the TS indication 1 (according to the mentioned agreement for the software, if you have been involved partial context switch). If the TS indication is equal to 1 transitions to step 716. Otherwise, goes to step 722. Thus, step 718 is similar to step 418 in Fig. 4A.

As described above, at step 716 is formed exception unavailable device, and executes the corresponding event handler. Step 716 is similar to step 418 in Fig. 4A. Because the step 714 rejects a situation where EM indication is equal to 1, an exception on the inadmissibility of the opcode, EAT readout must be equal to 0, and TS readout must be equal to 1. Since TS is equal to 1, the event handler causes the processor to function as described above with reference to a partial context switch saves the contents of the device floating-point and restores the correct state of the floating-point number if necessary) and causes the processor to continue execution, resuming the execution of the command received at step 702. The Packed data is combined with the state of the floating-point, this mod is from the way it is invisible to the operating system. Of course, alternative ways of implementation can perform this event handler by any number of techniques.

While one version of the implementation was described for processing a TS indication in the way that the operating system is invisible, alternative implementation may use other methods. For example, an alternative implementation may not perform the TS indication. This alternative implementation would not be compatible with operating systems that use the TS indication to perform a partial context switch. However, this alternative implementation would be compatible with known operating systems that do not support partial context switch using the TS indication. In another example, taken the command of the Packed data, while the TS indication is equal to 1, may be rejected to the new event handler or to an existing event handler that has changed. This event handler can be implemented to perform any action that is deemed appropriate in response to this situation. For example, in the embodiment, in which the condition is to save the state of the compressed data and/or the state of the floating-point number.

As described above, if during the execution of the command, floating point formed some numerical error, such errors are delayed until taken the following command to perform floating-point number, whose execution can be interrupted to service them. As described above, in steps 420 and 422 in Fig. 4 is determined whether there are any such pending errors that can be serviced. Similar to step 420 in Fig. 4A, at step 720 determines whether there are any such pending errors that can be serviced. If there are any such pending errors made the transition from step 720 to step 724. However, if at step 720 it is determined that there are no such pending errors transitions to step 726. In contrast, determining whether there is any pending errors from the previous command floating-point during an attempted execution of a command Packed data is in a different stage, which will be further described later. As a result, step 722 is different from step 422.

At step 724 is formed, an error event latency floating-point number. Thus, step 724 is similar to step 424 in Fig. 4A. As described above with reference to step 424 of Fig. 4A, this event may obrana, at step 726 determines whether the display mode that the processor is operating in mode floating-point number. Thus, step 726 is different from step 426 of Fig. 4B. If the processor does not work on the floating-point processor will need to be moved from mode to the Packed data mode to the floating-point to run floating point. Thus, if the processor is not operating in the mode of floating-point transitions to step 728. Otherwise it moves on to step 732.

At step 728, the processor translates mode Packed data mode, floating-point, and goes to step 730. Step 728 is performed by the device 600 of the transition in Fig. 6A and will be further described with reference to Fig. 9.

As shown in step 730, the command received at step 702, restarted, performing microprosopus ("microreactors"). Because in one embodiment, at step 728, using microcode, and the team microperipherals, there is no need to implement event handlers operating system.

As a result, the execution of the current task can be continued without any external processor actions, there is no need what we have. Thus, the processor can switch from mode to the Packed data mode to the floating-point method, which is invisible to the software, including the operating system. Thus, this implementation is compatible with well-known operating systems. Alternative implementation could be implemented as compatible. For example, an event could be included in the processor, and additional event handler can be added to the operating system to perform this transition.

As shown in step 732, the command is executed floating-point number. Step 732 is similar to step 426 in Fig. 4B. To be invisible to the operating system, one option exercise also changes the tags as necessary, reports any numerical errors, which can now be serviced, and performs any other delay numerical errors. As described above, changing the tags this option allows the implementation to be invisible to the operating system for any of these methods, operating systems, which store the contents of only those registers floating-point whose corresponding tag specifies a non-empty with is compatible with fewer some ways the operating system. For example, if the operating system does not use tags, a processor that does not implement the tags still compatible with this operating system. In addition, in the invention it is not necessary to exceptional situations of floating-point numbers were detained, and thus, alternative implementation, in which it is not implemented, remain in the scope of disclosure of the present invention.

As shown, at step 722 determines whether the display mode that the processor is in idle mode, the compressed data. Thus, step 722 is different from step 422 in Fig. 4A. Step 722 is performed to determine whether the processor is in the appropriate mode, to run the Packed data. If the processor is not in the mode Packed data, the processor must be transferred from the mode floating point mode to the Packed data to run the Packed data. Thus, if the processor is not in the mode Packed data transitions to step 734. Otherwise it moves on to step 738.

At step 734, the processor translates mode floating point mode to the Packed data, and is peak shown in step 736, the command received at step 702, restarted, performing MicroStart. Thus, step 736 is similar to step 730.

At step 738 is determined whether the command is a Packed data EMMS team. If the team Packed data is EMMS team transitions to step 740. Otherwise it moves on to step 742. As the team Packed data performed on a single device (i.e. the device Packed data), it is more efficient to save the display (for example, EMMS indication), which identifies what needs to be performed in the step 728, when the transition back to the mode floating-point, and not the actual execution of certain operations (for example, change the tags to the empty state in response to execution of the EMMS commands, and modify tags on a non-empty state in response to execution of any other teams Packed data). Using the EMMS indication, as well as other indications, will be described with reference to the stage of transition from mode Packed data mode, floating-point, which is further described in Fig. 9.

As shown in step 740, the EMMS indication is changed to indicate that the last team Packed data was EMMS com and following commands, received at step 702).

As shown in step 742, the team Packed data is performed without forming any numerical exceptional situations. Thus, step 742 is similar to step 434 in Fig. 4B, except that the pointer to the top of the stack is not modified. As described above, alternative embodiments of which are not completely invisible to the operating system, can be implemented so that any additional event handlers were included in the operating system, or a famous event handlers changed to maintenance errors. If any event memory formed as a result of attempts to execute the command Packed data, the execution is aborted and the event served.

As shown in step 744, the indication of risk is changed to indicate that the transition from the regime of floating-point to Packed mode data is no longer risky. From step 744 transitions to step 746. The operation indication of the risk will be further described with reference to Fig. 8.

As shown, at step 746 is determined, does the team Packed data processor to perform the entry in any of the combined registers. If Yes, is prefig. 4V.

At step 748, the corresponding crude indication of the combined registers changes in the crude state, and transitions to step 750. These crude indication used in step 728, when the transition from the regime of Packed data mode floating-point number. As described above, these crude indication used to identify those registers floating-point number, whose field of the sign and the order should be filled with ones. While in one embodiment, the "1" is recorded in the field of the sign and order, alternative implementation may use any value that represents NAN (not a number) or infinity. Steps 746 and 748 would not be required in an alternative embodiment, in which the fields of the sign and the order has not been modified.

As shown in step 750, EMMS, the display changes to indicate that the last team Packed data was not EMMS team. After step 750, the system is released to execute the following command. Of course, an implementation option that is not used EMMS command will not require the stages 738, 740 and 750.

Thus, the described method and device for executing commands opakowania Microsoft Corporation, Redmond, Washington) and supporting professional programming techniques. The Packed data is combined with the state of the floating-point state of the Packed data will be saved and restored well-known operating systems, as if it was a state of floating. Furthermore, since the events generated by the execution of commands Packed data available to the service known by the event handler of the operating system, these event handlers will not be modified, and new event handlers will not be added. As a result, the processor is compatible top-down, and the update does not require the expenditure required to develop or modify the operating system.

Change this option, implementation, some of which have been described may be fully or partially compatible with such operating systems and/or maintain professional programming techniques. For example, an alternative embodiment of the invention can move some of the stages in an excellent location in the sequence of operations. Other embodiments of the invention can modify or delete one or more the of n would be in Fig. 6A. For example, if EMMS command is not used, EMMS indication is not required. Of course, the invention could be useful for any number of system architectures and is not limited to the architecture described here.

In addition, while the method and apparatus have been described for combining two physical register files, alternative implementation can combine any number of physical register files, to perform any number of different types of commands. Moreover, while this option has been described with reference to the physical register file stack to execute the command with floating-point and two-dimensional physical register file to execute commands Packed data, the method can be used for combining at least one physical register file stack and at least one physical two-dimensional register file, regardless of the type of commands that must be executed on the files of the register.

In addition, while the method and apparatus have been described for command execution floating point and Packed data, an alternative implementation may be executed to perform any number of different types of the R to perform Packed integer operations and/or operations Packed floating-point number. As another example, an alternative implementation can combine the physical files of the register to perform scalar commands, floating-point scalar integer commands instead of or in addition to the teams Packed data. As another example, instead of combining (blending) team Packed data registers floating-point alternative implementation can combine (overlay) team Packed data in integer registers. As another example, an alternative implementation may combine scalar floating-point scalar integer and Packed team (integer and/or floating-point) on a single logical register file. Thus described can be used to derive a logical manifestation of the software that a single logical register file is available to execute commands that operate on different data types.

Fig. 8 is a functional diagram illustrating a method for performing step 734 of Fig. 7C according to one variant embodiment of the invention. As described above, at step 754 CPU goes from Reims shows at step 800 determines whether there are any pending errors from the previous command with floating point. If Yes, then transitions to step 724. Otherwise, goes to step 804. Thus, step 800 is similar to step 720 in Fig. 7 and step 422 in Fig. 4A.

As described above, at step 724 is formed of a pending exception of floating-point errors, and executes the corresponding event handler. As described above with reference to step 424 in Fig. 4A, this event may be processed or an internal or external event, and maintained accordingly. In an alternative embodiment, these errors remain detained during the execution of the commands Packed data.

As shown in step 804, the data stored in the fields mantissa registers floating-point, are copied into the registers of the Packed data. In this execution with data that were stored in the registers floating-point, you can work with the Packed data. If implemented the full alignment, the data stored in the mantissa fields of all registers floating-point are copied into the corresponding registers of the Packed data. On the contrary, if implemented partially compatible registers floating-point whose corresponding tags indicate a non-empty state is copied into the corresponding registers Packed data. Alternative embodiments of which will not allow the data stored in registers floating-point be used as the compressed data does not require processing at step 804. After step 804 transitions to step 806.

At step 806 EMMS, the display changes to indicate that the last team Packed data was not EMMS team, and transitions to step 808. This step is performed to initialize the mode Packed data.

As shown in step 808, each of the crude display changes to indicate the cleared state, and transitions to step 810. Steps 806 and 808 are performed to initialize the mode Packed data.

As shown in step 810, an indication of risk is changed to indicate that the conversion from floating-point to Packed data is risky. Although the data stored in registers floating-point, were copied into registers Packed data at step 804, the device status floating point was not changed. Thus, the state of the floating Zap is quivalent data stored in registers Packed data; tags have not been modified; and a pointer to the top of the stack has not been modified). If the team Packed data sequence, the data stored in the registers of the Packed data will be changed, and the state of the floating-point no longer current. As a result, the transition from the regime of Packed data mode to the floating-point demand that the state of the floating-point modified (for example, data stored in registers Packed data must be copied into the fields of the mantissa registers floating-point; the pointer to the top of the stack must be changed to 0, and the tags should be changed to an empty state). However, if the command is executed floating-point taken before executing any commands Packed data (that may be, if the event is generated to command the compressed data, which caused the transition from the regime of floating-point to Packed mode data - for example, if during an attempted execution of a command Packed data failed memory), the state of the floating point should not be modified because it is still current. Avoiding this modification, decl the button to take advantage of this fact, at this stage, the indication of risk is changed to indicate that the transition from a device to a floating-point number to the device Packed data is risky, because floating-point is still current. Later, if the command is executed Packed data, the indication of risk is changed to indicate that the transition is no longer risky, as described above with reference to step 744 in Fig. 7. Use display risk further described with reference to Fig. 9. While has been described one version of the implementation that uses the indication of risk, alternative implementation can avoid performing such indication of risk.

At step 812, the display mode is changed to indicate that the processor is now in the mode of Packed data. From step 812 transitions to step 736.

In Fig. 9 shows a functional diagram illustrating the method step 728 of Fig. 7, according to one variant embodiment of the invention. As described above, the processor switches from mode Packed data mode to the floating-point phase 728. From step 726 transitions to step 900.

At step 900 determines whether the indication of the ri is to Lisovets, to reduce the overhead on the transition from the regime of Packed data mode floating-point number. If at step 900 is determined that the transition from floating-point to Packed data is a risky stages 902 - 912 miss, going directly to step 914, and the overhead costs of the transition decreases. Otherwise it moves on to the step 902.

As shown, at step 902 determines whether EMMS indication that the last team Packed data was EMMS team. If Yes, goes to step 904. Otherwise, goes to step 906. As described above, the fact that the teams Packed data performed on a single device (i.e. the device Packed data), makes it more effective conservation zone (for example, EMMS indication) that identify what needs to be done when you go back to the mode floating-point, than to perform some operations (for example, changing the tags). Thus, instead of changing the tags in response to EMMS command changes the EMMS indication. Then, when migrating back to the mode with

floating point tags are changed, respectively, as shown.

At step 904 all te cnym way in step 432 in Fig. 4V.

At step 906, all of the tags changed to a non-empty state, and transitions to step 908. Thus, tags are changed in a similar manner as at step 440 in Fig. 4V.

As shown in step 908, the contents of the registers in the Packed data is copied into the fields of the mantissa registers floating-point, and goes to step 910. Thus, data stored in registers Packed data, can be used as floating point data. In addition, as known operating systems already save the state of the floating-point while performing multi-tasking mode, the condition of the Packed data to be saved and restored from a different context structures, as if it is floating. Thus, the physical registers Packed data are combined with the physical registers, floating-point, and the processor logically appears as having a single logical register file. As a result, an implementation option is invisible to the software, including the operating system. If you made a full alignment, the data stored in all registers Packed data copious is s, variant exercise can be implemented so that the data stored in those registers Packed data, which have been treated, are copied to the fields mantissa appropriate appropriate registers floating-point number.

As shown in step 910, the top of the stack is combined with an initialization value. In one embodiment, this value is zero. In an alternative embodiment, any command execution Packed data sets the pointer to the top of the stack to the position equal to the initialization value. From step 910 is a transition to step 912.

As shown in step 912, the units are stored in fields of the sign and order of those registers floating-point whose corresponding untreated display located in the untreated condition. Thus, at step 438 of Fig. 4B. From step 912 transitions to step 914.

At step 914, the mode display changes to indicate that the processor is operating in mode floating-point, and transitions to step 736. Thus, you move from mode to the Packed data mode floating-point number.

Fig. 10 is a block diagram illustrating botswa one physical register file, according to another variant embodiment of the invention. The device shown in Fig. 10, can be used as a device 560 of the instruction set of Fig. 5. In one embodiment, the device of Fig. 10 at least capable of executing the set of commands 580. Fig. 10 shows the decoder 1002, device 1004 rename device 1006 seizures, the dispenser 1008, execution units 1010, a set of registers 1012 state and memory ROM 1014 microcode.

The decoding device 1002 is used to decode commands received by the processor, control signals and/or entry points in microcode. These entry points in microcode identify the sequence of micro-operations (also referred to as "uops"), which are transmitted to the decoding unit 1002 to different devices in the processor. While some micro-operation can be stored in the device 1002 decoding, in one embodiment, the majority of micro-operations stored in the memory ROM 1014 microcode. In this embodiment, the decoder 1002 transmits the entry point of the microcode memory ROM 1014 microcode, and she will Perelet back to the device 1002 decoding the desired micro-operations(and).

The device 1004 rename and device seizure 1006 are used to perform the rename register. How to rename a register is well known and is done to avoid memory conflicts resulting from different teams trying to use a limited number of memory cells, such as registers. The memory conflict, for example, occurs when these commands affect each other, even if the conflicting commands are otherwise independent. Memory conflicts can be resolved through additional registers (called buffer registers), which are used to reestablish the correspondence between registers and values. To perform a rename register, the processor typically assigns one of the buffer registers for each new received values: that is, for each command, which writes to the register. The team that identifies the original case, in order to read its value, and gets in return the value assigned to the buffer RDY, to identify the buffer register and the correct value. The same identifier register in several different teams can have access to various hardware registers depending on the locations of the reference case relative to the destination register for further description rename register, see Johnson, Mike Superscalar Micro Processor Design, 1991 PTR Prentice, Inc. , New Jersey; "Flag Renaming and Flag Mask Within Alias Table", Ser N 08/204,521, Col-well, and others , "Integer and Floating Point Register Alias Table Within the Processor Device", Ser N 08/129,678, Clift and others ; and "Partial Width Stalls Within the Register Alias Table", Ser N 08/174,841, Colwell and others When the team successfully completed (without calling any events that are not delayed), the team distributed the buffer registers, "taken" - the values are moved from the buffer registers the original registers are identified in the command. Alternative implementation may implement any number of ways to resolve conflicts of memory, such as deadlock, partial renaming, etc.

Device 1006 seizures includes a set of buffer registers 1020, set of FP/PD registers 1022 and a set of integer registers 1024. The set of buffer registers 1020 is additional registers used for perimeno the Ministers, alternative implementation can have any number of registers. In this embodiment, the set of buffer registers 1020 is used as a buffer reordering.

In one embodiment, the FP/PD register 1022 and an integer register 1024 are visible to the software: that is, they are registers that are specified in the command, and thus they are visible to the software that they are the only registers to perform floating point data, the compressed data and integer data. On the contrary, the buffer registers 1020 invisible to software. Thus, the FP/PD registers 1022 are the only physical register file, which is manifested in regard to software as a single logical register file. In one embodiment, the set FP/PD registers 1022, and a set of integer registers 1024 each contains eight registers in order to remain compatible with existing software architecture Intel. However, alternative options for implementation may implement any number of registers.

The device 1004 rename includes a device 1030 display FP/PD, table 1032 togda operand is taken by the device 1004 rename, it is determined whether the operand the operand floating-point operand Packed data or integer operand.

Integer operands accepted by the device 1040 display integer. Device 1040 display integer controls the table 1042 display integer. In one embodiment, the table 1042 display integer contains the same number of inputs as the integer registers 1024. Each of the inputs in the table 1042 display the integer corresponds to a different one integer register 1024; Fig. 10 input 1050 corresponds to the integer register 1052. When a command is received that will cause the processor to write into an integer register (for example, an integer register 1052), the device 1040 display the whole number shall appoint one of the buffer registers 1020, keeping the pointer in the corresponding entry integer register in the table 1042 display integer (for example, entry 1050), identifying an available register in the set of buffer registers 1020 (e.g., buffer register 1054). Data is written into the selected buffer register (for example, buffer register 1054). When the command, which was formed by the operand completed from the selected buffer register (for example, buffer register 1054) into the corresponding integer register (for example, an integer register 1052) and causes the device 1040 display integer modify the contents of the input (for example, entry 1050) to indicate that the data stored in the corresponding integer register login.

When adopted, the command that causes the processor to read an integer register, the processor accesses the contents of the respective input an integer register in the table 1042 display integer (for example, entry 1050) using a device 1030 display FP/PD. If the entry contains a pointer to the buffer register (for example, buffer register 1054), the processor reads the contents of this buffer register. However, if the content of this entry indicates that data stored in the corresponding integer register input (for example, an integer register 1052), the processor reads the contents of the corresponding integer register login. Thus, the integer register 1024 implemented as a fixed register file in this embodiment of the invention.

Device 1030 display FP/PD controls table 1032 display FP/PD and tags 1034. As described above, each of the ETL, table 1032 display FP/PD contains the same number of inputs as the registers in the FP/PD registers 1022. Each input table 1032 display FP/PD corresponds to a different FP/PD register 1022. The operands of floating point and Packed data are accepted by the device 1030 display FP/PD appear on the buffer registers 1020 and taken to the registers 1022 FP/PD. Thus, the state of the floating-point and Packed state data are combined (superimposed) on single file registers visible to the user. As known operating systems are implemented to cause the processor to save the state of floating in multitasking mode, the same operating system will force the processor to store any state of the compressed data, which is combined with registers floating-point number.

In one embodiment, the operands of the Packed data processed in a manner analogous to the integer operand registers Packed data implemented as a fixed register file. Thus, when the team Packed data adopted that will cause the processor to record in the register FP/PD device 1030 display FP/PD assigns one buffer/PD identification, indicating an available register in the set of buffer registers 1020. Data is written into the selected buffer register. When the command, which was formed by the operand completed without any interruption (without any received events), device seizure 1006 transmits data by copying them from the selected buffer register in the appropriate register FP/PD (registers FP/PD, which correspond to the entry in the table 1032 display FP/PD), and causes the device 1030 display FP/PD to modify the sign-in table 1032 display FP/PD to indicate that the data stored in the corresponding register FP/PD input.

While the registers are implemented as a fixed register file while executing Packed data, one alternative embodiment of the invention implements registers as a register file with a case similar treatment to the stack, while executing floating-point method, which is compatible with known software Intel architecture (including operating systems). As a result, the device 1030 display FP/PD should be able to use the table 1032 display FP/PD and as a fixed register file operand Packed data, and how the stack operands from the within box 1072 top of the stack. Box 1072 top of the stack is used to save display the top stack that identifies the entry in the table 1032 display FP/PD, which is a register, which is currently the top of the stack floating-point number. Of course, alternative implementation may use registers as two-dimensional register file while executing floating-point number.

When the team floating-point is accepted, it will cause the processor to record in the register FP/PD device 1030 display FP/PD changes the pointer to the top of the stack and assigns one buffer register 1020 by storing at the top of the stack corresponding register entry in the table 1032 display FP/PD index that identifies the available register in the set of buffer registers 1020. Data is written into the selected buffer register. When the command, which was formed by the operand completed without any interruption (without any received events), device seizure 1006 transmits the data by copying them to the selected buffer register in the appropriate register FP/PD (registers FP/PD, which correspond to the entry in the table 1032 display FP/PD), and causes the device 1030 displays the register FP/PD input.

When the team floating-point is accepted, it will cause the processor to read the register FP/PD, the processor accesses the contents of the corresponding input register of the top of the stack to the table 1032 display FP/PD and accordingly modifies the stack. If the pointer to the buffer register is stored in this entry, the processor reads the contents of this buffer register. However, if the contents of the entry indicates that the data stored in the corresponding FP/PD entry in the FP/PD registers 1022, the processor reads the contents of this register FP/PD.

Thus, since the unit 1030 display FP/PD displays a floating-point operands in the register file with a case similar treatment to the stack, the inputs in the table 1032 display FP/PD should be available on top of the stack. On the contrary, since the unit 1030 display FP/PD displays the operands of the Packed data for a fixed register file, the entries in the table 1032 display FP/PD should be available regarding the register R0. To cause the processor to access the inputs in the table display the FP/PD relative to the register R0, the pointer to the top of the stack must be modified to specify the register R0. Therefore, the pointer to the top of the stack must be of the Vano, modifying the pointer to the top of the stack to indicate the register R0 during transitions from mode floating point mode to the Packed data and do not change the pointer to the top of the stack during the execution of the commands Packed data. Thus, the same scheme used to display the stack floating-point number that can be used to display a fixed Packed register file data. As a result, the complexity of the circuit is reduced and the chip area is saved in the embodiment described with reference to Fig. 6A. While has been described an implementation option, in which one and the same scheme is used to display the operands and Packed data and floating-point alternative implementation may use a different scheme.

Regardless of the type of the command being executed in one embodiment, the appointment and dismissal of the buffer register is managed in the same way. Device seizure 1006 includes a register 1060 state with the destination box 1062 and exceptions field 1064. The destination box 1062 stores a pointer assignment, identifying the next buffer register that should be used. When or device 1030 atabrine in the corresponding table display (i.e. device 1030 display FP/PD or table 1042 display integer and a pointer to the destination increases. In addition, the device 1004 rename transmits to the device exemptions 1006 signals indicating whether the command is a team Packed data and whether the processor mode, the compressed data.

In the assigned buffer register device exemptions 1006 stores the indication of readiness in the box 1082 readiness. Indication of the readiness of the original is changed to indicate that the buffer register is not ready for removal. However, when data is written in the box 1080 data from the buffer register, the availability indicator buffer register is changed to indicate that the buffer register is ready for removal.

Box 1064 seizures register 1060 state stores a pointer exceptions, identifying the next buffer register that you want to remove. When the indication of the readiness of this buffer register is changed to the ready state, the device 1006 withdrawal must determine whether any of the data in this buffer register to be transferred. As further described below, one version of the exercise device 1006 withdrawal does not transmit data, if any exceptions were formed (for example, the exceptional situation of device unavailability exception error saderia transitions between modes Packed data and floating point. If the data can be transferred, the data is copied into the appropriate FP/PD or an integer register, and index the exemption is increased to point to the next buffer register. While the withdrawal and destination pointers were described as stored in the control register, an alternative implementation may keep these pointers, as well as any other information (for example, EMMS display, mode indication, and so on ), is described here in some form of sequence elements such as triggers.

While has been described an implementation option, in which the device exemptions 1006 includes three separate set of registers and data is transferred from the buffer registers to the registers FP/PD or integer registers, an alternative implementation may be implemented to include any number of different sets of registers. For example, one alternative implementation may include a single set of registers. In this embodiment, each register in the register set may include an indication that identifies if the data stored in it, have been transferred.

In one embodiment, the processor is or in the processor may not properly execute any command Packed data and Vice-versa. As a result, to transfer the data stored in the buffer register, the device exemptions 1006 determines whether the data is compressed data, and if the processor mode Packed data. If the data is compressed data and the processor is not in the mode Packed data, called the device transition 1036 contained in the microcode ROM 1014 to perform the transition to the regime of Packed data. In one embodiment, determines whether the processor mode Packed data by determining whether the pointer is changed to the top of the stack is initialized (for example, to specify the register R0), and all tags 1034 are non-empty state.

There are a number of ways to cause the processor to poll the pointer to the top of the stack and tags 1034 to determine whether the processor is in a mode of Packed data. For example, as described above, the device 1002 decoding refers to the micro-operations from the memory ROM 1014 microcode. These micro operations include encoded field to identify the corresponding display that you want to perform device 1030 display FP/PD (for example, to increase the value of the pointer to ve what she least one additional encoded bit pattern (called "Packed bit pattern data) is included to identify the mapping for the teams Packed data. Thus, when the device 1002 decoding takes the team Packed data and accesses memory ROM 1014 microcode, at least one of the operations transferred to the decoding unit 1002 includes a Packed bit pattern data.

After the adoption of the micro-operation that contains Packed bit pattern data, the device 1030 display FP/PD: 1) determines the state of the tags 1034 and a pointer to the top of the stack; 2) transmits to the device exemptions 1006 signal(s) indicating whether the switch to a Packed data (in one embodiment, transmitted mode of the processor and the command type). In response, the device exemptions 1006 stores in any buffer registers designated by the command, the indication of the transition box 1084 transition (in one embodiment, the indication of the transition includes a first bit indicating the mode of the processor, and the second bit indicating a command type). Thus, if the command is a team Packed data and the processor is not in the mode Packed data, the display mode of the respective buffer registers changes to indicate that you want to upgrade. Otherwise, the mode display changes to indicate that the transition is not required. When Onesti, device seizure 1006 checks the indication of the transition. If the indication of the transition indicates that the transition is not required, and if the data can be otherwise removed (for example, there are events that must be serviced, the data is removed. On the contrary, if the indication of the transition indicates that the transition is required, the device exemptions 1006 transmits the entry point of the firmware for the device transition 1036 memory ROM 1014 MicroCal. In response, the memory ROM 1014 microcode sends the necessary micro-operation to switch the processor to the Packed data.

Thus, the embedding mode Packed data causes only minor complications. Of course, an alternative implementation can perform this functionality in any way, using:

1) the availability of the device 1002 decoding transmit special signals after making the team Packed data, which make the device 1004 rename to interrogate the tags and the pointer to the top of the stack; 2) adding bits to all micro-operations to indicate whether the tags and the top of the stack to be interviewed; 3) the availability device 1030 display FP/PD interrogate tags and a pointer to vergati device 1030 display FP/PD, when the Packed data item is ready to be transferred, and the availability device 1030 display FP/PD to cause the device to transition 1036, if the processor is not in the mode, compressed data, etc., while in one embodiment, determines whether the processor mode, the compressed data on the basis of the pointer to the top of the stack and tags 1034, alternative implementation may use any method, including display mode, as described above.

As described above, the device transition 1036 is used to translate the processor mode floating mode Packed data. The device transition 1036 causes the processor to change the pointer to the top of the stack initialization value and change all the tags 1034 to a non-empty state. Thus, the device 1004 rename is initialized to execute commands Packed data. After the completion of the transition team, which caused the transition from the regime of floating-point to Packed mode data microperipherals. As a result, are not required necrologue event handlers (including handlers operating system), and an implementation option is Nevada, alternative implementation may place the device transition 1036 somewhere in the processor. In another alternative embodiment, the device 1036 transition may be implemented to perform transitions from mode floating point mode to the Packed data. During this transition unit 1036 move will save the current pointer to the top of the stack in memory and change the pointer to the top of the stack is initialized. When the device transition 1030 is called again to switch back to the mode floating-point unit 1036 transition will restore the previous cursor to the top of the stack. In addition, in alternative embodiments, the exercise device 1036 transition can be realized in hardware or as necrotomy event handler that is stored outside of the processor.

As described above with reference to one version of the exercise, each group of teams Packed data should be completed EMMS team. In response to the execution of the EMMS command device 1010 execution forces the device 1004 rename to change the tags 1034 in the empty state. Thus, after execution of the EMMS command processor mode is floating the initialization as described above, the pointer to the top of the stack was changed to the initialization value at the transition to the regime of Packed data and has not been modified during execution of commands Packed data). As a result, the device transition is not required to perform the transition from the regime of Packed data mode floating-point number. This differs from the device of transition described with reference to Fig. 6A, which should be called to switch the processor back and forth between modes, floating-point and Packed data. In addition, because it uses a single combined file registers for States with floating point and Packed data, this transition is not required for copying data between two separate register files. As a result, the complexity of the circuit is reduced and the chip area is stored in the processor.

In other alternative embodiments, the implementation of the change tag and pointer to the top of the stack could be wholly or partly executed after command execution Packed data. For example, the needs in the transition could have been avoided: 1) forcing to execute each command the compressed data, which is not EMMS command to change the pointer in which NDI, to change the tags to the empty state. In another alternative embodiment, EMMS command is not executed, but is emulated by using floating-point, as described below with reference to Fig. 14.

Device 1008 issue is a buffer to store commands and their operands. Device 1008 issue may be implemented as a sequence of places redundancy, the Central window of the team or their hybrid. Using places of redundancy, each of the functional units (e.g., arithmetical-logical unit) has its own buffer to store the commands and information that identifies their respective operands. In contrast, when using a Central window command, uses a Central buffer that is shared by all functional units to save commands and information that identifies their respective operands. The corresponding operands of the command can be in several different forms depending on what information is available. If actual data is not available, the corresponding operands of the team identify any registers in the register set 1022 FP/PD, the set of registers 1024 integer or set of buffer returnee, then these data are stored in the buffer. In one embodiment, the dispenser 1008 also receives information from the device 1004 rename. However, this information is not necessary for understanding the invention.

The dispenser 1008 outputs the command to the device performing 1010, when the necessary information is acquired. The device 1010 execution executes the command. The device 1010 perform transmit any information of the operand, which must be saved to the device exemptions 1006 to save, as described above. In one embodiment, as commands can be delayed in the dispenser 1008 due to the lack of information of the operand, the device 1010 also perform transmit any information of the operand to the dispenser 1008. Thus, avoid any additional delays that can be caused by sending information of the operand to the device exemptions 1006 and then to the dispenser 1008. The device 1010 execution connected to the register 1012 state. Registers 1012 state retains control information for use by the execution units 1010. Such control information may include EATING indication and the TS indication, as described above. Devices the power load/save) to align different data types, accessed from the device exemptions 1006. The operation of the data alignment will be further described with reference to Fig. 12 and 13.

Change tags 1034 may be implemented using any number of different mechanisms. For example, in Fig. 10 shows a device 1030 display FP/PD also contains a device modification tag 1092 to change the tags. Device modification tag 1092 can be implemented in any number of ways, including those described with reference to Fig. 6V.

For example, in one embodiment, because the teams in floating point can be made so that not all tags must be changed at the same time, the device modification tag 1092 implemented in such a way that it can't change all tags simultaneously (one such option has been previously described with reference to Fig. 6B). To avoid complicating the diagram, global change tags in response to a transition to a state Packed data or in response to execution of the EMMS commands can be implemented using this known mechanism. In this regard, the set microcolony teams submitted EMMS device 1094, can be stored in the ROM 1014 microcode to perform EMMS coma to give several well-known micro-operations to change each of the eight tags. Thus, in response to receiving EMMS command device 1002 decoding will appeal to EMMS device 1094 and give several well-known micro-operations. In response to each of these operations the device modification tag 1092 change the appropriate tags to the empty state. On the contrary, in response to the accessing device transition 1036, the device 1002 decoding will produce several well-known micro-operations, which will make the unit modification tag 1092 modify each of the tags to a non-empty state. In this embodiment, global change tags may require approximately 4-8 cycles synchronization.

While one version of the implementation was described to change all the tags in response to a transition or EMMS team, alternative implementation may use any number of mechanisms. For example, changing all the tags in an empty or non-empty state can be completed in one cycle synchronization enabled, the new micro-operation and the implementation of a hardware modification tag 1092 so that it could globally change the tags (one such an implementation option for a device modification tag 1092 described with reference to Fig. 6B) in response to a new micro-operations. In this embodiment, the wasp is cooperatio (instead of several separate operations), to change all tags to the empty state. On the contrary, the device of the transition 1036 implemented so as to make the device 1002 decoding to give this one micro-operations (instead of several well-known individual micro-operations) to change all tags to a non-empty state. As another example, an alternative implementation may include a bus that connects the device 1010 to perform the tags 1034 and device seizure 1006. This alternative implementation may be implemented so that in response to EMMS command processor is converted to serial form (this can be done by device 1004 renaming), the signals are sent over the bus to get to change the tags (this can be performed by execution units 1010), and the processor is again converted into serial form (this can be done by device 1004 rename). Such an implementation option may require approximately 10-20 cycles synchronization to change all the tags. On the contrary, this alternative implementation may be implemented so that pre - and/or postprobationary in serial form is performed by another device or is not necessary. As tratnik means to change all the tags 1034 in response to receiving EMMS commands.

Thus, an implementation option, shown in Fig. 10, uses a single set of registers to execute commands with floating point and Packed data, instead of individual devices floating point and Packed data, as described above with reference to Fig. 6A. Additionally, an implementation option in Fig. 6A requires a separate schema to access the registers floating-point as the stack and the register Packed data and a fixed register file, while the device 1030 display FP/PD uses the same schema. In addition, unlike the device of transition described with reference to Fig. 6A, which should be caused to switch the processor back and forth between modes, floating-point and Packed data, the device transition described with reference to Fig. 10, only need to switch the CPU mode floating point mode to the Packed data. Moreover, because a single combined file registers are used for States with floating point and Packed data, this transition is not required to copy data between two separate register files. As a result, an implementation option, shown in Fig. 10, requires less complex schemes westline, which includes commands for executing floating point and Packed data, an alternative implementation may implement different sets of commands that cause the processor to perform operations with different data types. For example, a single set of instructions can cause the processor to perform scalar operations (floating point and/or integer), and another set of commands may cause the processor to perform Packed operations (floating point and/or integer). As another example, a single set of instructions can cause the processor to perform operations with floating-point (scalar and/or Packed), and another set of commands may cause the processor to perform integer operations (scalar and/or Packed). As another example, a single combined file registers can be used as a register file with a case similar treatment to the stack, and as a two-dimensional register file. Moreover, while described an implementation option, in which you have the full alignment, alternative embodiments of having a single physical register file may be implemented to function as a partially combined. This is dynochem combined physical register file.

Fig. 11A, 11B, and 11C illustrate a method in accordance with another embodiment of the invention for executing commands Packed data and floating-point on a single combined file registers in a way that is invisible to the operating system that supports professional practice programming and which may be implemented using hardware circuits of Fig. 10. This functional diagram similar functional diagram described with reference to Fig. 4A - b and Fig. 7A-C, 9 and 10. With reference to these previous functional diagrams described many alternative embodiments in which the stages were changed, moved and/or deleted. It should be clear that the steps described with reference to Fig. 11A - C, which are similar to the steps performed in the previously described functional diagrams can be performed using such alternative implementation. Functional diagram starts at step 1100. From step 1100 goes to step 1102.

As shown in step 1102, a set of bits treated as a command, and goes to step 1104. This set of bits includes the operation code that is broken is the bottom of the embodiment, the following steps are performed in stage decoding pipeline.

At step 1104 determines whether the operation code is valid. If the opcode is invalid transitions to step 1106. Otherwise, goes to step 1108. Step 1104 is similar to step 404 in Fig. 4.

At step 1106 is inserted into one or more micro-operations, reporting about the event, indicating that there should be formed an exceptional situation on the inadmissibility of the opcode. Micro operations, reporting about the event are used to avoid maintenance errors to the stage(s) of the removal conveyor. If the command is a micro-operations, reporting about the event, it passes through the stage and the decode stage(s) rename registers and the stage and perform. However, when micro-operations, reporting about the event adopted in the stage(s) of seizure, the state of the buffer register is not transmitted, and generates the appropriate event. Micro operations, reporting about the event are inserted before or instead of a team that raises the event. The use of micro-operations hereinafter described with reference to the "Method and Apparatus for Signaling on Occurence of an Event in a Processor", Ser. N 08/203, 790, Darrell D. Boggs and others From step 1106 goes to step 1108.

At step 1108 determines what type of command was accepted. If Komanda. Thus, if one or more micro-operations, reporting about the event were inserted at step 1106, it moves on to step 1110. However, if the command is a command floating-point transitions to step 1112. On the contrary, if the command is a team Packed data transitions to step 1114. Thus, step 1108 is similar to step 408 in Fig. 4A.

As shown in step 1110, the processor executes the command. If at step 1106, one or more micro-operations have been inserted, which indicates that there should be formed an exceptional situation on the inadmissibility of the opcode, the micro-operation pass through the stage and the decode stage(s) rename registers and the stage and perform. However, when micro-operations(and), report(s) about the event reaches(ut) stage(s) of seizure, the state of the buffer register is not transferred and formed an exceptional situation on the inadmissibility of the opcode. As described above with reference to step 215 in Fig. 2, this event handler can be implemented to cause the processor to display a message, execute, abort the current task and continue to perform other tasks. Of course, the alternative is because the execution of other commands is not necessary for the understanding of the invention, they are not further described.

As shown, at step 1112 determines whether the EM indication is equal to 1 (according to the mentioned agreement for the software, if the device is a floating point was emulated) and whether the TS indication is equal to 1 (according to the mentioned agreement for the software, if a partial context switch was performed). If you EAT readout and/or the TS indication is equal to 1 transitions to step 1116. Otherwise it moves on to step 1120. Thus, step 1112 is similar to step 412 in Fig. 4A.

At step 1116 is inserted into one or more micro-operations, reporting the event to indicate that there should be formed an exceptional situation of unavailability of the device. From step 1116 transitions to step 1120.

As shown in the steps 1114 and 1120 executes the rename register. From step 1120 transitions to step 1122. On the contrary, from step 1114 transitions to step 1134. In one embodiment, the steps 1114 and 1120 are in the stage(s) rename pipeline.

In one embodiment, the following stages in the stage(s) of the execution pipeline.

As long as the Xia invisible to the operating system, one option exercise also changes the tags as necessary, reports any numerical errors, which can now be serviced, and detains any other delay of numerical errors. As described above, changing the tags this option allows the implementation to be invisible to the operating system for any of these methods, operating systems, which store the contents of only those registers floating-point whose corresponding tags indicate a non-empty state. However, alternative options for implementation may be implemented to be compatible with certain methods of the operating system. For example, if the operating system does not use tags, the processor that does not implement tags, is still compatible with this operating system. In addition, the invention is not necessary to exceptional situations of floating-point numbers, were detained, and thus, alternative implementation that does not provide, are in the scope of the invention. From step 1122 transitions to step 1124.

At step 1134 is determined whether the command is a Packed data EMMS team. Thus, the shift at step 1136. Otherwise it moves on to step 1138. As described above, EMMS command is used to change the tags floating in the initialization state, and should be performed after any of the teams Packed data and/or to execute any command from a floating to transition the processor to the mode floating-point number.

As shown in step 1136, all tags are changed in the empty state. Thus, the tags have been initialized and prepared to execute commands floating-point number. After completion of step 1136 transitions to step 1144. In the embodiment, in which EMMS the command is not executed

step 1134 and 1136 will be absent and the transition would have been carried out from step 1114 to step 1138.

As shown in step 1138, the command is executed Packed data. During this phase, units are stored in fields of the sign and order of any registers FP or any of the buffer register, acting as FP/PD registers, which recorded Packed data. Thus, step 1138 similar to the steps 434, 436 and 438 in Fig. 4B. This embodiment supports qualified programming techniques through appropriate division teams floating-point and upakovannogo. While in one embodiment, the "1" is recorded in the field of the sign and order, alternative implementation may use any value that represents NAN (not a number) or infinity. Moreover, this step is performed without forming any numeric exceptions. If any event memory formed as a result of attempts to execute the command Packed data, the execution is interrupted and the event served. From step 1138 transitions to step 1144.

In one embodiment, the following steps are performed in the stage(s) of the removal conveyor.

At step 1124 is determined whether the command is a micro-operations, reporting about the event, pointing to the exceptional situation of unavailability of the device. If Yes, then it is determined at step 1112, which is also either or both TS and EAT zone were equal to 1. Thus, if the command is a micro-operations, reporting about the event, pointing to the exception unavailable device transitions to step 1126. Otherwise it moves on to step 1128. Thus, the exception unavailable device may be included in a processor that uses the rename register.

As shown, at step 1144 is determined whether EATING indication 1. Thus, step 1144 is similar to step 414 in Fig. 4A. If at step 1144 determined that the EM indication is equal to 1 transitions to step 1146, and not to step 1126. Otherwise it moves on to step 1148.

At step 1146 is formed an exceptional situation on the inadmissibility of the operation code and the corresponding event handler is executed. This is the same exception on the inadmissibility of the operation code, which has been described above with reference to step 1110 in Fig. 11A. The formation of an exceptional situation on the inadmissibility of the opcode similarly exceptional situation on the inadmissibility of the operation code generated at step 406 in Fig. 4A. As described above, with reference to this is to understand abort the current task and continue to perform other tasks. Of course, alternative ways of implementation can execute this handler by any number of techniques that have been described above. Rejecting the attempted command Packed data, while EM is equal to 1, an exception on the inadmissibility of the operation code, an implementation option is not visible to the operating system.

While an implementation option has been described for treatment of EATING indication in the way that is invisible to the operating system, an alternative implementation may use other methods. For example, an alternative implementation may also generate exception of device unavailability, another existing event or a new event in response to the attempted execution of the command the compressed data, while the EM indication is equal to 1. As another example, an alternative implementation may ignore EATING indication while executing Packed data.

As shown in step 1148 determines whether the TS indication 1 (according to the mentioned agreement for the software, if a partial context switch was performed). If partial SS="ptx2">

As described above, at step 1126 is formed exception unavailable device, and executes the corresponding event handler. Thus, in response to this event, the corresponding event handler can be implemented to poll EAT and TS display. However, when the teams Packed data is made, at step 1144, and situations where the EM indication is equal to 1 is rejected to an exceptional situation on the inadmissibility of the opcode. As a result, when the teams Packed data run and reach the stage 1126, EAT readout must be equal to 0, and TS readout must be equal to 1. Since the TS indication is equal to 1, the event handler functions, as described above with reference to a partial context switch, and causes the processor to continue execution by restarting execution of the command, adopted at step 1102. The Packed data is combined with the state of the floating-point number, this event handler works for the state of floating point and Packed data. As a result, this method is invisible to the operating system. Of course, alternative ways of implementation can execute this handler S display in the way, which is the operating system invisible, alternative implementation may use other methods as described above.

As described above, if some numerical errors generated during the execution of the command, floating point, these bugs linger taken to perform the following commands floating-point number, whose execution can be interrupted to service them. As shown, at stage 1128 and 1150 determines whether there are any such pending errors that can be serviced. Thus, these steps are similar to steps 420 and 422 in Fig. 4A. If there are any such pending errors, step 1128 and 1150 to step 1130. However, if it is determined at step 1128, that there are no such pending errors transitions to step 1132. On the contrary, if it is determined at step 1150 that there are no such pending errors transitions to step 1152. In an alternative embodiment, step 1150 is not performed, and the error floating point remains detained during the execution of the command, the compressed data.

At step 1130 is formed, an error event latency floating-point number. Thus, step 1130 or an internal event, or as an external event, and maintained accordingly.

As shown, at step 1152 is determined whether the mode controller Packed data. If the processor is in a mode Packed data, executing the Packed data was successfully completed and transitions to step 1132. However, if the processor is not in the mode Packed data, the team Packed data was performed with floating-point. As a result, the command Packed data is not accurate. To correct this, the processor must be switched-mode floating-point mode to the Packed data, and the team Packed data must be redone. Finally, if the processor is not in the mode Packed data transitions to step 1154. The determination at step 1152 may be executed by any number of techniques. For example, could be used display mode, as described above with reference to Fig. 6A. As another example, the pointer to the top of the stack and tags can be interviewed. If the pointer to the top of the stack is in the initialization state and all tags are non-empty, then the processor is in idle mode, the compressed data. Alstom condition, the processor is not in the mode, the compressed data.

At step 1154, the processor enters mode floating point mode to the Packed data, and transitions to step 1150. At step 1154, the processor enters mode floating point mode to the Packed data, changing all tags on a non-empty state and modifying the pointer to the top of the stack is initialized. Change all tags to a non-empty state encourages the professional programming techniques, in which there is a separate group of commands floating point and Packed data. In addition, from the perspective of operating system compatibility, some of the ways that the operating system saves the contents of only those registers floating-point whose corresponding tags indicate a non-empty state. Thus, in the embodiment, in which the condition of the Packed data is combined with the state of the floating-point change all tags to a non-empty condition causes the operating system to save the state of the compressed data as if it were the state of the floating-point number. Alternative implementation can be implemented to be compatible with Myung is an implementation option, which does not implement the tags is compatible with this operating system. Change the pointer to the top of the stack to zero is used to implement efficient ways of programming, as described above. In addition, changing the pointer to the top of the stack is initialized and unchange pointer to the top of the stack during the execution of the commands Packed data allows the same circuit to be used for processing registers FP/PD and how the stack floating-point, and as a fixed register file, as described above with reference to Fig. 10. As the state of floating point and Packed data are combined with a single register file, the transition does not require the data to be copied between the individual files registers floating-point and Packed data. This reduces the time required to switch between modes, floating-point and Packed data. As described above, the transition from floating-point to Packed data may be implemented in microcode. In an alternative embodiment, the execution of each command Packed data changes the pointer to the top of the stack is initialized.

As shown in step 1156, the team,tracesmart, the execution of the current task can be continued without any external action in relation to the processor - no handlers nemiranda should not be performed. Thus, this implementation is compatible with existing operating systems. Alternative implementation can be implemented to be less compatible. For example, an event may be included in the processor, and can be added event handler to the operating system to perform this transition.

At step 1132, the state of the buffer register is transferred to their respective FP/PD or integer registers. After completion of step 1132, the processor is freed to continue.

Thus, the described method of executing commands Packed data, which is compatible with known operating system and supports qualified programming techniques. The Packed data is combined with the state of the floating-point state of the Packed data will be saved and restored well-known operating systems, as if it were the state of the floating-point number. In addition to the handlers of the operating system, these event handlers should not be changed and there is no need to add new event handlers. As a result, the processor is compatible top-down, and the update does not require costs to develop or modify the operating system.

Versions of this variant implementation, some of which have been described may be fully or partially compatible with such operating systems and/or to maintain good programming techniques. For example, an alternative implementation can move, modify and/or delete one or more stages of this sequence of operations. If some of the stages in Fig. 11A, 11B and/or 11C removed some of the hardware would not be required in Fig. 10. For example, if the TS indication is not used, the TS indication is not required. Of course, the invention could be useful for any number of system architectures and is not limited to the architecture described here.

Fig. 12A, 12B, and 12C illustrate the formats of memory to store floating point data, the compressed data and integer data according to a variant implementation described with reference to Fig. 10. Of course, alternative embodiments of the of the R data and integer data.

Fig. 12A illustrates the format memory floating-point according to one variant of the invention, described with reference to Fig. 10. Fig. 12A shows the format 1200 memory floating-point number, including field 1202 of the sign containing the bit 85, and field 1204 of the order, containing bits [84: 68] , field 1206 of the mantissa contains bits [67: 3] and field 1208 rounding containing bits [2: 0] . As described above, the same team floating-point is used to preserve the state of the floating point execution memory switches tasks also need to work to save any of the States of the Packed data, combined with registers floating-point number. In one embodiment, the processor stores the bits of the rounding box 1028 rounding. As a result, compressed data must be stored somewhere in the mantissa field 1206 format 1200 memory floating-point number.

Fig. 12B illustrates the format of memory for Packed data according to a variant embodiment of the invention, described with reference to Fig 10. Fig. 12B shows the format of the memory 1210 Packed data, including field 1212 sign/order containing bits [85: 68] , the first reserved field 1214, containing bits [67] , box 1216 Packed data, sod is anany box 1212 sign/order when Packed data is written to the register. Also, as described above, the field 1216 Packed data is combined with the field 1206 of the mantissa so that the existing commands floating point will save the state of the compressed data. In one embodiment, the first and second reserved field 1214 and 1218 writes zeros when Packed data is written to the register. As was described variant of the invention, in which field 1216 Packed data format 1210 memory compressed data begins in the same bit position as the field 1206 mantissa format 1200 memory floating-point, alternative implementation can change this value.

Fig. 12C illustrates the memory format for integer data in accordance with the embodiment of the invention described with reference to Fig. 10. Fig. 12C shows the format 1220 memory integer data, including the reserved field 1222, containing bits [85: 32] , and field 1224 integer data containing bits [31: 0] . While described an implementation option, in which the integer data is stored in 32 bits, an alternative implementation may be implemented to store C is ternative options for implementation may support 64-bit format. In one embodiment, each of the integer registers 1024, which is visible to the software, includes only 32 bits. As a result, the integer format 1220 memory is used only in the buffer register 1020.

Fig. 13 illustrates a method according to one variant embodiment of the invention for step 1138 in Fig. 11B, when implemented formats memory described with reference to Fig. 12A, 12B, and 12C. The transition from both stages 1138 to step 1300.

At step 1300 is determined, finds whether the team Packed data Packed data from all registers FP/PD in any buffer registers, acting as FP/PD registers. If Yes, goes to step 1302. Otherwise it moves on to step 1308.

As shown in step 1302, bits [66: 3] selects from this combined buffer or register FP/PD, and then move to step 1308. This step is only necessary that the Packed data is not stored, starting with bit 0, and stored starting with bit 3, as shown in Fig. 12B. As a result, bits [2: 0] must be discarded. In one embodiment, this step is performed by the device data alignment 1090 in Fig. 10. In this embodiment, the e, it is shown in Fig. 12B. Thus, the data are accepted by the device performing 1010 in the format shown in Fig. 12B, and the device data alignment 1090 allow to extract bits [66: 3] . While Fig. 10 shows a single unit of data alignment, in one embodiment, each functional unit in the execution units 1010, which operates with Packed data includes device data alignment to extract bits [63: 3] . Because the data are aligned in the execution units 1010, the use of Packed data format is transparent to the rest of the processor. Device(s) data alignment can be implemented to address bits [66: 3] , using any method. For example, in one embodiment, the device(a) alignment of the data are made so as to be shifted to the right by three bits, all Packed data selected from the register FP/PD or buffer registers that function as FP/PD registers. In an alternative embodiment, device removal or extradition can be implemented to trim bits [2: 0] and bits [85: 67] . As another example, an alternative implementation can be implemented so that upcomnig data Packed data from any integer registers or any of the buffer register, functioning as an integer registers. If Yes, goes to step 1306. Otherwise it moves on to step 1308.

As shown in step 1306 selects bits [31: 0] of the combined buffer or integer registers transitions to step 1308. This step is necessary because the data is stored, starting with bit 0. As described above, in one embodiment, this step is performed by the device data alignment 1090 in Fig. 10. In this embodiment, data is transmitted from the device exemptions 1006, through the device of 1008 issue to the execution units 1010. If you refer to data from the buffer registers 1020, the data are accepted by the device performing 1010 in the format shown in Fig. 12C, and the device(s) data alignment allowed to extract bits [31: 0] . However, if data are accessed from an integer register 1024 in the embodiment, in which the integer registers 1024 are 32-bit registers, data is received by the execution units 1010 in 32-bit format. In any case, the 32-bit data can be processed as any of the 64-bit Packed data elements. For example, the first command move could be made to move 32 the motion could be made, to move 32 bits from an integer register in the lower 32-bit element of the Packed data.

As shown in step 1308, carried out the operation requested by the command, and navigates to step 1310.

At step 1310 is determined, does the team Packed data processor to write to any registers FP/PD or any buffer registers that function as FP/PD registers. If Yes, then transitions to step 1312. Otherwise it moves on to step 1314.

If the team Packed data causes the processor to perform the entry in any register FP/PD or buffer registers that function as FP/PD registers, the data must be saved in the correct format. Thus, at step 1312 Packed data is stored in bits [66: 3] these FP/PD or the buffer registers. In one embodiment, the device data alignment 1090 in Fig. 10 is used again. Again there are a number of ways to perform these functions. For example, the device(a) data alignment can be implemented to move data to the left by three bits, fill bits [2: 0] zero fill bits [67] zero and store the unit in bits [85: 68] . In an alternative implementation is determined, does the CPU team Packed data to be recorded in any of the registers or any of the buffer registers, functioning as an integer registers. If so, it moves on to step 1316. Otherwise it moves on to step 1144.

If the team Packed data causes the processor to perform the entry in any of the registers or buffer registers, functioning as an integer registers, Packed data should be saved in the correct integer format memory. Thus, at step 1316 the data is in integer registers as bits [31: 0] or in the buffer registers as bits [63: 0] or [31: 0] (depending on implementation). As there are 64-bit data, any 32-bit data can be stored in these registers. For example, the first move command could be executed to move the high-order bits of the element of the Packed data into an integer register, while the second move command could be executed to move the lower 32 bits of the element of the Packed data into an integer register. In one embodiment, this step is performed again with the device data alignment 10LASS="ptx2">

Thus, the memory formats used by different data types correctly aligned in the registers of the processor. In one embodiment, the same memory formats used in the buffer registers 1020, which are used in the FP/PD registers 1022 and integer registers 1024. Of course, alternative implementation may use any number of different formats of memory, and thus, such alternative implementation can be in the scope of the invention. For example, one alternative implementation uses these formats memory data in the set of buffer registers 1020 and uses a different format of the data memory to the registers that are visible to software (e.g., FP/PD registers 1022 and an integer register 1024).

As described above, the transition between floating-point and Packed data can take time and is not good programming practice. To assist programmers in defining how they perform many such transitions can be used various methods of performance management. For example, in one embodiment, uses a counter of performance management. The counter control is strachenyh in the processor. In one embodiment of the invention, one of these conditions are the transitions between modes, floating-point and Packed data. Thus, a programmer can learn, how many transitions required by the program. For further information concerning the program counters, see "Apparatus for Monitoring the Performance of a Processor", Ser. N 07/883,845, Robert S. Dreyer, etc.

As known processors with floating-point does not allow direct manipulation of tags, floating-point, can be made to emulate EMMS commands using commands floating-point number.

Fig. 14 is a sequence of operations illustrating a method of purification tags, according to one variant embodiment of the invention. This sequence of operations begins at step 1402, preserving the environment floating-point number in a predefined location in memory. This is done using the command FNSAVE or FSAVE in the architecture of the Intel processor.

Once this is done, the tag and/or TOS part of the predefined memory locations in which the environment was saved, can change to an empty state at step 1404. This is accomplished by using any number of previous commands, including MOV command with immediate operands for the Yes, which can set the tag and TOS part of the predefined memory locations in the empty state. Subsequently, the medium may then be restarted at step 1406 of the modified predefined memory locations. As other parts of the environment (such as the control word, status word, and so on ) can be left unchanged, only changing tags floating-point remainder of the medium remains unchanged from the operation 1402 save the environment. Note further that in order to prevent any unforeseen interruption, this process may be performed using any known technique, including the use of teams that make the interrupt is not available (for example, FNSTENV). Anyway, as the environment is now restarted using any known method, such as FRSTOR or FLDENV, the environment is now only reloaded tags floating-point, modified to their empty state. Note further that step 1404 may optionally include the additional step that cleans a part of the environment floating-point, which includes a pointer to the top of the stack, stored in field 350 of the top of the stack.

In yet another alternative embodiment, assoe number of times, until all bits of the tag is empty. In any case, EMMS can be run as a dedicated team or can be emulated, and any method is in the scope of disclosure.

Fig. 15A shows the sequence of operations, including Packed data and commands with floating point, to illustrate the time interval during which a separate physical register files, which are combined, can be modified. Fig. 15A shows the command 1500 floating with the following set of commands 1510 Packed data. In addition, Fig. 15A shows that the team floating-point 1500 runs in time T1, while the execution of the instruction set 1510 Packed data begins at time T2. Command execution floating-point 1500 causes the processor to write the value in the register floating-point number. Interval 1520 notes the time between time T1 and time T2, during which this value should be combined. For example, in one embodiment, described with reference to Fig. 6A-9, in which a separate physical register files are used to execute commands with floating point and Packed data, the floating state is forged data up to time T2 (taking, what other value is not recorded in the same register from floating-point to time point T2). On the contrary, when using a single physical register file (options implementation described with reference to Fig. 10-11C), the floating point value stored in the register, combined at time T1.

Thus, the described two extreme values of the interval 1520. However, alternative options for implementation may be implemented to combine the registers at any time during the interval 1520. For example, alternative embodiments of which use a separate physical register files to execute commands with floating point and Packed data, can be performed so that data written into the physical register file floating-point, was also recorded in the physical register file compressed data at time T1. In one embodiment, which writes the value in both the physical register file at the same time (for example, time T1), the portion of the device of transition, which copies the data from the registers floating point registers Packed data may be in the form of hardware (of course, alternative baratie funds). As another example alternative implementation that uses a separate physical register files to execute commands with floating point and Packed data, can be performed so that data written into the physical register file floating-point recorded in the physical register file compressed data, when the free processing time available in the interval 1520 (but before the point in time T2). Thus, these options exercise can reduce the transition time.

Fig. 15B shows the flow of execution, including Packed data and commands with floating point, to illustrate the time interval during which a separate physical register files, which are combined, can be modified. Fig. 15A is similar to Fig. 15V, except that the commands 1530 Packed data follows the instruction set floating-point 1540. Fig. 15A shows that the team 1530 Packed data is performed at time T1, while the execution of the instruction set floating-point 1540 started at time T2. The command 1530 Packed data causes the processor to write the values in the register Packed data. Interventive options implementation described with reference to Fig. 15A (citing team floating-point number, followed by the teams Packed data), can also be made with reference to Fig. 15V (with reference to commands from the compressed data with the following commands floating point).

Although the invention has been described with respect to several implementation options, specialists should be known that the invention is not limited to the described variants of implementation. The method and apparatus of the present invention may be modified and changed and scope of patent protection, which will cover, is defined by the attached claims.

1. The processor contains a decoder, configured to decode commands from the at least one set of commands, the set of physical registers and the display device connected to the specified decoding device and a given set of physical registers, wherein the said display device configured to display the logical registers specified by the first instruction set, registers in a specified set of physical registers with a case similar to the appeal to the local set of physical registers with address, other than references to the stack in which the first set of commands defines operations on operands of data type different from the second set of commands.

2. The processor under item 1, characterized in that said display device includes a set of tags, each tag in the specified set of tags corresponds to a different one of the specified logical registers, each tag in the specified set of tags identifies whether the specified corresponding register in the empty state or a non-empty state, and the display device together with the previously mentioned set of tags form a device renaming.

3. The processor under item 2, characterized in that said display device is configured with the ability to change each tag in the specified set of tags in the specified non-empty state in the first time interval between the beginning of the execute commands from the specified first set of commands and start executing commands from the specified second set of commands, if the command is not executed in the second time interval after executing commands from the specified first set of commands before executing commands from the second set of commands.

4. The processor under item 2, characterized theorem tags specified in an empty state in response to receiving the specified processor single command.

5. The processor under item 2, characterized in that said display device is configured with the ability to change each tag in the specified set of tags in the specified non-empty state in response to receiving the specified processor at least one command to the specified second set of commands.

6. The processor under item 1, characterized in that the said set of physical registers includes a first physical register file and the set of buffer registers and a specified display device is configured to initially display the logical registers that are defined as specified by the first instruction set and the specified second set of commands in the registers in the specified set of buffer registers and to remove the abovementioned registers from the specified set of buffer registers in said first physical register file.

7. The processor under item 6, wherein the specified set of buffer registers is many reservations.

8. The processor under item 1, characterized in that said first set of commands causes the specified processor to perform scalar operations floating-point number.

9. The processor under item 1, characterized in that the first laborioso by p. 1, wherein the specified second set of commands causes the specified processor to perform a scalar integer operations.

11. The processor under item 1, characterized in that the second set of commands causes the specified processor to perform Packed integer operations.

12. The processor under item 1, characterized in that said display device is also configured to display a variety of logical registers that are defined by a third set of commands in the registers in a specified set of physical registers.

13. The processor under item 12, characterized in that said first set of commands causes the specified processor to perform scalar operations with floating-point specified by the second set of commands causes the specified processor to perform Packed integer operations, and the specified third set of commands causes the specified processor to perform a scalar integer operations.

14. The processor on p. 13, characterized in that the said set of physical registers includes a set of buffer registers, and a specified display device is configured to initially display the logical registers, predestinate registers, assigned to the specified first and second sets of commands from the specified set of buffer registers in the physical register file and remove the registers assigned to the specified third set of commands from the specified set of buffer registers in the second physical register file.

15. The processor under item 12, characterized in that said display device displays the logical registers specified by that third set of commands with an address other than the address to the stack.

16. The processor under item 1, characterized in that the logical registers that are defined are listed first and the second sets of instructions are at least partially combined.

17. The processor containing the display device of the floating point and Packed data, connected to the first table, the display initially shows the first operand type second operand type in the set of buffer registers, the display device integer, connected to the second table display initially shows the third type of the operand in the specified set of buffer registers, characterized in that it contains the device exemptions, including the first and second physical files reg opportunity to withdraw the operands of the specified first and second operand types from the specified set of buffer registers in said first physical register file, to remove the operands specified in the third operand type of a specified set of buffer registers to the specified second physical register file and make the specified display device to modify the above first and second tables display accordingly.

18. The processor under item 17, characterized in that each register specified in the first physical register file corresponds to a different input specified in the first table display and each register in the specified second physical register file corresponds to a different input specified in the second table display.

19. The processor under item 17, wherein the specified display device of the floating point and Packed data additionally includes a control register storing the pointer to the top of the stack, and configured to process the first table specified display as stack when displaying operands specified first type of the operand.

20. The processor under item 19, wherein the specified display device of the floating point and Packed data is made with the ability to handle the specified first table display so that Kazanov the specified second type of the operand.

21. The processor under item 19, wherein the specified display device of the floating point and Packed data also connected to a memory having a set of inputs, each input at the specified set of inputs has a corresponding entry in the specified first table display, each input has saved the tag identifying information about the data displayed is specified by the corresponding entry in the specified first table display.

22. The processor on p. 21, characterized in that the device further comprises a transition connected to the specified display device of the floating point and Packed data, configured to make the specified display device change the pointer to the top of the stack is initialized and edit each tag to a non-empty state in response to an attempt by the specified processor to execute one instruction set of Packed data if the specified processor has executed one instruction set floating-point later than one of the specified instruction set of Packed data.

23. The processor under item 22, wherein the specified device display spalanie the specified processor instruction set.

24. The processor under item 23, wherein the specified set of commands includes a single command.

25. The processor on p. 22, characterized in that the display device does not change the specified pointer to the top of the stack when executing any of a specified set of commands Packed data.

26. The processor under item 17, wherein the first type of the operand associated with the commands floating-point specified by the second operand associated with the teams Packed data, as specified by the third operand type is associated with a scalar integer commands.

27. The processor under item 26, characterized in that the team Packed data make the specified processor to perform Packed integer operations.

28. The processor under item 26, characterized in that the team Packed data make the specified processor to perform the Packed floating point.

29. The method is executed in the processing unit, according to which decode the first and second sets of commands whose execution causes operations on operands of different data types to be performed on the contents of the same logical file regrow with the specified first set of commands and process the specified logical register file as a stack with the specified second set of commands, resolve conflicts between registers in the specified logical register file using the rename register, perform the specified first set of commands when processing the specified logical register file as a fixed register file, perform the specified second set of commands when processing the specified logical register file as a stack.

30. The method according to p. 29, characterized in that it additionally will change all the tags in the tag set in an empty state at any time between the beginning of the implementation of this phase of the specified first set of commands and the beginning of the implementation of this phase of the specified second set of commands and each of a specified set of tags identify whether different register in the specified logical register file is empty or non-empty.

31. The method according to p. 30, characterized in that at this stage of execution of the specified first set of commands executing commands Packed data and at this stage of executing the second set of commands perform scalar command.

32. The method according to p. 29, characterized in that it additionally will change all the tags in the tag set to non-empty state between the start of execution of the specified step d, with a specified set of tags corresponds to the specified logical register file and identifies whether the registers in the specified logical register file is empty or non-empty.

33. The method according to p. 32, wherein each tag in the specified set of tags corresponds to a different register in the specified logical register file and identifies whether the specified corresponding logical register is empty or non-empty.

34. The method according to p. 29, characterized in that it further modify the pointer to the top of the stack is initialized at some point between the beginning of the implementation of this phase of the specified first set of commands and completion of specified stages of a specified second set of commands with the specified pointer on top of stack point one register in the specified logical register file as the tops of the specified register file by treatment similar treatment to the stack.

35. The method according to p. 29, characterized in that at this stage of execution of the second command set advanced copy of the contents of the physical register mapped in each case in the specified logical register file.

36. SPO is the R.

37. The method according to p. 29, wherein during execution of the first set of instructions perform Packed integer operations.

38. The method according to p. 29, wherein during execution of the first set of commands perform the Packed floating point.

39. The method according to p. 29, wherein during execution of the second instruction set to perform scalar operations floating-point number.

40. The method is executed in the processing unit, including a processor, according to which receive the first command from a first program, determine whether the specified first one of the instruction set floating-point or one of the command set Packed data, and as specified instruction set floating-point, and the specified instruction set of Packed data defines operations to be performed on the contents of the same logical register file, wherein risky perform the mentioned first command if the first command is one of a specified set of commands Packed data, determine if the specified processor mode, the compressed data if the specified processor is not the R data and restarts the execution of the specified first team, transmit data generated in this phase risky perform the first command, the transmit data generated in this phase risky perform the first command.

41. The method according to p. 40, characterized in that at this stage of the translation of the specified processor in the specified mode Packed advanced data changing each of the set of tags to a non-empty state, and each tag in the specified set of tags corresponds to a different register in the specified logical register file, at this stage of determining whether the specified first team one of the specified instruction set floating-point or one of a specified set of commands Packed data, additionally determine whether the specified first team team transition from a specified set of commands Packed data, and, if determined, that said, the first command is a command to transition to the stage of implementation of this first command will change each of a specified set of tags to an empty state.

42. The method according to p. 40, characterized in that at this stage a risky perform the first team additionally perform the specified command with the specified instruction set floating-point the pointer to the top of the stack is controlled by the specified processor to identify a single case in the specified logical register file, which is currently at the top of the specified stack, execute the specified command Packed data with reference similar treatment to the stack, if the first command is one of a specified set of commands Packed data.

43. The method according to p. 42, characterized in that at this stage of the translation of the specified processor in the specified mode, the compressed data is additionally modify the pointer to the top of the stack is initialized.

44. The method according to p. 43, characterized in that at this stage risky execute the specified command Packed data, working with the specified address other than the address to the stack, not further modify the pointer to the top of the stack.

45. The method according to p. 43, characterized in that the phase transfer of the specified processor in the specified mode Packed advanced data changing each of the set of tags to a non-empty state, and each tag in the specified set of tags corresponds to a different register in the specified logical fallaway comma, or one of a specified set of commands Packed data additionally determine whether the specified first team team transition from a specified set of commands Packed data, and, if determined that the first command is a command to transition to the stage of implementation of this first command will change each of a specified set of tags to an empty state.

46. The method according to p. 40, characterized in that to determine whether a partial context switch if you have previously specified a partial context switch, interrupt the execution of the specified first program, execute the second program.

47. The method according to p. 40, characterized in that it further determine whether to emulate the execution of the specified instruction set floating-point, if you can emulate the execution of the specified instruction set floating-point, then stop execution of the specified first program, execute the second program if the first command is one of a specified set of commands floating point, otherwise, terminate the execution of the specified first program, performing a third program, if the first command is one of a specified set of commands floating-point number.

48. The method is performed the specified partial context switch, it interrupts the execution of the specified first program and perform the specified second program.

49. The method according to p. 48, characterized in that it further determines whether the specified first team one of the set of valid codes operation, if specified in the first command is missing from any of a specified set of valid codes operation, interrupt the execution of the specified first program and perform the specified third program.

50. The method according to p. 40, characterized in that at the stage of implementation of this first team additionally writes a predetermined value in the field of the sign and the order of the physical register mapped to the specified logical register, write the specified data value in the mantissa field of the specified logical register, if the first command is one of a specified set of commands Packed data, and execute the specified command Packed data causes the specified processor to write data values in the logical register in the specified logical register file.

51. The method according to p. 40, characterized in that at the stage of implementation of this first team additionally perform the mentioned first team, assiamira commands floating-point perform the mentioned first team without recognition of any numerical errors, if the first command is one of a specified set of commands Packed data, additionally determine whether there are any pending numerical errors from the previous command completes, the specified instruction set floating-point interrupts the execution of the specified first program, perform the second sub-program, if there are any pending numerical errors from the previous command completes, the specified instruction set floating-point number.

 

Same patents:

FIELD: engineering of data processing systems, which realize operations of type "one command stream and multiple data streams".

SUBSTANCE: system is disclosed with command (ADD8TO16), which decompresses non-adjacent parts of data word with utilization of signed or zero expansion and combines them by means of arithmetic operation "one command stream, multiple data streams", such as adding, performed in response to one and the same command. Command is especially useful for utilization in systems having a data channel, containing a shifting circuit before the arithmetic circuit.

EFFECT: possible use for existing processing resources in data processing system in a more efficient way.

3 cl, 5 dwg

FIELD: computing devices with configurable number length for long numbers.

SUBSTANCE: device consists of two computing device units, each of them divided into at least four subunits, which consist of a quantity of unit cells. Named units are spatially located so that the distance between unit cell of first unit and equal unit cell in the second unit is minimal. Computing device configuration can be changed using configurational switches, which are installed between device subunits.

EFFECT: increased performance of computing device, reduced time of data processing.

12 cl, 6 dwg

FIELD: network communications, in particular, control means built into applications for conduction of network exchange.

SUBSTANCE: expandable communication control means is used for maintaining communication between computing device and remote communication device. In a computer program adapted for using expandable communication control means, information about contacting side is found, and on basis of found contact information it is determined which types of transactions may be used for communication with contacting side at remote communication device. As soon as communication setup function is determined using contacting side information, communication setup request, associated with such a function, is dispatched to communication address. After receipt, expandable communication control means begins conduction of communication with remote communication device.

EFFECT: creation of more flexible and adaptable software communication control means (program components) for processing communications (connections, exchange) between devices.

3 cl, 11 dwg

FIELD: engineering of microprocessors and computer systems.

SUBSTANCE: in accordance to shuffling instruction, first operand is received, which contains a set of L data elements, and second operand, which contains a set of L shuffling masks, where each shuffling mask includes a "reset to zero" field and selection field, for each shuffling mask, if the "reset to zero" field of shuffling mask is not set, then data indicated by shuffling mask selection field are moved, from data element of first operand, into associated data element of result, and if "reset to zero" field of shuffling mask is set, then zero is placed in associated data element of result.

EFFECT: improved characteristics of processor and increased productivity thereof.

8 cl, 43 dwg

FIELD: physics.

SUBSTANCE: invention pertains to the means of providing for computer architecture. Description is given of the method, system and the computer program for computing the data authentication code. The data are stored in the memory of the computing medium. The memory unit required for computing the authentication code is given through commands. During the computing operation the processor defines one of the encoding methods, which is subject to implementation during computation of the authentication code.

EFFECT: wider functional capabilities of the computing system with provision for new extra commands or instructions with possibility of emulating other architectures.

10 cl, 15 dwg

FIELD: physics; computer technology.

SUBSTANCE: present invention pertains to digital signal processors with configurable multiplier-accumulation units and arithmetic-logical units. The device has a first multiplier-accumulation unit for receiving and multiplying the first and second operands, storage of the obtained result in the first intermediate register, adding it to the third operand, a second multiplier-accumulation unit, for receiving and multiplying the fourth and fifth operands, storage of the obtained result in the second intermediate register, adding the sixth operand or with the stored second intermediate result, or with the sum of the stored first and second intermediate results. Multiplier-accumulation units react on the processor instructions for dynamic reconfiguration between the first configuration, in which the first and second multiplier-accumulation units operate independently, and the second configuration, in which the first and second multiplier-accumulation units are connected and operate together.

EFFECT: faster operation of the device and flexible simultaneous carrying out of different types of operations.

21 cl, 9 dwg

FIELD: information technologies.

SUBSTANCE: command of message digest generation is selected from memory, in response to selection of message digest generation command from memory on the basis of previously specified code of function, operation of message digest generation, which is subject to execution, is determined, at that previously specified code of function defines operation of message digest calculation or operation of function request, if determined operation of message digest generation subject to execution is operation of message digest calculation, in respect to operand, operation of message digest calculation is executed, which contains algorithm of hash coding, if determined operation of message digest generation subject to execution is operation of function request, bits of condition word are stored in block of parameters that correspond to one or several codes of function installed in processor.

EFFECT: expansion of computer field by addition of new commands or instructions.

14 cl, 18 dwg

FIELD: information technology.

SUBSTANCE: present invention relates to computer engineering and can be used in signal processing systems. The device contains an instruction buffer, memory control unit, second level cache memory, integral arithmetic-logic unit (ALU), floating point arithmetic unit and a system controller.

EFFECT: more functional capabilities of the device due to processing signals and images when working with floating point arithmetic.

4 cl, 4 dwg

FIELD: physics; computer engineering.

SUBSTANCE: invention relates to processors with pipeline architecture. The method of correcting an incorrectly early decoded instruction comprises stages on which: the early decoding error is detected and a procedure is called for correcting branching with a destination address for the incorrectly early decoded instruction in response to detection of the said error. The early decoded instruction is evaluated as an instruction, which corresponds to incorrectly predicted branching.

EFFECT: improved processor efficiency.

22 cl, 3 dwg, 1 tbl

FIELD: information technology.

SUBSTANCE: method involves defining a granule which is equal to the smallest length instruction in the instruction set and defining the number of granules making up the longest length instruction in the instruction denoted MAX. The method also involves determining the end of an embedded data segment, when a program is compiled or assembled into the instruction string and inserting a padding of length MAX-1 into the instruction string to the end of the embedded data. Upon pre-decoding of the padded instruction string, a pre-decoder maintains synchronisation with the instructions in the padded instruction string even if embedded data are randomly encoded to resemble an existing instruction in the variable length instruction set.

EFFECT: ensuring reconstruction during repeated synchronisation owing to reduced errors of synchronising the mechanism for pre-decoding the instruction string.

20 cl, 11 dwg

Up!