Method for processing with use of one commands stream and multiple data streams

FIELD: engineering of data processing systems, which realize operations of type "one command stream and multiple data streams".

SUBSTANCE: system is disclosed with command (ADD8TO16), which decompresses non-adjacent parts of data word with utilization of signed or zero expansion and combines them by means of arithmetic operation "one command stream, multiple data streams", such as adding, performed in response to one and the same command. Command is especially useful for utilization in systems having a data channel, containing a shifting circuit before the arithmetic circuit.

EFFECT: possible use for existing processing resources in data processing system in a more efficient way.

3 cl, 5 dwg

 

The present invention relates to the field of data processing systems. More specifically, the present invention relates to a data processing system in which it is desirable to ensure that operations of this type, which is defined as the use of a single instruction stream and multiple data streams.

Operations that use the same instruction stream and multiple data streams, characterized by a known method in which data words manipulated in accordance with one team, actually represent many data values, and specific manipulation independently performed with the corresponding data values. This type of team can improve the efficiency of data processing systems and particularly useful for reducing code size and faster processing operations. This method is typically, but not exclusively, used when manipulating data values representing physical signals, for example, in applications of digital signal processing.

The capabilities of data processing systems data processing important factor is unproductive losses associated with the degree of increasing size, complexity, cost and power consumption, which can be implemented to support additional processing tools. It is very preferred and are measures adds a processing capabilities while reducing the related overhead expenses.

In one aspect the present invention provides a device for processing containing the shift circuit, an arithmetic circuit and a command decoder, responsive to the command control shift circuit and an arithmetic circuit to perform the operation on the word of Rn and data word Rm data, and the specified operation is specified by selecting multiple non-contiguous multivitamin parts mentioned words Rm data for the formation of many multivitamin parts, each of which has a bit length; additional shift mentioned many multivitamin parts on the overall magnitude of shift in the shifted bit positions; converting each of these sets multivitamin parts of the above-mentioned bit length And the bit length for the formation of many converted multivitamin parts, so that the converted multibyte parts can be attached (connected) to generate the converted word R data; and performing many independent arithmetic operations using the input operands of the relevant parts of bit positions of a bit length In both of these converted words R Yes is the data and words mentioned, Rn data to generate the result of the word Rd data.

The invention provides for the creation of a new team of data processing in the data processing system, which can be used to decompress the data values contained in the data word, and to perform arithmetic operations like "single instruction stream, multiple data streams" on the decompressed data values. The invention proceeds from the fact that unpacking non-contiguous data values comprising data words can be implemented with significantly less additional overhead costs than in the case of the normal commands unpacking designed to extract related data values. In particular, it is possible to avoid additional ways of passing data, which can reject the bit positions of the data values that were previously related. Instead, for example, can be used in existing schemes masking and shifting words. In addition, simplification of functions decompression provides the opportunity for single commands also provide an arithmetic operation on the operands without introducing problems limitations of the loop.

Although in the generalized form of the invention can be applied to select nonadjacent multivitamin parts of arbitrary length in comparison with their convertible in length, to about obinna effective and convenient options for implementation are those in which the selected multivitamin parts have a length equal to half the length of the converted multivitamin parts, and converted multibyte part of the dock as part of the converted data words in such a way as to have a length equal to the length of the input data words for this operation.

It should be understood that the conversion of lengths selected multivitamin parts can occur in different ways. Two particularly useful ways by which this can be done, are a sign extension or extension with preceding zero.

In addition, the arithmetic operation that is combined with unpacking may take many different forms. However, particularly preferred variant of implementation are such that an arithmetic operation is an operation of summing performed independently for the respective converted multivitamin parts. This command is especially useful in many real situations, data processing, for example when calculating the sum of absolute difference as part of computing the motion compensation according to the standard MPEG compression moving image).

As mentioned above, the invention provides the use of the processing resources in the data processing system Bo is it an effective way. This, in particular, takes place in the system in which the shift circuit is provided to the arithmetic circuit in the channel data. This configuration allows unpack with any additional changes before performing arithmetic operations.

In preferred embodiments, to provide the required functionality without imposing additional constraints on the time loop, there is a conversion chart that converts the selected multivitamin parts along the length (e.g., either by expanding at the expense of the sign, or by expansion due to the preceding zero), connected in parallel with part of the schema of the shift, and the range of values of the shift, which is defined, is limited in such a way that the first part of the schema of the shift can be used in combination with the scheme of conversion to perform the required operations without increasing the time required for passing data values beyond which already provided for passing through the complete circuit of the shift in other operations.

In another aspect the invention provides a method of processing data comprising the steps of decoding and executing the command, which gives the result set by selecting multiple non-contiguous multivitamin parts mentioned words Rm data for groups who Finance many multivitamin parts of bit length A; by the additional shift mentioned many multivitamin parts on the overall magnitude of shift in the shifted bit positions; converting each of these sets multivitamin parts of the above-mentioned bit length And the bit length for the formation of many converted multivitamin parts, so that the converted multibyte parts can be joined for forming the converted word R data; and performing many independent arithmetic operations using the input operands of the relevant parts of bit positions of a bit length In both of these converted words R data, and said words Rn data to generate the result of the word Rd data.

The invention also provides a computer software product to store a computer program that is designed to control a universal computer in accordance with the above method, comprising command data in the form of the operations described above.

Described below is provided only to the example embodiments of the present invention with reference to the drawings showing the following:

Figure 1 - schematic representation of the steps of the first command data type SIMD (single instruction stream and multiple p the currents data);

Figure 2 - schematic representation of the channel data processing device, which can be used to run the command data of figure 1;

3 and 4, is a schematic representation of the two options, the other team data type SIMD and

5 is a schematic representation of the channel data processing system that may be used to execute commands data in figure 3 and 4.

Figure 1 presents the effect of the first command data type SIMD defined as "the sum of 8 and 16". This command is sent as is the case with the sign, and in the variant without the sign, in accordance with the nature of the extensions introduced before the selected part of the data words of the input operand, when it expands in length for part processing. The first word of the input data operand is stored in register Rm data processing units. The data word consists of four 8-bit parts P0, P1, P2 and P3. Depending on whether you have defined, or no operation rotate right 8-bit parts in the team, from the input data words in the register Rm is selected multibyte part P0 and P2 or alternatively R1 and R3. Additional operation turning right may be exercised by the value of 16 and 24, if desired. It is effective to transpose parts of the high and n is skogo orders of magnitude. The example shown in figure 1, illustrates the selection of non-contiguous parts of the P0 and P2 as variant without rotation (displacement), and another possible variant is shown in dashed lines.

After multibyte part selected, each of them is converted in length from 8 bits to 16 bits with zero or sign extension. The shaded area in the converted word R data in the drawing show these extended parts.

The second input data word stored in the register Rn and contains two 16-bit values. In the shown example, the operation is performed using a single instruction stream and multiple data streams, and enhanced the value of P0 is added to the lower 16-bit part of the A0 values of Rn, while the extended value P2 is added to the upper 16-bit part of the A2 values of Rn. This type of summation can be regarded as the summation of the full-width with a gap of chain transfer between the 15th and 16th bits of the result. It should be borne in mind that there may be other types of arithmetic operations such as subtraction type SIMD (single instruction stream and multiple data streams).

The word data output generated by the command of figure 1, forms in the lower 16 bits of the sum of P0 and A0, while the upper 16 bits contain the sum of P2 and A2. This command is especially useful in the operation is, which determine the sum of absolute differences between the corresponding data values, and A0 and P0 represent the sum values and the values of P2 and A2 represent individual absolute values of differential values of the signals, such as differential values of image elements (pixels). This type of surgery is usually required in the processing of motion estimation in MPEG standard, and the ability to perform this operation at high speed is highly preferred.

Figure 2 shows an example of channel 2 data data processing system that may be used to implement the commands of figure 1. Bank 4 register stores the 32-bit data words that are to be processed. Both Rm and Rn data input operands are read from the Bank register, and the resulting data word is written into register Rd in the Bank 4 registers. Channel 2 data includes circuit 6 shift and figure 8 adder. Many other commands data provided by the system, use this diagram 6 shift and figure 8 adder in a variety of ways. Such channel 2 data is designed particularly careful that the time required for passing a data value via scheme 6 shift and figure 8 adder, was well coordinated with the cycle time data processing. Efficient use of hardware resources are the means of channel 2 data is provided in the systems, where these resources are active for a greater share of each word of data passing through the channel 2 data. Scheme 10 zero/sign extension and masking is provided parallel to the bottom 6 of the shift. The multiplexer 12 has the choice of either output full scheme 6 shift, or the output of the circuit 10 of sign/zero extension and masking as one of the inputs of the circuit 8 of the adder. To another input of the differential adder 8 is supplied to the word Rn data input operand.

When you run the command in 1 word Rm data input operand is fed to the circuit 6 of the shift in which the data word is applied an additional right shift 8-bit positions, depending on whether this parameter is specified in the command. Additional turn to the right 16 and 24-bit positions can also be used. In the shift circuit based on multilevel multiplexer, this shift with disabilities can be provided relatively simply from the first portion of the circuit 6 shift (for example, in the case of a 32-bit system the first level multiplexer can provide a 16-bit shift, and the second level multiplexer provides the 8-bit shift). Accordingly, the value is additionally shifted by a certain amount, can be removed from the circuit 6 shift and filed in the circuit 10 of sign/zero extension and masks the simulation. This scheme 10 provides masking unselected multivitamin parts of speech Rm data input operand with a possible offset and replacement of these masked parts zero or sign extension corresponding to the selected multivitamin parts. The output circuit 10 of sign/zero extension and masking pass through multiplexer 12 to the first input circuit 8 of the adder. To the second input circuit 8 of the adder is fed the word of Rn data input operand. Scheme 8 adder performs a summation of the SIMD type on the input data (i.e. two parallel 16-bit summands with chain migration, effectively torn between bit positions 15 and 16). The output circuit 8, the sum is written back into the register Rd Bank 4 registers.

Alternatively, the circuit 10 of sign/zero extension and masking can take the input word Rm data (without turning) and then independently rotate 0, 8, 16, or 24 over the four possible sign bits, and then create a mask. Scheme 6 shift should work in parallel to shift all 32 bits of the word Rm.

3 and 4 illustrate two variants of a command of type SIMD packing half the words. Team RCNT figure 3 takes a fixed upper half word of data from a single input operand stored in the register Rn, and owindow bit part with a variable position of the data words of the second input operand, stored in the register Rm, and combines them respectively in the upper half and the lower half of the output data words destined to save in register Rd. Team RSNWT takes the lower half of the data words of one of the input operand of the Rn and the length in half word with variable position from the word data of the second input operand of the Rm and combines them respectively in the lower and upper half of the output data word intended for storing in register Rd. You can see that the selected portion of the word Rn data input operand in any case does not move its position within the output word Rd data. Therefore, this part can be ensured with a simple masking or selection, representing a very small additional hardware expenses. Part of the half word with variable position commands in figure 3 is selected from bit position 15 to 0 of the word Rm after this word is shifted to the right by k bit positions. Similarly, the part of the Rm length in half word with variable position command selected in accordance with the command in figure 4, is selected from bit position 31 to 16 words Rm after this word is shifted to the left by k bit positions.

Variable shift, provided with packing teams figure 3 and figure 4, is particularly useful for of the of eneny setting the "Q" numeric value with a fixed decimal point, that may occur in the processing of these values.

Figure 5 presents the channel 14 data, which is particularly well suited to execute commands on figure 3 and 4. Bank of 16 registers again provides data words of the input operands, in this example representing a 32-bit data words, and stores the output data word. The data channel includes a circuit 18 of the shift circuit 20 of the adder and logic circuit 22 of choice and Association.

In the process, without shifting the word Rn data input operand passes directly from the Bank of 16 registers in the circuit 22 of choice and Association. In the case of team 3 are selected 16 bits values Rn, which form the corresponding bits in the output word Rd data. In the case of team 4 selects the lower 16 bits of the word Rn data input operands, which are held for the formation of low-order bits of the output data word Rd. The word Rm data input operand passes through a complete circuit 18 of the shift. If team 3 applies an arithmetic right shift by k bit positions, and then 16 bits with the output of the circuit 18 selected by the shift circuit 22 of choice and Association for the formation of the 16 low-order bits of the output word Rd data. If teams figure 4 circuit 18 provides a logical shift left shift by k bit positions, and delivers the result to the circuit 22 using the RA and associations. Figure 22 selecting and combining selects 16 bits from the output of the circuit 18 shift and uses them to form a 16 bits output word Rd data.

You can see that the circuit 22 selecting and combining connected in parallel with the circuit 20 of the adder. Accordingly, assuming that the channel 14 data designed carefully to ensure that the operation of the full shift and summation was performed for the loop may be provided with a relatively simple operation of selecting and combining within a time interval, normally provided for the operation of the circuit 20 adder without imposing any restrictions on the loop.

It should be borne in mind that the command data, explained above, and as represented in the claims, are defined in terms of the obtained result value. It should be understood that the same resulting value can be obtained using many other processing steps and sequences of steps. The invention encompasses all such options, which allow you to get the same end result value using a single command.

1. A device for processing data containing the shift circuit, an arithmetic circuit and a command decoder, responsive to a command to control the shift circuit and arithmet the standard scheme to perform the operation using the first input word Rm data and second different input words Rn data, while the above operation gives the resulting value is determined by selecting multiple non-contiguous multivitamin parts of the first input word Rm data for the formation of many multivitamin parts with a bit length And each of the optional shift mentioned multiple discontiguous multivitamin parts on the overall magnitude of shift in the shifted bit positions depending on whether you have defined the parameter shift in kampande, converting each of the sets mentioned nonadjacent multivitamin parts of the above-mentioned bit length And the bit length for the formation of many converted multivitamin parts, so that the converted multibyte parts can be joined for forming the converted word R data, and performing many independent arithmetic operations using the input operands of the relevant parts of bit positions from bit-length of the said converted words R data and said second different input words Rn data to generate the result of the word Rd data.

2. The device according to claim 1, characterized In that=2·A.

3. The device according to claim 1 or 2, characterized in that the said lot multivitamin parts is shifted to the shifted bit positions so that the lower Multivita the part of bit positions positions starting from the zero bit position.

4. The device according to claim 1, characterized in that the conversion multivitamin parts of the bit length And the bit length includes one of the following: a sign extension multivitamin parts to the bit length and zero extension multivitamin parts to the bit lengths of the Century

5. The device according to claim 1, characterized in that the mentioned many independent arithmetic operations are independent operations of summation.

6. The device according to claim 1, characterized in that the said first input data word and said second different input data word are every bit the length of C, and C=N·where N is an integer greater than 1.

7. The device according to claim 2 or 6, characterized in that=·2.

8. The device according to claim 1, characterized in that In=16 and A=8.

9. The device according to claim 1, characterized in that the said total amount of shift is equal To-A.

10. The device according to claim 1, characterized in that the above command is the command operation type "single instruction stream and multiple data streams".

11. The device according to claim 1, characterized in that the team comprises of the unpacking of the data values with the arithmetic operation.

12. The device according to claim 1, wherein said shift circuit is enabled to arithmetic schemes on the data channel mentioned device.

13. The device according to claim 1, characterized in that the circuit transformations that transform multivitamin parts of the bit length And the bit length included in parallel circuit of the shift, and the shift circuit provides a limited range of common values shift to values of data through the shift circuit when executing the above command, in comparison with the range of values of the shift, provided the said shift circuit when executing other commands.

14. A method of processing data comprising the steps of decoding and executing the command on the first input data word and a second different input data word, which gives the resulting value is determined by selecting multiple non-contiguous multivitamin parts of the first input word Rm data for the formation of many multivitamin parts with a bit length And each of the optional shift mentioned multiple discontiguous multivitamin parts on the overall magnitude of shift in the shifted bit positions depending on whether the shift parameter in the above command, converting each of the multiple non-contiguous multivitamin parts of the above-mentioned bit length And the bit length for the formation of many converted multivitamin parts, so mentioned converted multibyte parts can be with the docked for the formation of the converted words R data and perform many independent arithmetic operations using the input operands of the relevant parts of bit positions from bit-length of the said converted words R data and said second different input words Rn data to generate the result of the word Rd data.

15. A computer program product containing a computer program for controlling a computer to perform the method according to 14.



 

Same patents:

FIELD: engineering of microprocessors and computing systems, in particular, engineering of devices for parallel conjunction of data with shift to the right.

SUBSTANCE: method includes in parallel with shift to left for 'L - M' data elements of first operand having first set of L data elements, second operand is shifted having second set of L data elements, to the right for M data elements, and aforementioned shifted first set is combined with aforementioned shifted second set for producing a result having L data elements.

EFFECT: efficient support of SIMD operations without substantial decrease of efficiency as a whole.

6 cl, 39 dwg

The invention relates to data processing systems

The invention relates to data processing systems

FIELD: engineering of data processing systems, which realize operations of type "one command stream and multiple data streams".

SUBSTANCE: system is disclosed with command (ADD8TO16), which decompresses non-adjacent parts of data word with utilization of signed or zero expansion and combines them by means of arithmetic operation "one command stream, multiple data streams", such as adding, performed in response to one and the same command. Command is especially useful for utilization in systems having a data channel, containing a shifting circuit before the arithmetic circuit.

EFFECT: possible use for existing processing resources in data processing system in a more efficient way.

3 cl, 5 dwg

FIELD: computing devices with configurable number length for long numbers.

SUBSTANCE: device consists of two computing device units, each of them divided into at least four subunits, which consist of a quantity of unit cells. Named units are spatially located so that the distance between unit cell of first unit and equal unit cell in the second unit is minimal. Computing device configuration can be changed using configurational switches, which are installed between device subunits.

EFFECT: increased performance of computing device, reduced time of data processing.

12 cl, 6 dwg

FIELD: physics, computer engineering.

SUBSTANCE: group of inventions relates to computer engineering and can be used in arithmetic processors. A processor receives at least one floating-point operand and performs a floating-point operation using at least one floating-point operand to provide a floating-point result. The method includes determining if a preferred quantum is stored in the floating-point result, said quantum indicating a value which is presented as the least significant digit of the significant of the floating-point result. An indication of the occurrence of a quantum exclusion is provided at the output in response to the determination that the preferred quantum is not stored.

EFFECT: high accuracy.

18 cl, 3 dwg

FIELD: physics, computer engineering.

SUBSTANCE: group of inventions relates to computer engineering and can be used to convert data. The method includes steps of obtaining, by a processor, a machine instruction for execution, wherein the machine instruction is defined for execution by the processor according to computer architecture and includes at least one operation code field which provides an operation code, the operation code identifying a conversion function from a zoned from decimal floating-point; a first register field defining a first operand cell; a second register field and a displacement field, wherein contents of a second register defined by the second register field are combined with contents of the displacement field to form an address of a second operand; and a sign directive used to indicate whether the second operand has a sign field; and executing the machine instruction, which includes converting the second operand in a zoned format to a decimal floating-point format; and placing the conversion result in the first operand cell.

EFFECT: high efficiency.

20 cl, 18 dwg, 6 tbl

FIELD: engineering of microprocessors and computing systems, in particular, engineering of devices for parallel conjunction of data with shift to the right.

SUBSTANCE: method includes in parallel with shift to left for 'L - M' data elements of first operand having first set of L data elements, second operand is shifted having second set of L data elements, to the right for M data elements, and aforementioned shifted first set is combined with aforementioned shifted second set for producing a result having L data elements.

EFFECT: efficient support of SIMD operations without substantial decrease of efficiency as a whole.

6 cl, 39 dwg

FIELD: engineering of data processing systems, which realize operations of type "one command stream and multiple data streams".

SUBSTANCE: system is disclosed with command (ADD8TO16), which decompresses non-adjacent parts of data word with utilization of signed or zero expansion and combines them by means of arithmetic operation "one command stream, multiple data streams", such as adding, performed in response to one and the same command. Command is especially useful for utilization in systems having a data channel, containing a shifting circuit before the arithmetic circuit.

EFFECT: possible use for existing processing resources in data processing system in a more efficient way.

3 cl, 5 dwg

FIELD: computer engineering.

SUBSTANCE: processor contains first logical means for preserving set of bit groups into non-adjacent groups of storage cells and second logical means for storing a copy of a set of non-adjacent bit groups. In accordance to method, set of bit groups is saved to set of non-adjacent storage cells and set of non-adjacent bit groups is copied into remaining groups of bit storage cells. System contains memory and processor for storing first bit group in first and second storage cell groups and for storing second bit group in third and fourth storage cells. Device contains execution module for storing bits [31-0] in positions [31-0] and [62-32], bits [95-64] in positions [95-64] and [127-96] of destination register bits.

EFFECT: possible use of single command for moving/loading, which provides for loading and following copying of series of bits of operand of source to register of destination.

4 cl, 5 dwg

FIELD: computer engineering, in particular, devices for priority servicing of requests.

SUBSTANCE: device contains request register, AND element, two OR elements, modulus two addition element, clock impulses generator, control trigger, counter, decoder, switches, additionally incorporated AND element, additionally incorporated OR elements, number of which is equal to capacity of request register.

EFFECT: increased speed of operation of device due to fast processing of single requests.

1 dwg

Up!