RussianPatents.com
|
Device to predict exceptional situation "accuracy loss" of "multiplication with accumulation" operation unit. RU patent 2498392. |
|||||||||||||||||||
IPC classes for russian patent Device to predict exceptional situation "accuracy loss" of "multiplication with accumulation" operation unit. RU patent 2498392. (RU 2498392):
|
FIELD: information technologies. SUBSTANCE: device comprises a subblock of prediction of a sum of fractional parts, a counter of senior zeros of the sum of fractional parts, registers of fractional parts of numbers, input registers of number exponents, a counter of junior zeros of the summand fractional parts, a subblock of calculation of a shift of levelling and prediction of a shift of preliminary normalisation, a comparator of early loss of accuracy, a counter of junior zeros of a sum of fractional parts, a comparator of late loss of accuracy. EFFECT: accelerated process of performance of a flow of independent commands of multiplication with accumulation with permitted exceptional situation of accuracy loss. 5 dwg, 1 tbl
The invention relates to the field of computer engineering, namely, to the computing systems based on microprocessors with coprocessors real and complex arithmetic. There are various versions of the structural organization of computing devices, microprocessors, dependent on ways to handle exceptional situations in operations over the real or complex numbers. So, systems support the exact exceptions IEEE754 standard, it is necessary the presence of a device that implements a mechanism to undo the results of all operations after the operation that caused the exception. Since the duration of most of the statements of real arithmetic more duration of integer operations, at simultaneous execution of integer and real instructions, it is possible that an integer instructions will be completed before the end of the material and cancel the result of the first very difficult. Therefore for correct operation the microprocessor in the mode permission of exceptional situations, such as the «overflow», «underflow», «loss of precision, the use of such a mode of its work, which at the time of execution of the current arithmetic operation to execute other commands in parallel current is not permitted. This mode significantly reduces the speed of the real calculations. In order to mitigate declines in output, you must use the device predictions exceptional situations, which is integrated in the Executive arithmetic subblock and provides an early calculation of sign of a possible exception. The predicted sign, will be used by the microprocessor for the decision about the issuance of arithmetic commands on the Executive subblocks. Efficiency of use of the device predictions exceptional situations depends on the accuracy of predictions and the stage of the execution of the instruction, which can be done prediction. Efficiency will be higher the greater the arithmetic instructions, provided that it is possible to realize the prediction in the early stages. Because computing section of modern microprocessors are built primarily with the use of physical modules multiplication stacked», ensure that the last device predictions of exceptional situations will be able to get better performance. Prediction of the exceptional situation «underflow» is more complicated than the prediction situations overflow and underflow» and fraught with complexity of predicting the availability of drop in rounding significant bits of the intermediate result. Usually such indication lost in the rounding bits calculated after the final normalization of the intermediate result. Known Executive subsection real arithmetic, performing «multiplication», «add», «multiplication stacked», division, square root and has the device predictions exceptional situations overflow and underflow (US patent 6,631,392, cl. G06F 7/38, publ. 7.10.2003). The disadvantage of this subunit it should be noted that the device predictions exceptional situations serves several arithmetic subblocks of real arithmetic (adder, multiplier, subsection square root/division) and this feature makes difficult the process of determining the membership of the predicted sign of a particular command. The closest technical essence and the achieved result is a device that performs the real operation of the «multiplication of accumulation» and contains accelerated scheme of calculation of bit-sign of loss of accuracy. Model calculation of bit-sign of loss of accuracy consists of the predictor values the running total, counter senior zeros forecasted amount, adder intermediate value, subcircuit normalization of the intermediate value, subcircuits disjunction included after each stage of subcircuit normalization, which determine the presence of a significant drop in rounding bits (patent US 5,771,183, cl. G06F 7/00, publ. 23.07.1998,). The disadvantage of this prototype is a structural feature that slows down the process of the execution of a thread to independent teams «multiplication of accumulation» (if included IEEE754 exceptions) and that is that the calculation of bit-sign of loss of accuracy is carried out after the main summation (with the spread of migration), and simultaneously with the process of normalization of the summation. This organization gives an opportunity to speed up the calculation of bit-sign of loss of accuracy and be completed simultaneously with the completion of the normalization process, but does not provide the possibility of early calculation and, therefore, the predictions of the bit-sign of loss of accuracy. This restriction does not give the form a sufficient condition flag exceptional situation loss of accuracy in the early stages of the pipeline module, and signaling about it the kernel of the microprocessor, not waiting for the end of calculation. The technical result of the invention consists in accelerating the process of the execution of a thread to independent teams «multiplication with the accumulation of» authorized by the exceptional situation «loss of accuracy». This technical result is achieved by that device predictions exceptional situation «underflow» unit operations «multiplication stacked», including a subblock predictions amount , counter senior zeros amount , according to the invention, provided input registers numbers, the input registers Exhibitor the numbers A, b and C, counter Junior zeros of the mantissa of term calculation shift alignment and predictions shift preliminary normalization, comparator early loss of precision, counter Junior zeros amount , comparator late loss of precision, while the counter Junior zeros the mantissa of the term is associated with the input register of a mantissa number for the implementation of counting the number of younger zeros of the mantissa of the summand With and comparator early loss of accuracy for comparison with the shift alignment emerging from sub-block of calculations shift alignment and prediction of the shift of the provisional normalization, associated with the input register of the exhibitors number And input register exhibitors number In the input register of the number of exhibitors With to obtain proof of the presence of significant bits of the leveled the mantissa of the term outside the bitness of internal data presentation and comparator late loss of accuracy associated with counter senior zeros amount and counter Junior zeros amount associated with predictions amount to obtain proof of the presence of significant bits of the mantissa is the result of a rounding part. The invention is realized on the example of block «multiplication with the accumulation of» single-precision, as the most simple and illustrated by drawings, where the figure 1 shows the block diagram of computing unit operations «multiplication with the accumulation of» single-precision without intermediate rounding with integrated scheme of prediction accuracy loss. Figure 2 presents a detailed block diagram of the subblock predictions amount . Figure 3 presents a detailed block diagram of counter Junior and senior zeros, and the comparator late loss of accuracy. In figure 4 and figure 5 presents illustrations to explain the algorithm of operation of the schemes of the predictions of early and late loss of accuracy. In case of application of the invention in the «multiplication with the accumulation of» other precision constants comparison Comparators, the bit counters zeros will depend on the bitness of the input numbers. The unit of calculation operations «multiplication with the accumulation of» single-precision without intermediate rounding with integrated scheme of prediction accuracy loss includes: input register the sign, And 1, input register sign In 2, input register the sign, With 3, the input register of a mantissa number 4, the input register of a mantissa number 5, the input register of a mantissa number 6, input register exhibitors number And 7, input register exhibitors number 8, input register number of exhibitors From 9, subsection multiplication 10, subsection alignment of the mantissa summand 11, counter Junior zeros of the mantissa of term 12, subsection calculation exhibitors 13, subsection calculation of the shear and alignment predictions shift preliminary normalization 14, comparator early loss of accuracy 15, compressor Zv2 16, subsection preliminary normalization 17, subsection predictions amount 18, counter Junior zeros amount 19, counter senior zeros amount 20, comparator late loss accuracy 21, subsection main normalization 22, error detector predictions 23, subsection corrector exhibitors the 24 detector overflow/underflow 25, detector sign the addition 26, multiplexer amount 27, multiplexer shift normalization 28, multiplexer total loss of accuracy 29, subsection post-correction of the exhibitors 30, subsection correction overflow/underflow 31, adder rounding 32, shaper operation symbol 33, subsection post-normalization 34, the output register of the sign of the 35, the output register of the mantissa the 36, the output register of the flag loss of accuracy 37, the output register the exhibitors 38, the output register of the vector exceptional situations 39. In turn, subsection predictions amount 18 consists of: subblock bitwise exclusive-OR 40, sub-block bitwise «OR» 41, sub-block bitwise exclusive «OR» 42, sub-block of left-shift 43, sub-block 44 mask bits [73:48], subblock 45 mask bits [48:0]element of the unification of the data bus 46. A set of counter Junior zeros amount 19, counter senior zeros amount 20 and comparator late loss of accuracy 21 consists of: counter 1 senior zeros bits [48:23] 47, counter 2 senior zeros bits [22:0] 48, counter Junior zeros bits [48:0] 49, multiplexer senior zeros 50, sub-block masking 51, shaper of constant comparison, 52, compressor 3B2 53 adder comparator late loss of accuracy 54, multiplexer correction 55. The above device works as follows. The block input received three operands encoded in a format IEEE754, each of the numbers presented field exhibitors size 8 bits (mantissa - 24-bit sign - 1 bit. In the first step of the scheme are the following: in multiplication 10 are multiplied operands A and obtained from the input registers of the mantissa number 4 and the mantissa is number 5. The calculation of the intermediate values of the exponent of the result in the calculation exhibitors 13, as well as the value of shift alignment and prediction shift preliminary normalization in calculation shift alignment, the predictions of the shift preliminary normalization 14 on the basis of data incoming from the input registers to the exhibitors, the number And 7, exhibitors of the number 8, number of exhibitors From 9. At the same time, counter Junior zeros of the mantissa of term 12, on the basis of data incoming from the input register of a mantissa With 6 counts the number of younger zeros of the mantissa summand C. The value of the younger zeros of the mantissa of the summand With compares with a shift of alignment in comparator early loss of accuracy 15, to determine if a significant bits of the leveled mantissa term beyond bit grid (74-bit) internal representation of the result. Further, once becoming willing to lower bits of the shift values alignment 56 (___), formed in calculation shift alignment, the predictions of the shift preliminary normalization of 14, is the alignment of the mantissa of the summand With the alignment of the mantissa summand 11. Algorithm of work prediction schemes of early loss of accuracy is illustrated in figure 4. 67 - the mantissa of the summand With input equalization scheme. 68 - number of younger zeros of the mantissa, 69 - zero bits, supplementing the bitness of the mantissa to the value 74 bit. 70 - the result of multiplying operands a and b In the form of excess with the preservation of the transfer. 71 - shift value alignment _. Flag of early loss of accuracy is equal to one if: __+(2*m+2)<_, or __+(2*m+2)<Expo + - - bias+m+3, or ,where Expo, , - exhibitors processed in the format ieee754, m - bitness of the mantissa input numbers, bias - the offset value of the exponent. All actions of the second stage of work is duplicated into two branches: one branch works on the assumption that the more the result of multiplying than the mantissa term, the second branch - provided that more the mantissa of the summand. This is done to subtraction had the opportunity to choose module operations, and not to convert a negative value. Each branch of the scheme includes the following steps over the results of the first stage: is the addition of the multiplication presented in the form of excessive preserving transfer, and leveled the mantissa of the summand in the compressor 3B2 16. In preliminary normalization 17 the preliminary normalization of the intermediate value addition on the maximum amount of the shift to the left, equal to 25 bits. The value of the shift preliminary normalization is formed in the first step of the module operation in calculation shift alignment, the predictions of the shift preliminary normalization of 14. Simultaneously with the process of preliminary normalization in predictions amount 18 is prediction of the significance of the amounts on the basis of output values 59 of 60 compressor 3B2 16, size of 56 shift alignment of the mantissa of the summand With values 58 leveled the mantissa of the term of The first stage. Detailed block diagram of the subblock predictions amount 18 is shown in figure 2. It is a simplified scheme of binary adder in which the value of , formed in 40, is the sum modulo two with a simplified value transfer, which is formed in the 41 and arrives at the current position. Simplified signal transfer takes into account only the bits in the previous position. Namely, when you sum binary numbers A n and B n , the value of the predicted amount Subblock predictions amount 18 logically divided into two parts: the oldest part (bits [73:48]) and the rest (bits [48:0]). As the value of the older part is taken bits value [73:48] from the leveled the mantissa of the summand With 58. To predict the values of the bits [48:0] amount used formula (2). Predicted values of the older part (bits [73:48]) and the rest (bits [48:0]) pass through the subblocks masking 44 and 45, in which, on the basis of the characteristic 57 finding shift alignment 56 in the range from 0 to 25 inclusive, is the formation of the correct value forecasted amounts, which are then combined element Association data bus 46. Formed in such a way inaccurate value of the amount being equal to the younger zeros, equal to the accurate value of the amount that would be obtained by using the full binary adder, and the number of senior zeros less than one or equal to the exact value. In table 1 are the possible values senior zeros in the running total capacity of 74 bit. Table 1Possible options values senior zeros in the running total Value of the shift in the alignment of the mantissa WITH, ___ >0, <25 >25The value of the senior zeros interval bits [73:48], __0 ___ >=25The value of the senior zeros interval bits [48:23], __1 0 countingThe value of the senior zeros interval bits [22:0], VAL ST 2 0counting, if __1=25 Counter Junior zeros amount 19 counts the younger of the zeros of the predicted amount. Counter senior zeros amount 20 counts of senior zeros predicted amount. Then the values of the younger and older zeros are compared with the constant in comparator late loss of accuracy 21 to determine if a significant bits of the mantissa is the result of a rounding part. A set of counter senior zeros amount 20, counter Junior zeros amount 19 and comparator late loss of accuracy 21 operates in the following way: field bits internal value of the sum is divided into three parts (table 1), which independently counts senior zeros. As the value of the senior zeros field bits [73:48] (__0 takes on the value of the shift in the alignment of the mantissa of the summand WITH 56 (___), if it does not exceed a value of 25. If ___>25 then the number of zeros in the field of bits [73:48] is 25 or 26. The value of the senior zeros field bits [48:23] (__1) is calculated in counter 1 senior zeros bits [48:23] 47 and is only considered if ___>25, otherwise it is 0. The value of the senior zeros field bits [22:0] (__2) is calculated in counter 2 senior zeros bits [22:0] 48 and is only taken into account if the number of older zeros of the entire field a running total is greater than or equal to the value of 51, i.e. __0+__1=51, and in this case does not occur later loss of precision, and there is no need to use __2 to calculate the late flag loss of precision, it is only used for the normalization of the. The value of the younger zeros field bits [48:0] is calculated in counter Junior zeros bits [48:0] 49. Algorithm of work prediction schemes late loss of accuracy is illustrated in figure 5. 72 - mantissa the operation multiplication stacked», presented in the form of excessive preserving transfer. 74 - predicted value of the amount, which has a number of younger zeros 77, coincides with the value of the number of Junior zeros of the full amount of 73, number of senior zeros 75 equal to the value of the number of senior zeros full Sumy 73 or a smaller one. Flag of the later loss of accuracy if: , where m is the bitness of the mantissa input numbers, 78 - shift value normalization. Thus, the comparator late loss of accuracy 21, compares the amount of senior and Junior zeros with a constant. If the flag 57 finding shift alignment 56 in the range from 0 to 25 inclusive set to "1", then as a senior zeros take the value of the shift alignment 56 (__0), and the comparison is equal to the decimal value 49. If the flag 57 finding shift alignment 56 in the range from 0 to 25 inclusive set to "0", then as a senior zeros is __1 and constant comparison is equal to the decimal value of 24. The value of the senior zeros field bits [22:0] (__2) is not served on a comparator, as in the case when it is necessary to consider not happening later loss of accuracy. __2, in masking 51, through the exit 66, is served on the subblock main normalization 22. Simultaneously with the prediction of the later loss of precision in main normalization 22 is the final normalization of the mantissa and the correction values for the primary shift on the basis of a correction signal delivered with sub-block of the detector prediction error 23. In parallel with the work branches of normalization and the predictions of the amount on the basis of data incoming from the input registers sign, And 1, the sign of a number 2, number sign With 3, the works and values 58 leveled mantissa term is calculated With the sign of the result of the addition in detector sign the addition 26. On the basis of the value of the sign 63, selected the normalized value of the mantissa, with the release of one of the branches in the multiplex amount 27, the number of senior zeros for the correction of exhibitors in the multiplex shift normalization 28. The value of a bit of total loss of accuracy is selected in the multiplex total loss of accuracy 29, where complemented by an early sign of overflow, formed in the detector overflow/underflow 25. The value of the General flag loss of accuracy arrives at the output register of the flag loss of accuracy 37. Interim adjustment exhibitors on the basis of the value of older zeros is held in correction schemes exhibitors of the 24. At this stage the operation of the scheme are the following: is addition and rounding the values of the mantissa is presented in the form of excessive in the adder rounding 32. Then is the final normalization rounded mantissa, because of a possible overflow in post-normalization 34. Overflow mantissa the problem may occur due to rounding. Under a possible signal overflow is the final adjustment of exhibitors post-correction of the exhibitors 30, from which the value enters the output register of the exhibitors 38. In correction overflow/underflow 31 formed vector exceptional situations, coming to the output register of the vector exceptional situations 39. In shaper operation symbol 33 forming the sign of the operation as a whole, which goes to the output register of the sign of the 35. The mantissa is the result of the operation comes on the output register of the mantissa the 36. Device predictions exceptional situation «loss the accuracy of the» unit operation «multiplication stacked», including a subblock predictions amount , counter senior zeros amount , wherein it is provided with input registers numbers, the input registers Exhibitor numbers A, b and C, counter Junior zeros of the mantissa addend calculation shift alignment and predictions shift preliminary normalization, comparator early loss of precision, counter Junior zeros amount , comparator late loss of precision, while the counter Junior zeros of the mantissa term is associated with input register the mantissa of a number With for the implementation of counting the number of younger zeros of the mantissa of the summand With and comparator early loss of accuracy for comparison with the shift alignment emerging from sub-block of calculations shift alignment and predictions shift preliminary normalization, associated with input register exhibitors number, the input register of the exhibitors number In the input register of the number of exhibitors With to obtain proof of the presence of significant bits of the leveled the mantissa of the term outside the bitness of internal data presentation and comparator late loss of accuracy associated with counter senior zeros amount and counter Junior zeros amount associated with predictions amount to obtain proof of the presence of significant bits of the mantissa is the result of a rounding part.
|
© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English. |