Data processing apparatus and method of switching workload between first and second processing circuitry

IPC classes for russian patent Data processing apparatus and method of switching workload between first and second processing circuitry (RU 2520411):

G06F9/50 - Allocation of resources, e.g. of the central processing unit (CPU)

Another patents in same IPC classes:

Method of coordinating multiple sensors / 2510147

Disclosed is a method of coordinating multiple sensors, wherein multiple networked computers control the sensors for the purpose of executing one or more services requested through the network. In order to coordinate with one another, the computers exchange information over the network in a manner such that each networked computer knows the characteristics of and the services executed by the other computers. Subsequently, the computers providing the same service execute the same configuration algorithm that defines the tasks to be carried out by each computer as a function of the characteristics of the computers providing said same service. Thus, the computers get to the same conclusions with respect to the tasks to be carried out, and control the sensors based on the tasks defined by the configuration algorithm.

Method of monitoring dead-end situations in information and communication system and apparatus for realising said method / 2509346

Method of monitoring dead-end situations in an information and communication system involves determining the values:

λ_{r_{i t}}

- mathematical expectation of failure rate of the i-th critical technical resource r_it, where i=1, 2, 3, …,

h_{r_{j p}}

- mathematical expectation of failure rate of the j-th critical software resource r_jp, where j=1, 2, 3,…, of the size q of the buffer memory zone of a unit of the information and communication system; setting t_tis - the time interval for scheduled execution of processes and calculating the value of the readiness factor K_rr using the formula:

K_{r r} = \prod_{i = 1}^{r i t} e \times \prod_{j = 1}^{r j p} e^{- h r_{j p} t_{t i s}} \times \prod_{n = 1}^{N} e^{- k @_{e} q_{n}} \sum_{i = 1}^{k - 1} \frac{{(k @_{e} q_{n})}^{i}}{i!},

where: i=1, 2, 3, …, is the number of critical resources r_it; j=1, 2, 3,…, is the number of critical software resources i_jp;

λ_{r_{i t}}

mathematical expectation of failure rate of the 1-th critical technical resource r_it; t_tis is the time interval for scheduled execution of processes;

h_{r_{j p}}

is the mathematical expectation of failure rate of the j-th critical software resource r_jp; k is the order of the approximating Erlang distribution with parameter @_e - Poisson intensity in the unit for an integer value of the size q of the buffer memory zone of the unit of the information and communication system; N is the total number of buffer memory zones in the information and communication system; q is the size of the buffer memory zone of the unit of the information and communication system; comparing the determined readiness factor K_rr with a threshold level K_{rr (0)} and if the condition K_rr < K_{rr (0)} is satisfied, there are dead-end situations in the information and communication system.

Hosting and broadcasting virtual events using streaming interactive video / 2503998

Method involves broadcasting a live twitch video game tournament in the form of one or more real-time compressed digital video streams from a hosting service centre to viewers over the Internet, and receiving an input control signal over the Internet from live twitch video game tournament players, each player interacting through a client device which does not run the actual twitch video game tournament and is located at a distance away from the hosting service centre.

Method for dynamic control of dead-end situations in information and communication system and apparatus for realising said method / 2502123

Apparatus has a point for installing an information and communication system, consisting of an information transmission line, a testing mode control unit, a unit for testing failure rate of the i-th critical process resource, a unit for testing failure rate of the j-th critical program resource, a unit for testing the size of buffer memory of the information and communication system unit, switches and units for turning on a self-contained testing mode, a control station consisting of an information transmission line, a control station control signal former, a testing mode selection unit, a control signal parameter control unit, an availability factor calculating unit, a unit for setting the time period for scheduled execution of processes, a comparator unit and a threshold setting unit.

Video compression system and method for reducing effect of packet loss in communication channel / 2493585

Method provides interactive audio/video streaming hosting based on receiving packet streams and routing them to one or more applications with the possibility of calculating data in response to control signal input, as well as parallel compression of parts of the data using a plurality of processing units. The method also provides logically subdividing each of a sequence of images of a video stream into a plurality of fragments, each having a defined position within each of the sequence of images; and packing the fragments into a plurality of data packets to maximise the number of fragments which are aligned with boundaries of each of the data packets.

Hierarchical infrastructure of resources backup planning / 2481618

Address is executed to a policy of system resources planning for operations of planning in a work load. The policy is set on the basis of the work load in such a manner that the policy depends on the workload. System resources are backed up for the work load in accordance with the policy. Backups may be of hierarchical nature, besides, the work loads are also hierarchically ordered. Besides, dispatching mechanisms for dispatching of work loads at system resources may be realised regardless of policies. Feedback in respect to usage of system resources may be used to determine selection of a policy for control of dispatching mechanisms.

Method, system and device to determine activity of processor core and cashing agent / 2465631

It is determined whether condition of device integrated package capacity is the condition of lower capacity, otherwise, it is identified, whether the number of processor cores put into energy-saving condition has changed, compared to the previous time period. The credit pool and usage of the missed addres turn are compared with the first threshold. Counters of multiple architecture events are compared with the second threshold, and the level of active power is set as based at least partially on results of the specified comparison.

Method and apparatus for managing resources in wireless communication device / 2460120

Apparatus configured to perform wireless communication with a base station includes: a processing module, having maximum processing capacity and operating to execute applications running in the apparatus and a controller which monitors processing needs of the applications and controls at least one application based processing needs and maximum processing capacity. A higher clock frequency can be selected for the processing module when processing needs exceed the upper threshold value, and a lower clock frequency can be selected when processing needs drop below the lower threshold value.

Method and system for executing program applications and machine-readable medium / 2454704

Invention describes a method, a system and apparatus for dividing a computing task into microtasks and designating execution of the microtasks over a time interval when required resources match one or more availability criteria. The microtasks are executed constantly, although only and only when resources required by the microtasks are not required by other tasks. The program which executes said method can operate for the entire time the computer is on without detriment to execution of other programs running in the same computing system.

Hard-wired method to plan tasks (versions), system to plan tasks and machine-readable medium / 2453901

Invention relates to the field of planning tasks to do on a computer. The method to plan tasks comprises detection of usage of the first resource and usage of the second resource: if usage of the first resource by one or more other tasks, which currently use the specified resource, is below the first threshold of usage, planning to perform the first computational task is on the first resource; if usage of the first resource by one or several other tasks is above the first threshold of usage, there is a delay in doing the first computational task on the first resource; if usage of the second resource by one or several other tasks, which currently use the specified resource, is below the second threshold of usage, planning to do the first computational task is on the second resource; if usage of the second resource by one or more other tasks is above the second threshold of usage, there is a delay in doing the first computational task on the second resource; besides, the method is performed by a computational system comprising a processor.

/ 2265881

/ 2268489

/ 2268490

/ 2268491

/ 2273876

/ 2287179

/ 2290684

/ 2296362

/ 2319198

/ 2320002

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to a data processing apparatus and a method of switching a workload between first and second processing circuitry. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. During handover operation, the switch controller is enables to cause the source processing circuitry to make its current architectural state available to the destination processing circuitry, the current architectural state being that state which is not available from shared memory at a time the handover operation is initiated, and which is necessary for the destination processing circuitry to successfully take over performance of the workload from the source processing circuitry.

EFFECT: high efficiency of using power by a data processing apparatus.

20 cl, 19 dwg

The technical field to which the invention relates

The present invention relates to the processing unit and method of switching the workload between the first and second circuitry processing, and in particular to a method of fulfilling this switch to increase the efficiency of energy use by the processing unit.

Prior art

In modern data processing systems, the difference in performance requirement between tasks requiring high performance, such as the functioning of the games, and tasks requiring low productivity, such as playing MP3 files can exceed the ratio of 100:1. In the case of a single processor for all tasks, the processor must have high performance, but for microarchitecture processors there is an axiom that a high-performance processors less energy efcient than inefficient processors. It is known that to increase the efficiency of energy use at the processor level using techniques such as Dynamic voltage scaling and frequency (Dynamic Voltage and Frequency Scaling DVFS) or selective power supply to provide the processor with the range of performance levels and corresponding characteristics is intikami energy consumption. However, usually these methods is not sufficient to allow one processor to take on the task with the specified difference in the performance requirements.

Accordingly, the proposed use of multi-core architecture to ensure system with efficient use of energy to perform these different tasks. While, with the possibility of different cores in parallel to perform different tasks to increase throughput, multi-core systems for some time are used to increase productivity, the study of how such systems can be used to improve the efficiency of energy use, is the latest achievement of technology.

In the article "Towards Better Performance Per Watt in Virtual Environments on Asymmetric Single-ISA Multi-Core Systems", V Kumar and others, ACM SIGOPS Operating Systems Review, Volume 43, Issue 3 (July 2009) discusses multi-core systems with Asymmetric single architecture instruction set (Asymmetric Single Instruction Set Architecture, ASISA), consisting of multiple cores, providing an identical architecture instruction set (ISA), but differ in features, complexity, power consumption and performance. In this article we investigate the properties of virtualized workloads to understand how these workloads should be planned in si the topics ASISA, to improve the performance and energy consumption. In this article indicates that certain tasks are more suitable for microarchitecture team with high frequency/performance (usually tasks that require a lot of computing power), while others are more suitable for microarchitecture team with less frequency/performance, and as a side effect, consume less energy (usually tasks that require performance I/o). Although these studies show how you can use system ASISA to perform various tasks in efficient use of energy, there is still a need to provide a mechanism for scheduling individual tasks for more suitable processors, and such management planning is usually a significant load on the operating system.

In the article "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction", R Kumar and others, Proceedings of the 36th International Symposium of Microarchitecture (MICRO-36'03) discusses multi-core architecture, in which all cores execute the same instruction set, but have different capabilities and levels of performance. At run time, the system software estimates the resource requirements of the applications and selects the core of a good satisfies these requirements, along with the fact that minimize the duty to regulate energy consumption. As discussed in section 2 of this article, during execution of the application software of the operating system maps this app with different kernels to choose a kernel that meet specific selection criteria, such as specific performance requirement. In section 2.3 it is noted that there are costs to switching cores, which will inevitably entail the constraint granularity switching. The following paragraphs discuss a specific example in which, if the operating system is based on the logic decides to switch, it turns the power of the new engine, starts to clear cache and saves all modified cache data in shared memory structure, and then sends a signal into the new kernel to run in a predefined entry point into the operating system. The power of the old kernel can then be turned off, while the new kernel fetch data from memory. This approach is described in section 2.3 as providing the ability to switch applications between cores by the operating system. In the rest of the article discusses how dynamically it is possible to perform the specified switch in the conditions of multi-core environment with the aim of reducing energy consumption.

Although in the above article discusses the sweat is dzielna the possibility to reduce energy consumption through heterogeneous multi-core architectures with a single ISA, you are still required to provide the operating system with sufficient functionality to enable decision making on planning for individual applications. When switching between instances of processors with different architectural features, the Function of the operating system in this respect is becoming more complex. In this regard, it should be noted that kernel Alpha EV4-EV8 are considered in this article as not fully compatible with ISA, as discussed, for example, in the fifth paragraph of section 2.2.

In addition, this article does not solve the problem of the existence of the significant costs associated with switching applications between cores, which can significantly reduce the benefit from the specified switch.

The invention

From the point of view of the first aspect, the present invention provides a processing unit, comprising: first circuitry processing for performing data processing operations, the second layout processing circuit for performing data processing operations, and the first circuitry processing is architecturally compatible with the second circuitry processing, so that the workload is performed by the processing unit, can be performed or the first layout processing circuit, or the second arrangement, and therefore the processing, and referred to the workload contains at least one application and at least one operating system for the execution of the said at least one application, and the first circuitry processing differs from the point of view of architecture from the second schematic layout processing so that the performance of the first schematic layout processing is different from the performance of the second composition processing circuit, and the first and second layout processing circuit is configured so that the workload is one of the first schematic layout processing and the second schematic layout processing at any point in time, the controller switch, responsive to the control action to transfer, to perform a transfer service to transfer the execution of the workload from the original schematic layout processing on the target layout processing circuit, where the initial layout of the processing schemes is one of the first schematic layout processing and the second schematic layout processing, and the target layout processing circuit is different from the first layout processing circuit and the second composition processing circuit, and the controller switch is configured to, during operation of the transmission service: (i) the challenge of providing a source circuitry processing and its current state architecture of the target layout processing circuit, with the current state of the architecture is something that is not available from shared memory shared between the first and second circuitry of the processing at the time of initiation of the transfer operation of the service, and which is necessary to the target circuitry processing in order to successfully take on the workload of the original layout of the processing schemes, and (ii) masking predetermined specific processor configuration information of said at least one operating system, so that the transfer of the workload is transparent to the aforementioned at least one operating system.

According to the present invention, the processing unit is provided with first and second circuitry processing, which are architecturally compatible with each other, but different from the point of view of the microarchitecture. Due to the architectural compatibility of the first and second schematic layout processing, workload, which consists not only of one or more applications, but also includes at least one operating system to perform one or more of these applications can be moved between the first and second circuitry processing. In addition, since the first and second schematic layout processing I have are different from the point of view of the microarchitecture, the performance characteristics (and therefore, the characteristics of energy consumption) of the first and second schematic layout processing are different.

According to the present invention, at any point in time, the workload is one of the first or second processing circuit, and the controller switch responds to the control action to transfer to perform a transfer service to transfer the execution of the workload between the processing schemes. When receiving a control action to transfer, one of the two processing schemes, which currently performs workload is considered as the original layout of the processing circuit, and the other is treated as the target circuitry processing. Controller switch, responsible for the operation of transmission service calls providing the current state of the original architecture of the circuitry processing the target circuitry processing and predefined masks for specific processor configuration information from at least one operating system, forming part of the workload, so that the transfer of the workload is transparent to the operating system.

Using the present invention it is possible to carry the entire workload from one computer is shutdown processing circuit to another when masking this migration from the operating system and ensure that what is the necessary condition of architecture, which is not available in the shared memory at the time of initiating the operation of the transmission service provided to the target circuitry processing, so that it can successfully take on the workload.

When considering the entire workload as a macroscopic object, which is performed only on one of the first and second processing schemes at any particular time, the method of the present invention provides the ability to quickly switch the workload between the first and second processing schemes transparently for the operating system, and at the same time ensures that the target processing circuit has all the information necessary to ensure her ability to take on the workload. This approach solves the previously mentioned problems that arise in the use of the operating system to manage planning applications for specific processing schemes, and, as found, provides the ability to obtain significant savings in energy consumption.

In one embodiment, the processing unit also contains: the layout of the power management schemes for independent power management provided by the first layout processing circuit and the second comp is plant processing circuit, moreover, to implement the control action to transfer the target layout processing circuit is in the power saving mode, and during the transfer operation maintenance circuitry power management causes the output of the circuitry of the processing from the power saving mode before the target layout for processing will take over the running workload. Using the specified schematic layout of the power management can reduce the energy consumed by the circuitry processing, which currently does not perform the workload.

In one embodiment, after the transfer operation of the service, circuitry power management causes a transition schematic layout processing in the power saving mode. This can occur immediately after surgery, transmission service, or in alternative embodiments, the initial layout processing circuit may be configured to transition to the power saving mode only after some predefined period of time, which can ensure the availability of data is still stored in the memory of the original layout of the processing schemes, the target circuitry processing with greater energy efficiency and higher performance.

The following is a problem which exists in the prior art, regardless of the method, which switches between different processing schemes, is how quickly and efficient use of energy to transfer the information required for this transfer was successful. In particular, it is necessary to provide the above-mentioned current state of the target architecture the layout of the processing schemes. One way to achieve this is to rewrite all of this current state of architecture in the shared memory as part of the transfer operation of the service, so that the target circuitry processing can subsequently be read from the shared memory. As used in this description, the term "shared memory" refers to memory that can perform direct access as the first layout processing circuit, and the second circuitry processing, such as main memory, is connected with the first and second layout processing circuit via the interconnect.

However, the problem that occurs when copying only the current state of architecture in the shared memory, is that this process not only takes a considerable amount of time, but also consumes considerable energy, which can significantly Nate elisavet potential benefits which can be obtained by performing a switch.

According to one variant of implementation, during the migration operation, the controller switch causes the application source circuitry processing fast track mechanism to provide its current state architecture to the target circuitry processing without contacting the target schematic layout processing to the shared memory to obtain the current state of architecture. Therefore, according to the specified options for implementation, provided the mechanism by which avoids the need to transfer the state of architecture through shared memory, in order to provide its target layout processing circuit. As a result, not only improves the performance during the migration operation, but also reduces the energy consumption associated with the transfer operation.

In one embodiment, at least referred to the original layout of the circuits has an associated cache, the processing unit also contains the circuitry controlling the display, and a fast mechanism includes a transfer of the current state of architecture in the target layout processing circuit using the associated cache and the above-mentioned schematic layout of the control view.

According to e is th way local cache of the original layout of the processing schemes used to save the current state of the architecture that must be provided to the target processor. This condition is further marked as shared, which provides the ability to view the status of the target layout processing circuit using the schematic layout of the control view. Therefore, in this embodiment, the first and second circuitry processing are coherent with each other hardware caches, this reduces the amount of time, energy, and complexity of the hardware associated with switching from the original schematic layout processing on the target circuitry processing.

In one specific embodiment, the Overdrive mechanism is a mechanism for saving and restoring, which causes the preservation of the original circuitry processing its current state architecture associated with the cache, and causes the execution of the target circuitry processing the restore operation, whereby the circuitry controlling the display retrieves the current state of architecture from the associated cache of the original schematic layout processing and ensures that the extracted current state of architecture in the target circuitry processing the I. The mechanism of save/restore provides a particularly effective way to save the state of architecture in the local cache of the original schematic layout processing, and subsequently retrieve this state for the target schematic layout processing.

This approach can be used regardless of whether the target circuitry processing its own associated local cache or not. Each time the circuitry controlling the display accepts the request on the state of architecture, either directly from the target schematic layout processing or from the associated local cache of the target schematic layout processing in the event of a cache miss, it establishes that this required element of the state architecture is stored in the local cache associated with the source circuitry, and retrieves the data from the local cache of the original schematic layout to return to the target circuitry processing (either directly or through an associated cache of the target schematic layout processing, if it exists).

In one specific embodiment, the target layout processing circuit has an associated cache, in which a portable state of architecture, received by the circuitry controlling the display is maintained for treatment (to him) relevo the schematic layout processing.

However, the approach of hardware cache coherence, described above, is not the only method that can be used to provide the previously mentioned fast track mechanism. For example, in an alternative embodiment, the Overdrive mechanism includes a dedicated bus between the original layout of the processing circuit and the target layout processing circuit, in which the original layout of the processing circuit provides the current state of architecture in the target layout processing schemes. Despite the fact that in this approach there is usually a large hardware costs than when you approach the cache coherency, it provides a more rapid method of migration, from which it is possible to obtain advantages in certain implementations.

The controller switching can take many forms. However, in one embodiment, the controller switch contains at least virtualization software logically separating the at least one operating system from the first layout processing circuit and the second schematic layout processing. Know the use of virtual machines to enable execution of applications written using a specific set of native commands on the hardware, with excellent nab the R own teams. Applications are executed in a virtual machine environment, and team applications are native to the virtual machine, but the virtual machine is implemented by software running on hardware that has a great set of native commands. Virtualization software provided by the controller switching the above-mentioned variants of implementation, can be considered as functioning similar to a hypervisor in a virtual machine environment, as it provides separation between the workload and the underlying hardware platform. In the context of the present invention, virtualization software provides an efficient mechanism for migrating a workload from one schematic layout processing on another layout processing circuit for masking specific processor configuration information from the operating(nd) systems(s)that form(s) for this workload.

The control action for the transfer can be generated for many reasons. However, in one embodiment, the timing control for the transfer is chosen to improve the energy efficiency of the processing unit. This can be achieved in many ways. For example, can be installed counters about what socialnet for counting events, performance-sensitive (for example, the number of executable commands or the number of operations load/store). Compared with the cycle counter or system timer, it provides an opportunity to identify that runs an application that requires very large computing power, which may be better served by switching to a layout processing circuit with higher performance, the identification of a large number of operations, load/store, pointing to the application requiring the performance of IO (input/output), which can better be served on the layout of the processing schemes with efficient use of energy, and so an Alternative approach is to profile the application and mark them as "large" "small" or "large/small", the result that the operating system can serve as an interface with the controller switch to move the workload, respectively (here the term "large" refers to the arrangement of the processing circuit with higher performance, as the term "small" refers to the layout of the processing schemes with efficient use of energy).

The state of the architecture that is required for the target schematic layout processing in order to successfully undertake the execution of the workload from the original schematic layout clicks the processing, can take many forms. However, in one embodiment, the state of the architecture contains at least the current value of one or more special registers the original layout of the processing schemes, including the program counter. Along with the value of the program counter, and various other information that can be stored in special registers. For example, other special registers include registers processor status (for example, CPSR and SPSR in the ARM architecture), which contain the control bits for mode processor, masking interrupts, operating status and flags. Other special registers include architectural oversight (control register CP15 system in the ARM architecture), which contains the bits to change endianness of data, enable or disable the MMU, enable or disable caches data/commands, etc. In other special registers in CP15 stores information about the status and location of the exception.

In one embodiment, the state of the architecture also contains the current values stored in the architectural register file the original layout of the processing schemes. Specialists in the art it is obvious that the architectural register file contains registers that are accessed by commands executed in time executed the I application moreover, these registers contain the source operands for the calculations and provide the address of the memory locations in which are stored the results of these calculations.

In one embodiment, at least one of the first schematic layout processing and the second composition processing circuit contains a single processing unit. In addition, in one embodiment, at least one of the first schematic layout processing and the second composition processing circuit contains a group of processing units with the same microarchitecture. In one particular embodiment, the first circuitry processing can contain a group of processing units with the same microarchitecture, while the second layout processing circuit contains one processing unit (with excellent microarchitecture compared to the microarchitecture processing units within the group, forming the first layout processing schemes).

The power saving mode, in which the circuitry power management can selectively transfer the first and second processing schemes can take many forms. In one embodiment, the power saving mode is one of: off mode power, partial/full save data or standby mode. To a person skilled in the art such regimes are obvious, and is therefore, its, in this description are not discussed in more detail.

There are several ways of performing the first and second processing schemes with different microarchitecture. In one embodiment, the first layout processing circuit and the second circuitry processing are different from the point of view of the microarchitecture of the presence of at least one of: different lengths of the Executive pipeline and various Executive resources. Differences in the length of the conveyor usually result in differences in operating frequency, which, in turn, have an impact on performance. Similarly, differences in Executive resources have an impact on throughput and, consequently, performance. For example, the processing circuit having greater resources Executive, provides the ability to process more information at any given time, thus increasing throughput. Additionally, or alternatively, one processing circuit may have more Executive resources than the other, for example, a greater number of arithmetical and logical unit (ALU, ALU), which also increase throughput. As another example, different resources Executive, the processing circuit with efficient use of energy can be provided by a simple pipeline is a sequence of execution of commands in order, while the processing circuit with higher performance can be achieved superscalar pipeline with reordering the sequence of commands.

Another problem that can occur when using high-performance processing schemes, for example, operating frequency, measured in gigahertz (GHz), is that these processors are approaching the limits for heating, the operation in which they are intended, and sometimes exceed them. Known methods of finding solutions to these problems can include the transfer processing circuit mode with low power consumption to reduce heat dissipation, which may include skipping a separate clock and/or voltage reduction, and possibly even a complete shutdown of the processing circuit for a certain period of time. However, during the implementation of the method of embodiments of the present invention is possible to implement an alternative approach to eliminate the deviation limits for heating. In particular, in one embodiment, the original layout of the processing circuit has a higher performance than the target circuitry processing, and the processing unit also contains the circuitry of the current control of the heating current control of the heat source schematic layout printing handling the key and start the mentioned control action to transfer, when the aforementioned heat dissipation reaches a predetermined level. In accordance with these methods, the entire workload can be moved from the layout processing schemes with better performance on the layout of the processing schemes with reduced performance, after which it will stand out less heat, which will provide the possibility of cooling the original layout of the processing schemes. Therefore, the set containing the two patterns can be cooled while continuing execution of the programs, albeit with less bandwidth.

The processing unit may be performed in a variety of ways. However, in one embodiment, the first layout processing circuit and the second circuitry processing are located on the same integrated circuit.

From the point of view of the second aspect, the present invention provides a processing unit, comprising: first processing means for performing data processing operations, the second processing means for performing data processing operations, and the first processor is architecturally compatible with the second processing means, so that the workload is performed by the processing unit, can be performed on any of the first processing means and second processing tools, and referred to the servant of the tea payload contains, at least one application and at least one operating system for the execution of the said one of the application, where the first processor is different from the point of view of architecture from the second means for processing, so that the performance of the first processing means is different from the performance of the second processing means, and first and second processing means is configured so that the workload is one of the first processing means and second processing tools at any point in time, the tool transfer control, responsive to the control action to transfer, to perform a transfer service to transfer the execution of the workload from the source processing tools on the target processor, and the original process is one of the first processing means and second processing means, and the target processor is different from the first processing means and second processing tools, and the tool transfer control (with option), during the operation of the transmission service: (i) call to provide the initial means of processing their current state architecture of the target processor, and the current state of the architecture is the state that is not DOS is available from the shared memory means, shared between the first and second processing means, at the time of initiation of the transfer operation of the service, and which is required on the target processor in order to successfully undertake the execution of the workload from the source processing tools, and (ii) masking predetermined specific processor configuration information of said at least one operating system, so that the transfer of workload is transparent to the aforementioned at least one operating system.

From the point of view of the third aspect, the present invention provides a method of operation of the device data containing the first layout processing circuit for performing data processing operations and a second arrangement of the processing circuit for performing data processing operations, and the first circuitry processing is architecturally compatible with the second circuitry processing, so that the workload is performed by the processing unit, can be performed or the first layout processing circuit, or the second circuitry processing, and mentioned the workload contains at least one application and at least one operating system for the execution of the said at least one application, and per the second circuitry processing differs from the point of view of architecture from the second schematic layout processing, so the first performance of the circuitry of the processing differs from the performance of the second schematic layout processing, and this method includes the steps of: executing, at any point in time, the workload on one of the first schematic layout processing and the second schematic layout processing, performing, in response to the control action for transfer operations transfer service to transfer the execution of the workload from the source schematic layout processing on the target layout processing circuit, where the initial layout of the processing schemes is one of the first schematic layout processing and the second schematic layout processing, and the target layout processing circuit is different from the first composition processing circuit and the second composition processing circuit, during operation of the transmission service: (i) the challenge of providing a source circuitry processing its current state architecture of the target layout processing circuit, and the current state of the architecture is something that is not available from shared memory shared between the first and second circuitry of the processing at the time of initiation of the transfer operation of the service, and which is necessary to the target circuitry processing in order to successfully take on the workload seshadri schematic layout processing, and (ii) masking predetermined specific processor configuration information of said at least one operating system, so that the transfer of workload is transparent to the aforementioned at least one operating system.

Brief description of drawings

Further, the present invention is described, for example only, with reference to his ways of implementation, illustrated in the accompanying drawings, in which:

Figure 1 - block diagram of a data processing system in accordance with one embodiment.

Figure 2 shows schematically the software controller switch (this description also called controller transfer workload) in accordance with one embodiment for a logical division of the workload performed by the processing unit, from a particular hardware platform within data processing units used to perform this workload.

Figure 3 - diagram, which schematically depicts the steps performed as the source processor and the destination processor in response to the control action to switch to migrate a workload from a source processor to the destination processor, in accordance with one embodiment.

On figa schematically depicts the saving of the current state of the architecture of the original schematic layout of the processing associated with the cache during the save operation in figure 3.

On FIGU schematically shows a control block is used by the view to control the transfer of the current state of the architecture of the source processing circuitry in the target schema are processed during a restore operation in figure 3.

Figure 5 shows an alternative structure to ensure rapid mechanism for transferring the current state of the architecture of the original schematic layout processing on the target layout processing circuit during the transfer procedure, in accordance with one embodiment.

On figa-6I schematically depicts the steps of transferring the workload from a source pattern on the target processing circuit in accordance with one embodiment.

7 is a graph which shows the change in the efficiency of energy use depending on performance, and which illustrates how the different cores of the processor depicted in figure 1, are used at various points along this curve, in accordance with one embodiment.

On figa-8B schematically depicts inefficient processor pipeline and a high-performance processor pipeline, respectively, used in one embodiment.

Fig.9 is a graph which shows the change in energy consumed by the processing system Yes is the data when execution of the workload processing is switched between input processing circuit with high energy efficiency and high performance processing circuitry with a low energy efficiency.

Description of embodiments

Figure 1 is a block diagram that schematically illustrates a data processing system in accordance with one embodiment. As shown in figure 1, the system contains two architectural compatible instance of the processing schemes (layout 0 10 processing circuit and layout 1 50 processing schemes), but they are different instances of the processing schemes have different microarchitecture. In particular, the arrangement 10 processing circuit has a capability of functioning with a higher performance than the layout 50 processing schemes, but with a compromise in regard to the arrangement 10 of the processing schemes are less efficient use of energy than the arrangement 50 of the processing schemes. Examples microarchitecture differences described in more detail below with reference to figa-8B.

Each processing circuit may include a single processing unit (also called core), or, alternatively, at least one of the instances of the processing circuit can contain a group of blocks of the processing with the same microarchitecture.

In the example depicted in figure 1, the processing scheme 10 includes two cores 15, 20 processor, which are identical in architecture and microarchitecture. On the contrary, the circuit 50 processing contains only one core 55 of the processor. In the following description, the cores 15, 20 processor called "big" cores, while the core 55 of the processor is called "small" kernel, because the cores 15, 20 processor are typically more complex than the core 55 of the processor, due to the fact that these cores are designed with high performance, while the core 55 of the processor, in contrast, is usually less complicated, due to the development taking into account the efficiency of energy use.

In figure 1, it is assumed that each of the cores 15, 20, 55 has its own associated cache 25, 30, 60 level 1, respectively, which can be performed as a combined cache for storing both instruction and data handling (to him) of the associated engine or can be performed with the Harvard architecture, providing separate caches level 1 commands and data. Despite the fact that presents that each of the cores has its own associated cache level 1, this is not a necessary condition, and, in the alternative implementation, one or more cores may not have the local cache.

Variance implementation presented in figure 1, the arrangement 10 of the processing circuit also includes a cache 35 level 2, shared between the core 15 and core 20, and is used unit 40 controlling the display to provide the cache coherency between the two caches, 25, 30 level 1 and level cache 35 level 2. In one embodiment, the cache level 2 is designed as an inclusive cache, and therefore, any data stored in any of the caches, 25, 30 level 1 that are also in the cache 35 level 2. As it is obvious to experts in the art, the purpose of the block 40 management view is the provision of the cache coherency between the different caches, so any core 15, 20 can always be ensured access to the most recent version of any data, when it issues a request for access. Therefore, only as an example, if the core 15 issues a request for access to records that are not in the associated cache 25 level 1 unit 40 controlling the display of the intercepts this request as transmitted next from the cache 25 level 1 and sets the address to the cache 30 level 1 and/or cache 35 level 2 whether this is an access request to be serviced based on portions of the contents of one of these other caches. Only if data is missing in all of these caches, the access request is transmitted via the interconnect 70 in the main memory 80, and the main memory 80 is what I memory which is shared by the arrangement 10 of the processing schemes and layout 50 processing schemes.

Unit 75 controls the view provided in the interconnect 70, operates similarly to the control unit 40 viewing, but in this case, the goal is to maintain coherency between the cache structure provided in the layout 10 processing circuit, and a cache structure provided in the arrangement 50 of the processing schemes. In examples in which the cache 35 level 2 is an inclusive cache, the control unit view supports hardware cache coherency between the cache 35 level 2 link 10 processing schemes and cache 60 level 1 link 50 processing schemes. However, if the cache 35 level 2 is designed as an exclusive cache level 2, unit 75 of the control view is also looking at the data in the cache 25, 30 level 1, to ensure the cache coherency between the caches layout 10 processing schemes and cache 60 composition 50 processing schemes.

According to one variant of implementation, only one of the link 10 processing circuit and layout 50 processing schemes actively handles the workload at any point in time. For the purposes of this application it is possible to assume that the workload contains at least one application and at least one operating system to perform this, at least, one application is to schematically by the reference position 100 figure 2. In this example, the application 105, 110 are performed under control of an operating system 115, and together applications 105, 110 and the operating system 115 form a workload of 100. It can be assumed that the applications are located at the user level, while the operating system is located at a privileged level, and in aggregate workload generated by the application and the operating system running on the hardware platform 125 (representing the views of a device-level). At any point in time, this hardware platform is based either on the layout 10 processing schemes or linking 50 processing schemes.

As shown in figure 1, the layout 65 power management is provided for selective and independent feeding composition 10 processing circuit and layout 50 processing schemes. To migrate a workload from one treatment to another, usually only one of these processing schemes fully powered, i.e. the processing scheme, which is currently performing the workload (the original layout of the processing schemes), and the other processing circuitry (target circuitry processing) usually is in the power saving mode. When it is established that the working load must be transferred from one treatment to another, during migration operations when there is a period of time, when both patterns are in a state with the power on, but at some point in time after the move operation, the source processing circuitry, which transferred the workload is transferred to the power saving mode.

The power saving mode may take various forms, depending on the implementation, and hence, for example, may be one of the off mode power, partial/full save data, hibernation or standby mode. To a person skilled in the art such regimes are obvious and, accordingly, this description is not discussed in more detail.

The purpose of the described embodiments is to perform switching of workload among processing schemes depending on the required level of energy performance/workload. Accordingly, when the workload includes the performance of one or more tasks that require high performance, for example, execution of applications-games, this workload can be executed on high-performance processing scheme 10 using one or two large nuclei 15, 20. But, on the contrary, when the workload is performing only tasks requiring low productivity, for example, MP3 files, all the working load can be transferred to the circuit 50 processing to obtain benefits from the efficient use of energy, which can be implemented through the use of circuit 50 is processed.

For the best use of these opportunities switching, it is necessary to provide a mechanism that provides the possibility of switching a simple and effective way, so that when migrating a workload is not consumed this amount of energy, which negates the benefits received from the switch, and to ensure that the switching process is quick enough he any not significantly degrade performance.

In one embodiment, these benefits are obtained at least partially by performing layout 10 processing circuit so that it is compatible with the architectural layout 50 processing schemes. This provides the ability to move workloads from one schematic layout processing to another while ensuring proper functioning. As an absolute minimum specified architectural compatibility requires the joint use of both circuits 10 and 50 processing identical architecture instruction set. However, in one embodiment, the specified architectural compatibility also entails a higher compatibility requirements to ensure that these the VA instance of processing schemes are perceived by the programmer as identical. In one embodiment, this involves the use of identical registers of architecture and one or more special registers that store data used by the operating system during execution of the application. With the specified compatibility level of the architecture is possible masking from the operating system 115 transfer of workload among processing schemes, so that the operating system does not know that, does the workload on the layout 10 processing schemes or linking 50 processing schemes.

In one embodiment, the service transfer from one treatment to another controller manages the switch 120, depicted in figure 2 (also called there virtualisation, but in other places of this description controller transfer workload). The controller switching can be carried out by a combination of hardware, software and hardware and/or software components, but in one embodiment, includes software, similar in nature to the software hypervisor, which is provided in the virtual machines to enable execution of applications written within the same set of native commands on the hardware platform, which introduced a different set of custom is the R commands. Due to architectural compatibility between the two circuits 10, 50 processing controller switch 120 may mask the transfer from the operating system 115 only by masking one or more members of a specific processor configuration information from the operating system. For example, a particular processor configuration information may include part of the content of the register ID of the CPU CP15 registers and the type of cache CP15.

In the specified embodiment, the controller switch then only necessary to provide the possibility that the current state of architecture, fixed the source processing circuitry in the time of the transfer, and which is not (condition) when the transfer is initiated already available in the shared memory 80, is available for the destination processing circuitry, to enable this target schema to successfully take on the workload. Using the previously described example, the specified state architecture usually contains the current values stored in the architectural register file the original layout of the processing schemes, together with the current values of one or more special registers the original layout of the processing schemes. Due to architectural compatibility the hem 10, 50 treatment, if this is the current state of the architecture can be migrated from the source processing circuitry in the target schema processing, the destination processing circuitry can successfully take on the execution of the workload from the source schema processing.

While the architectural compatibility of the circuits 10, 50 processing facilitates the transfer of the entire workload between the two processing schemes, in one embodiment, circuit 10, 50 processing are different from each other from the point of view of architecture, so there are different performance characteristics and, therefore, the characteristics of energy consumption associated with these two processing schemes. As previously discussed, in one embodiment, the processing scheme 10 is a high-performance processing circuitry with high power consumption, while the circuit 50 processing is the processing circuitry with less performance and less energy consumption. These two patterns may differ from each other from the point of view of the microarchitecture in many ways, but typically have at least one of the different lengths of the Executive pipeline and/or different resources Executive. Differences in the length of the conveyor usually result in differences in operating frequency, which, in turn, have effect for the e on performance. Similarly, differences in Executive resources have an impact on throughput and, consequently, performance. Therefore, for example, the arrangement 10 of the processing schemes may have more significant resources Executive and/or more Executive resources to increase throughput. In addition, the conveyors in the cores 15, 20 processor can be configured to superscalar processing by reordering the sequence of commands, while the more simple the core 55 in the circuit 50 processing with efficient use of energy can be in the form of a conveyor belt with the sequence of execution of commands in order. Further discussion microarchitecture differences are provided below with reference to figa and figv.

Generating a control action to migrate to cause the initiating controller switch 120 operations transmission maintenance to migrate a workload from one treatment to another, can be run for many reasons. For example, in one embodiment, may be included in the application profile, and they can be marked as "large" "small" or "large/small", resulting in the operating system may serve as an interface with the controller switch to move the workload accordingly. SL is therefore in this approach, the generation of control action to transfer may be displayed in a specific combination of executable applications to ensure that when you need high performance, workload is executed on high-performance processing scheme 10, whereas when such performance is not required, instead, it uses a scheme 50 processing with efficient use of energy. In one embodiment, can be executed algorithms to dynamically determine when to start the transfer of workload from one treatment to another, based on one or more input values. For example, can be installed performance counters for counting events, performance-sensitive (for example, the number of executable commands or the number of operations load/store). Compared with the cycle counter or system timer, it provides an opportunity to identify that runs an application that requires very large computing power, which may be better served by switching to a layout processing circuit with higher performance, or to identify a large number of operations load/store, pointing to the application requiring the performance of IO (input/output), which can is better served on the layout of the processing schemes with efficient use of energy, etc.

As another example, can be generated when the control action to transfer, the data processing system may include one or more temperature sensors 90 for monitoring the temperature of this data processing system during operation. This may be the case when modern high-performance processing circuitry, for example, operating frequency, measured in gigahertz (GHz), sometimes approaching the limits for heating, the operation in which they are intended, or even exceed them. By using these heat-sensitive sensors 90 can detect the approaching limits on heat, and these conditions can be generated by a control action for the transfer to start transferring the workload to the processing circuit with a more efficient use of energy to cause a complete cooling system data. Therefore, when considering the example of figure 1, in which the processing scheme 10 is a high-performance processing circuitry and circuit 50 processing is a processing circuit with reduced performance, consume less energy, moving a workload from scheme 10 processing circuit 50 processing when approaching the limits of the heating device causes subsequent the e cooling of this device, along with ensuring the continuation of the execution of the programs, albeit with less bandwidth.

Although figure 1 shows the two circuits 10, 50 treatment, it is obvious that the methods of the above embodiments can also be applied to systems containing more than two different processing schemes, enabling coverage of a data processing system a larger range of energy levels/performance. In these versions of the implementation, all of these different schemes of processing performed architecturally compatible with each other to enable rapid movement of the entire workload between these processing schemes, but they differ from each other from the point of view of the microarchitecture to enable selection of the use of these processing schemes depending on the desired levels of power/performance.

Figure 3 is a precedence diagram illustrating the sequence of steps performed on the source processor and the destination processor when transferring workload from the source processor to the destination processor after taking control action. Such control action to migrate can be generated by the operating system 115 or virtualization 120 via the system hardware and the interface, that results in detection of switching control in step 200, the source processor (which performs not only the workload but also software virtualisatie forming at least part of the controller switch 120). Admission control for migration (this description also called switching control) at step 200 causes the initiating controller 65 power management operation 205 power and reset on the target processor. After this power-up and reset, in step 210, the target processor invalidates its local cache, and then in step 215 provides the ability to view. At this point, the target processor sends a signal to the source processor that it is ready to transfer the workload, while this signal triggers the execution of the source processor saving state at step 225. This operation persistence discussed in more detail below with reference to figa, but in one embodiment, implies that the original layout of the processing circuit stores in its local cache their any current state, which is not available from the shared memory at the time of initiation of the transfer operation of the service, and that the two which is necessary for to the target processor successfully took over the running of the workload.

After surgery 225 persistence in the target processor is issued to the switching signal status 230 that indicates the target processor that it should at the moment to start viewing the source processor to retrieve the required status of architecture. This process occurs through the operation 230 state recovery, which is more fully discussed below with reference to figv, but which in one embodiment, implies that the target layout processing circuit initiates a sequence of accesses that are intercepted by the block 75 management view within the interconnect 70, which causes the extraction of cached copies of the state of architecture in the local cache of the source processor and return it to the target processor.

After step 230, the target processor can take over the workload, and, accordingly, at step 235 begins normal operation.

In one embodiment, after the start of normal operation in the target processor, the cache of the source processor can be cleared, as indicated at step 250, reset all the modified data in the shared memory 80, and then the power source of the processor can be switched off in step 55. However, in one embodiment, to further improve the efficiency of the target processor, the source processor is made with the ability to stay on for a certain period of time, figure 3 is designated as the viewing period. During this time, at least one of the caches of the source schema is enabled, so that parts of the content can be viewed scheme 75 controlling the display in response to an access request issued by the target processor. After transferring the entire workload using the process described in figure 3, it is assumed that, at least in the initial period of time, after which the target processor will begin the operational management of workload, some of the data required during the execution of the workload, remain in the cache of the source processor. If the original processor has lost part of its contents into memory, and its power is off, the target processor at these early stages will operate relatively inefficiently, because there will be many cache misses in its local cache and many samples of data from the shared memory, resulting in significantly affect performance during "warm up" the cache of the target processor, i.e. filling in the values of the data required by the schema of the target processor on the I operations asked workload. However, due to the fact that the cache of the source processor remains active during the viewing period, the circuit 75 controls the view can handle a lot of these requests cache miss with reference to the cache of the source schema that provides a significant performance benefits compared to the extraction of these data from the shared memory 80.

However, as it is assumed that this performance improvement only lasts for a certain period of time after the switch, after which the contents of the cache of the source processor becomes obsolete. Accordingly, at some point in time, at step 245, the system generates an event of the termination of the view to block view, then, at step 250, cleared the cache of the source processor, and after that, in step 255, the power source of the processor is turned off. Different scenarios, which can be generated in the event of the termination of the view, discussed in more detail below with reference to fig.6G.

On figa schematically depicts a save operation is executed in step 225 in Fig 3, in accordance with one embodiment. In particular, in one embodiment, the state of the architecture, which should be preserved from the original layout 300 of processing schemes in the local cache 330, the state is t part of the contents of the register file 310, accessed arithmetical and logic unit 305 (ALU ALU) during the execution of data processing operations, as well as parts of the content of various special registers 320, identifying various pieces of information required workload to enable the destination processing circuitry to successfully take on this workload. Part of the contents of the special registers 320 include, for example, a counter that identifies the current executable command, and various other information. For example, other special registers include registers processor status (for example, CPSR and SPSR in the ARM architecture), which contain the control bits for mode processor, masking interrupts, operating status and flags. Other special registers include architectural oversight (control register CP15 system in the ARM architecture), which contains the bits to change endianness of data, enable or disable the MMU, enable or disable caches data/commands, etc. In other special registers in CP15 stores information about the status and location of the exception.

As schematically illustrated figa, the original scheme 300 processing also usually contains some specific processor information 315 regarding the configuration, but this information metropulse to cache 330, since it is not relevant to the target pattern. Specific processor information 315 regarding the configuration is usually hard-coded in the source schema 300 processing using logical constants, and may include, for example, the register ID of the CPU CP15 (which is different for each pattern) or part of the content of the register type cache CP15 (which depends on the configuration of the caches, 25, 30, 60, for example, indicating that these caches have different length of the string). When the operating system 115 requires a portion of the specific processor information 315 regarding the configuration, if the processor is not already in the mode hypervisor is a trap performance (and transition) in the hypervisor mode. In response virtualization 120 in one embodiment, may include the value of the information required, and in another embodiment, returns a "virtual" value. If the ID values of the processor, it is a virtual value can be chosen identical for both "large"and "small" processors, thereby causing the concealment of the actual configuration of the hardware from the operating system 115 through virtualisatie 120.

As schematically illustrated figa, during the save operation, or part of the contents of the register file 310 special registers 20 retains the original layout of the processing schemes in the cache 330 for forming a cached copy. This cached copy is then marked as shared, which provides the ability to view the state of the target processor through unit 75 of the control view.

The recovery operation is subsequently executed on the target processor, schematically illustrated figv. In particular, the target layout 350 processing schemes, which may exist or be absent its own local cache, issues a request to a specific member state architecture, and this request is intercepted by the block 75 management view. The control unit view then issues a request to view the local cache 330 of the source processing circuitry to determine whether this element is the state of architecture in the cache of the source. Due to the steps performed during the save operation, depicted in FIGU will be detected by "hit" in the cache 330 source, which results in return of this cached state architecture by block 75 management view in the target schema 350 processing. This process can be repeated iteratively until until all elements of the state architecture will not be retrieved by viewing the cache of the source processing circuitry. As previously discussed, any specific processor information about the configuration is, related to the target circuit 350 processing, is typically hard-coded in this target schema 350 processing. Accordingly, after completion of the restore operation, the target circuitry processing there is all the information required to ensure its ability to successfully undertake the maintenance workload.

In addition, in one embodiment, regardless of whether the workload is 100 "big" scheme 10 processing, "small" circuit 50 processing, virtualization 120 provides the operating system 115 information about the virtual configuration that contains identical values, and therefore the differences in hardware between "large" and "small" circuits 10, 50 processing masked from the operating system 115 virtualization 120. This means that the operating system 115 is not known about the migration executing a workload of 100 to another hardware platform.

According to the operations of preserving and restoring described with reference to figa and FIGU, different instances of 10, 50 processors implemented with hardware coherence caches each other to reduce the amount of time, energy, and complexity of the hardware associated with the transfer of the state of architecture from the source processor to the destination processor. In the above-mentioned method is used Loka the capacity of the cache of the source processor to save the entire state, which should be migrated from the source processor to the destination processor, and which is not available from the shared memory during execution of the transfer transaction. As mentioned condition is marked as shared within the cache of the source processor, this allows the target processor with coherent hardware cache to view this state during migration operations. Using this method, you can transfer the said state between instances of processors without the need to maintain this state in main memory or in a storage element displayed in local memory. It is, therefore, provides significant advantages in performance and energy consumption with the increase in the variety of situations in which it is advisable to switch the workload to implement advantages in energy consumption.

However, although the above-described method of using the cache coherency provides one expedited mechanism for providing the current state of the architecture of the target processor without passing the current state of the architecture through shared memory, it is not the only way in which such accelerated mechanism can be implemented. For example, figure 5 shows the viola is native mechanism, which provides a dedicated bus 380 between the original layout 300 of processing schemes and the target layout 350 processing schemes to enable state migration architecture during the operation of the transmission service. Therefore, in such scenarios, implementation, operation 225, 230 save and restore in figure 3 are replaced with an alternative mechanism of transfer using dedicated bus 380. Despite the fact that in this approach there is usually a large hardware costs than when you approach the cache coherency (and approach the cache coherency typically uses hardware that is already installed in the data processing system), it provides a more rapid method of migration, from which it is possible to obtain advantages in certain implementations.

On figa-6I depicts a sequence of steps that are executed to perform the migration of the workload from the original layout 300 of processing schemes in the target layout 350 processing schemes. The original layout 300 of processing schemes is one of the circuits 10, 50 processing that performs workload to migrate, and the target circuitry processing is another of the circuits 10, 50 processing.

On figa presents the system in the initial state, in which the original layout 300 of processing schemes and powered by the controller 65 controls the power and performs workload 100 processing, while the target layout 350 processing circuit is in the power saving mode. In this embodiment, the power saving mode is a mode with the power off, but, as mentioned above, can also be used with other types of power saving mode. The workload is 100, which includes applications 105, 110 and the operating system 115 to run applications 105, 110, abstracted from the hardware platform of the original layout 300 of processing schemes through virtualisatie 120. Run-time workload of 100, the original layout 300 processing circuit maintains 400 architecture, which may contain, for example, part of the content of the register file 310 and special registers 320, as shown in figa.

On FIGU virtualization 120 detects control 430 to transfer. Despite figv control 430 to transfer represented as an external event (for example, the detection of the departure of the heating temperature-sensitive sensor 90), control 430 may also be event triggered by virtualization 120 or the operating system 115 (e.g., the operating system 115 may be configured to virtualizatio 120 about when should handle a particular type of application). In response to control impacts is of 430 virtualization 120 manages the controller 65 controls the power to supply power to the target layout 350 processing schemes for its state with the power.

On figs target layout 350 processing circuit starts the execution of virtualisatie 120. Virtualization 120 manages the target layout 350 processing schemes to declare its cache 420 invalid to prevent errors in processing caused by erroneous data values that can exist in the cache 420 enabling target layout 350 processing schemes, while the target cache 420 is declared invalid, the original layout 350 processing circuit continues to perform workload 100. When the advertisement target cache 420 void ends, virtualization 120 manages the target layout 350 processing circuit for the signal in the original layout 300 of processing schemes that it is ready for transmission service workload 100. With the continuation of the processing workload of 100 on the original layout 300 of processing schemes up until the target layout 350 processing circuit is ready for a transfer operation of the service, the impact on the transmission performance of service may be reduced.

In the next step, presented at fig.6D, the original layout 300 processing circuit stops the execution of a workload of 100. During this stage, neither the original layout 300 of processing schemes, or the target layout 350 processing schemes not vypolnyaemogo the load 100. From the original layout 300 of processing schemes in the target layout 350 processing circuit is transferred copy status 400 architecture. For example, the state of 400 architecture can be saved in the source cache 410 and restored in the target layout 350 processing circuit, as shown in figa and FIGU, or may be transferred via a dedicated bus, as shown in figure 5. State 400 architecture contains all the information required to execute the workload 100 target layout 350 processing schemes, with the exception of information that already exists in the shared memory 80.

After migration status 400 architecture in the target layout 350 processing schemes, the original layout 300 of processing schemes transferred to the state energy-saving layout 65 power management schemes (see Figa) except that the source cache 410 continues to be supplied with energy. The target layout 350 processing circuit starts executing a workload of 100 using the transferred state 400 architecture.

When the target layout 350 processing circuit begins processing workload 100 begins viewing period (see fig.6F). During the viewing period, block 75 management view can view the data stored in the source cache 410, and extract the data to the target layout 350 schemes about what abode. When the target layout 350 schema processing requests data that is not present in the target cache 420, the task layout 350 schema processing requests data from unit 75 of the control view. After that, the block 75 management view looking at the source cache 410, and if the result of this view is a cache hit, then the control unit 75 controls the view retrieves the data obtained by viewing, from the source cache 410 and returns them in the target layout 350 processing schemes in which the data obtained in the result of the scan can be stored in the target cache 420. On the contrary, if the result of this view is a cache miss in the source cache 410, the requested data is selected from the shared memory 80 and return to the target layout 350 processing schemes. Since accesses to the data in the source cache 410 are faster and require less energy than accesses to the shared memory 80, then view the source cache 410 for a certain period increases productivity and reduces energy consumption during the initial period after the transfer of workload 100 in the target layout 350 processing schemes.

In step presented on fig.6G, block 75 controlling the display detects the event of the termination of the view, which indicates t is, that maintaining the source cache 410 in a state with the power no longer is appropriate. The event of the termination of the view causes the end of the observation period. The event of the termination of the view can be any of the set of events of termination preview-controlled layout 75 of the control circuits of view. For example, the set of events of termination, you may include any one or more of the following events:

a) when the percentage or proportion of hits when viewing that result in cache hits in the source cache 410 (i.e., the quantity that is proportional to the number of hits when viewing"/"total views") falls below a predetermined threshold level after starting the workload 100 target layout 350 tickets

b) when the number of transactions or number of predefined transaction type (for example, the cached transaction)that have been made since, when the target layout 350 processing schemes commenced workload 100 exceeds a predefined threshold,

c) when the number of processing cycles that have elapsed since when the target layout 350 processing schemes commenced workload 100 exceeds a predefined threshold,

d) when a specific area of the shared pamyati accessed for the first time since when the target layout 350 processing schemes commenced workload 100,

e) when a specific area of the shared memory 80, to which access was gained during the initial period after the target layout 350 processing schemes commenced workload 100, there is no access for a predetermined number of cycles or a predetermined period of time,

f) when the target layout 350 processing circuit writes in a predetermined memory cell in the first time since the beginning of the execution of the migrated workload 100.

These events cessation of view can be determined using programmable counters in the coherent interconnect 70, which includes unit 75 of the control view. In the set of events of termination view can also be other types of events stop watching.

When the event is detected, stop view, block 75, the preview control sends a signal 440 stop viewing the original processor 300. Unit 75 controls the view stops viewing the source cache 410 and from this point on in response to requests for access to data from the target layout 350 processing schemes fetch the requested data from the shared memory 80, and returns the set to Yes, the data in the target layout 350 tickets in which the selected data can be cached.

On fign control scheme of the original cache in response to a signal 440 stop viewing clears the cache 410 for storing in the shared memory 80 all valid and modified data values (that is, the cached value is newer than the corresponding value in the shared memory 80).

On pig power source cache 410 then turn off the controller 65 controls the power for a full translation of the original layout 300 of processing schemes in the state energy conservation. The target layout 350 processing circuit continues to perform workload 100. From the point of view of the operating system 115, the current situation is similar to the situation in figa. The operating system 115 is not known that the execution of a workload is moved from one treatment to another processing circuit. When another control action to transfer, you can use the same steps figa-6I for switching execution of the workload back to the first processor (in this case one of the circuits 10, 50 treatment, which was the original layout of the processing schemes will be the target layout processing circuit and Vice-versa).

In the embodiment, by figa-6I, independent power management for cache 410 and the source to which Banovci 300 processing schemes is available, to the power source of the link 300 of processing schemes, except the source cache 410, could be turned off after the target layout 350 processing schemes will begin execution of the workload (see figa), while only cache 410 source link 350 processing circuit remains in a state with the power on (see fig.6F-6H). Next on pig power source cache 410 is turned off. This approach can be useful to save energy, especially when the original layout 300 of processing schemes is the "big" scheme 10 processing.

However, you can also continue to supply energy to the entire layout 300 of processing schemes during the viewing period, and further Fig to translate the original layout 300 of processing schemes in General in the state of energy savings after the end of the period to view and clear the source cache 410. This can be useful when the source cache 410 is deeply embedded in the core of the source processor and cannot be supplied regardless of the kernel source processor. This approach may also be more appropriate when the source processor is a "small" circuit 50 processing, energy consumption is insignificant compared to the "big" scheme 10 processing, because after the "big" scheme 10 processing will start processing the transferred workload 100, what about switching the "small" circuit 50 processing other than the cache 60, the state of the energy savings during the viewing period may slightly affect the total energy consumption of the system. This may mean that the additional complexity of the hardware while providing a separate power management to "small" circuit 50 processing and cache 60 "small" kernel may be unnecessary.

In some cases, that data stored in the source cache 410, will not require the target layout 350 processing circuit, when it starts to execute the workload 100, may be known to transfer this workload. For example, the original layout 300 of processing schemes can just terminate the application when the wrap will occur, and therefore, the data stored in the source cache 410 during migration, refer to the completed application, not the application that should be the target layout 350 processing circuit after migration. In this case, the controller cancellation, you may run virtualization 120 and layout 75 circuits controlling the display to cancel viewing the source cache 410 and source control circuit 300 processing for cleaning and power off the source cache 410 without waiting the event of the termination of the view to signal the end of the observation period. In this case, the method according to Figo-6I performs the transition from Sha is and file immediately to step on fig.6G without step fig.6F, where data is viewed (and selected) from the source cache 410. Accordingly, if we know in advance that the data in the source cache 410 will not be used by the target layout 350 processing schemes, it is possible to save energy by transferring the source cache 410 and the source of the link 300 processing circuit in the power saving mode without waiting the event of the termination of the view. The controller cancellation of view may be part of virtualisatie 120 or may be implemented as firmware, sung in the layout 300 of processing schemes. The controller cancellation of view may also be implemented as a combination of elements, for example, the operating system 115 may inform virtualization 120 about when ending the application, and virtualization 120 then may cancel viewing the source cache 410, if the transfer occurs when the application is finished.

Fig.7 is a graph on which line 600 illustrates how the energy consumption depending on performance. For various sites this graph, the data processing system can be configured to use different combinations of cores 15, 20, 55 processor, depicted in figure 1, with the aim of obtaining a suitable optimal balance between performance and energy consumption. Therefore, the example must be fulfilled a lot of very high-performance tasks, you can use both large cores 15, 20 circuits 10 processing to achieve the required performance. Additionally, you can use the methods of changing the voltage of the power source to allow a change in performance and energy consumption when using these two nuclei.

When performance requirements fall to the level at which the required performance can be achieved using only one of the large cores, the tasks can be moved only on one of the big cores 15, 20, and the power of the other engine is turned off or moves it to some other power saving mode. And again, you can use the change in the voltage of the power source to enable a change between performance and energy consumption when using this one large nucleus. It should be noted that the transition from two large cores on a single large nucleus does not require the generation of control action to transfer or use the above methods to transfer workload, as in all cases, using a processing scheme 10 and scheme 50 processing will be in the power saving mode. However, as indicated by the dotted whether the Oia 7, when performance drops to the level where by a small engine can be achieved the required performance, you can generate the control action to migrate to launch the previously described mechanism to move the entire workload of the circuit 10, the processing circuit 50 is processed so that the entire workload is then performed on a small core 55, the circuit 10 processing means in the power saving mode. And again, you can use the change in the voltage of the power source to allow a change in performance and energy consumption of a small core 55.

On figa-8B respectively depicted microarchitecture differences between inefficient processing pipeline 800 and high-performance processor pipeline 850 according to one variant of implementation. Inefficient processing pipeline 800 figa suitable for small nucleus 55 data of figure 1, while the high-performance processor pipeline 850 to figv suitable for large cores 15, 20.

Inefficient processing pipeline 800 figa includes a step 810 fetch to fetch commands from the memory 80, step 820 decoding for decoding the selected commands, step 830 issue to issue commands for execution and many to whom meyerov execution, includes integer pipeline 840 to perform integer operations, the conveyor 842 MAC to perform the operations of multiplication with accumulation conveyor and 844 SIMD/FPU to perform SIMD (single instruction stream-multiple data flows) or floating-point operations. In inefficient processing pipeline 800, at step 830 is issued one command at a time, and commands are issued in the order in which they are selected.

High-performance processor pipeline 850 to figv contains step 860 fetch to fetch commands from the memory 80, step 870 decoding for decoding the selected commands, step 875 rename to rename the registers specified in the decoded commands, the phase dispatch to dispatch commands to execute and a set of pipelines of execution, which includes two integer pipeline 890, 892, conveyor 894 MAC and two conveyor 896, 898 SIMD/FPU. In high-performance processor pipeline 850, step 880 dispatch is parallel to the stage of issuance, which may be issued multiple teams in different conveyors 890, 892, 894, 896, 898 at the same time. At step 880 dispatching commands may also be issued by change order. In contrast to the inefficient processing pipeline 800, the length of the conveyors 896, 898 SIMD/FPU may change, this means that you can have rablet processing operations, performed by conveyors 896, 898 SIMD/FPU, to skip certain stages. The advantage of this approach is that if all of the sets of conveyor designs have different resources, there is no need to artificially lengthen the shortest pipeline to its length was identical to the length of the longest pipeline, but instead requires logic to cope with the chaotic nature of the output from different conveyors (for example, re-setting everything in order, if there is exception while processing).

Step 875 rename provided to display specifiers register, which is included in a program command, and identification of specific registers architecture, when viewed from the perspective of the model of the device software management, physical registers, which are the actual registers of the hardware platform. Step 875 rename provides the possibility of providing a microprocessor larger pool of physical registers than there is from the point of view of device management software on the microprocessor. This larger pool of physical registers is useful for performance by changing the order of commands, as it provides the opportunity to avoid pipeline conflicts, such as conveying the conflicts write-after-write (WAW), by displaying a single register architecture, given in two or more different teams, in two or more physical registers, so that it is possible to concurrently execute different commands. For more information about how to rename registers the reader should refer to the application for U.S. patent of the same applicant US 2008/114966 and U.S. patent 7590826.

Inefficient conveyor 800 and a conveyor 850 differ from the point of view of the microarchitecture in several directions. Differences from the point of view of the microarchitecture may include:

a) what conveyors contain different stages. For example, a conveyor includes a step 875 rename, which is absent in inefficient pipeline 800.

b) what steps conveyors have different capabilities. For example, at step 830 the issuance inefficient pipeline 800 commands can only be issued for one, while on stage 880 dispatching high-performance conveyor 850 commands can be issued in parallel. The parallel output of the commands increases the throughput of the processing pipeline and, therefore, improves performance.

c) what steps conveyors are of different lengths. For example, step 870 decoding of high-performance pipeline 850 may include Tr is podata, while step 820 decoding inefficient pipeline 800 may include one step. The longer stage of the pipeline (the larger the number of sub-stages), the greater the number of commands that can be "in flight" (in process), and, accordingly, the greater the working frequency at which the conveyor can operate, which results in a higher level of performance.

d) a different number of pipelines of execution (for example, a conveyor 850 contains more conveyors performance than the low-production pipeline 800). With a greater number of conveyors execution, can be processed in parallel, a greater number of teams, and therefore, productivity is increased.

e) ensuring the execution of commands in order (as in the pipeline 800) or performance by changing the order of commands in the pipeline 850). When commands can be executed change order, then the performance increases as the performance of teams can be scheduled dynamically to optimize performance. For example, inefficient pipeline 800 sequence of execution of commands in order, the sequence of commands MAC must be executed by the conveyor 842 MAC alternately until one of localename pipeline 840 and pipeline 844 SIMD/floating-point can be executed the next command. In contrast, in high-performance conveyor 850 commands MAC can be performed by the pipeline 894 MAC, while (except for any pipelined data conflicts that cannot be resolved by renaming) the following command, using a different pipeline 890, 892, 896, 898 performance, can execute in parallel with commands MAC. This means that the performance by changing the order of commands can improve the processing performance.

These and other examples of differences of microarchitecture team in the lead conveyor 850 provides higher processing performance than the pipeline 800. On the other hand, the mentioned differences of the microarchitecture team also cause greater consumption by the conveyor 850 than the pipeline 800. Accordingly, the provision of conveyors 800, 850 with different microarchitecture team gives the opportunity to optimize the workload or for high performance (through the use of "big" scheme 10 processing containing a conveyor 850) or for efficient use of energy (through the use of "small" circuit 50 processing containing unproductive conveyor 800).

Figure 9 depicts a graph illustrating the change in energy consumption by the data processing system, when a slave is whose load 100 switches between the big scheme 10 handle and a small circuit 50 processing.

At the point Fig.9 A workload of 100 runs on a small layout 50 processing circuit, and therefore the energy consumption is low. At the point B is the control action to transfer, indicating that handling should be carried out, requiring high performance, and therefore the execution of the workload is transferred into a larger layout 10 processing schemes. Then the energy consumption increases and remains high at point C, while a large layout 10 processing circuit performs the aforementioned workload. At point D, it is assumed that operate both large kernel in combination for processing mentioned workload. If, however, performance requirements fall to the level when working load can be served by only one of the big engines, the working load is transferred only on one of the big engines, and the power of the other is turned off, as indicated by the drop in energy up to a level close to the point E. However, at the point E is another control for migration (indicating that you need to return to treatment, requiring low productivity) to start transferring execution of the workload back to the small layout 50 processing schemes.

When a small layout 50 processing circuit begins processing workload processing, the most part of a large schematic layout processing is in a state of energy savings, but the cache is large layout 10 processing schemes continues to be supplied with energy during the viewing period (point F in figure 9) to allow retrieval of the data stored in the cache, for a small layout 50 processing schemes. Therefore, the cache is large layout 10 processing circuit that causes more energy consumption at the point F than at point A, which is supplied with energy only a small layout 50 processing schemes. At the end of the observation period food cache large layout 10 processing circuit is switched off, and the point G energy consumption returns to a low level, when only a small layout 50 processing circuit is active.

As mentioned above, figure 9 during the viewing period at the point F of the energy consumed is greater than the G-spot, due to the fact that during the period of viewing the cache is large layout processing circuit is supplied with energy. Although the increase in energy consumption is indicated only after going big-small, viewing period may also exist after the transition, small-large, during which the data stored in the cache of a small layout 50 processing schemes can be viewed for a large layout 10 processing circuit unit 75 of the control view. The period of browsing, small-large not specified in Fig.9, since EN is rgiya, consumed due to the fact that the cache is a small link 50 processing circuit is in a state with the power on during the viewing period is insignificant compared to the energy consumed large scheme 10 processing when running a workload processing, and therefore a very small increase in energy consumption due to energy supply cache small layout 50 processing schemes is not noticeable on the graph in figure 9.

In the above embodiments, the implementation of the described system containing two or more architecturally compatible instance of the processor microarchitecture team, optimized for energy efficiency or performance. The state of the architecture required by the operating system and applications can switch between instances of processors, depending on the desired level of power/performance, to enable switching the entire workload between instances of processors. In one embodiment, only one of the instances of the processors executes the workload at any given point in time, while another instance (CPU) is in power saving mode or in the process of transition into power saving mode/exit.

In one embodiment, specimens of the s processors can be executed in the form of hardware caches, coherent with each other, to reduce the amount of time, energy, and complexity of the hardware associated with the switching state of the architecture from the source processor to the destination processor. This reduces the time of operation of the switch, which increases the possibility of using methods of embodiments.

Such systems can be used in many situations where energy efficiency is important for the battery life and/or heat control, and a range of performance is such that a processor with a more efficient use of energy can be used for smaller workloads for processing, while a higher performance processor can be used for large workloads for processing.

As mentioned two or more instances (processors) are architecturally compatible, from the point of view of applications the only difference between the two processors is available for the performance. By way of a variant of implementation of all required state architecture can be moved between processors without requiring access to the operating system, so then, what processor the operating system and applications is carried out is carried out, is transparent to the operating system and applications running under its control.

When using a compatible architectural instances of the processor, as described in the above embodiments, the implementation, the total amount of state architecture, which should be moved easily can fit within the data cache, and as in modern data processing systems are often implemented coherence cache, then the persistence architecture, which should be switched within the data cache, the target processor can quickly view the status of the efficient use of energy using structures existing schemes.

In one embodiment, uses a switching mechanism to ensure limits by heating to a data processing system was not violated. In particular, when approaching the limits of the heating, the entire workload can be switched to a processor with more efficient use of energy by providing cooling capabilities of the entire system, while continuing execution of the program, albeit with a lower bandwidth.

Although this document describes a specific implementation, it is obvious that the invention is not limited to them, and what can be done many meters is deficate and supplements thereto within the scope of the invention. For example, can be implemented in different combinations of signs of the following dependent claims multiple claims of the invention with the features of independent claims, without going beyond the scope of the present invention.

1. The processing unit containing:
the first layout processing circuit for performing data processing operations,
the second layout processing circuit for performing data processing operations,
the first layout processing circuit is architecturally compatible with the second circuitry processing, so that the workload is performed by the processing unit, can be performed or the first layout processing circuit, or the second circuitry processing, and mentioned the workload contains at least one application and at least one operating system for the execution of the said at least one application,
the first layout processing circuit is different from the second schematic layout processing from the point of view of the microarchitecture, so that the performance of the first schematic layout processing is different from the performance of the second schematic layout processing,
the first and second layout processing circuit is configured so that the workload is one of the first layout the hem process and a second schematic layout processing at any point in time,
controller switch, responsive to the control action to transfer, to perform a transfer service to transfer the execution of the workload from the original schematic layout processing on the target layout processing circuit, where the initial layout of the processing schemes is one of the first schematic layout processing and the second schematic layout processing, and the target layout processing circuit is different from the first layout processing circuit and the second schematic layout processing,
moreover, the controller switch is configured to, during operation of the transmission service:
(i) to provide a source circuitry processing its current state architecture of the target layout processing circuit, and the current state of the architecture is something that is not available from shared memory shared between the first and second circuitry of the processing at the time of initiation of the transfer operation of the service, and which is necessary to the target circuitry processing in order to successfully undertake the execution of the workload from the original schematic layout processing, and
(ii) masking a predetermined specific processor configuration information of said at least one operating system, so the transfer of the workload is transparent to the aforementioned at least one operating system.

2. The processing unit according to claim 1, additionally containing:
the layout of the power management schemes for independent control of power supplied to the first layout processing circuit and the second circuitry processing,
moreover, to implement the control action to transfer the target layout processing circuit is in the power saving mode, and during the transfer operation maintenance circuitry power management causes the output of the circuitry of the processing from the power saving mode before the target layout for processing will take over the execution of the workload.

3. The processing unit according to claim 2, in which after the operation transmission maintenance circuitry power management causes the transition of the original layout of the processing circuit in the power saving mode.

4. The processing unit according to any preceding paragraph, in which during the operation of the transfer controller switch causes the application source circuitry processing fast track mechanism to provide its current state architecture to the target circuitry processing without contacting the target schematic layout processing to the shared memory to obtain the current state of architecture.

5. The processing unit according to claim 4, in which:
for me the greater extent, referred to the original layout of the circuits has an associated cache
the processing unit also contains the circuitry controlling the display, and
accelerated transfer mechanism contains the current state of architecture in the target layout processing circuit using the associated cache is referred to the original layout of the schemes mentioned in the schematic layout of the control view.

6. The processing unit according to claim 5, in which the Overdrive mechanism is a mechanism for saving and restoring, which causes the preservation of the original circuitry processing its current state architecture associated with the cache, and causes the execution of the target circuitry processing the restore operation, whereby the circuitry controlling the display retrieves the current state of architecture from the associated cache of the original schematic layout processing and ensures that the extracted current state of architecture in the target layout processing schemes.

7. The processing unit according to claim 5, in which the target layout processing circuit has an associated cache, in which a portable state of architecture, received by the circuitry controlling the display, is stored for access by the target schematic layout processing.

8. The device of the OBR is ODI data according to claim 4, in which accelerated mechanism includes a dedicated bus between the original layout of the processing circuit and the target layout processing circuit, in which the original layout of the processing circuit provides the current state of architecture in the target layout processing schemes.

9. The processing unit according to one of claims 1, 2, 3, in which the controller switch contains at least virtualization software logically separating mentioned at least one operating system from the first layout processing circuit and the second schematic layout processing.

10. The processing unit according to one of claims 1, 2, 3, in which the timing control for the transfer is chosen to increase the energy efficiency of the device data.

11. The processing unit according to one of claims 1, 2, 3 in which the said state architecture contains at least the current value of one or more special registers the original layout of the processing schemes, including the program counter.

12. The processing unit according to claim 11, in which the said state architecture further comprises a current value stored in the architectural register file, the original schematic layout processing.

13. The processing unit according to one of the claim 1, 2, 3, in which at least one of the first schematic layout processing and the second composition processing circuit contains a single processing unit.

14. The processing unit in one of the paragraphs. 1, 2, 3, in which at least one of the first schematic layout processing and the second composition processing circuit contains a group of processing units with the same microarchitecture.

15. The processing unit according to claim 2, in which the said power saving mode is one of:
the off mode power,
the partial/full save data,
sleep mode, or
idle mode.

16. The processing unit according to one of claims 1, 2, 3, 15, in which the arrangement of the processing circuit and the second circuitry, the processing differs from the point of view of the microarchitecture of the presence of at least one of:
different lengths of the Executive pipeline, or
different Executive resources.

17. The processing unit according to one of claims 1, 2, 3, 15, in which the original layout of the processing circuit has a higher performance than the target circuitry processing, and the processing unit further comprises:
the circuitry of the current control of the heating current control of the heat source schematic layout processing and to run the mentioned control action to transfer mentioned when the heat reaches a predetermined level.

18. The processing unit according to one of claims 1, 2, 3, 15, in which the arrangement of the processing circuit and the second layout processing schemes are on the same integrated circuit.

19. The processing unit containing:
first processing means for performing data processing operations,
second processing means for performing data processing operations,
moreover, the first processor is architecturally compatible with the second processing means, so that the workload is performed by the processing unit, can be executed on the first processor or the second processor, and mentioned the workload contains at least one application and at least one operating system for the execution of the said at least one application,
moreover, the first processing means is different from the second means for processing from the point of view of the microarchitecture, so that the performance of the first processing means is different from the performance of the second processing tools,
moreover, the first and second processing means is configured so that the workload is one of the first processing means and second processing tools at any point in time,
the tool transfer control, responsive to the control action for the of arenosa, to perform a transfer service to transfer the execution of the workload from the source processing tools on the target processor, and the original process is one of the first processing means and second processing means, and the target processor is different from the first processing means and second processing tools,
moreover, the tool transfer control is configured to, during operation of the transmission service:
(i) to provide the initial means of processing their current state architecture of the target processor, and the current state of the architecture is something that is not available from the shared memory means shared between the first and second processing means, at the time of initiation of the transfer operation of the service, and which is required on the target processor in order to successfully undertake the execution of the workload from the source processing tools, and
(ii) masking a predetermined specific processor configuration information of said at least one operating system, so that the transfer of the workload is transparent to the aforementioned at least one operating system.

20. The method of functioning of the mouth of the STS data, containing the first layout processing circuit for performing data processing operations and a second arrangement of the processing circuit for performing data processing operations, and the first circuitry processing is architecturally compatible with the second circuitry processing, so that the workload is performed by the processing unit, can be performed or the first layout processing circuit, or the second circuitry processing, and mentioned the workload contains at least one application and at least one operating system for the execution of the said at least one application, and the first layout processing circuit is different from the second schematic layout processing perspective microarchitecture, so that the performance of the first schematic layout processing is different from the performance of the second composition processing circuit, and the method comprises the steps are:
perform, at any point in time, the workload on one of the first schematic layout processing and the second schematic layout processing,
perform, in response to the control action to transfer, transfer service to transfer the execution of the workload from the original schematic layout processing on the target layout processing circuit, where the initial layout the hem treatment is one of the first schematic layout processing and the second schematic layout processing, and the target layout processing circuit is different from the first layout processing circuit and the second schematic layout processing,
during the transfer operation maintenance:
(i) provide the original circuitry processing its current state architecture of the target layout processing circuit, and the current state of the architecture is something that is not available from shared memory shared between the first and second circuitry of the processing at the time of initiation of the transfer operation of the service, and which is necessary to the target circuitry processing in order to successfully undertake the execution of the workload from the original schematic layout processing, and
(ii) masking a predetermined specific processor configuration information processor of said at least one operating system, so that the transfer of workload is transparent to the aforementioned at least one operating system.