RussianPatents.com
|
Caching apparatus, method and system |
|||||||||||||||||||||
IPC classes for russian patent Caching apparatus, method and system (RU 2483347):
|
FIELD: information technology. SUBSTANCE: caching apparatus has: cache memory for storing one or more entries, wherein each entry corresponds to an input/output memory access request, and each entry should contain a guest physical address (GPA) which corresponds to the input/output memory access request, and a corresponding host physical address (HPA); and a first logic circuitry which receives a first input/output memory access request from a terminal device and determines the first input/output memory access request includes future access prompting associated with an address, wherein the future access prompting should indicate to the host whether the address can be accessed in the future, and entries in the cache memory which do not contain prompting which corresponds to the previous input/output memory access requests, containing future access prompting should be replaced with earlier entries which contain prompting; and the first logic circuitry should provide updating of one or more bits, which corresponds to the address of both the entry in the cache memory and the entry in the input/output translation look-aside buffer (IOTLB), in response to the determination that the first input/output memory access request includes a future access prompt. EFFECT: improved address translation caching during virtualisation for directed input/output. 19 cl, 4 dwg
BACKGROUND of INVENTION The present disclosure generally relates to the field of electronics. More specifically, example embodiments of the invention relates to improved caching address translation and/or characteristics of the I / o cache memory in a virtualized environment. Virtualization of I/o technology developed to ensure the effective operation of the device I/o in virtualized environment. In General, the virtualization environment may be an environment in which you can simultaneously run multiple operating systems (OS). In some embodiments perform to improve the efficiency of virtualization I/o can be used hardware. However, this embodiment may require the implementation of a relatively large number of transistor switches that, in turn, makes the device more expensive and/or more difficult to implement. BRIEF DESCRIPTION of DRAWINGS Detailed description references the accompanying drawings. In the drawings, the leftmost digit position indicates the figure, which appeared for the first time this position. Using the same items in different figures indicates similar or identical components. Figures 1-3 illustrate block diagrams of examples of embodiments of computing systems, which may be used, to implement various options discussed here inventions. Figure 4 illustrates the execution scheme of the method according to one example embodiment of the invention. DETAILED DESCRIPTION In the following description, numerous details, in order to ensure full understanding of the various examples of embodiment of the invention. However, some examples of embodiment can be provided without specific details. In other instances well known methods, procedures, components and circuits are not described in detail so as not to obscure the invention. Various aspects of examples of embodiments of the invention can be performed using various means, such as semiconductor integrated circuits (hardware)machine-readable commands, organized in the form of one or more programs (software), or a combination of hardware and software. The reference to "logic" shall mean the hardware, software or some combination thereof. Some of the examples discussed here, the embodiment can improve caching address translation (with virtualization for directed I/o (VTd)and/or cache memory I/o in virtualized environments. More specifically, some pluriversality can be implemented in hardware structures, which are used to translate guest physical addresses (GPA) in the physical address of the host (NDA). Accordingly, such structures can support caching, for example, in the form of a buffer ahead of the sample input/output (IOTLB) when the broadcast of the GPA in the NDA. Some examples of these embodiments the cache structure can provide lower latency for queries that are aimed at the same address translation. In addition, some of the techniques can be used in various types of computing environments, such as discussed with reference to figures 1-4. The figure 1 shows the block diagram of a computer system 100 according to one example embodiment of the invention. System 100 may include one or more means 102-1 to 102-M (collectively referred to here as "agents 102, or in a General sense "agent 102"). In example embodiments of the agents 102 may be components of a computer system, for example system discussed with reference to figures 2-4. As shown in figure 1, the agents 102 may communicate through the network fabric 104. In one example embodiment of the network fabric 104 may include a computer network that allows different agents (such as computing device) to transmit data. In one example embodiment of the network fabric 104 may include one or more connections (or connection networks), to the which communicate through the serial communication line (for example, from point to point) and/or through a network connection sharing. In particular, some examples of embodiments can facilitate debugging components or scan lines, which provide communications with a fully buffered working together double memory modules (FBD), for example, where the line FBD is a serial communication line used for connecting memory modules to a host controller (such as a processor or core memory). Debug information can be transferred from the host line FBD so that debug information could be controlled along the line appliances validation of traffic (for example, one or more logic analyzers). In one example embodiment, the system 100 can support scheme multilayer Protocol, which may include a physical layer, the communication layer, the layer routing, transport layer and/or layer Protocol. Matrix 104 may also facilitate the transfer of data (e.g. packets) from one Protocol (e.g., cache processor, or cache memory) to another Protocol for point-to-point network or a public network. In addition, in some examples, the embodiment of the network fabric 104 may provide connectivity to one or more devices in the buffer memory according to the relevant protocols. In addition, as shown by arrows in f is Gora 1, the agents 102 may transmit and/or receive data via the network fabric 104. Therefore, some agents may use unidirectional communication line, while others may use a bidirectional communication line. For example, one or more agents (such as agent 102-M) can transmit data (e.g., via a unidirectional communication line 106), other agents (such as agent 102-2) may receive data (e.g., via a unidirectional communication line 108), while some agents (such as agent 102-1) can send and receive data (e.g., via a bidirectional communication line 110). The figure 2 presents the block diagram of parts of a computer system 200 according to one example embodiment of the invention. In this example embodiment can be performed by various components of the system 200 in the form of agents 102-1 and/or 102 discussed with reference to figure 1. Further details of some of the operations of the computing system 200 will be discussed with reference to figure 4. The system 200 can include one or more processors 202-1 to 202-N (collectively referred to here as "processors 202 or in a General sense, the processor 202"). Each of the processors 202-1 to 202-N may include various components, such as private or public cache memory, Executive modules, one or more cores, etc. in Addition, each of the first processor 202 may have access to memory 204 (e.g., to the memory 204-1 - 204-N). In addition, the system 200 may further include a system memory 206, which may be divided between the various components of the system 200, including, for example, one or more processors 202, a non-nuclear part of the processor or the chipset (CS) 208 or components connected to the chipset 208, etc. One or more memory modules 204 and/or 206 can be used to store one or more operating systems. Thus, the system 200 in some examples, the embodiment can perform many operational programmes (in particular, at the same time). As shown in figure 2, the non-nuclear portion 208 may include various components such as the cache memory RC 210 (which may be divided among the various components of a computing system such as system 200). In some examples, the embodiment of the cache memory RC 210 may be present in the hub memory management (sit) and/or in the graphics part of the sit (GMCH) chipset or part of Niagra (for example, CS/neato 208). The cache memory RC 210 may communicate with other components on the data line 212 (which may include additional circuit interconnects 214, for example, in order to facilitate communication between one or more processor cores 202 and other components of the system 200). The system 200 may additionally vluchtrisico prefetch 216, for example pre-fetch data (including commands or microcommand) from different locations (such as one or more memory modules 204, RAM 206, other memory devices, including, for example, volatile or non-volatile storage device, etc. in the block IOTLB 220 (for example, through virtualized or translational logic 222-1 to 222-R (referred to here as the "logic circuit 222, or in a General sense "logic 222")). As shown in figure 2, in at least one example embodiment of the data path 212 may be associated with one or more input/output. Can be used any type of device input/output. For illustrative purposes, in the example embodiment shown in figure 2, input/output can include one or more devices 224-1 - 224-P (collectively referred to here as the "target device 224, or in a General sense "terminal device 224"). Target device 224 may be peripheral connecting device (for example, a PCI bus) in one example implementation. For example, the target device 224 can communicate with CS/Neagra 208 Protocol connection through a local PCI bus, with the addition of from 3.0 dated March 9, 2004, published by Special PCI group, Portland, Oregon, USA (hereinafter referred to as "PCI"). Alternatively, it may be used in esavana Specification PCI-X Addendum a, hereafter referred to as "PCI-X" (2 October 2006)published the above Special PCI group, Portland, Oregon, USA. In addition, other peripheral devices associated with CS/Neagra 208 may include various examples of embodiment of the invention, integrated drive controller (IDE) or a hard disk of a small computer system interface (SCSI), universal serial bus (USB), keyboard, mouse, parallel ports, serial ports, floppy drives, support for digital output (for example, in the form of an interactive digital video (DVI)), etc. As shown in figure 2, the target device 224 can communicate through the root ports 226-1 to 226-P (collectively referred to here as "ports 226, or in a General sense "port 226") with other components of the system 200, such as a logic circuit 222. In the example embodiment, the logic 222 may perform address translation for virtualized environments, in particular the translation of virtual addresses into physical addresses, for example, type IOTLB 220. Physical address may correspond to locations (e.g., records) in system memory 206. Logic 222 may additionally perform other actions, such as discussed with reference to figures 3 and 4, which may include broadcast recordings GPA and NDA in the memory device connected to the system 200 and/or 300 (such as systems the I memory 206). In addition, the logic 222 may be the root directory in accordance with the PCI specification. In addition, the processor 202 may be a processor of any type, such as a universal processor, network processor (which can handle data from a computer network 250) and so on (including a computer processor with a reduced instruction set (RISC) or a processor with a full instruction set (CISC)). In addition, the processor 202 may have a single-core or multi-core design. The processor 202 with multi-core design can include different types of processor cores on the same integrated circuit (IC). In addition, the processor 202 of multi-core type can be implemented as a symmetric or asymmetric multiprocessors. Also, as shown in figure 2, at least one or more target devices 224 in one example embodiment can be connected to the network 250. Next, the processor 202 may include one or more devices of the cache memory (not shown), which may be hidden and/or public in various examples of embodiments. In General, the cache memory stores data corresponding to the original data stored elsewhere or previously received. To reduce the latency of memory access when data is stored in the cache, future use can be based on accessing casino the Anna copies instead of re-access or re-computing the original data. Discussed here, the cache memory including, for example, the cache memory RC 210, IOTLB 220, combinations thereof, etc. may be any type of cache, such as cache level 1 (L1)cache level 2 (L2), level 3 (L3)cache memory medium level cache memory of the last level (LLC), combinations thereof, etc. to store electronic data (e.g., commands)that uses one or more components of system 200. The example embodiment of system 200 and/or 300 may also include other devices, such as one or more displays (e.g., connected to a CS/Neagra 208 and used to display images), audio device (e.g., connected to a CS/Neagra 208 to process the audio signals), etc. In some examples of embodiments of such devices can be made as the target device 224 (which can communicate with CS/Neagra 208 through the root ports 226). The figure 3 shows the block diagram of parts of a computer system 300 according to another example embodiment of the invention. In this example embodiment, various components of system 300 may be integrated into one of the agents 102-1 and/or 102 discussed with reference to figure 1. Detailed description of some of the actions of the computing system 300 will be described here with reference to figure 4. As shown in figure 3, the system 300 can include one or a number of the processors 202, the memory blocks 204, RAM 206, the cache memory RC 210, the data path 212, additional circuit interconnects 214, prefetch logic 216, IOTLB 220, the logic 222, the target device 224 and root ports 226. In addition, as shown in the drawing, in one example embodiment of the cache memory RC 210 and IOTLB 220 may be combined into a single module cache. The figure 4 shows the sequence diagram of the operations of the process 400, in order to update the information stored in the cache memory I/o, and improve caching address translation and/or cache memory I/o in virtualized environments, according to one example embodiment. In one example embodiment discussed various components with reference to figures 1-3 and 5 for explanations of one or more of the steps discussed with reference to figure 4. In figures 1-4 stage 402 of the method 400 begins with receiving a request to access memory. For example, a request to access memory (e.g., access to read or write) may be formed with one of the terminal devices 224 and received the appropriate virtualization logic 222 through one of the ports 226 on stage 402. At stage 404 may be determined whether there is an entry corresponding to the request to access the cache memory. In the example embodiment of virtualization logic 222 may access the IOTLB 220, the cache memory RC 210 and/or to whom MINALI (such as shown in figure 3) at stage 404. If the corresponding entry does not exist, data can be recorded in the cache memory at the stage 406 (for example, the virtualization logic 222 and/or logic prefetch 216). In the example embodiment, the relevant data can be transferred to the cache memory logic 216 to the stage 402. In one example embodiment, the request for pre-fetching is given one of the terminal devices 224, which performs prefetching and supports a coherent copy of the destination address. These requests for pre-fetching can also refresh the IOTLB 220, the cache memory RC 210 and/or combinations thereof; the inputs are distributed and cached until then, until the request is sent to the device. Will be defined settings ASN, if the input is in the IOTLB 220, the cache memory RC 210 and/or a combination of both should be supported or tagged for replacement. At stage 408 may be determined (for example, the virtualization logic 222)whether the request to access the memory hint (in the form of one or more prompts in the request for memory access). If there is no prompt, a request for memory access can be processed at stage 410, for example, the broadcast address of the NRA and GPA and/or physical/virtual addresses to the input IOTLB 220, the cache memory RC 210 and/or a combination of both. In one example embodiment CaSIR is of broadcast addresses and/or operation of the cache memory I/o in virtualized environments can be improved based on the hints traffic device I/o (which can also be referred to as hints the access control block (ASN)). The block (or blocks) of the ASN may be provided with a device I/o (for example, one of the terminal devices 224) in the memory request (e.g., PCI bus)to determine whether it received the device access to the same address again. Accordingly, at stage 412 can determine whether the hint for the next access to the same address. This information may be stored as one or more bits corresponding entry in the cache (for example, an entry in the IOTLB 220, the cache memory RC 210 and/or combinations thereof)that it would be useful for the replacement algorithm line of the cache memory, for example, where the cached broadcast without the planned reuse of bits set (or cleared depending on the implementation) could be candidates for replacement. In one example embodiment, the logic 222 may perform an operation 412. If not indicated any future access, the method 400 resumes from step 410. Otherwise, the corresponding record information can be updated at stage 414 (e.g., one or more bits for the corresponding entry in the IOTLB 220, the cache memory RC 210 and/or their combinations can be updated corresponding logic 222). After stage 414, the method 400 returns to the step 410. In some examples of embodiments of consolidation patterns IOTLB 220 and the cache memory RC 210 in combin, who stimulated the structure of the cache memory IOTLB and cache RC (also called here the cache memory I/o) can improve performance (for example, to reduce latency for transmissions I/o) and/or to provide a more efficient use of chip area or silicon wafer (for example, to reduce the total number of gates). In one example, an application program generated by the processor (for example, one or more processors 202), will see the contents of the cache memory RC 210 (or combination of cache memory I/o), using the physical address and accesses the I/o will look up the address in the cache memory RC 210 (or combination of cache memory I/o) on the basis of GPA. In some examples of embodiments of various algorithms replace the cache memory can be applied to the cache memory RC 210, IOTLB 220 and/or combinations thereof. For example, the replacement algorithm may be a random replacement algorithm, whereas in the other case can be recently used algorithm (LRU). Accordingly, in some examples of embodiments latency broadcast address and/or latency associated with servicing requests I/o, can be restored. In addition, consolidation of storage structures (e.g., address or data)that is used to cache RC 210 and IOTLB 220 (for example, in one module cache memory I/o), can lead to improved efficiency of silicon and silicon substrate (for example, by reducing the number t is insisting switches). In various examples of embodiments of the invention discussed here, the operation, for example with reference to figures 1-4 may be implemented using hardware (e.g., circuits), software, programmable hardware, set the set of microinstructions or combinations thereof, which may be provided as a product in the form of a computer program, for example, including a machine-readable medium that stores the commands (or software)that is used for programming a computer to perform a process discussed here. In addition, the term "logic" may include, for example, software, hardware, or a combination of software and hardware. The machine-readable medium may include a storage device similar to the device discussed here. For example discussed here, the storage device may include volatile and/or nonvolatile memory (or storage). Non-volatile memory may include one of the following: - only memory (ROM), programmable ROM (PROM), erasable PROM memory (EPROM), electrically erasable read-only memory (EEPROM), a disk drive, floppy disk, ROM, CD-ROM (CD-ROM), digital versatile disk (DVD), flash memory, optical magnetic disk, or other types of nonvolatile machine-readable medium, capable of being the second to store electronic data (e.g., commands). Volatile storage (or memory) may include devices such as random-access memory (RAM), dynamic RAM (DRAM), synchronous dynamic DRAM (SDRAM), static RAM (SRAM), etc. Additionally, such computer-readable medium can be loaded as a component of a computer program where the program may be transferred from a remote computer (e.g., server) for the requested computer (e.g., a client) by transmission of data signals embedded in a carrier or another medium via a communication line (for example, a bus, a modem or network connection). The link in the description to "one example embodiment" or "an example embodiment" means that a particular feature, structure or characteristic described in connection with example embodiments may be included, at least in the operation. The appearance of the phrases "in one embodiment" in various places in the description may or may not be attributed to one example implementation. In addition, in the description and the claims can be used, the terms "connected" and "connected" along with their derivatives. In some examples of embodiments of the invention, the term "connected" may mean that two or more elements are in direct physical or electrical contact with each other. The term "connected" mo is no mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements may not be in direct contact with each other, but can be in interaction with each other. Thus, although examples of embodiments of the invention have been described in language specific to structural features and/or methodological acts, it should be understood that the claimed object cannot be limited to specific indications or described actions. Rather, the specific features and steps are disclosed as typical forms of implementing the claimed object. 1. Caching device, containing: 2. The device according to claim 1, in which the target device must generate a request to access the memory. 3. The device according to claim 1, additionally containing logic prefetch data prefetch into the cache memory in response to a request issued by the target device. 4. The device according to claim 1, wherein the target device includes a peripheral connecting device (PCI). 5. The device according to claim 1, in which one or more circuits of the first logic, one or more cores of the processor or the cache is in the same chip of the integrated circuit. 6. The device according to claim 1, wherein the cache memory includes one or more root blocks of the cache memory, a buffer ahead of the sample broadcast input/output (IOTLB) or their combinations. 7. The device according to claim 1, in which the torus cache memory is a public or a private cache. 8. The device according to claim 1, in which the cache memory contains one or more blocks of the cache level 1 (L1), level 2 (L2), level 3 (L3)cache memory medium level cache memory of the last level (LLC) or a combination of both. 9. The device according to claim 1, additionally containing a root port, used to connect the first logic to the target device. 10. The device according to claim 1, in which the cache memory must combine the root block of the cache memory and buffer advanced sample broadcast input/output (IOTLB). 11. The device according to claim 1, in which when forming processor application programs must be searched in the cache memory using the physical address, and a request for memory access, I/o, you must search in the cache memory using the physical address of the guest (GPA). 12. Method cache that contains the following stages: 13. The method according to item 12, additionally containing the broadcast address corresponding to the first memory access input/output. 14. The method according to item 12, which when executed by a processor of search must be searched in the cache memory using the physical address, and a request for memory access, I/o, you must search in the cache memory using the physical address of the guest (GPA). 15. The caching system, comprising: 16. The system of clause 15, in which the target device must generate a request to access the memory. 17. The system of clause 15, further containing logic prefetch for advanced sampling data in the cache memory in response to a request issued by the target device. 18. The system of clause 15, in which the target device includes a peripheral connecting device (PCI). 19. The system of clause 15, further containing a display, connected to Neagra that contains the cache memory.
|
© 2013-2014 Russian business network RussianPatents.com - Special Russian commercial information project for world wide. Foreign filing in English. |