What Is Cache Coherence?
Cache coherence, also known as cache coherence, cache-to-cache coherence. Refers to a computer system that uses a hierarchical storage system to ensure that the data in the cache memory is the same as the data in the main memory. In a system, when many different devices share a common memory resource, the data in the cache is inconsistent, which causes problems. This problem is particularly prone to occur in multiprocessor systems with several CPUs.
- Regarding the concept of consistency, it can be intuitively defined that each read operation must return the value that was last written to this location. For the operation of each processor unit, there is a general, serial operation sequence that we would like to see in any consistent memory system. Therefore, a more formal definition of cache coherency can be made: if any execution result of a program satisfies the following conditions: for any unit, it is possible to establish a hypothetical sequence of operations (that is, all processes The read and write operations issued are arranged in a total order, and this sequence is consistent with the execution result, and in the sequence: (The order in which any operation issued by a particular process exhibits the same order as the process issued them to the storage system. The value returned by the read operation is the last value written to the corresponding unit in serial order, so this multiprocessor storage system is consistent.
- In addition, there is another problem with multiple private caches: if data is written by a processor to a certain unit, and another processor reads from it to transfer it, then we are concerned about Consistency will be very important. Eventually, the data written in a unit will be visible to all readers, but this consistency does not indicate when the written data will become visible. Generally, when writing a parallel program, we want to establish an order between writing and reading, that is, we need to define an order model, according to which the programmer can infer the execution result of their program and its correctness. This model is storage identity.
- A complete consistency model includes two aspects of cache consistency and storage identity, and the two are complementary: cache consistency defines the behavior of read and write operations to the same storage address, and storage identity The model defines the read and write behavior of accessing all storage addresses. In shared storage space, multiple processes perform concurrent read and write operations on different units of storage, and each process will see an order in which these operations are completed. A storage identity model specifies several constraints on this order. It is worth mentioning that the concurrent storage operations involved here include both the same unit and different units; that is, they can come from the same unit, and Can come from different processes. In this sense, storage identity includes cache coherence.
- Cache coherence protocol is the main solution to the problem of cache coherence, and it is also an important means to ensure storage identity. It defines the existence form of the shared cache block in each private cache, and defines the communication behavior between the private caches in detail. Academia and industry have proposed a variety of cache coherence protocol models, but the starting point of all models is the same, they are all to ensure the storage model SWMR (single-writer, multiple-reader) attributes, that is, for a given Fixed cache block, at any time during system operation, it is guaranteed that (1) only one processor core can have write access to the cache at the same time; or (2) zero or more processor cores can have this permission at the same time Read access to the cache block. According to the modification of the shared data, the implementation of the cache coherency protocol can be divided into two forms: write invalidation protocol and write update protocol. Among them, in the write invalidation protocol, the processor core must ensure that the current processor core has read and write rights to the cache block before performing a write operation on a certain memory block. If two or more processor cores attempt to access the same data item and perform a write operation at the same time, only one of them can be in progress at the same time, and the other access request will be blocked; when a processor performs a write operation, All other copies of this data in the private cache will be invalidated. After the current processor completes the write operation, all subsequent operations on this data must first obtain a copy of this newly written data. Therefore, the write-invalid type protocol enforces serialization of write operations. The write update protocol is also called the write broadcast protocol, which means that the processor core simultaneously updates the data copies of the current data in all other caches when performing a write operation on a certain data.
- There are two main types of cache coherency protocols: a cache-based coherence protocol (or a listening protocol) based on a listening form and a cache coherency protocol (or a directory protocol) based on a directory structure. In these two protocols, the implementation of the listening protocol depends on a bus or a bus-like network connection. Using this network, all requests issued by the private cache of a single processor core will be broadcast to all other processes in the system. In the private cache of the processor core, the access requests of all processors can also be sequenced on this bus to achieve the cache consistency model and the storage order model's requirements for access sequence. In addition, this protocol can also handle multiple conflicting requests for the same data block through the bus structure, and the private caches of multiple processors can communicate directly through this bus structure, reducing communication delay. However, since all requests are transmitted through the bus, but the bus bandwidth resources are limited, it will affect the scalability of the entire system. The directory protocol uses a directory structure to manage cache blocks. In the directory protocol, the fetch request from the private cache of the processor core is first sent to the directory structure that owns the corresponding cache block. This directory structure records the current cache block sharing situation. The directory structure controller will The state of the cache block, choose to respond to this request or forward this request to other corresponding private caches. This method does not require a network that depends on a specific topology, and reduces the bandwidth consumption in the network through point-to-point direct communication. Therefore, this protocol is easy to expand. However, in the implementation of this protocol, all requests must be processed through the directory structure, so additional delays will be introduced [2]
- Each cache line in the single-core cache has 2 flags: dirty and valid flags, which describe the data relationship between the cache and memory (memory) (whether the data is valid and whether the data has been modified). In multi-core processing, In the processor, multiple cores share some data, and the MESI protocol contains a description of the shared state.
- In the MESI protocol, each Cache line has 4 states, which can be represented by 2 bits. They are:
- M (Modified): This line of data is valid, the data has been modified, and it is inconsistent with the data in memory, and the data exists only in this cache.
- E (Exclusive): This row of data is valid, the data is consistent with the data in memory, and the data only exists in the cache.
- S (Shared): This row of data is valid, the data is consistent with the data in memory, and the data exists in many caches.
- I (Invalid): This row of data is invalid.
- Under the role of this protocol, although each cache controller is listening to the system bus at any time, the only situations that can be monitored are read misses, write misses, and shared row write hits. Read valid lines that hit the monitor must enter the S-state and issue a listen-hit instruction, but M-state rows must be written back to main memory first; valid rows that write the listen-hit hit must enter the I-state, but M-state rows when receiving RWITM Be the first to write back to main memory. In short, the monitoring logic is not complicated, and the added system bus transmission overhead is not large. However, the MESI protocol effectively ensures the consistency of the dirty copies of the main memory block in multiple caches, and can be written back in time to ensure cache main memory access. Correctness [3] .