What is L3 Cache?

The third-level cache is designed to read the data missed after the second-level cache. This kind of cache. In a CPU with a third-level cache, only about 5% of the data needs to be called from memory, which further improves the efficiency of the CPU. Its operating principle is to use a faster storage device to keep a copy of the data read from the slow storage device and make a copy. When there is a need to read and write data from the slower storage, a cache can make the read The writing action is completed on the fast device first, so that the system responds faster.

The computer cache is when the CPU is reading data, it first looks up from the cache file, and then automatically reads it after it is found, and then inputs it to the CPU for processing. Of course, if the corresponding cache file is not found in the cache, then it will Read from memory and transfer to cpu for processing. Of course this will take some time so it will be slow. After the CPU processes the data, the data block in which the data resides will be stored in the cache file quickly. In this way, when reading this data in the future, it will be directly performed in the cache. Do not repeatedly call and read the data in memory. .
The size of the cache is also one of the important indicators of the CPU, and the structure and size of the cache have a great impact on the speed of the CPU. The operating frequency of the cache in the CPU is extremely high, and it generally operates at the same frequency as the processor. The working efficiency is far greater than the system memory and hard disk. In actual work, the CPU often needs to repeatedly read the same data block, and the increase in the cache capacity can greatly improve the hit rate of the internal read data of the CPU without having to look for it in memory or hard disk to improve system performance. . However, due to factors such as CPU chip area and cost, caches are small.
The first-level cache is built in the CPU and runs at the same speed as the CPU, which can effectively improve the operating efficiency of the CPU. The larger the L1 cache, the higher the CPU's operating efficiency. However, due to the internal structure of the CPU, the capacity of the L1 cache is very small.
L2 cache, it is to coordinate the speed between L1 cache and memory. The CPU call cache is first-level cache. When the speed of the processor is gradually increased, the first-level cache will be in short supply, so it must be upgraded to the second-level cache. L2 cache is relatively slower than L1 cache, but it has more space than L1 cache. It is mainly used for temporary exchange of data between the first-level cache and memory.
The third-level cache is designed to read the data missed after the second-level cache. This kind of cache. In a CPU with a third-level cache, only about 5% of the data needs to be called from memory, which further improves the efficiency of the CPU. Its operating principle is to use a faster storage device to retain a copy of the data read from the slow storage device and make a copy. When there is a need to read and write data from the slower storage again, a cache can make the read The writing action is completed on the fast device first, so that the system responds faster.
Cache (three-level cache) is divided into two types, the early ones are external, and the later upgrade products are built-in. And its practical effect is that the application of L3 cache can further reduce
First let's calculate a small account (February 2009), about the second-level cache of Intel processors:
The Celeron dual-core E1200 with 512K L2 cache only costs 270 yuan. The Pentium dual-core E2140 with 1M L2 cache sells for 370 yuan. It takes 100 yuan to buy this extra 512K cache; Core 2 E4300 with 2M L2 cache or Pentium dual-core The price of E5200 is above 550 yuan, which means that you have to pay another 200 yuan to buy this extra 1M L2 cache; the 2M Core 2E7200 with 3M cache costs 750 yuan, and you have to pay 200 yuan to buy this extra 1M L2 cache; Core 2 series processors with 4M / 6M L2 cache and so on ...
Regardless of Core 2, Pentium dual-core or Celeron dual-core, their core architecture is actually the same, the frequency can be changed at will, the only difference is the secondary cache. It is no exaggeration to say that Intel is selling secondary caches, 200 yuan for 1M.
In fact, Intel has divided the product line by the size of the secondary cache over the years. In the early days, there were only two specifications, the Pentium and Celeron. In the Core 2 era, Intel reached the state of pinnacle: only dual-core products have 512K, 1M, 2M. , 3M, 4M, 6M up to six versions, quad-core products also have 4M, 6M, 8M, 12M four versions, dazzling! Intel's strategy of subdividing product lines provides excellent products at every price, but it also creates unprecedented confusion for users: how big is the second-level cache?
Throughout the development of Intel processors, and no matter how the core architecture changes, L2 caches that grow in stages are the most intuitive. Willamette with a 0.18 micron process in the Pentium 4 era has a 256K secondary cache, a Northwood core with a 0.13 micron has 512K, and a 0.09 micron Prescott later increased to 1M. In the era of Core, while the architecture has undergone earth-shaking changes, the 65-nanometer process has doubled the secondary cache. Even when Allendale, the representative of the low-end Core at the time of its launch, the secondary cache has reached 2M, and the high-end Core is even more Has a 4M secondary cache. After entering the 45nm process, the capacity of the secondary cache has been further increased. The high-end E8X00 series secondary cache has reached an astonishing 6M, and the low-end E7X00 has reached 3M. So far, Intel has achieved the secondary cache from 512K to 6M or even 12M. "Seamless."
There is no eternal laggard in the market. When AMD enters the 45nm era and the arrival of Phenom II, AMD can also design CPUs targeting different markets through the combination of core number and cache.
The impact of L3 cache on performance is high and low. In terms of games, increasing the capacity of the L3 cache has a great impact on the performance of the game. Although it is not useful for general home machines, if the capacity of the L3 cache is an Internet cafe or a fever machine, there will still be a significant performance increase . Although the third-level cache can also bring significant performance improvements to the PC, after all, the third-level cache acts on the server. For the PC, the third-level cache can only serve as an auxiliary function. In the case of other parameters being the same, The larger the L3 cache capacity, the better the performance. If other parameters are not the same, the effect of the L3 cache is not obvious at this time.
Regardless of the role of the third-level cache, it is also one of the parameters that contribute to the development of computers.
The earliest L3 cache was applied to the K6-III processor released by AMD. At that time, the L3 cache was limited by the manufacturing process and was not integrated into the chip, but was integrated on the motherboard. The L3 cache, which can only be synchronized with the system bus frequency, is not much different from the main memory. Later using L3 cache is Intel's Itanium processor introduced for the server market. Then came P4EE and Xeon MP. Intel also plans to launch a Itanium2 processor with 9MBL3 cache and a dual-core Itanium2 processor with 24MBL3 cache in the future.
But basically the L3 cache is not very important to improve the performance of the processor. For example, the XeonMP processor equipped with 1MBL3 cache is still not Opteron's opponent. Therefore, the increase of the front-side bus can bring more effective performance than simply increasing the cache. Promotion.
Looking at AMD
AMD's attitude on L3 cache
First , the third-level cache capacity plays a more significant role in the server field, but if the server and desktop processors use different architectures, it will inevitably increase the difficulty and cost of production, so it will be brought to the desktop;
Second , on the desktop, increasing the L3 cache from 2MB to 6MB can bring about a 5% performance improvement, and actual tests have also proven this;
Third , it can be seen from the previous data that the L3 cache has tripled, but thanks to the improvement of the production process, the core area is smaller and the cost is lower.
Those who are familiar with the Intel NehalemCore i7 processor must have thought that Intel also uses the same large-capacity shared three-level cache design with a capacity of up to 8MB, which also takes up about one-third of the entire core area. The reason is that Core i7 has only 64KB and 256KB per core L2 cache and L2 cache, which is half less than Phenom / Phenom II.
Interestingly, the Core i7, which is also based on the 45nm process, integrates 731 million transistors, slightly less than the Phenom II, but with a slightly larger core area of 263 square millimeters.
From the perspective of cost, an account of the chip structure of the Phenom II X4 shows that the chip area occupied by the L3 cache is more than the two cores and L1L2 combined, so even if one is blocked, The core Phenom II X3 is also not low in cost. For AMD, which is the main cost-effective route, the loss of profit will be relatively large.
Therefore, after the release of the Phenom II X4 and X3 processors, AMD is also actively preparing to locate mainstream mid- and low-end products to replace the Athlon 64 X2 series that it has been fighting for many years. Due to the higher cost of L3, AMD completely deleted the third-level cache of Phenom II X4 (note that it is not blocked), and Athlon X4 will meet you.
In this way, people can easily understand how much the 6M L3 contributes to the performance of AMD's Phenom II architecture processors through comparative evaluation. It is also possible to know in advance the strength of the Phenom II X3, which has a complete L3 but lacks a core. What about Athlon X4 without L3 but with four cores? I believe many friends will be interested.
AMD has already released Phenom II 920 (6M L3) and Phenom 9850 (2M L3), and a mysterious Athlon X4 engineering sample without L3, so that they all work at 200 * 14 = 2.8GHz frequency, so it can be intuitive. The performance difference caused by the 6M / 2M / 0M L3 cache.
In addition, the just-released Phenom II X3 720 processor has been added. It has a complete 6M L3 cache, but one core is missing. This can reflect the contribution of an additional core or the contribution of 6M L3. The test results show that, from the perspective of the CPU architecture, the cache has a great impact on performance, but the performance of the Athlon X4, especially in the large number of computing processes, strongly beats the previous generation 9850 with a complete level 3 cache. The advantage of memory bandwidth is self-evident. Metaphor.
Intel6 (16MB L3 cache) core processor
First to market is the high-end desktop PC processor brand Core i7 and Nehalem-EP for the energy-efficient server market. It is expected to be available in the fourth quarter of 2009. Subsequently, new architecture products will be launched one after another, including Nehalem-EX for the scalable server market, Havendale and Lynnfield for the desktop market, Auburndale and Clarksfield for the mobile market, all expected to debut in the second half of 2009.
The next-generation Core microarchitecture (Nehalem) processors all start from 4 cores, but also use Hyper-Threading technology, which can process 8 threads at the same time. Core i7 supports Turbo Mode and Power Gates technology, which can completely shut down the idle core when multi-threaded computing is not needed. Each core can work at different voltages / frequency. Turbo Mode, which increases the frequency of a core individually, can significantly improve the performance of single-threaded applications.
Intel also released the first 6-core processor, the "Dunnington" Xeon X7460 for the multi-server server market, with a built-in 16MB L3 cache, which went on sale in September 2008. It is Intel's last 45nm before moving to the Nehalem microarchitecture. Core 2 micro-architecture processor. Server models using this processor have broken many world records, including the 8-way 48-core IBM System x3950 M2 server for the first time in the TPC Benchmark C database test exceeded 1 million tpmC, and the 4-way system HP Proliant DL580 G5 broke TPC- C record, Dell PowerEdge R900 broke TPC-E record, Sun Fire X4450 broke SPECjbb 2005 record, Fujitsu Siemens PRIMERGY RX600 S4 broke SPECint_rate2006 record. [1]
No L3 cache AMD 45nm processor
After the process is improved and the technology is improved, AMD s new 45nm desktop processor will have a full range of five sub-families, of which the Phenom II brand has the third-level cache, and those who have streamlined the third-level cache will still use Athlon (I do nt know why it is not Athlon II). As for Sempron, it will soon be eliminated.
The highest-end "Phenom II X4 900/800" series quad-core code is Deneb, the second-level cache is 4 × 512KB, the third-level cache is 6MB in the former, and the latter is streamlined to 4MB. Both series will be released first in January next year. The first two Phenom II X4 940/920 with AM2 + interface will be launched at the CES 2009 exhibition on January 8. All of them will use the AM3 interface.
The triple-core series "Phenom II X3 700" is code-named Heka, the second-level cache is 3 × 512KB, and the third-level cache is a complete 6MB, which will be followed up in February next year.
In addition to these two series, AMD also prepared a version without level 3 cache, and did not add a level 3 cache at the beginning of the design, instead of simply shielding the version, it will not cause waste.
The four cores will be "Athlon X4 600" series, codenamed Propus, secondary cache 4 × 512KB, and the three cores will be "Athlon X3 400" series, codenamed Rana, secondary cache 3 × 512KB, both of which will debut in April next year. .
The last is the "Athlon X2 200" series dual-core, code-named Regor, with 2 × 1MB of secondary cache, which is twice as much as other series, but the latest release time, which will not be launched until June next year. Some older architecture models conflict.
As for the chipset, the current AMD 7 series motherboards can support the new processor by updating the BIOS, depending on the technical support of the motherboard manufacturer. In addition, AMD will launch a new 8-series integrated chipset in the first half of 2009. RS880 and RS880C are paired with the SB750 South Bridge, and RS880D and RS890 are paired with the next-generation SB800. [2]
Who is more important for L1, L2 and L3 caches
Level 1 is the most important, but now the CPU's level 1 cache is almost the same, so ignore it.
The second-level cache is very important for Intel CPUs. The larger the second-level cache of Intel CPUs is, the more significant the performance improvement is, while the AMD CPUs are also important, but the second-level cache size is important for the performance of AMD CPUs. The improvement is not obvious.
The third-level cache is actually just an auxiliary function. Except for the server, it is actually not useful for most home machines. The memory is still very important. However, if you run large programs or games, the third-level cache is very important. The CPU already has three levels of cache.
Therefore, in addition to frequency, there is also a number of cores, and the size of the cache. In theory, the larger the second-level cache, the better the processor's performance, but this does not mean that the capacity of the second-level cache is doubled. Being able to processor brings a doubling of performance growth. In 2006, the size of most of the data processed by the CPU was between 0-256KB, the size of a small part of the data was between 256KB-512KB, and only a very small amount of data exceeded 512KB. By 2009, there were already 1M and 2M. So as long as the available L1 and L2 cache capacity of the processor reaches more than 256KB, it can cope with normal applications; the L2 cache of 512KB capacity is enough to meet the needs of most applications.
Which of the main frequency, second-level cache and third-level cache is more important
The working principle of the cache is that when the CPU wants to read a piece of data, it first looks in the cache, if it finds it, it reads it immediately and sends it to the CPU for processing; if it doesn't find it, it reads it from memory at a relatively slow speed and sends it to The CPU processes and loads the data block into the cache at the same time, so that the entire block of data can be read from the cache in the future without having to call the memory.
It is this read mechanism that makes the CPU hit the cache with a very high hit rate (most CPUs can reach 90%), which means that 90% of the data to be read by the CPU next time is in the cache, only about 10% Need to read from memory. This greatly saves the time for the CPU to directly read the memory, and also basically eliminates the need for the CPU to read data. In general, the order in which the CPU reads data is cached first and then memory.
The size of the second-level cache and third-level cache of the CPU is not the only criterion for measuring the performance of the CPU. It also depends on the CPU's frequency and the manufacturing process. For example, 45nm is better than 65nm. Pay attention to it If you look at the instruction set it supports, it depends on whose product. The second-level cache is very important for Intel's products, but the second-level cache is not as important to AMD as Intel, because AMD has a second-level cache. There are also three levels of cache.
To say which of the main frequency, the second-level cache and the third-level cache is more important, this question completely depends on what you are using the computer to pursue, and what tasks are mainly performed. The main frequency is high and the operation speed is fast. The second-level cache (L2) and third-level cache (L3) play a buffering role between memory and CPU. Alleviating the mismatch between memory and CPU speed will affect the efficiency of CPU execution. Therefore, the large L2 and L3 will be more efficient when the CPU is processing a large amount of data for a long time. High clock speed will be faster in processing a small amount of data in a short period of time. In fact, these three items are all important. Any one that does not meet a certain standard will have a bottleneck effect.
IntelXeon 7100 series CPU (16MB L3 cache)
Intel officially released the latest dual-core Xeon processors for high-end servers, code-named Tulsa's Xeon 7100 series. The processor is still based on the previous-generation NetBurst architecture, but there are no small improvements in performance and power consumption.
Xeon 7100 series CPU configuration
Xeon 7100 series CPU core codename is Tulsa, dual core design, each core is equipped with 1MB L2 (level 2) cache, 16MB L3 (level 3) cache. The processor also supports Hyper-Threading (HT: Hyper-Threading), virtualization (Intel Virtualization Technology) and Intel Cache Safe Technology (Cache Safe Technology). Xeon 7100 series processors above 3.0GHz TDP power consumption is 150W, TDP below 3.0GHz is 95W.
Xeon 7100 series CPU specifications
CPU number clocked FSBL2 cache L3 cache thousand price
7140M 3.40GHz 800MHz 2 × 1MB 16MB 1980
7140N 3.33GHz 667MHz 2 × 1MB 16MB 1980
7130M 3.20GHz 800MHz 2 × 1MB 8MB 1391
7130N 3.16GHz 667MHz 2 × 1MB 8MB 1391
7120M 3.00GHz 800MHz 2 × 1MB 4MB 1177
7120N 3.00GHz 667MHz 2 × 1MB 4MB 1177
7110M 2.60GHz 800MHz 2 × 1MB 4MB 856
7110N 2.60GHz 667MHz 2 × 1MB 4MB 856
The Xeon 7100 series processors support 667 and 800MHz PSB buses and are compatible with the 8501 chipset. Existing server platforms can be easily upgraded to new processors. In addition, Intel stated that the 65nm process has developed very smoothly. At present, the shipment ratio of 65nm products has exceeded 90nm.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?