What Is a Translation Lookaside Buffer?
The translation lookaside buffer is also translated into a page table cache and a forwarding bypass cache , which is a type of cache of the CPU and is used by the memory management unit to improve the translation speed from virtual addresses to physical addresses. All current desktop and server processors (such as x86) use TLB. TLB has a fixed number of space slots for tab page table entries that map virtual addresses to physical addresses. It is a typical content-addressable memory (Acronym: CAM). Its search key is a virtual memory address, and its search result is a physical address. If the requested virtual address exists in the TLB, CAM will give a very fast matching result, and then the obtained physical address can be used to access the memory. If the requested virtual address is not in the TLB, the tab page table is used for the virtual and real address translation, and the access speed of the tab page table is much slower than the TLB. Some systems allow tab page tables to be swapped to secondary memory, so virtual-to-real address translation can take a very long time.
- TLB is used to cache a part of tab page table entries. TLB can be between CPU and
- Two common solutions to TLB misses in modern architectures:
- Hardware-based TLB management, the CPU traverses the tab page table to see if there is a valid tab page table entry containing the specified virtual address. If there is such a paging table entry, save the paging table entry to the TLB and re-execute the TLB access, and this visit will definitely be found and the program can run normally. A tab error occurs if the CPU cannot find a valid entry in the tab table that contains the specified virtual address
- Capacity: 8-4,096 paging table entries
- Centering time: 0.5-1 clock frequency period
- Cost of misses: 10-30 clock frequency cycles
- Misses: 0.01%-3%
- If 1 clock frequency cycle is needed for TLB seek, 30 clock frequency cycles are required for one miss access, and the miss rate is 1%, the average value of effective memory access cycles is {\ displaystyle 1 \ times 0.99+ (1+ 30) \ times 0.01 = 1.30} clock frequency cycles / each memory access.
- Instructions and data can use different TLBs, namely Instruction TLB (ITLB) and Data TLB (DTLB), or instructions and data use a unified TLB, namely Unified TLB (UTLB), or use block TLB (BTLB)
- During task switching, some TLB entries may become invalid. For example, a previously running process has accessed a page, but the process to be executed has not yet visited this page. The simplest strategy is to clear out the entire TLB. Newer CPUs already have more effective strategies; for example, in Alpha EV6, each TLB entry will be marked with an "address space number" (ASN), and only TLB entries that match the currently working ASN Will be considered valid.
- There is a TLB issue with the B2 version of this CPU. If you use software to solve this problem, it will lose 10-30% of the performance. For this reason, AMD also introduced a B3 version of the Phenom processor, which was corrected on the hardware circuit to overcome the problem of reduced performance of the B2 version of the Phenom processor.