2019 conference paper
Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior
Proceedings of the ACM International Conference on Supercomputing, 195–205.
Modern workloads such as graph analytics, sparse matrix multiplication, and in-memory key-value stores use very large datasets and typically have non-uniform memory access patterns which defy traditional concepts of locality. Moreover, many of these algorithms simultaneously use multiple data structures that have very distinct access patterns to the corresponding pages, leading to heterogeneity in TLB behavior. Our intuition suggests that these two factors make it important to architect a heterogeneity-aware TLB hierarchy. Our results confirm the existence of heterogeneity in TLB behavior, where a few pages have high reuse but poor temporal locality. These pages are responsible for a significant percentage of the TLB misses (e.g. over 15% of the TLB misses result from only 17 pages, which is 0.04% of the total number of pages, for Canneal kernel). In this paper, we propose Diligent TLBs (Di-TLBs), a novel hardware-software co-design for TLBs that identifies such delinquent page mappings by tracking their reuse behavior and pinning them in the TLBs to reduce misses. We show that Di-TLBs reduce TLB misses by up to 24.93% on average while improving performance by up to 9.13% on average for a collection of memory-intensive workloads.