Paper

Niki C. Thornock. Using Set Sampling in Level three Cache Studies. Master's thesis, Brigham Young University, 1999.

Abstract

In single processor systems, one or two cache levels are sufficient to reduce the performance gap between the processor and main memory. With the increasing popularity of multiprocessor systems, this level of caching is becoming inadequate; adding a third, very large cache (level 3 or L3) seems a likely candidate for reducing the performance gap. Simulation, especially trace-driven simulation, is a frequently used method of testing new cache configurations. Creating a simulator is fairly straightforward but it is difficult to obtain the long, accurate traces necessary for simulating extremely large L3 cache systems used in current and future multiprocessor systems.

We discuss some of the difficulties present in trace collection and trace-driven simulation. We then describe our multiprocessor tracing technique and verify that it accurately collects long traces. We investigate time sampling and two types of set sampling and conclude that the second set sampling technique achieves the most accurate results. The miss rate for the second set sampling method is calculated as the number of misses to sampled sets divided by the total number of references scaled by the sample size. We found that the sampling accuracy depends on the workload: if the workload warms up the cache, the sampling technique is accurate for all cache configurations. If the workload does not warm up the cache, the sampling technique is only accurate for very associative caches. We determined that the 10% sample size was the most accurate. Our chosen sampling method reduces required disk space, enables simulations to run faster, and effectually enlarges the trace buffer of our hardware monitor, decreasing trace distortion.