Design Considerations of High Performance Data Cache with Prefetching

In this paper, we propose a set of four load-balancing techniques to address the memory latency problem of on-chip cache. The first two mechanisms, the sequential unification and the aggressive lookahead mechanisms, are mainly used to reduce the chance of

  • PDF / 76,576 Bytes
  • 8 Pages / 432.397 x 672.803 pts Page_size
  • 91 Downloads / 164 Views

DOWNLOAD

REPORT


Abstract. In this paper, we propose a set of four load-balancing techniques to address the memory latency problem of on-chip cache. The first two mechanisms, the sequential unification and the aggressive lookahead mechanisms, are mainly used to reduce the chance of partial hits and the abortion of accurate prefetch requests. The latter two mechanisms, the default prefetching and the cache partitioning mechanisms, are used to optimize the cache performance of the unpredictable references. The resulting cache, called the LBD (Load-Balancing Data) cache, is found to have superior performance over a wide range of applications. Simulation of the LBD cache with RPT prefetching (Reference Prediction Table - one of the most cited selective data prefetch schemes [2,3]) on SPEC95 showed that significant reduction in the data reference latency, ranging from about 20% to over 90% and with an average of 55.89%, can be obtained. This is compared against the performance of prefetch-on-miss and RPT, with an average latency reduction of only 17.37% and 26.05% respectively.

1.

Introduction

To improve the accuracy and coverage of data prefetching, current research emphasizes on the exploration of hybrid data address and value prediction. Data accesses in a program are partitioned into exclusive distinct reference classes, each of which is handled exclusively by one predictor/prefetch unit. Good examples of these predictors include linear memory reference predictor and pointer predictors. With an accurate predictor and its supporting hardware/software, very good cache performance is expected. However, experiment showed that it might not be as simple as it appears. Even though the overall cache performance can be improved, there are still many data references who access patterns are predicted accurately but are missing from cache [5]. This percentage ranges from a few percents to over 95%, with an average of 54%. In other words, about half of the cache misses are actually due to data references that can be predicted accurately! The overall cache effectiveness is further bounded by the access behavior of unpredictable references in cache, which contributes to the other half of the cache misses. In this paper, we propose a set of four load-balancing mechanisms to make up for the discrepancy between the ideal and the observable performance of accurate prefetching. They are (i) sequential unification of demand and prefetch requests, (ii) P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 1243-1250, 1999.  Springer-Verlag Berlin Heidelberg 1999

1244

Chi-Hung Chi and Jun-Li Yuan

aggressive lookahead prefetching, (iii) default prefetching, and (iv) cache partitioning. The first two mechanisms are mainly used to reduce the chance of partial hits and the abortion of accurate prefetch requests by demand fetch requests while the latter two mechanisms are to optimize the cache performance of the unpredictable references. To ensure the performance gain, these mechanisms will only be triggered selectively, based on the level of confidence