Twin-Load: Bridging the Gap between Conventional Direct-Attached and Buffer-on-Board Memory Systems
Paper in proceedings, 2016
Conventional systems with direct-attached DRAM struggle to meet growing memory capacity demands: the number of channels is limited by pin count, and the number of modules per channel is limited by signal integrity issues. Recent buffer-on-board (BOB) designs move some memory controller functionality to a separate buff er chip, which lets them support larger capacities (by adding more DRAM or denser, non-volatile components). Nonetheless, lower-cost, lower-latency, direct-attached DRAM still represents a better price-performance solution for many applications. Most processors exclusively implement either the direct-attached or the BOB approach. Combining both technologies within one processor has obvious bene fits, but current memory-interface requirements complicate this straightforward solution. The standard DRAM interface is DDR, which requires data to be returned at a fixed latency. In contrast, the BOB interface supports diverse memory technologies precisely because it allows asynchrony. We propose Twin-Load technology to enable one processor to support both direct-attached and BOB memory. We show how to use Twin-Load to support BOB memory over standard DDR interfaces with minimal processor modifications. We build an asynchronous protocol over the existing, synchronous interface by splitting each memory read into twinned loads. The first acts as a prefetch to the buffer chip, and the second asynchronously fetches the data. We describe three methods for generating twinned loads, each leveraging different layers of the system stack.