Design and Fabrication of 3D Microprocessors

  • PDF / 1,059,395 Bytes
  • 12 Pages / 612 x 792 pts (letter) Page_size
  • 66 Downloads / 239 Views

DOWNLOAD

REPORT


0970-Y03-02

Design and Fabrication of 3D Microprocessors Patrick Morrow1, Bryan Black2, Mauro J. Kobrinsky1, Sriram Muthukumar3, Don Nelson4, Chang-Min Park1, and Clair Webb4 1 Components Research, Intel Corp., M/S RA3-252, 5200 N.E. Elam Young Parkway, Hillsboro, OR, 97124-6497 2 Microprocessor Research Labs, Intel Corp., MS AN1, 1501 S Mopac, Suite 400, Austin, TX, 78746 3 Assembly Technology Development, Intel Corp., M/S CH2-140, 5000 W. Chandler Blvd., Chandler, AZ, 85226 4 Portland Technology Development, Intel Corp., M/S RA3-256, 5200 N.E. Elam Young Parkway, Hillsboro, OR, 97124-6497

ABSTRACT Stacking multiple device strata can improve system performance of a microprocessor (µP) by reducing interconnect length. This enables latency improvement, power reduction, and improved memory bandwidth. In this paper we review some of our recent design analysis and process results which quantitatively show the benefits of stacking applied to µPs. We report on two applications for stacking which take advantage of reduced wire length“logic+logic”stacking and “logic+memory” stacking. In addition to optimizing minimum wire length, we considered carefully the thermal ramifications of the new designs. For the logic+memory application, we considered the case of reducing off-die wiring by stacking a DRAM cache (32 to 64MB) onto a high performance µP. Simulations showed 3x reduced off-die bandwidth, Cycles Per Memory Access (CPMA) reduction of 13%, and a 66% average bus power reduction. For logic+logic applications, we considered a high performance µP where the unit blocks were repartitioned into two strata. For this case, simulations showed that stacking can simultaneously reduce power by 15% while increasing performance by 15% with a minor 14º C increase in peak temperature compared to the planar design. Using voltage scaling, this translates to 34% power reduction and 8% performance improvement with no temperature increase. We found that these results can be further improved by a secondary splitting of the individual blocks. As an example, we split a 32KB first level data cache resulting in 25% power reduction, 10% latency reduction, and 20% area reduction. We also discuss the fabrication of stacked structures with two complimentary process flows. In one case, we developed a 300mm wafer stacking process using Cu-Cu bonding, wafer thinning, and through-silicon vias (TSVs). This technology provides reliable bonding with nondetectable bonding-interface resistance and inter-strata via pitch below 8µm. We investigated the impact of this wafer stacking process to the transistor and interconnect layers built using a 65nm strained-Si/Cu-Low-K process technology and found no impact to either discrete N- and P-MOS devices or to thin 4Mb SRAMs. We verified fully functional SRAMs on thinned wafers with thicknesses down to 5µm. Although wafer stacking leads itself well to tight-pitch same-diesize stacking, die stacking enables integration of different size dies and includes opportunity to improve yield by stacking known good dies. We dem