Reconfigurable Many-Core Processor with Cache Coherence

As the number of cores integrated on one processor increases, the cost of on-chip communication becomes more expensive, including the latency and the load on links. This also limits the utilization of the many-core processor. This paper describes a virtua

  • PDF / 211,687 Bytes
  • 10 Pages / 439.363 x 666.131 pts Page_size
  • 49 Downloads / 188 Views

DOWNLOAD

REPORT


Abstract. As the number of cores integrated on one processor increases, the cost of on-chip communication becomes more expensive, including the latency and the load on links. This also limits the utilization of the many-core processor. This paper describes a virtual computing group(VCG) model to improve the utilization of the computing resources on NoC-based many-core processor. Each VCG can be reconfigured into different size and topology before the program starts. The token protocol for cache coherence is adopted to improve the performance of memory accessing. Modifications to Token protocol are made to support cache coherence in the local VCG only, which lightens the communication penalty on a large NoC. We implement this reconfigurable system in Gem5 simulator, and the simulation result proves the improvement of the performance. Keywords: Reconfiguration, Many-core, Cache Coherence, VCG, Parallel Library.

1

Introduction

A current trend for microprocessor is the many-core processor which integrates a great quantity of cores onto one single chip to supply high parallel computing ability. Tilera announced its first many-core processor Tile64[4] in 2007, followed by the Single-chip Cloud Computer(SCC)[1] from Intel in 2009. These serials of processors contains tens to hundreds of cores, using network-on-chip(NoC) as their interconnection. The report of ITRS2011[7] shows that the number of cores per chip increases at the speed of 1.4x each year, and predicts that more than one hundred cores will be integrated on one processor by the year 2016 in SoC design. Although some researches, such as TRIPS[13], put a lot of efforts in processing element(PE) design to make the best use of each computing components, the researches on many-core processors nowadays usually focus on two aspects: interconnection and memory hierarchy. In the single-core and even some multi-core processors, bus is the most commonly used interconnection among processing cores. However, as the number of cores increases, the exclusive accessing to bus becomes the bottleneck for data exchanging. Some other interconnections are designed to reduce communication latency, such as crossbar and NoC. Compared to crossbar, NoC has higher W. Xu et al. (Eds.): NCCET 2013, CCIS 396, pp. 198–207, 2013. c Springer-Verlag Berlin Heidelberg 2013 

Reconfigurable Many-Core Processor with Cache Coherence

199

accessing latency, but it need fewer physical links and is much easier for extending. A lot of researches are made to improve the performance of NoC in the past two decades, which cover topology designing and link optimization, such as application-specific NoC design from Xu[16], 3D NoC from Xie[17], and link addition and removal from Jiao[8]. Some other researches focused on avoiding deadlock for some specified NoC topology, which are mainly based on the theory of William Dally[3], such as routing algorithms on Torus[15]. For NoC, the Mesh topology is the most used in research. Memory hierarch is another aspect to be considered in many-core processors. In multi-core pr