Reconfigurable Many-Core Processor with Cache Coherence

As the number of cores integrated on one processor increases, the cost of on-chip communication becomes more expensive, including the latency and the load on links. This also limits the utilization of the many-core processor. This paper describes a virtua

PDF / 211,687 Bytes
10 Pages / 439.363 x 666.131 pts Page_size
49 Downloads / 195 Views

DOWNLOAD

REPORT

Abstract. As the number of cores integrated on one processor increases, the cost of on-chip communication becomes more expensive, including the latency and the load on links. This also limits the utilization of the many-core processor. This paper describes a virtual computing group(VCG) model to improve the utilization of the computing resources on NoC-based many-core processor. Each VCG can be reconﬁgured into diﬀerent size and topology before the program starts. The token protocol for cache coherence is adopted to improve the performance of memory accessing. Modiﬁcations to Token protocol are made to support cache coherence in the local VCG only, which lightens the communication penalty on a large NoC. We implement this reconﬁgurable system in Gem5 simulator, and the simulation result proves the improvement of the performance. Keywords: Reconﬁguration, Many-core, Cache Coherence, VCG, Parallel Library.

1

Introduction

A current trend for microprocessor is the many-core processor which integrates a great quantity of cores onto one single chip to supply high parallel computing ability. Tilera announced its ﬁrst many-core processor Tile64[4] in 2007, followed by the Single-chip Cloud Computer(SCC)[1] from Intel in 2009. These serials of processors contains tens to hundreds of cores, using network-on-chip(NoC) as their interconnection. The report of ITRS2011[7] shows that the number of cores per chip increases at the speed of 1.4x each year, and predicts that more than one hundred cores will be integrated on one processor by the year 2016 in SoC design. Although some researches, such as TRIPS[13], put a lot of eﬀorts in processing element(PE) design to make the best use of each computing components, the researches on many-core processors nowadays usually focus on two aspects: interconnection and memory hierarchy. In the single-core and even some multi-core processors, bus is the most commonly used interconnection among processing cores. However, as the number of cores increases, the exclusive accessing to bus becomes the bottleneck for data exchanging. Some other interconnections are designed to reduce communication latency, such as crossbar and NoC. Compared to crossbar, NoC has higher W. Xu et al. (Eds.): NCCET 2013, CCIS 396, pp. 198–207, 2013. c Springer-Verlag Berlin Heidelberg 2013

Reconﬁgurable Many-Core Processor with Cache Coherence

199

accessing latency, but it need fewer physical links and is much easier for extending. A lot of researches are made to improve the performance of NoC in the past two decades, which cover topology designing and link optimization, such as application-speciﬁc NoC design from Xu[16], 3D NoC from Xie[17], and link addition and removal from Jiao[8]. Some other researches focused on avoiding deadlock for some speciﬁed NoC topology, which are mainly based on the theory of William Dally[3], such as routing algorithms on Torus[15]. For NoC, the Mesh topology is the most used in research. Memory hierarch is another aspect to be considered in many-core processors. In multi-core pr

Data Loading...

Reconfigurable Many-Core Processor with Cache Coherence

Recommend Documents

Processor Cache

Reconfigurable Cryptographic Processor

The Verification of the On-Chip COMA Cache Coherence Protocol

Dynamic, Tagless Cache Coherence Architecture in Chip Multiprocessor

Page Cache

A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processor

Cell Processor

Cache Performance

L3 Cache

Aggregate Cache

Processor sharing

L1 Cache