Dual Data Rate Network-on-Chip Architectures
Doktorsavhandling, 2021

Networks-on-Chip (NoCs) are becoming increasing important for the performance of modern multi-core systems-on-chip. The performance of current NoCs is limited, among others, by two factors: their limited clock frequency and long router pipeline. The clock frequency of a network defines the limits of its saturation throughput. However, for high throughput routers, clock is constrained by the control logic (for virtual channel and switch allocation) whereas the datapath (crossbar switch and links) possesses significant slack. This slack in the datapath wastes network throughput potential. Secondly, routers require flits to go through a large number of pipeline stages increasing packet latency at low traffic loads. These stages include router resource allocation, switch traversal (ST) and link traversal (LT). The allocation stages are used to manage contention among flits attempting to simultaneously access switch and links, and the ST stage is needed to change the routing dimension. In some cases, these stages are not needed and then requiring flits to go through them increases packet latency. The aim of this thesis is to improve NoC performance, in terms of network throughput, by removing the slack in the router datapath, and in terms of average packet latency, by enabling incoming flits to bypass, when possible, allocation and ST stages. More precisely, this thesis introduces Dual Data-Rate (DDR) NoC architectures which exploit the slack present in the NoC datapath to operate it at DDR. This requires a clock with period twice the datapath delay and removes the control logic from the critical path. DDR datapaths enable throughput higher than existing single data-rate (SDR) networks where the clock period is defined by the control logic. Additionally, this thesis supplements DDR NoC architectures with varying levels of pipeline stage bypassing capabilities to reduce low-load packet latency. In order to avoid complex logic required for bypassing from all inputs to all outputs, this thesis implements and evaluates a simplified bypassing approach. DDR NoC routers support bypassing of the allocation stage for flits propagating an in-network straight hop (i.e. East to West, North to South and vice versa) and when entering or exiting the network. Disabling bypassing during XY-turns limits its benefits, but, for most routing algorithms under low traffic loads, flits encounter at most one XY-turn on their way to the destination. Bypassing allocation stage enables incoming flits to directly initiate ST, when required conditions are met, and propagate at one cycle per hop. Furthermore, DDR NoC routers allow flits to bypass the ST stage when propagating a straight hop from the head of a specific input VC. Restricting ST bypassing from a specific VC further simplifies check logic to have clock period defined by datapath delays. Bypassing ST requires dedicated bypass paths from non-local input ports to opposite output ports. It enables flits to propagate half a cycle per hop. This thesis shows that compared to current state-of-the-art SDR NoCs, operating router’s datapath at DDR improves throughput by up to 20%. Adding to a DDR NoC an allocation bypassing mechanism for straight hops reduces its packet latency by up to 45%, while maintaining the DDR throughput advantage. Enhancing allocation bypassing to include flits entering and exiting the network further reduces latency by another 24%. Finally, adding ST bypassing further reduces latency by another 32%. Overall, DDR NoCs offer up to 40% lower latency and about 20% higher throughput compared to the SDR networks.

Dual Data-Rate

Network-on-Chip

Multiprocessor System-on-Chip

On-Chip Interconnect

System-on-Chip

Chip Multiprocessors

Opponent: Tushar Krishna, Georgia Institute of Technology, USA

Författare

Ahsen Ejaz

Chalmers, Data- och informationsteknik

FastTrackNoC: A NoC with FastTrack Router Datapaths

HighwayNoC: Approaching Ideal NoC Performance With Dual Data Rate Routers

IEEE/ACM Transactions on Networking,; Vol. 29(2021)p. 318-331

Artikel i vetenskaplig tidskrift

FreewayNoC: A DDR NoC with Pipeline Bypassing

2018 12th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2018,; (2018)

Paper i proceeding

DDRNoC: Dual Data-Rate Network-on-Chip

Transactions on Architecture and Code Optimization,; Vol. 15(2018)

Artikel i vetenskaplig tidskrift

Computers are extensively used to solve many problems our civilization faces today. As these problems become more complex, we require increasingly powerful computers capable of processing information faster and more efficiently. One key component of computers is the Central Processing Unit or CPU, which is responsible for processing information. Until 2005, CPU design engineers increased the processing speeds of computers by designing faster CPUs. However, simply increasing the speed of CPUs has no longer been possible because faster CPUs consume much more power and generate more heat than what can easily be cooled to avoid causing permanent damage to CPU chips. These limitations were addressed by designing CPUs with multiple processing cores, integrated in a single multi-core chip, capable of processing data in parallel.

Over the years, the number of processing cores in a CPU chip has increased from two to four to over a thousand in some cutting-edge CPUs today. A CPU with so many cores can offer very high performance, but it is limited by, among others, the inefficient communication between the cores in a multi-core chip. A high-speed communication fabric is then deployed within the chip to connect all the cores in an efficient way and improve CPU performance. This thesis proposes novel architectures for this on-chip interconnect which offer better performance compared to existing approaches.

Energy-efficient Heterogeneous COmputing at exaSCALE (ECOSCALE)

Europeiska kommissionen (EU), 2015-10-01 -- 2018-12-31.

Green Computing Node for European micro-servers (EUROSERVER)

Europeiska kommissionen (EU), 2013-09-01 -- 2016-08-31.

Ämneskategorier

Datorteknik

Inbäddad systemteknik

Datorsystem

Styrkeområden

Informations- och kommunikationsteknik

Infrastruktur

C3SE (Chalmers Centre for Computational Science and Engineering)

Drivkrafter

Innovation och entreprenörskap

ISBN

978-91-7905-497-7

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4964

Utgivare

Chalmers tekniska högskola

Online

Opponent: Tushar Krishna, Georgia Institute of Technology, USA

Mer information

Senast uppdaterat

2021-05-18