Exploring Power-Thermal-Performance Trade-Offs in 3D Network on Chip-Enabled Many-Core Systems

Exploring Power-Thermal-Performance Trade-Offs in 3D Network on Chip-Enabled Many-Core Systems
Author: Dongjin Lee
Publisher:
Total Pages: 132
Release: 2018
Genre: Networks on a chip
ISBN:

High-performance and energy-efficient Network-on-Chip (NoC) architecture is one of the crucial components of the manycore processing platforms. A very promising NoC architecture recently proposed in the literature is the three-dimensional small-world NoC (3D SWNoC). Due to short vertical links in 3D integration and the robustness of small-world networks, the 3D SWNoC architecture outperforms its other 3D counterparts. However, the performance of 3D SWNoC is highly dependent on the placement of the links and associated routers. In this dissertation, we propose a sensitivity-based link placement algorithm (SEN) to optimize the performance of 3D SWNoC. The sensitivity of a link in a NoC measures the importance of the link. The SEN algorithm optimizes the performance of 3D SWNoC by calculating the sensitivities of all the links in the NoC and removing the least important link repeatedly. We compare the performance of SEN algorithm with simulated annealing and machine learning-based optimization algorithm. 3D NoC architectures suffer from high power density and the resultant thermal hotspots leading to functionality and reliability concerns over time. The power consumption and thermal profiles of 3D NoCs can be improved by incorporating a Voltage-Frequency Island (VFI)-based power management and Reciprocal Design Symmetry (RDS)-based floor planning. We undertake a detailed design space exploration for 3D NoC by considering power-thermal-performance trade-offs. We consider a small-world network-enabled 3D NoC in this performance evaluation due to its superior performance and energy-efficiency compared to other existing 3D NoC. For TSV-based systems, high power density and the resultant thermal hotspot remain major concerns from the perspectives of chip functionality and overall reliability. Due to inherent thermal constraints of a TSV-based 3D system, we are unable to fully exploit the benefits offered by the power management methodology. In this context, emergence of monolithic 3D (M3D) integration has opened new possibility of designing ultra-low-power and high-performance circuits and systems. The smaller dimensions of the inter-layer dielectric and monolithic inter-tier vias offer high-density integration, flexibility of partitioning logic blocks across multiple tiers, and significant reduction of total wire-length. We present a comparative performance evaluation of M3D NoCs with respect to their conventional TSV-based counterparts.

Evaluation of Temperature-performance Trade-offs in Wireless Network-on-chip Architectures

Evaluation of Temperature-performance Trade-offs in Wireless Network-on-chip Architectures
Author: Nishad Nerurkar
Publisher:
Total Pages: 144
Release: 2013
Genre: Interconnects (Integrated circuit technology)
ISBN:

"Continued scaling of device geometries according to Moore's Law is enabling complete end-user systems on a single chip. Massive multicore processors are enablers for many information and communication technology (ICT) innovations spanning various domains, including healthcare, defense, and entertainment. In the design of high-performance massive multicore chips, power and heat are dominant constraints. Temperature hotspots witnessed in multicore systems exacerbate the problem of reliability in deep submicron technologies. Hence, there is a great need to explore holistic power and thermal optimization and management strategies for the massive multicore chips. High power consumption not only raises chip temperature and cooling cost, but also decreases chip reliability and performance. thus, addressing thermal concerns at different stages of the design and operation is critical to the success of future generation systems. The performance of a multicore chip is also influenced by its overall communication infrastructure, which is predominantly a Network-on-Chip (NoC). The existing method of implementing a NoC with planar metal interconnects is deficient due to high latency, significant power consumption, and temperature hotspots arising out of long, multi-hop wireline links used in data exchange. On-chip wireless networks are envisioned as an enabling technology to design low power and high bandwidth massive multicore architectures. However, optimizing wireless NoCs for best performance does not necessarily guarantee a thermally optimal interconnection architecture. The wireless links being highly efficient attract very high traffic densities which in turn results in temperature hotspots. Therefore, while the wireless links result in better performance and energy-efficiency, they can also cause temperature hotspots and undermine the reliability of the system. Consequently, the location and utilization of the wireless links is an important factor in thermal optimization of high performance wireless Networks-on-Chip. Architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance yet energy-efficient massive multicore chips. This work contributes to exploration of various design methodologies for establishing wireless NoC architectures that achieve the best trade-offs between temperature, performance and energy-efficiency. It further demonstrates that incorporating Dynamic Thermal Management (DTM) on a multicore chip designed with such temperature and performance optimized Wireless Network-on-Chip architectures improves thermal profile while simultaneously providing lower latency and reduced network energy dissipation compared to its conventional counterparts."--Abstract.

Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC

Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC
Author: Aniket Dilip Mhatre
Publisher:
Total Pages: 124
Release: 2014
Genre: Interconnects (Integrated circuit technology)
ISBN:

"Advancements in the field of chip fabrication led to the integration of a large number of transistors in a small area, giving rise to the multi-core processor era. Massive multi-core processors facilitate innovation and research in the field of healthcare, defense, entertainment, meteorology and many others. Reduction in chip area and increase in the number of on-chip cores is accompanied by power and temperature issues. In high performance multi-core chips, power and heat are predominant constraints. High performance massive multicore systems suffer from thermal hotspots, exacerbating the problem of reliability in deep submicron technologies. High power consumption not only increases the chip temperature but also jeopardizes the integrity of the system. Hence, there is a need to explore holistic power and thermal optimization and management strategies for massive on-chip multi-core environments. In multi-core environments, the communication fabric plays a major role in deciding the efficiency of the system. In multi-core processor chips this communication infrastructure is predominantly a Network-on-Chip (NoC). Tradition NoC designs incorporate planar interconnects as a result these NoCs have long, multi-hop wireline links for data exchange. Due to the presence of multi-hop planar links such NoC architectures fall prey to high latency, significant power dissipation and temperature hotspots. Networks inspired from nature are envisioned as an enabling technology to achieve highly efficient and low power NoC designs. Adopting wireless technology in such architectures enhance their performance. Placement of wireless interconnects (WIs) alters the behavior of the network and hence a random deployment of WIs may not result in a thermally optimal solution. In such scenarios, the WIs being highly efficient would attract high traffic densities resulting in thermal hotspots. Hence, the location and utilization of the wireless links is a key factor in obtaining a thermal optimal highly efficient Network-on-chip. Optimization of the NoC framework alone is incapable of addressing the effects due to the runtime dynamics of the system. Minimal paths solely optimized for performance in the network may lead to excessive utilization of certain NoC components leading to thermal hotspots. Hence, architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance and energy-efficient multicore systems. This work contributes at exploring various wired and wireless NoC architectures that achieve best trade-offs between temperature, performance and energy-efficiency. It further proposes an adaptive routing scheme which factors in the thermal profile of the chip. The proposed routing mechanism dynamically reacts to the thermal profile of the chip and takes measures to avoid thermal hotspots, achieving a thermally efficient dynamically reconfigurable network on chip architecture."--Abstract.

Towards Energy Efficient and Reliable 3D Manycore Chip Enabled by Machine Learning

Towards Energy Efficient and Reliable 3D Manycore Chip Enabled by Machine Learning
Author: Sourav Das
Publisher:
Total Pages: 200
Release: 2018
Genre:
ISBN:

Finally, we summarize our contributions and outline some promising directions for future work based on the findings of this work. Future work includes incorporating machine learning approaches for on-chip security analysis and development of online mitigation techniques against external attacks.

3D Networks-on-Chip Architecture Optimization for Low Power Design

3D Networks-on-Chip Architecture Optimization for Low Power Design
Author: Opoku Agyeman Michael
Publisher: LAP Lambert Academic Publishing
Total Pages: 180
Release: 2015-07-13
Genre:
ISBN: 9783659758133

Three dimensional Networks-on-Chip (3D NoCs) have attracted a growing interest to solve on-chip communication demands of future multi-core embedded systems. However, 3D NoCs have not been completely accepted into the mainstream due to issues such as the high cost and complexity of manufacturing 3D vertical wires, larger memory, area and power consumption of 3D NoC components than that of conventional 2D NoC. This thesis aims at optimizing 3D NoCs by modeling and evaluating alternate NoC topologies, routing algorithms and mapping techniques to achieve optimized area, power and performance parameters (latency and throughput). Particularly, novel 3D NoC router architectures and their possible combinations have been investigated with the aim of achieving lower area and power consumption of on-chip communication components with a minimal performance trade-off. This book investigates different heterogeneous 3D NoC architectures which combine 2D and 3D routers to improve area and energy efficiency of 3D NoCs with minimal performance degradation.

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures
Author: James David Coddington
Publisher:
Total Pages: 108
Release: 2015
Genre: Networks on a chip
ISBN:

"With the increased complexity and continual scaling of integrated circuit performance, multi-core chips with dozens, hundreds, even thousands of parallel computing units require high performance interconnects to maximize data throughput and minimize latency and energy consumption. High core counts render bus based interconnects inefficient and lackluster in performance. Networks-on-Chip were introduced to simplify the interconnect design process and maintain a more scalable interconnection architecture. With the continual scaling of feature sizes for smaller and smaller transistors, the global interconnections of planar integrated circuits are consuming higher energy proportional to the rest of the chip power dissipation as well as increasing communication delays. Three-dimensional integrated circuits were introduced to shorten global wire lengths and increase chip connectivity. These 3D ICs bring heat dissipation challenges as the power density increases drastically for each additional chip layer. One of the most popularly researched vertical interconnection technologies is through-silicon vias (TSVs). TSVs require additional manufacturing steps to build but generally have low energy dissipation and good performance. Alternative wireless technologies such as capacitive or inductive coupling do not require additional manufacturing steps and also provide the option of having a liquid cooling layer between planar chips. they are typically much slower and consume more energy than their wired counterparts, however. This work compares the interconnection technologies across several different NoC architectures including a proposed sparse 3D mesh for inductive coupling that increases vertical throughput per link and reduces chip area compared to the other wireless architectures and technologies."--Abstract.

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures
Author: Aqeeb Iqbal Arka
Publisher:
Total Pages: 0
Release: 2022
Genre: Machine learning
ISBN:

Big data applications such as - deep learning and graph analytics require hardware platforms that are energy-efficient yet computationally powerful. 3D manycore architectures are the key to efficiently executing such compute- and data-intensive applications. Through silicon via (TSV)-based 3D manycore system is a promising solution in this direction as it enables integration of disparate heterogeneous computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional through-silicon-via (TSV)-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," and opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D ICs, M3D offers the "true" benefits of vertical dimension for system integration: the size of a MIV used in M3D is over 100x smaller than a TSV. However, designing these new architectures often involves optimizingmultiple conflicting objectives (e.g., performance, thermal, etc.) due to thepresence of a mix of computing elements and communication methodologies; each with a different requirement for high performance. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in an M3D-based manycore chip, Machine Learning algorithms can be explored as a promising solution to this problem and. The first part of this dissertation focuses on the design of high-performance and energy-efficient architectures for big-data applications, enabled by M3D vertical integration and data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Cache as the choice of hardware platform in this part of the work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. In this dissertation, we first design a M3D-enabled heterogeneous manycore architecture and we demonstrate the efficacy of machine learning algorithms for efficiently exploring a large design space. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts. However, the M3D-enabled heterogeneous manycore architecture is still limited by the inherent memory bandwidth bottlenecks of traditional von-Neumann architectures. As a result, later in this dissertation, we focus on Processing-in-Memory (PIM) architectures tailor-made to accelerate deep learning applications such as Graph Neural Networks (GNNs) as such architectures can achieve massive data parallelism and do not suffer from memory bandwidth-related issues. We choose GNNs as an example workload as GNNs are more complex compared to traditional deep learning applications as they simultaneously exhibit attributes of both deep learning and graph computations. Hence, it is both compute- and data-intensive in nature. The high amount of data movement required by GNN computation poses a challenge to conventional von-Neuman architectures (such as CPUs, GPUs, and heterogeneous system-on-chips (SoCs)) as they have limited memory bandwidth. Hence, we propose the use of PIM-based non-volatile memory such as Resistive Random Access Memory (ReRAM). We leverage the efficient matrix operations enabled by ReRAMs and design manycore architectures that can facilitate the unique computation and communication needs of large-scale GNN training. We then exploit various techniques such as regularization methods to further accelerate GNN training ReRAM-based manycore systems. Finally, we streamline the GNN training process by reducing the amount of redundant information in both the GNN model and the input graph.Overall, this work focuses on the design challenges of high-performance and energy-efficient manycore architectures for machine learning applications. We propose novel architectures that use M3D or ReRAM-based PIM architectures to accelerate such applications. Moreover, we focus on hardware/software co-design to ensure the best possible performance.

Resource Management in Manycore Architecture: 3D NoC to Embedded Systems

Resource Management in Manycore Architecture: 3D NoC to Embedded Systems
Author: Shouvik Musavvir
Publisher:
Total Pages: 0
Release: 2022
Genre: Embedded computer systems
ISBN:

Manycore architecture exploits tremendous computation capability for highly parallelized workloads and big data analysis. Manycore chip uses network-in-chip (NoC) to transfer message between core-to-core and memory. Three-dimensional (3D) NoC provides a scalable, high-performance and energy-efficient communication backbone. By taking advantage of the shorter distance in z-dimension, 3D NoC enables lower latency and energy consumption compared to the 2D counterpart. Through-silicon-vias (TSVs) based 3D NoC suffers from several fabrication and reliability imperfections. Recently, monolithic 3D (M3D) architecture has been proposed as an alternative to TSV-based design. M3D technology enables high density integration by sequentially stacking tiers on top of each other using minuscule monolithic inter-tier vias (MIVs). In M3D fabrication, the active layers are fabricated on the same die and high temperature annealing can damage the chip. This has necessitated low temperature annealing techniques for M3D fabrication, leading to inferior performance of transistors in the top tier and slower interconnects in bottom tier. To this end, we developed a process-variation aware monolithic 3D NoC design technique to place the NoC components optimally and minimize the effect of process related degradation. Manycore chip also suffers from thermal hotspots resulting from power-hungry processors. Voltage frequency island (VFI)-based power management is a popular strategy to enhance the energy efficiency of a manycore chip without incurring noticeable performance degradation. The heart of a VFI-based system is changing the voltage/frequency (V/F) pairs of each island to match the requirements of a dynamically varying workload. However, negative bias temperature instability (NBTI) increases the threshold voltage of PMOS transistors, leading to timing failures for fixed V/F pairs. Hence, we propose an online NBTI-aware VFI design to improve the chip lifetime and energy efficiency while dynamically tuning V/F pairs. Modern mobile chip is shifting from traditional homogenous structure to heterogenous one to support diverse workloads. In mobile chips, the resource management technique needs to fulfil two contradictory objectives: energy efficiency with application wise performance requirements. Moreover, smartphones also run numerous unseen applications throughout the lifetime. Hence, we propose a machine learning based resource management strategy to adapt in presence of multiple new applications.

Design Automation of Cyber-Physical Systems

Design Automation of Cyber-Physical Systems
Author: Mohammad Abdullah Al Faruque
Publisher: Springer
Total Pages: 288
Release: 2019-05-09
Genre: Technology & Engineering
ISBN: 3030130509

This book presents the state-of-the-art and breakthrough innovations in design automation for cyber-physical systems.The authors discuss various aspects of cyber-physical systems design, including modeling, co-design, optimization, tools, formal methods, validation, verification, and case studies. Coverage includes a survey of the various existing cyber-physical systems functional design methodologies and related tools will provide the reader unique insights into the conceptual design of cyber-physical systems.

Modeling and Optimization of Parallel and Distributed Embedded Systems

Modeling and Optimization of Parallel and Distributed Embedded Systems
Author: Arslan Munir
Publisher: John Wiley & Sons
Total Pages: 399
Release: 2016-02-08
Genre: Computers
ISBN: 1119086418

This book introduces the state-of-the-art in research in parallel and distributed embedded systems, which have been enabled by developments in silicon technology, micro-electro-mechanical systems (MEMS), wireless communications, computer networking, and digital electronics. These systems have diverse applications in domains including military and defense, medical, automotive, and unmanned autonomous vehicles. The emphasis of the book is on the modeling and optimization of emerging parallel and distributed embedded systems in relation to the three key design metrics of performance, power and dependability. Key features: Includes an embedded wireless sensor networks case study to help illustrate the modeling and optimization of distributed embedded systems. Provides an analysis of multi-core/many-core based embedded systems to explain the modeling and optimization of parallel embedded systems. Features an application metrics estimation model; Markov modeling for fault tolerance and analysis; and queueing theoretic modeling for performance evaluation. Discusses optimization approaches for distributed wireless sensor networks; high-performance and energy-efficient techniques at the architecture, middleware and software levels for parallel multicore-based embedded systems; and dynamic optimization methodologies. Highlights research challenges and future research directions. The book is primarily aimed at researchers in embedded systems; however, it will also serve as an invaluable reference to senior undergraduate and graduate students with an interest in embedded systems research.