Towards Energy Efficient and Reliable 3D Manycore Chip Enabled by Machine Learning

Towards Energy Efficient and Reliable 3D Manycore Chip Enabled by Machine Learning
Author: Sourav Das
Publisher:
Total Pages: 200
Release: 2018
Genre:
ISBN:

Finally, we summarize our contributions and outline some promising directions for future work based on the findings of this work. Future work includes incorporating machine learning approaches for on-chip security analysis and development of online mitigation techniques against external attacks.

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures
Author: Aqeeb Iqbal Arka
Publisher:
Total Pages: 0
Release: 2022
Genre: Machine learning
ISBN:

Big data applications such as - deep learning and graph analytics require hardware platforms that are energy-efficient yet computationally powerful. 3D manycore architectures are the key to efficiently executing such compute- and data-intensive applications. Through silicon via (TSV)-based 3D manycore system is a promising solution in this direction as it enables integration of disparate heterogeneous computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional through-silicon-via (TSV)-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," and opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D ICs, M3D offers the "true" benefits of vertical dimension for system integration: the size of a MIV used in M3D is over 100x smaller than a TSV. However, designing these new architectures often involves optimizingmultiple conflicting objectives (e.g., performance, thermal, etc.) due to thepresence of a mix of computing elements and communication methodologies; each with a different requirement for high performance. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in an M3D-based manycore chip, Machine Learning algorithms can be explored as a promising solution to this problem and. The first part of this dissertation focuses on the design of high-performance and energy-efficient architectures for big-data applications, enabled by M3D vertical integration and data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Cache as the choice of hardware platform in this part of the work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. In this dissertation, we first design a M3D-enabled heterogeneous manycore architecture and we demonstrate the efficacy of machine learning algorithms for efficiently exploring a large design space. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts. However, the M3D-enabled heterogeneous manycore architecture is still limited by the inherent memory bandwidth bottlenecks of traditional von-Neumann architectures. As a result, later in this dissertation, we focus on Processing-in-Memory (PIM) architectures tailor-made to accelerate deep learning applications such as Graph Neural Networks (GNNs) as such architectures can achieve massive data parallelism and do not suffer from memory bandwidth-related issues. We choose GNNs as an example workload as GNNs are more complex compared to traditional deep learning applications as they simultaneously exhibit attributes of both deep learning and graph computations. Hence, it is both compute- and data-intensive in nature. The high amount of data movement required by GNN computation poses a challenge to conventional von-Neuman architectures (such as CPUs, GPUs, and heterogeneous system-on-chips (SoCs)) as they have limited memory bandwidth. Hence, we propose the use of PIM-based non-volatile memory such as Resistive Random Access Memory (ReRAM). We leverage the efficient matrix operations enabled by ReRAMs and design manycore architectures that can facilitate the unique computation and communication needs of large-scale GNN training. We then exploit various techniques such as regularization methods to further accelerate GNN training ReRAM-based manycore systems. Finally, we streamline the GNN training process by reducing the amount of redundant information in both the GNN model and the input graph.Overall, this work focuses on the design challenges of high-performance and energy-efficient manycore architectures for machine learning applications. We propose novel architectures that use M3D or ReRAM-based PIM architectures to accelerate such applications. Moreover, we focus on hardware/software co-design to ensure the best possible performance.

Machine Learning-Inspired Resource Management in M3D-Enabled Manycore Architectures

Machine Learning-Inspired Resource Management in M3D-Enabled Manycore Architectures
Author: Anwesha Chatterjee
Publisher:
Total Pages: 0
Release: 2022
Genre: High performance computing
ISBN:

Monolithic 3D (M3D) integration has emerged as an enabling technology to design high performance and energy-efficient circuits and systems. The smaller dimension of vertical monolithic inter-tier vias (MIVs) lowers effective wirelength and allows high integration density. To design an energy-efficient many-core architecture, necessitates efficient resource management of the full SOC system, in terms of power and performance of the system. Voltage/frequency island (VFI)-based power management is a popular methodology for designing energy-efficient manycore architectures without incurring significant performance overhead. In an M3D chip, the vertical layers introduce inter-tier process variations that affect the performance of transistors and interconnects in different layers. Therefore, VFI-based power management in M3D manycore systems requires the consideration of inter-tier process variation effects. In this dissertation, we undertake the problem of resource management in M3D many-core architectures degraded due to inter-tier process variation effects inherent in M3D chips. Firstly, we present the design of an imitation learning (IL)-enabled VFI-based power management strategy that considers the inter-tier process-variation effects in M3D manycore chips. We demonstrate that the IL-based power management strategy can be fine-tuned based on the M3D characteristics. Our policy generates suitable V/F levels based on the computation and communication characteristics of the system for both process-oblivious and process-aware configurations. Subsequently, we propose a machine learning-based online update strategy of IL-based DVFI policies for process degraded M3D architectures. We demonstrate that with no prior knowledge of process-variation parameters, our online strategy captures the inter-tier process variations in the M3D system improving the power-performance trade-off than a process-oblivious offline DVFI policy for the degraded M3D many-core architecture. Furthermore, we show that online update strategy improves the overall energy-efficiency for unseen workloads that are not considered during offline DVFI policy creation.

Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning

Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning
Author: Vikram Jain
Publisher: Springer Nature
Total Pages: 199
Release: 2023-09-15
Genre: Technology & Engineering
ISBN: 3031382307

This book explores and motivates the need for building homogeneous and heterogeneous multi-core systems for machine learning to enable flexibility and energy-efficiency. Coverage focuses on a key aspect of the challenges of (extreme-)edge-computing, i.e., design of energy-efficient and flexible hardware architectures, and hardware-software co-optimization strategies to enable early design space exploration of hardware architectures. The authors investigate possible design solutions for building single-core specialized hardware accelerators for machine learning and motivates the need for building homogeneous and heterogeneous multi-core systems to enable flexibility and energy-efficiency. The advantages of scaling to heterogeneous multi-core systems are shown through the implementation of multiple test chips and architectural optimizations.

Machine Learning-Enabled Vertically Integrated Heterogeneous Manycore Systems for Big-Data Analytics

Machine Learning-Enabled Vertically Integrated Heterogeneous Manycore Systems for Big-Data Analytics
Author: Biresh Kumar Joardar
Publisher:
Total Pages: 101
Release: 2020
Genre: Big data
ISBN:

The rising use of deep learning and other big-data algorithms has led to an increasing demand for hardware platforms that are computationally powerful, yet energy-efficient. Heterogeneous manycore architectures that integrate multiple types of cores on a single chip present a promising direction in this regard. However, designing these new architectures often involves optimizing multiple conflicting objectives (e.g., performance, power, thermal, reliability, etc.) due to the presence of a mix of computing elements and communication methodologies; each with a different requirement for high-performance. This has made the design, and evaluation of new architectures an increasingly challenging problem. Machine Learning algorithms are a promising solution to this problem and should be investigated further. This dissertation focuses on the design of high-performance and energy efficient architectures for big-data applications, enabled by data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Resistive Random-Access Memory (ReRAMs) as the choice of hardware platform in this work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. In addition, novel design techniques like Processing-in-memory and 3D integration introduces additional design constraints (like temperature, noise, etc.) that need to be considered in the design process. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. However, optimizing all these objectives simultaneously leads to an exponential increase in the design space of possible architectures. Existing optimization algorithms do not scale well to such large design spaces and often require more time to reach a good solution. In this work, we highlight the efficacy of machine learning algorithms for efficiently designing a suitable heterogeneous manycore architecture. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts.On overall, this work focuses on the design challenges of high-performance and energy efficient architectures for big-data applications, and proposes machine learning algorithms capable of addressing these challenges.

Resource Management in Manycore Architecture: 3D NoC to Embedded Systems

Resource Management in Manycore Architecture: 3D NoC to Embedded Systems
Author: Shouvik Musavvir
Publisher:
Total Pages: 0
Release: 2022
Genre: Embedded computer systems
ISBN:

Manycore architecture exploits tremendous computation capability for highly parallelized workloads and big data analysis. Manycore chip uses network-in-chip (NoC) to transfer message between core-to-core and memory. Three-dimensional (3D) NoC provides a scalable, high-performance and energy-efficient communication backbone. By taking advantage of the shorter distance in z-dimension, 3D NoC enables lower latency and energy consumption compared to the 2D counterpart. Through-silicon-vias (TSVs) based 3D NoC suffers from several fabrication and reliability imperfections. Recently, monolithic 3D (M3D) architecture has been proposed as an alternative to TSV-based design. M3D technology enables high density integration by sequentially stacking tiers on top of each other using minuscule monolithic inter-tier vias (MIVs). In M3D fabrication, the active layers are fabricated on the same die and high temperature annealing can damage the chip. This has necessitated low temperature annealing techniques for M3D fabrication, leading to inferior performance of transistors in the top tier and slower interconnects in bottom tier. To this end, we developed a process-variation aware monolithic 3D NoC design technique to place the NoC components optimally and minimize the effect of process related degradation. Manycore chip also suffers from thermal hotspots resulting from power-hungry processors. Voltage frequency island (VFI)-based power management is a popular strategy to enhance the energy efficiency of a manycore chip without incurring noticeable performance degradation. The heart of a VFI-based system is changing the voltage/frequency (V/F) pairs of each island to match the requirements of a dynamically varying workload. However, negative bias temperature instability (NBTI) increases the threshold voltage of PMOS transistors, leading to timing failures for fixed V/F pairs. Hence, we propose an online NBTI-aware VFI design to improve the chip lifetime and energy efficiency while dynamically tuning V/F pairs. Modern mobile chip is shifting from traditional homogenous structure to heterogenous one to support diverse workloads. In mobile chips, the resource management technique needs to fulfil two contradictory objectives: energy efficiency with application wise performance requirements. Moreover, smartphones also run numerous unseen applications throughout the lifetime. Hence, we propose a machine learning based resource management strategy to adapt in presence of multiple new applications.

Machine Learning-inspired High-performance and Energy-efficient Heterogeneous Manycore Chip Design

Machine Learning-inspired High-performance and Energy-efficient Heterogeneous Manycore Chip Design
Author: Wonje Choi
Publisher:
Total Pages: 134
Release: 2018
Genre:
ISBN:

In this dissertation, we undertake above-mentioned problems of designing efficient heterogenous manycore architectures. First, we propose a hybrid Network-on-Chip architecture consisting of both wireline and wireless links that can seamlessly handle the varied traffic requirements that arise in heterogeneous manycore platforms. Second, we develop a machine learning-based multi-objective optimization (MOO) algorithm that learns an evaluation function and guides the search toward optimal designs in heterogeneous manycore systems. Finally, we propose architecture-independent imitation learning-based methodology for dynamic VFI control in heterogeneous manycore systems to address power and thermal issues.

Exploring Power-Thermal-Performance Trade-Offs in 3D Network on Chip-Enabled Many-Core Systems

Exploring Power-Thermal-Performance Trade-Offs in 3D Network on Chip-Enabled Many-Core Systems
Author: Dongjin Lee
Publisher:
Total Pages: 132
Release: 2018
Genre: Networks on a chip
ISBN:

High-performance and energy-efficient Network-on-Chip (NoC) architecture is one of the crucial components of the manycore processing platforms. A very promising NoC architecture recently proposed in the literature is the three-dimensional small-world NoC (3D SWNoC). Due to short vertical links in 3D integration and the robustness of small-world networks, the 3D SWNoC architecture outperforms its other 3D counterparts. However, the performance of 3D SWNoC is highly dependent on the placement of the links and associated routers. In this dissertation, we propose a sensitivity-based link placement algorithm (SEN) to optimize the performance of 3D SWNoC. The sensitivity of a link in a NoC measures the importance of the link. The SEN algorithm optimizes the performance of 3D SWNoC by calculating the sensitivities of all the links in the NoC and removing the least important link repeatedly. We compare the performance of SEN algorithm with simulated annealing and machine learning-based optimization algorithm. 3D NoC architectures suffer from high power density and the resultant thermal hotspots leading to functionality and reliability concerns over time. The power consumption and thermal profiles of 3D NoCs can be improved by incorporating a Voltage-Frequency Island (VFI)-based power management and Reciprocal Design Symmetry (RDS)-based floor planning. We undertake a detailed design space exploration for 3D NoC by considering power-thermal-performance trade-offs. We consider a small-world network-enabled 3D NoC in this performance evaluation due to its superior performance and energy-efficiency compared to other existing 3D NoC. For TSV-based systems, high power density and the resultant thermal hotspot remain major concerns from the perspectives of chip functionality and overall reliability. Due to inherent thermal constraints of a TSV-based 3D system, we are unable to fully exploit the benefits offered by the power management methodology. In this context, emergence of monolithic 3D (M3D) integration has opened new possibility of designing ultra-low-power and high-performance circuits and systems. The smaller dimensions of the inter-layer dielectric and monolithic inter-tier vias offer high-density integration, flexibility of partitioning logic blocks across multiple tiers, and significant reduction of total wire-length. We present a comparative performance evaluation of M3D NoCs with respect to their conventional TSV-based counterparts.

Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing

Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing
Author: Sudeep Pasricha
Publisher: Springer Nature
Total Pages: 481
Release: 2023-10-09
Genre: Technology & Engineering
ISBN: 3031399323

This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative application domains, exploring the efficient hardware design of efficient machine learning accelerators, memory optimization techniques, illustrating model compression and neural architecture search techniques for energy-efficient and fast execution on resource-constrained hardware platforms, and understanding hardware-software codesign techniques for achieving even greater energy, reliability, and performance benefits. Discusses efficient implementation of machine learning in embedded, CPS, IoT, and edge computing; Offers comprehensive coverage of hardware design, software design, and hardware/software co-design and co-optimization; Describes real applications to demonstrate how embedded, CPS, IoT, and edge applications benefit from machine learning.

AI for Computer Architecture

AI for Computer Architecture
Author: Lizhong Chen
Publisher: Springer Nature
Total Pages: 124
Release: 2022-05-31
Genre: Technology & Engineering
ISBN: 3031017706

Artificial intelligence has already enabled pivotal advances in diverse fields, yet its impact on computer architecture has only just begun. In particular, recent work has explored broader application to the design, optimization, and simulation of computer architecture. Notably, machine-learning-based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This book reviews the application of machine learning in system-wide simulation and run-time optimization, and in many individual components such as caches/memories, branch predictors, networks-on-chip, and GPUs. The book further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated computer architecture designs.