Compilation of Stream Programs Onto Embedded Multicore Architectures

Compilation of Stream Programs Onto Embedded Multicore Architectures
Author: Weijia Che
Publisher:
Total Pages: 230
Release: 2012
Genre: Compilers (Computer programs)
ISBN:

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.

Smart Multicore Embedded Systems

Smart Multicore Embedded Systems
Author: Massimo Torquati
Publisher: Springer Science & Business Media
Total Pages: 194
Release: 2013-11-09
Genre: Technology & Engineering
ISBN: 1461488001

This book provides a single-source reference to the state-of-the-art of high-level programming models and compilation tool-chains for embedded system platforms. The authors address challenges faced by programmers developing software to implement parallel applications in embedded systems, where very often they are forced to rewrite sequential programs into parallel software, taking into account all the low level features and peculiarities of the underlying platforms. Readers will benefit from these authors’ approach, which takes into account both the application requirements and the platform specificities of various embedded systems from different industries. Parallel programming tool-chains are described that take as input parameters both the application and the platform model, then determine relevant transformations and mapping decisions on the concrete platform, minimizing user intervention and hiding the difficulties related to the correct and efficient use of memory hierarchy and low level code generation.

Pipelined Multiprocessor System-on-Chip for Multimedia

Pipelined Multiprocessor System-on-Chip for Multimedia
Author: Haris Javaid
Publisher: Springer Science & Business Media
Total Pages: 174
Release: 2013-11-26
Genre: Technology & Engineering
ISBN: 3319011138

This book describes analytical models and estimation methods to enhance performance estimation of pipelined multiprocessor systems-on-chip (MPSoCs). A framework is introduced for both design-time and run-time optimizations. For design space exploration, several algorithms are presented to minimize the area footprint of a pipelined MPSoC under a latency or a throughput constraint. A novel adaptive pipelined MPSoC architecture is described, where idle processors are transitioned into low-power states at run-time to reduce energy consumption. Multi-mode pipelined MPSoCs are introduced, where multiple pipelined MPSoCs optimized separately are merged into a single pipelined MPSoC, enabling further reduction of the area footprint by sharing the processors and communication buffers. Readers will benefit from the authors’ combined use of analytical models, estimation methods and exploration algorithms and will be enabled to explore billions of design points in a few minutes.

Software Development for Embedded Multi-core Systems

Software Development for Embedded Multi-core Systems
Author: Max Domeika
Publisher: Newnes
Total Pages: 435
Release: 2011-04-08
Genre: Technology & Engineering
ISBN: 0080558585

The multicore revolution has reached the deployment stage in embedded systems ranging from small ultramobile devices to large telecommunication servers. The transition from single to multicore processors, motivated by the need to increase performance while conserving power, has placed great responsibility on the shoulders of software engineers. In this new embedded multicore era, the toughest task is the development of code to support more sophisticated systems. This book provides embedded engineers with solid grounding in the skills required to develop software targeting multicore processors. Within the text, the author undertakes an in-depth exploration of performance analysis, and a close-up look at the tools of the trade. Both general multicore design principles and processor-specific optimization techniques are revealed. Detailed coverage of critical issues for multicore employment within embedded systems is provided, including the Threading Development Cycle, with discussions of analysis, design, development, debugging, and performance tuning of threaded applications. Software development techniques engendering optimal mobility and energy efficiency are highlighted through multiple case studies, which provide practical “how-to advice on implementing the latest multicore processors. Finally, future trends are discussed, including terascale, speculative multithreading, transactional memory, interconnects, and the software-specific implications of these looming architectural developments. This is the only book to explain software optimization for embedded multi-core systems Helpful tips, tricks and design secrets from an Intel programming expert, with detailed examples using the popular X86 architecture Covers hot topics, including ultramobile devices, low-power designs, Pthreads vs. OpenMP, and heterogeneous cores

Transactions on High-Performance Embedded Architectures and Compilers III

Transactions on High-Performance Embedded Architectures and Compilers III
Author: Per Stenström
Publisher: Springer
Total Pages: 309
Release: 2011-02-23
Genre: Computers
ISBN: 3642194486

Transactions on HiPEAC aims at the timely dissemination of research contributions in computer architecture and compilation methods for high-performance embedded computer systems. Recognizing the convergence of embedded and general-purpose computer systems, this journal publishes original research on systems targeted at specific computing tasks as well as systems with broad application bases. The scope of the journal therefore covers all aspects of computer architecture, code generation and compiler optimization methods of interest to researchers and practitioners designing future embedded systems. This third issue contains 14 papers carefully reviewed and selected out of numerous submissions and is divided into four sections. The first section contains the top four papers from the Third International Conference on High-Performance Embedded Architectures and Compilers, HiPEAC 2008, held in Göteborg, Sweden, in January 2008. The second section consists of four papers from the 8th MEDEA Workshop held in conjunction with PACT 2007 in Brasov, Romania, in September 2007. The third section contains two regular papers and the fourth section provides a snapshot from the First Workshop on Programmability Issues for Multicore Computers, MULTIPROG, held in conjunction with HiPEAC 2008.

Programming Multicore and Many-core Computing Systems

Programming Multicore and Many-core Computing Systems
Author: Sabri Pllana
Publisher: John Wiley & Sons
Total Pages: 511
Release: 2017-02-06
Genre: Computers
ISBN: 0470936908

Programming multi-core and many-core computing systems Sabri Pllana, Linnaeus University, Sweden Fatos Xhafa, Technical University of Catalonia, Spain Provides state-of-the-art methods for programming multi-core and many-core systems The book comprises a selection of twenty two chapters covering: fundamental techniques and algorithms; programming approaches; methodologies and frameworks; scheduling and management; testing and evaluation methodologies; and case studies for programming multi-core and many-core systems. Program development for multi-core processors, especially for heterogeneous multi-core processors, is significantly more complex than for single-core processors. However, programmers have been traditionally trained for the development of sequential programs, and only a small percentage of them have experience with parallel programming. In the past, only a relatively small group of programmers interested in High Performance Computing (HPC) was concerned with the parallel programming issues, but the situation has changed dramatically with the appearance of multi-core processors on commonly used computing systems. It is expected that with the pervasiveness of multi-core processors, parallel programming will become mainstream. The pervasiveness of multi-core processors affects a large spectrum of systems, from embedded and general-purpose, to high-end computing systems. This book assists programmers in mastering the efficient programming of multi-core systems, which is of paramount importance for the software-intensive industry towards a more effective product-development cycle. Key features: Lessons, challenges, and roadmaps ahead. Contains real world examples and case studies. Helps programmers in mastering the efficient programming of multi-core and many-core systems. The book serves as a reference for a larger audience of practitioners, young researchers and graduate level students. A basic level of programming knowledge is required to use this book.

Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures

Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures
Author: Michael Ian Gordon
Publisher:
Total Pages: 223
Release: 2010
Genre:
ISBN:

Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target. Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to automatically apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes. Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multicore architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup).

Scheduling and Optimizing Stream Programs on Multicore Machines by Exploiting High-Level Abstractions

Scheduling and Optimizing Stream Programs on Multicore Machines by Exploiting High-Level Abstractions
Author: Dai Nguyen Bui
Publisher:
Total Pages: 144
Release: 2013
Genre:
ISBN:

Real-time streaming of HD movies and TV via YouTube, Netflix, Apple TV and Xbox Live is gaining popularity. Stream programs often consume considerable amounts of energy due to their compute-intensive nature. Making stream programs energy-efficient is important, especially for energy-constrained computing devices such as mobile phones and tablets. The first part of this thesis focuses on exploiting the popular Synchronous Dataflow (SDF) high-level abstraction of stream programs to design adaptive stream programs for energy reduction on multicore machines. Observing that IO rates of stream programs can vary at runtime, we seek to make stream programs adaptive by transforming their internal structures to adapt required occupied computing resources, e.g., cores and memory, to workload changes at runtime. Our experiments show that adapting stream programs to IO rate changes can lead to significant energy reduction. In addition, we also show that the modularity and static attributes of stream programs' abstraction not only help map stream programs on multicore machines more easily but also enable energy-efficient routing schemes of high-bandwidth stream traffic on the interconnection fabric, such as networks on-chip. While SDF abstractions can help optimize stream programs on multicore machines, SDF is more suitable for describing stream data-intensive computations such as FFT, DCT, and FIR and so on. Modern stream operations such as MPEG2 or MP3 encoders/decoders are often more sophisticated and composed of multiple such computations. Enabling operation synchronization between different such computations with different semantics leads to the need for control messaging. We extend previous work on control messaging and give a formal definition for control message latency via the semantics of information wavefronts. This control-operation-integrated SDF (COSDF) is able to model sophisticated stream programs more precisely. However, the conventional scheduling method developed for SDF is not sufficient to schedule COSDF applications. To schedule COSDF applications, we develop a scheduling method using dependency graphs and applying a periodic graph theory, based on reduced dependency graphs (RDG). This RDG scheduling method also helps extract parallelism of stream programs. The more precise abstraction of COSDF is expected to help synthesize and generate sophisticated stream programs more efficiently. Although the SDF modularity property also improves programmability, it can come at a price of efficiency when SDF models are not compiled and run using model-based design environments. However, compiling large SDF models to mitigate the inefficiency can be prohibitive in the situations where even a small change in a model may lead to large recompilation overhead. We tackle the problem by proposing a method for incrementally compiling large SDF models that faithfully captures the executions of original SDF models to avoid potential artificial deadlocks of a naive compilation method.

Multi-Processor System-on-Chip 2

Multi-Processor System-on-Chip 2
Author:
Publisher: John Wiley & Sons
Total Pages: 272
Release: 2021-03-31
Genre: Computers
ISBN: 1119818389

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. Multi-Processor System-on-Chip 2 covers application-specific MPSoC design, including compilers and architecture exploration. This second volume describes optimization methods, tools to optimize and port specific applications on MPSoC architectures. Details on compilation, power consumption and wireless communication are also presented, as well as examples of modeling frameworks and CAD tools. Explanations of specific platforms for automotive and real-time computing are also included.

Software Development for Embedded Multi-core Systems

Software Development for Embedded Multi-core Systems
Author: Max Domeika
Publisher:
Total Pages: 440
Release: 2011
Genre: Computer software
ISBN:

The multicore revolution has reached the deployment stage in embedded systems ranging from small ultramobile devices to large telecommunication servers. The transition from single to multicore processors, motivated by the need to increase performance while conserving power, has placed great responsibility on the shoulders of software engineers. In this new embedded multicore era, the toughest task is the development of code to support more sophisticated systems. This book provides embedded engineers with solid grounding in the skills required to develop software targeting multicore processors. Within the text, the author undertakes an in-depth exploration of performance analysis, and a close-up look at the tools of the trade. Both general multicore design principles and processor-specific optimization techniques are revealed. Detailed coverage of critical issues for multicore employment within embedded systems is provided, including the Threading Development Cycle, with discussions of analysis, design, development, debugging, and performance tuning of threaded applications. Software development techniques engendering optimal mobility and energy efficiency are highlighted through multiple case studies, which provide practical 'how-to' advice on implementing the latest multicore processors. Finally, future trends are discussed, including terascale, speculative multithreading, transactional memory, interconnects, and the software-specific implications of these looming architectural developments. Table of Contents Chapter 1 - Introduction Chapter 2 - Basic System and Processor Architecture Chapter 3 - Multi-core Processors & Embedded Chapter 4 -Moving To Multi-core Intel Architecture Chapter 5 - Scalar Optimization & Usability Chapter 6 - Parallel Optimization Using Threads Chapter 7 - Case Study: Data Decomposition Chapter 8 - Case Study: Functional Decomposition Chapter 9 - Virtualization & Partitioning Chapter 10 - Getting Ready For Low Power Intel Architecture Chapter 11 - Summary, Trends, and Conclusions Appendix I Glossary References *This is the only book to explain software optimization for embedded multi-core systems *Helpful tips, tricks and design secrets from an Intel programming expert, with detailed examples using the popular X86 architecture *Covers hot topics, including ultramobile devices, low-power designs, Pthreads vs. OpenMP, and heterogeneous cores.