Dynamic Scheduling of Stream Programs on Embedded Multi-core Processors

Dynamic Scheduling of Stream Programs on Embedded Multi-core Processors
Author: Haeseung Lee
Publisher:
Total Pages: 31
Release: 2013
Genre: Computer architecture
ISBN:

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded multi-core processors. The thesis addresses the problem in the context of a multi-tasking environment with a time varying allocation of processing elements for a particular streaming application. As a solution the thesis proposes a two step approach where the stream program is compiled to gather key application information, and to generate re-targetable code. A light weight dynamic scheduler incorporates the second stage of the approach. The dynamic scheduler utilizes the static information and available resources to assign or partition the application across the multi-core architecture. The objective of the dynamic scheduler is to maximize the throughput of the application, and it is sensitive to the resource (processing elements, scratch-pad memory, DMA bandwidth) constraints imposed by the target architecture. We evaluate the proposed approach by compiling and scheduling benchmark stream programs on a representative embedded multi-core processor. We present experimental results that evaluate the quality of the solutions generated by the proposed approach by comparisons with existing techniques.

A Hybrid Static/dynamic Approach to Scheduling Stream Programs

A Hybrid Static/dynamic Approach to Scheduling Stream Programs
Author: Ceryen C. Tan
Publisher:
Total Pages: 97
Release: 2009
Genre:
ISBN:

Streaming languages such as Streamlt are often utilized to write stream programs that execute on multicore processors. Stream programs consist of actors that operate on streams of data. To execute on multiple cores, actors are scheduled for parallel execution while satisfying data dependencies between actors. In StreamIt, the compiler analyzes data dependencies between actors at compile-time and generates a static schedule that determines where and when actors are executed on the available cores. Statically scheduling actors onto cores results in no scheduling overhead at runtime and allows for sophisticated compile-time scheduling optimizations. Unfortunately, static scheduling has a number of severe limitations. The generated static schedule is inflexible and cannot be adapted to run-time conditions, such as cores that are unexpectedly unavailable. Static scheduling may also incorrectly load-balance cores due to inaccurate static work estimates. This thesis contributes a hybrid static/dynamic scheduling approach that attempts to address the limitations of static scheduling. Dynamic load-balancing is utilized to adjust the static schedule to run-time conditions and to correct load imbalances that might exist after static scheduling. Dynamic load-balancing is designed to add very little run-time overhead.

Compilation of Stream Programs Onto Embedded Multicore Architectures

Compilation of Stream Programs Onto Embedded Multicore Architectures
Author: Weijia Che
Publisher:
Total Pages: 230
Release: 2012
Genre: Compilers (Computer programs)
ISBN:

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.

Dynamic Scheduling in Multicore Processors

Dynamic Scheduling in Multicore Processors
Author: Demian Rosas Ham
Publisher:
Total Pages: 155
Release: 2012
Genre:
ISBN:

The advent of multi-core processors, particularly with projections that numbers of cores will continue to increase, has focused attention on parallel programming. It is widely recognized that current programming techniques, including those that are used for scientific parallel programming, will not allow the easy formulation of general purpose applications. An area which is receiving interest is the use of programming styles which do not have side-effects. Previous work on parallel functional programming demonstrated the potential of this to permit the easy exploitation of parallelism. This thesis investigates a dynamic load balancing system for shared memory Chip Multiprocessors. This system is based on a parallel computing model called SLAM (Spreading Load with Active Messages), which makes use of functional language evaluation techniques. A novel hardware/software mechanism for exploiting fine grain parallelism is presented. This mechanism comprises a runtime system which performs dynamic scheduling and synchronization automatically when executing parallel applications. Additionally the interface for using this mechanism is provided in the form of an API. The proposed system is evaluated using cycle-level models and multithreaded applications running in a full system simulation environment.

Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing
Author: Leando Soares Indrusiak
Publisher: CRC Press
Total Pages: 177
Release: 2022-09-01
Genre: Computers
ISBN: 1000794385

The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: • Load and Resource Models• Admission Control• Feedback-based Allocation and Optimisation• Search-based Allocation Heuristics• Distributed Allocation based on Swarm Intelligence• Value-Based AllocationEach of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.

Memory Optimizations of Embedded Applications for Energy Efficiency

Memory Optimizations of Embedded Applications for Energy Efficiency
Author: Jong Soo Park
Publisher: Stanford University
Total Pages: 177
Release: 2011
Genre:
ISBN:

The current embedded processors often do not satisfy increasingly demanding computation requirements of embedded applications within acceptable energy efficiency, whereas application-specific integrated circuits require excessive design costs. In the Stanford Elm project, it was identified that instruction and data delivery, not computation, dominate the energy consumption of embedded processors. Consequently, the energy efficiency of delivering instructions and data must be sufficiently improved to close the efficiency gap between application-specific integrated circuits and programmable embedded processors. This dissertation demonstrates that the compiler and run-time system can play a crucial role in improving the energy efficiency of delivering instructions and data. Regarding instruction delivery, I present a compiler algorithm that manages L0 instruction scratch-pad memories that reside between processor cores and L1 caches. Despite the lack of tags, the scratch-pad memories with our algorithm can achieve lower miss rates than caches with the same capacities, saving significant instruction delivery energy. Regarding data delivery, I present methods that minimize memory-space requirements for parallelizing stream applications, applications that are commonly found in the embedded domain. When stream applications are parallelized in pipelining, large enough buffers are required between pipeline stages to sustain the throughput (e.g., double buffering). For static stream applications where production and consumption rates of stages are close to compile-time constants, a compiler analysis is presented, which computes the minimum buffer capacity that maximizes the throughput. Based on this analysis, a new static streamscheduling algorithm is developed, which yields considerable speed-up and data delivery energy saving compared to a previous algorithm. For dynamic stream applications, I present a dynamically-sized array-based queue design that achieves speed-up and data delivery energy saving compared to a linked-list based queue design.

Advances in Computing Applications

Advances in Computing Applications
Author: Amlan Chakrabarti
Publisher: Springer
Total Pages: 290
Release: 2017-01-19
Genre: Computers
ISBN: 9811026300

This edited volume presents the latest high-quality technical contributions and research results in the areas of computing, informatics, and information management. The book deals with state-of art topics, discussing challenges and possible solutions, and explores future research directions. The main goal of this volume is not only to summarize new research findings but also place these in the context of past work. This volume is designed for professional audience, composed of researchers, practitioners, scientists and engineers in both the academia and the industry.

Multicore Systems On-Chip: Practical Software/Hardware Design

Multicore Systems On-Chip: Practical Software/Hardware Design
Author: Abderazek Ben Abdallah
Publisher: Springer Science & Business Media
Total Pages: 291
Release: 2013-07-20
Genre: Computers
ISBN: 9491216929

System on chips designs have evolved from fairly simple unicore, single memory designs to complex heterogeneous multicore SoC architectures consisting of a large number of IP blocks on the same silicon. To meet high computational demands posed by latest consumer electronic devices, most current systems are based on such paradigm, which represents a real revolution in many aspects in computing. The attraction of multicore processing for power reduction is compelling. By splitting a set of tasks among multiple processor cores, the operating frequency necessary for each core can be reduced, allowing to reduce the voltage on each core. Because dynamic power is proportional to the frequency and to the square of the voltage, we get a big gain, even though we may have more cores running. As more and more cores are integrated into these designs to share the ever increasing processing load, the main challenges lie in efficient memory hierarchy, scalable system interconnect, new programming paradigms, and efficient integration methodology for connecting such heterogeneous cores into a single system capable of leveraging their individual flexibility. Current design methods tend toward mixed HW/SW co-designs targeting multicore systems on-chip for specific applications. To decide on the lowest cost mix of cores, designers must iteratively map the device’s functionality to a particular HW/SW partition and target architectures. In addition, to connect the heterogeneous cores, the architecture requires high performance complex communication architectures and efficient communication protocols, such as hierarchical bus, point-to-point connection, or Network-on-Chip. Software development also becomes far more complex due to the difficulties in breaking a single processing task into multiple parts that can be processed separately and then reassembled later. This reflects the fact that certain processor jobs cannot be easily parallelized to run concurrently on multiple processing cores and that load balancing between processing cores – especially heterogeneous cores – is very difficult.

Euro-Par 2019: Parallel Processing Workshops

Euro-Par 2019: Parallel Processing Workshops
Author: Ulrich Schwardmann
Publisher: Springer Nature
Total Pages: 765
Release: 2020-05-29
Genre: Computers
ISBN: 3030483401

This book constitutes revised selected papers from the workshops held at 25th International Conference on Parallel and Distributed Computing, Euro-Par 2019, which took place in Göttingen, Germany, in August 2019. The 53 full papers and 10 poster papers presented in this volume were carefully reviewed and selected from 77 submissions. Euro-Par is an annual, international conference in Europe, covering all aspects of parallel and distributed processing. These range from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-edged applications, from architecture, compiler, language and interface design and implementation to tools, support infrastructures, and application performance aspects. Chapter "In Situ Visualization of Performance-Related Data in Parallel CFD Applications" is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

Multi-Core Embedded Systems

Multi-Core Embedded Systems
Author: Georgios Kornaros
Publisher: CRC Press
Total Pages: 421
Release: 2018-10-08
Genre: Computers
ISBN: 1351834088

Details a real-world product that applies a cutting-edge multi-core architecture Increasingly demanding modern applications—such as those used in telecommunications networking and real-time processing of audio, video, and multimedia streams—require multiple processors to achieve computational performance at the rate of a few giga-operations per second. This necessity for speed and manageable power consumption makes it likely that the next generation of embedded processing systems will include hundreds of cores, while being increasingly programmable, blending processors and configurable hardware in a power-efficient manner. Multi-Core Embedded Systems presents a variety of perspectives that elucidate the technical challenges associated with such increased integration of homogeneous (processors) and heterogeneous multiple cores. It offers an analysis that industry engineers and professionals will need to understand the physical details of both software and hardware in embedded architectures, as well as their limitations and potential for future growth. Discusses the available programming models spread across different abstraction levels The book begins with an overview of the evolution of multiprocessor architectures for embedded applications and discusses techniques for autonomous power management of system-level parameters. It addresses the use of existing open-source (and free) tools originating from several application domains—such as traffic modeling, graph theory, parallel computing and network simulation. In addition, the authors cover other important topics associated with multi-core embedded systems, such as: Architectures and interconnects Embedded design methodologies Mapping of applications