Fault-Tolerance Techniques for High-Performance Computing

Fault-Tolerance Techniques for High-Performance Computing
Author: Thomas Herault
Publisher: Springer
Total Pages: 325
Release: 2015-07-01
Genre: Computers
ISBN: 3319209434

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Data Driven e-Science

Data Driven e-Science
Author: Simon C. Lin
Publisher: Springer Science & Business Media
Total Pages: 526
Release: 2011-02-04
Genre: Computers
ISBN: 1441980148

ISGC 2010, The International Symposium on Grid Computing was held at Academia Sinica, Taipei, Taiwan, March, 2010. The 2010 symposium brought together prestigious scientists and engineers worldwide to exchange ideas, present challenges/solutions and to discuss new topics in the field of Grid Computing. Data Driven e-Science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), an edited volume, introduces the latest achievements in grid technology for Biomedicine Life Sciences, Middleware, Security, Networking, Digital Library, Cloud Computing and more. This book provides Grid developers and end users with invaluable information for developing grid technology and applications. The last section of this book presents future development in the field of Grid Computing. This book is designed for a professional audience composed of grid users, developers and researchers working in the field of grid computing. Advanced-level students focused on computer science and engineering will also find this book valuable as a reference or secondary text book.

The Datacenter as a Computer

The Datacenter as a Computer
Author: Luiz Barroso
Publisher: Springer Nature
Total Pages: 112
Release: 2009-05-06
Genre: Technology & Engineering
ISBN: 3031017226

As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. Table of Contents: Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures and Repairs / Closing Remarks

Security for Cloud Storage Systems

Security for Cloud Storage Systems
Author: Kan Yang
Publisher: Springer Science & Business Media
Total Pages: 91
Release: 2013-07-01
Genre: Computers
ISBN: 1461478731

Cloud storage is an important service of cloud computing, which offers service for data owners to host their data in the cloud. This new paradigm of data hosting and data access services introduces two major security concerns. The first is the protection of data integrity. Data owners may not fully trust the cloud server and worry that data stored in the cloud could be corrupted or even removed. The second is data access control. Data owners may worry that some dishonest servers provide data access to users that are not permitted for profit gain and thus they can no longer rely on the servers for access control. To protect the data integrity in the cloud, an efficient and secure dynamic auditing protocol is introduced, which can support dynamic auditing and batch auditing. To ensure the data security in the cloud, two efficient and secure data access control schemes are introduced in this brief: ABAC for Single-authority Systems and DAC-MACS for Multi-authority Systems. While Ciphertext-Policy Attribute-based Encryption (CP-ABE) is a promising technique for access control of encrypted data, the existing schemes cannot be directly applied to data access control for cloud storage systems because of the attribute revocation problem. To solve the attribute revocation problem, new Revocable CP-ABE methods are proposed in both ABAC and DAC-MACS.

The Datacenter as a Computer

The Datacenter as a Computer
Author: Luiz André Barroso
Publisher: Springer Nature
Total Pages: 201
Release: 2022-06-01
Genre: Technology & Engineering
ISBN: 3031017617

This book describes warehouse-scale computers (WSCs), the computing platforms that power cloud computing and all the great web services we use every day. It discusses how these new systems treat the datacenter itself as one massive computer designed at warehouse scale, with hardware and software working in concert to deliver good levels of internet service performance. The book details the architecture of WSCs and covers the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. Each chapter contains multiple real-world examples, including detailed case studies and previously unpublished details of the infrastructure used to power Google's online services. Targeted at the architects and programmers of today's WSCs, this book provides a great foundation for those looking to innovate in this fascinating and important area, but the material will also be broadly interesting to those who just want to understand the infrastructure powering the internet. The third edition reflects four years of advancements since the previous edition and nearly doubles the number of pictures and figures. New topics range from additional workloads like video streaming, machine learning, and public cloud to specialized silicon accelerators, storage and network building blocks, and a revised discussion of data center power and cooling, and uptime. Further discussions of emerging trends and opportunities ensure that this revised edition will remain an essential resource for educators and professionals working on the next generation of WSCs.

Multi-Core Cache Hierarchies

Multi-Core Cache Hierarchies
Author: Rajeev Balasubramonian
Publisher: Springer Nature
Total Pages: 137
Release: 2022-06-01
Genre: Technology & Engineering
ISBN: 303101734X

A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints. The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, and practitioners who wish to understand the landscape of recent cache research. The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers. Table of Contents: Basic Elements of Large Cache Design / Organizing Data in CMP Last Level Caches / Policies Impacting Cache Hit Rates / Interconnection Networks within Large Caches / Technology / Concluding Remarks

Designing for Network and Service Continuity in Wireless Mesh Networks

Designing for Network and Service Continuity in Wireless Mesh Networks
Author: Parth H. Pathak
Publisher: Springer Science & Business Media
Total Pages: 226
Release: 2012-11-02
Genre: Technology & Engineering
ISBN: 1461446279

“Designing for Network and Service Continuity in Wireless Mesh Networks” describes performance predictability of the new wireless mesh network paradigm, and describes considerations in designing networks from the perspective of survivability and service continuity metrics. The work provides design insights for network design researchers and industry professionals. It includes designs for new mesh networks and extensions of existing networks with predictable performance.

Cloud Computing

Cloud Computing
Author: Nick Antonopoulos
Publisher: Springer
Total Pages: 418
Release: 2017-06-02
Genre: Computers
ISBN: 3319546457

This practically-focused reference presents a comprehensive overview of the state of the art in Cloud Computing, and examines the potential for future Cloud and Cloud-related technologies to address specific industrial and research challenges. This new edition explores both established and emergent principles, techniques, protocols and algorithms involved with the design, development, and management of Cloud-based systems. The text reviews a range of applications and methods for linking Clouds, undertaking data management and scientific data analysis, and addressing requirements both of data analysis and of management of large scale and complex systems. This new edition also extends into the emergent next generation of mobile telecommunications, relating network function virtualization and mobile edge Cloud Computing, as supports Smart Grids and Smart Cities. As with the first edition, emphasis is placed on the four quality-of-service cornerstones of efficiency, scalability, robustness, and security.

The Datacenter as a Computer

The Datacenter as a Computer
Author: Luis Andre Barroso
Publisher: Springer Nature
Total Pages: 145
Release: 2013-08-06
Genre: Technology & Engineering
ISBN: 3031017412

As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board. Notes for the Second Edition After nearly four years of substantial academic and industrial developments in warehouse-scale computing, we are delighted to present our first major update to this lecture. The increased popularity of public clouds has made WSC software techniques relevant to a larger pool of programmers since our first edition. Therefore, we expanded Chapter 2 to reflect our better understanding of WSC software systems and the toolbox of software techniques for WSC programming. In Chapter 3, we added to our coverage of the evolving landscape of wimpy vs. brawny server trade-offs, and we now present an overview of WSC interconnects and storage systems that was promised but lacking in the original edition. Thanks largely to the help of our new co-author, Google Distinguished Engineer Jimmy Clidaras, the material on facility mechanical and power distribution design has been updated and greatly extended (see Chapters 4 and 5). Chapters 6 and 7 have also been revamped significantly. We hope this revised edition continues to meet the needs of educators and professionals in this area.