Pro Apache Hadoop

Pro Apache Hadoop
Author: Jason Venner
Publisher: Apress
Total Pages: 428
Release: 2014-09-18
Genre: Computers
ISBN: 1430248645

Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop – the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more. This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your software—you just focus on the code; Hadoop takes care of the rest. Covers all that is new in Hadoop 2.0 Written by a professional involved in Hadoop since day one Takes you quickly to the seasoned pro level on the hottest cloud-computing framework

Pro Hadoop Data Analytics

Pro Hadoop Data Analytics
Author: Kerry Koitzsch
Publisher: Apress
Total Pages: 304
Release: 2016-12-29
Genre: Computers
ISBN: 1484219104

Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation. Pro Hadoop Data Analytics emphasizes best practices to ensure coherent, efficient development. A complete example system will be developed using standard third-party components that consist of the tool kits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system. The book also highlights the importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. You'll discover the importance of mix-and-match or hybrid systems, using different analytical components in one application. This hybrid approach will be prominent in the examples. What You'll Learn Build big data analytic systems with the Hadoop ecosystem Use libraries, tool kits, and algorithms to make development easier and more effective Apply metrics to measure performance and efficiency of components and systems Connect to standard relational databases, noSQL data sources, and more Follow case studies with example components to create your own systems Who This Book Is For Software engineers, architects, and data scientists with an interest in the design and implementation of big data analytical systems using Hadoop, the Hadoop ecosystem, and other associated technologies.

Pro Apache Phoenix

Pro Apache Phoenix
Author: Shakil Akhtar
Publisher: Apress
Total Pages: 148
Release: 2016-12-29
Genre: Computers
ISBN: 1484223705

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop. You will learn how to: Handle a petabyte data store by applying familiar SQL techniques Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase Apply best practices while working with a scalable data store on Hadoop and HBase Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis Demonstrate real-time use cases and big data modeling techniques Who This Book Is For Data engineers, Big Data administrators, and architects.

Professional Hadoop

Professional Hadoop
Author: Benoy Antony
Publisher: John Wiley & Sons
Total Pages: 216
Release: 2016-05-23
Genre: Computers
ISBN: 111926717X

The professional's one-stop guide to this open-source, Java-based big data framework Professional Hadoop is the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings. Written by an expert team of certified Hadoop developers, committers, and Summit speakers, this book details every key aspect of Hadoop technology to enable optimal processing of large data sets. Designed expressly for the professional developer, this book skips over the basics of database development to get you acquainted with the framework's processes and capabilities right away. The discussion covers each key Hadoop component individually, culminating in a sample application that brings all of the pieces together to illustrate the cooperation and interplay that make Hadoop a major big data solution. Coverage includes everything from storage and security to computing and user experience, with expert guidance on integrating other software and more. Hadoop is quickly reaching significant market usage, and more and more developers are being called upon to develop big data solutions using the Hadoop framework. This book covers the process from beginning to end, providing a crash course for professionals needing to learn and apply Hadoop quickly. Configure storage, UE, and in-memory computing Integrate Hadoop with other programs including Kafka and Storm Master the fundamentals of Apache Big Top and Ignite Build robust data security with expert tips and advice Hadoop's popularity is largely due to its accessibility. Open-source and written in Java, the framework offers almost no barrier to entry for experienced database developers already familiar with the skills and requirements real-world programming entails. Professional Hadoop gives you the practical information and framework-specific skills you need quickly.

Big Data Analytics: Applications, Hadoop Technologies and Hive

Big Data Analytics: Applications, Hadoop Technologies and Hive
Author: Dr.P.Pushpa
Publisher: Leilani Katie Publication
Total Pages: 251
Release: 2024-04-22
Genre: Computers
ISBN: 8197147965

Dr.P.Pushpa, Lecturer, School of Software Engineering, East China University of Technology, Nanchang, Jiangxi, China. Dr.V.Thamilarasi, Assistant Professor, Department of Computer Science, Sri Sarada College for Women(Autonomous), Salem, Tamil Nadu, India. Dr. S. Lakshmi Prabha, Associate Professor, Department of Computer Science, Seethalakshmi Ramaswami College, Tiruchirappalli, Tamil Nadu, India. Mrs.Sudha Nagarajan, Assistant Professor, Department of Computer Science, Excel College for Commerce and Science, Komarapalayam, Namakkal, Tamil Nadu, India.

Handbook of Research on Advanced Practical Approaches to Deepfake Detection and Applications

Handbook of Research on Advanced Practical Approaches to Deepfake Detection and Applications
Author: Obaid, Ahmed J.
Publisher: IGI Global
Total Pages: 409
Release: 2023-01-03
Genre: Computers
ISBN: 1668460629

In recent years, falsification and digital modification of video clips, images, as well as textual contents have become widespread and numerous, especially when deepfake technologies are adopted in many sources. Due to adopted deepfake techniques, a lot of content currently cannot be recognized from its original sources. As a result, the field of study previously devoted to general multimedia forensics has been revived. The Handbook of Research on Advanced Practical Approaches to Deepfake Detection and Applications discusses the recent techniques and applications of illustration, generation, and detection of deepfake content in multimedia. It introduces the techniques and gives an overview of deepfake applications, types of deepfakes, the algorithms and applications used in deepfakes, recent challenges and problems, and practical applications to identify, generate, and detect deepfakes. Covering topics such as anomaly detection, intrusion detection, and security enhancement, this major reference work is a comprehensive resource for cyber security specialists, government officials, law enforcement, business leaders, students and faculty of higher education, librarians, researchers, and academicians.

Research Challenges in Information Science

Research Challenges in Information Science
Author: Renata Guizzardi
Publisher: Springer Nature
Total Pages: 836
Release: 2022-05-13
Genre: Computers
ISBN: 3031057600

This book constitutes the proceedings of the 16th International Conference on Research Challenges in Information Sciences, RCIS 2022, which took place in Barcelona, Spain, during May 17–20, 2022. It focused on the special theme "Ethics and Trustworthiness in Information Science". The scope of RCIS is summarized by the thematic areas of information systems and their engineering; user-oriented approaches; data and information management; business process management; domain-specific information systems engineering; data science; information infrastructures, and reflective research and practice. The 35 full papers presented in this volume were carefully reviewed and selected from a total 100 submissions. The 18 Forum papers are based on 11 Forum submissions, from which 5 were selected, and the remaining 13 were transferred from the regular submissions. The 6 Doctoral Consortium papers were selected from 10 submissions to the consortium. The contributions were organized in topical sections named: Data Science and Data Management; Information Search and Analysis; Business Process Management; Business Process Mining; Digital Transformation and Smart Life; Conceptual Modelling and Ontologies; Requirements Engineering; Model-Driven Engineering; Machine Learning Applications. In addition, two-page summaries of the tutorials can be found in the back matter.

Inventive Computation and Information Technologies

Inventive Computation and Information Technologies
Author: S. Smys
Publisher: Springer Nature
Total Pages: 911
Release: 2022-01-18
Genre: Technology & Engineering
ISBN: 9811667233

This book is a collection of best selected papers presented at the International Conference on Inventive Computation and Information Technologies (ICICIT 2021), organized during 12–13 August 2021. The book includes papers in the research area of information sciences and communication engineering. The book presents novel and innovative research results in theory, methodology and applications of communication engineering and information technologies.

Big Data Processing Using Spark in Cloud

Big Data Processing Using Spark in Cloud
Author: Mamta Mittal
Publisher: Springer
Total Pages: 275
Release: 2018-06-16
Genre: Computers
ISBN: 9811305501

The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been overcome by Spark. The book mainly focuses on the in-depth architecture of Spark and our understanding of Spark RDDs and how RDD complements big data’s immutable nature, and solves it with lazy evaluation, cacheable and type inference. It also addresses advanced topics in Spark, starting with the basics of Scala and the core Spark framework, and exploring Spark data frames, machine learning using Mllib, graph analytics using Graph X and real-time processing with Apache Kafka, AWS Kenisis, and Azure Event Hub. It then goes on to investigate Spark using PySpark and R. Focusing on the current big data stack, the book examines the interaction with current big data tools, with Spark being the core processing layer for all types of data. The book is intended for data engineers and scientists working on massive datasets and big data technologies in the cloud. In addition to industry professionals, it is helpful for aspiring data processing professionals and students working in big data processing and cloud computing environments.

Big Data Computing

Big Data Computing
Author: Tanvir Habib Sardar
Publisher: CRC Press
Total Pages: 397
Release: 2024-02-27
Genre: Computers
ISBN: 100382272X

This book primarily aims to provide an in-depth understanding of recent advances in big data computing technologies, methodologies, and applications along with introductory details of big data computing models such as Apache Hadoop, MapReduce, Hive, Pig, Mahout in-memory storage systems, NoSQL databases, and big data streaming services such as Apache Spark, Kafka, and so forth. It also covers developments in big data computing applications such as machine learning, deep learning, graph processing, and many others. Features: Provides comprehensive analysis of advanced aspects of big data challenges and enabling technologies. Explains computing models using real-world examples and dataset-based experiments. Includes case studies, quality diagrams, and demonstrations in each chapter. Describes modifications and optimization of existing technologies along with the novel big data computing models. Explores references to machine learning, deep learning, and graph processing. This book is aimed at graduate students and researchers in high-performance computing, data mining, knowledge discovery, and distributed computing.