Using OpenRefine

Using OpenRefine
Author: Ruben Verborgh
Publisher: Packt Publishing Ltd
Total Pages: 155
Release: 2013-09-10
Genre: Computers
ISBN: 1783289090

The book is styled on a Cookbook, containing recipes - combined with free datasets - which will turn readers into proficient OpenRefine users in the fastest possible way.This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required, as we start from the very beginning and gradually reveal more advanced features. You don't even need your own dataset, as we provide example data to try out the book's recipes.

Practical Data Analysis Cookbook

Practical Data Analysis Cookbook
Author: Tomasz Drabas
Publisher: Packt Publishing Ltd
Total Pages: 384
Release: 2016-04-29
Genre: Computers
ISBN: 1783558512

Over 60 practical recipes on data exploration and analysis About This Book Clean dirty data, extract accurate information, and explore the relationships between variables Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn Find and extract the most important features from your dataset using the most efficient Python libraries Who This Book Is For If you are a beginner or intermediate-level professional who is looking to solve your day-to-day, analytical problems with Python, this book is for you. Even with no prior programming and data analytics experience, you will be able to finish each recipe and learn while doing so. What You Will Learn Read, clean, transform, and store your data usng Pandas and OpenRefine Understand your data and explore the relationships between variables using Pandas and D3.js Explore a variety of techniques to classify and cluster outbound marketing campaign calls data of a bank using Pandas, mlpy, NumPy, and Statsmodels Reduce the dimensionality of your dataset and extract the most important features with pandas, NumPy, and mlpy Predict the output of a power plant with regression models and forecast water flow of American rivers with time series methods using pandas, NumPy, Statsmodels, and scikit-learn Explore social interactions and identify fraudulent activities with graph theory concepts using NetworkX and Gephi Scrape Internet web pages using urlib and BeautifulSoup and get to know natural language processing techniques to classify movies ratings using NLTK Study simulation techniques in an example of a gas station with agent-based modeling In Detail Data analysis is the process of systematically applying statistical and logical techniques to describe and illustrate, condense and recap, and evaluate data. Its importance has been most visible in the sector of information and communication technologies. It is an employee asset in almost all economy sectors. This book provides a rich set of independent recipes that dive into the world of data analytics and modeling using a variety of approaches, tools, and algorithms. You will learn the basics of data handling and modeling, and will build your skills gradually toward more advanced topics such as simulations, raw text processing, social interactions analysis, and more. First, you will learn some easy-to-follow practical techniques on how to read, write, clean, reformat, explore, and understand your data—arguably the most time-consuming (and the most important) tasks for any data scientist. In the second section, different independent recipes delve into intermediate topics such as classification, clustering, predicting, and more. With the help of these easy-to-follow recipes, you will also learn techniques that can easily be expanded to solve other real-life problems such as building recommendation engines or predictive models. In the third section, you will explore more advanced topics: from the field of graph theory through natural language processing, discrete choice modeling to simulations. You will also get to expand your knowledge on identifying fraud origin with the help of a graph, scrape Internet websites, and classify movies based on their reviews. By the end of this book, you will be able to efficiently use the vast array of tools that the Python environment has to offer. Style and approach This hands-on recipe guide is divided into three sections that tackle and overcome real-world data modeling problems faced by data analysts/scientist in their everyday work. Each independent recipe is written in an easy-to-follow and step-by-step fashion.

A Hands-on Introduction to Big Data Analytics

A Hands-on Introduction to Big Data Analytics
Author: Funmi Obembe
Publisher: SAGE Publications Limited
Total Pages: 415
Release: 2024-02-23
Genre: Business & Economics
ISBN: 1529615909

This practical textbook offers a hands-on introduction to big data analytics, helping you to develop the skills required to hit the ground running as a data professional. It complements theoretical foundations with an emphasis on the application of big data analytics, illustrated by real-life examples and datasets. Containing comprehensive coverage of all the key topics in this area, this book uses open-source technologies and examples in Python and Apache Spark. Learning features include: - Ethics by Design encourages you to consider data ethics at every stage. - Industry Insights facilitate a deeper understanding of the link between what you are studying and how it is applied in industry. - Datasets, questions, and exercises give you the opportunity to apply your learning. Dr Funmi Obembe is the Head of Technology at the Faculty of Arts, Science and Technology, University of Northampton. Dr Ofer Engel is a Data Scientist at the University of Groningen.

Practical Data Analysis

Practical Data Analysis
Author: Hector Cuesta
Publisher: Packt Publishing Ltd
Total Pages: 330
Release: 2016-09-30
Genre: Computers
ISBN: 1785286668

A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark About This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Who This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting In Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data.

Data Literacy

Data Literacy
Author: David Herzog
Publisher: SAGE Publications
Total Pages: 197
Release: 2015-01-29
Genre: Language Arts & Disciplines
ISBN: 1483378667

A practical, skill-based introduction to data analysis and literacy We are swimming in a world of data, and this handy guide will keep you afloat while you learn to make sense of it all. In Data Literacy: A User′s Guide, David Herzog, a journalist with a decade of experience using data analysis to transform information into captivating storytelling, introduces students and professionals to the fundamentals of data literacy, a key skill in today’s world. Assuming the reader has no advanced knowledge of data analysis or statistics, this book shows how to create insight from publicly-available data through exercises using simple Excel functions. Extensively illustrated, step-by-step instructions within a concise, yet comprehensive, reference will help readers identify, obtain, evaluate, clean, analyze and visualize data. A concluding chapter introduces more sophisticated data analysis methods and tools including database managers such as Microsoft Access and MySQL and standalone statistical programs such as SPSS, SAS and R.

Digital Humanities and Religions in Asia

Digital Humanities and Religions in Asia
Author: L.W.C. van Lit
Publisher: Walter de Gruyter GmbH & Co KG
Total Pages: 342
Release: 2023-12-04
Genre: Religion
ISBN: 3110747758

In pre-modern religions in the geographical context of Asia we encounter unique scripts, number systems, calendars, and naming conventions. These can make Western-built technologies – even tools specifically developed for digital humanities – an ill fit to our needs. The present volume explores this struggle and the limitations and potential opportunities of applying a digital humanities approach to pre-modern Asian religions. The authors cover Buddhism, Christianity, Daoism, Islam, Jainism, Judaism and Shintoism with chapters categorized according to their focus on: 1) temples, 2) manuscripts, 3) texts, and 4) social media. Thus, the volume guides readers through specific methodologies and practical examples while also providing a critical reflection on the state of the field, pushing the interface between digital humanities and pre-modern Asian religions into new territory.

The SAGE Handbook of Social Media Research Methods

The SAGE Handbook of Social Media Research Methods
Author: Luke Sloan
Publisher: SAGE
Total Pages: 992
Release: 2017-01-28
Genre: Social Science
ISBN: 1473987970

The SAGE Handbook of Social Media Research Methods offers a step-by-step guide to overcoming the challenges inherent in research projects that deal with ‘big and broad data’, from the formulation of research questions through to the interpretation of findings. The handbook includes chapters on specific social media platforms such as Twitter, Sina Weibo and Instagram, as well as a series of critical chapters. The holistic approach is organised into the following sections: Conceptualising & Designing Social Media Research Collection & Storage Qualitative Approaches to Social Media Data Quantitative Approaches to Social Media Data Diverse Approaches to Social Media Data Analytical Tools Social Media Platforms This handbook is the single most comprehensive resource for any scholar or graduate student embarking on a social media project.

Practical Python Data Wrangling and Data Quality

Practical Python Data Wrangling and Data Quality
Author: Susan E. McGregor
Publisher: "O'Reilly Media, Inc."
Total Pages: 416
Release: 2021-12-03
Genre: Computers
ISBN: 1492091456

The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations. Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data. Use Python 3.8+ to read, write, and transform data from a variety of sources Understand and use programming basics in Python to wrangle data at scale Organize, document, and structure your code using best practices Collect data from structured data files, web pages, and APIs Perform basic statistical analyses to make meaning from datasets Visualize and present data in clear and compelling ways

CeDEM14

CeDEM14
Author: Parycek, Peter
Publisher: MV-Verlag
Total Pages: 620
Release: 2014
Genre:
ISBN: 3902505354

Web Scraping with Python

Web Scraping with Python
Author: Ryan Mitchell
Publisher: "O'Reilly Media, Inc."
Total Pages: 264
Release: 2015-06-15
Genre: Computers
ISBN: 1491910259

Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice. Learn how to parse complicated HTML pages Traverse multiple pages and sites Get a general overview of APIs and how they work Learn several methods for storing the data you scrape Download, read, and extract data from documents Use tools and techniques to clean badly formatted data Read and write natural languages Crawl through forms and logins Understand how to scrape JavaScript Learn image processing and text recognition