Scalable Algorithms for Data and Network Analysis

Scalable Algorithms for Data and Network Analysis
Author: Shang-Hua Teng
Publisher:
Total Pages: 274
Release: 2016
Genre: Big data
ISBN: 9781680831313

In the age of Big Data, efficient algorithms are now in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today's problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. In this tutorial, I will survey a family of algorithmic techniques for the design of provably-good scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning. They also include spectral graph-theoretical methods, such as those used for computing electrical flows and sampling from Gaussian Markov random fields. These methods exemplify the fusion of combinatorial, numerical, and statistical thinking in network analysis. I will illustrate the use of these techniques by a few basic problems that are fundamental in network analysis, particularly for the identification of significant nodes and coherent clusters/communities in social and information networks. I also take this opportunity to discuss some frameworks beyond graph-theoretical models for studying conceptual questions to understand multifaceted network data that arise in social influence, network dynamics, and Internet economics.

Scalable Algorithms for Data and Network Analysis

Scalable Algorithms for Data and Network Analysis
Author: Shang-Hua Teng
Publisher:
Total Pages: 292
Release: 2016-05-04
Genre: Computers
ISBN: 9781680831306

In the age of Big Data, efficient algorithms are in high demand. It is also essential that efficient algorithms should be scalable. This book surveys a family of algorithmic techniques for the design of scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning.

Data Algorithms

Data Algorithms
Author: Mahmoud Parsian
Publisher: "O'Reilly Media, Inc."
Total Pages: 778
Release: 2015-07-13
Genre: Computers
ISBN: 1491906154

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects. Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark. Topics include: Market basket analysis for a large set of transactions Data mining algorithms (K-means, KNN, and Naive Bayes) Using huge genomic data to sequence DNA and RNA Naive Bayes theorem and Markov chains for data and market prediction Recommendation algorithms and pairwise document similarity Linear regression, Cox regression, and Pearson correlation Allelic frequency and mining DNA Social network analysis (recommendation systems, counting triangles, sentiment analysis)

Computing and Combinatorics

Computing and Combinatorics
Author: Yixin Cao
Publisher: Springer
Total Pages: 708
Release: 2017-07-25
Genre: Computers
ISBN: 3319623893

This book constitutes the refereed proceedings of the 23rd International Conference on Computing and Combinatorics, COCOON 2017, held in Hiong Kong, China, in August 2017. The 56 full papers papers presented in this book were carefully reviewed and selected from 119 submissions. The papers cover various topics, including algorithms and data structures, complexity theory and computability, algorithmic game theory, computational learning theory, cryptography, computationalbiology, computational geometry and number theory, graph theory, and parallel and distributed computing.

Algorithms for Big Data

Algorithms for Big Data
Author: Hannah Bast
Publisher: Springer Nature
Total Pages: 296
Release: 2022
Genre: Algorithms
ISBN: 3031215346

This open access book surveys the progress in addressing selected challenges related to the growth of big data in combination with increasingly complicated hardware. It emerged from a research program established by the German Research Foundation (DFG) as priority program SPP 1736 on Algorithmics for Big Data where researchers from theoretical computer science worked together with application experts in order to tackle problems in domains such as networking, genomics research, and information retrieval. Such domains are unthinkable without substantial hardware and software support, and these systems acquire, process, exchange, and store data at an exponential rate. The chapters of this volume summarize the results of projects realized within the program and survey-related work. This is an open access book.

Working with Network Data

Working with Network Data
Author: James Bagrow
Publisher: Cambridge University Press
Total Pages: 555
Release: 2024-05-31
Genre: Science
ISBN: 1009212591

Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.

On the Analysis of Complex Networks

On the Analysis of Complex Networks
Author: Feizi-Khankandi Feizi
Publisher:
Total Pages: 496
Release: 2016
Genre:
ISBN:

Network models provide a unifying framework for understanding dependencies among variables in data-driven and engineering sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging. In this thesis, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several network science problems. In Chapters 2-4, we consider network inference challenges. In Chapter 2, we introduce Network Maximal Correlation (NMC) as a multivariate measure of nonlinear association suitable for evaluation on large datasets. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for finite discrete and jointly Gaussian random variables. We illustrate an application of NMC and multiple MC in inference of graphical models for bijective, possibly non-monotone, functions of jointly Gaussian variables. As a demonstration of NMC's utility, we infer nonlinear gene association networks and modules in cancer datasets and validate them using survival times of patients. In Chapter 3, we develop a network integration framework to infer gene regulatory networks in human and model organisms fly and worm using diverse and high-throughput datasets. Inferred regulatory interactions have significant overlap with known edges, indicating the robustness and accuracy of the proposed network inference framework. In Chapter 4, we formulate the transitive noise problem in networks as the inverse of matrix transitive closure and introduce an algorithm to solve it efficiently. We demonstrate the effectiveness of our approach in several applications such as regulatory network inference, protein contact map inference and strong collaboration tie inference. In Chapters 5-8, we consider network analysis challenges. In Chapter 5, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. This problem is essential in comparative analysis across large datasets and networks. To solve this combinatorial problem, we present a new scalable spectral algorithm which creates an eigenvector relaxation for the underlying optimization. We prove the optimality of the method under certain technical conditions, and show its effectiveness over various synthetic networks as well as in comparative analysis of gene regulatory networks across human, fly and worm species. In Chapter 6, we consider the source inference problem where the goal is to identify the source(s) of propagated signals across biological, social and engineered networks. To solve this problem, we propose a computationally tractable general method based on a path-based network diffusion kernel. We prove mean-field optimality of this method for different scenarios and show its effectiveness over several synthetic networks as well as in identifying sources in a Digg social news network. In Chapter 7, we consider the problem of learning low dimensional structures (such as clusters) in large networks. Here we introduce logistic Random Dot Product Graphs (RDPGs) as a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a scalable spectral method that solves the maximum likelihood inference problem asymptotically exactly. This leads to a new scalable spectral network clustering algorithm that is robust under different clustering setups. In Chapter 8, we consider the biclustering problem, the analog of clustering on bipartite graphs. This problem has several applications such as inference of co-regulated genes, document classification, and so on. Here we propose an algorithm based on message-passing that closely approximates a general likelihood function and excels at resolving the overlaps between biclusters. In Chapters 9-12, we consider design challenges of systems and algorithms for engineering networks such as communication networks. In Chapters 9-10, we create a connection between compressive sensing and traditional information theoretic techniques in source, channel and network coding and propose a joint coding scheme over wireless networks based on random projection and restricted eigenvalue principles. Moreover, we characterize fundamental results on the trade-off between the communication rate and the decoding complexity. In Chapters 11-12, we propose an adaptive nonuniform sampling framework, in which time increments between samples are determined as a function of the most recent increments and sample values, obviating the need to track time stamps. We analyze the performance of the proposed method for different stochastic and deterministic signal models and show its effectiveness to enhance measurements of heart ECG signals.

Network Algorithms, Data Mining, and Applications

Network Algorithms, Data Mining, and Applications
Author: Ilya Bychkov
Publisher: Springer Nature
Total Pages: 251
Release: 2020-02-22
Genre: Mathematics
ISBN: 3030371573

This proceedings presents the result of the 8th International Conference in Network Analysis, held at the Higher School of Economics, Moscow, in May 2018. The conference brought together scientists, engineers, and researchers from academia, industry, and government. Contributions in this book focus on the development of network algorithms for data mining and its applications. Researchers and students in mathematics, economics, statistics, computer science, and engineering find this collection a valuable resource filled with the latest research in network analysis. Computational aspects and applications of large-scale networks in market models, neural networks, social networks, power transmission grids, maximum clique problem, telecommunication networks, and complexity graphs are included with new tools for efficient network analysis of large-scale networks. Machine learning techniques in network settings including community detection, clustering, and biclustering algorithms are presented with applications to social network analysis.