Computational Methods for Analysis of Large-Scale Epigenomics Data

Computational Methods for Analysis of Large-Scale Epigenomics Data
Author: Petko Plamenov Fiziev
Publisher:
Total Pages: 248
Release: 2018
Genre:
ISBN:

Reverse-engineering and understanding the regulatory dynamics of genes is key to gaining insights into many biological processes on molecular level. Advances in genomics technologies and decreasing costs of DNA sequencing enabled interrogating relevant properties of the genome, collectively referred to as epigenetics, on very large scale. This work presents results from two collaborative projects with experimental biologists and two new general computational methods for analysis of high-throughput epigenomic data. The first collaborative project is joint work with Dr. Kathrin Plath and members of her lab at UCLA on studying the epigenetics of somatic cell reprogramming in mouse. By generating and analyzing a large compendium of genomics datasets at four distinct stages during reprogramming, we discovered key properties of the regulatory dynamics during this process and proposed new ways to improve its efficiency. The first computational method in this work, ChromTime, presents a novel framework for modeling spatio-temporal dynamics of chromatin marks. ChromTime detects expanding, contracting and steady domains of chromatin marks from time course epigenomics data. Applications of the method to a diverse set of biological systems show that predicted dynamic domains likely mark important regulatory regions as they associate with changes in gene expression and transcription factor binding. Furthermore, ChromTime enables analyses of the directionality of spatio-temporal dynamics of epigenetic domains, which is a previously understudied aspect of chromatin dynamics. Our results uncover associations between the direction of expanding and contracting domains of several chromatin marks and the direction of transcription of nearby genes. The second collaborative project is joint work with cancer researchers, Dr. Lynda Chin and Dr. Kunal Rai and members of their labs at MD Anderson Cancer Center in Houston, TX. Within this project we studied the epigenetics of melanoma cancer progression. Our collaborators generated genome-wide maps for a large number of histone modifications, DNA methylation and gene expression in tumorigenic and non-tumorigenic human melanocytes. By comparing these maps we discovered that loss of acetylation marks at regulatory regions is characteristic of tumorigenic melanocytes and that modulating acetylation levels can impact tumorigenic potential of cells. In addition, we developed a novel nanostring assay for interrogating the chromatin state at a small subset of genomic locations, which can potentially be used for diagnostic or prognostic purposes in future. The second computational method presented in this work, CSDELTA, is designed to detect differential chromatin sites from genome-wide chromatin state maps in groups with multiple samples. Biological relevance of detected differential sites is supported by associations with changes in gene expression and transcription factor binding. Furthermore, CSDELTA models the functional similarity between chromatin states and improves upon the resolution of detection compared to existing methods, which enables more accurate downstream analyses to gain insights into the regulatory dynamics of biological systems.

Computational Epigenetics and Diseases

Computational Epigenetics and Diseases
Author:
Publisher: Academic Press
Total Pages: 450
Release: 2019-02-06
Genre: Business & Economics
ISBN: 0128145145

Computational Epigenetics and Diseases, written by leading scientists in this evolving field, provides a comprehensive and cutting-edge knowledge of computational epigenetics in human diseases. In particular, the major computational tools, databases, and strategies for computational epigenetics analysis, for example, DNA methylation, histone modifications, microRNA, noncoding RNA, and ceRNA, are summarized, in the context of human diseases. This book discusses bioinformatics methods for epigenetic analysis specifically applied to human conditions such as aging, atherosclerosis, diabetes mellitus, schizophrenia, bipolar disorder, Alzheimer disease, Parkinson disease, liver and autoimmune disorders, and reproductive and respiratory diseases. Additionally, different organ cancers, such as breast, lung, and colon, are discussed. This book is a valuable source for graduate students and researchers in genetics and bioinformatics, and several biomedical field members interested in applying computational epigenetics in their research. Provides a comprehensive and cutting-edge knowledge of computational epigenetics in human diseases Summarizes the major computational tools, databases, and strategies for computational epigenetics analysis, such as DNA methylation, histone modifications, microRNA, noncoding RNA, and ceRNA Covers the major milestones and future directions of computational epigenetics in various kinds of human diseases such as aging, atherosclerosis, diabetes, heart disease, neurological disorders, cancers, blood disorders, liver diseases, reproductive diseases, respiratory diseases, autoimmune diseases, human imprinting disorders, and infectious diseases

Computational Methods for Processing and Analyzing Large Scale Genomics Datasets

Computational Methods for Processing and Analyzing Large Scale Genomics Datasets
Author: Olivera Grujic
Publisher:
Total Pages: 144
Release: 2016
Genre:
ISBN:

This dissertation develops computational methods for analyzing large-scale genomic and epigenomic datasets. We developed a supervised machine learning approach to predict non-exonic evolutionarily conserved regions in the human genome based on vast amount of functional genomics data. The resulting probabilistic predictions provide a resource for prioritizing functionally important regulatory regions in the human genome. We also developed a method for identifying from large-scale gene expression datasets genes that are differentially expressed in both blood and brain from 12 vervet monkeys, which we used to identify 29 transcripts whose expression is variable between individuals and heritable. Additionally, we developed a method using a global search optimization algorithm to successfully improve a model of human thyroid hormone regulation dynamics leading to a better fit of data for thyrotoxicosis. Together, these three approaches have the potential to impact the understanding and eventual treatment of disease.

Computational Epigenomics and Epitranscriptomics

Computational Epigenomics and Epitranscriptomics
Author: Pedro H. Oliveira
Publisher: Springer Nature
Total Pages: 267
Release: 2023-02-01
Genre: Science
ISBN: 107162962X

This volume details state-of-the-art computational methods designed to manage, analyze, and generally leverage epigenomic and epitranscriptomic data. Chapters guide readers through fine-mapping and quantification of modifications, visual analytics, imputation methods, supervised analysis, and integrative approaches for single-cell data. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Cutting-edge and thorough, Computational Epigenomics and Epitranscriptomics aims to provide an overview of epiomic protocols, making it easier for researchers to extract impactful biological insight from their data.

Computational Methods for the Analysis of Genomic Data and Biological Processes

Computational Methods for the Analysis of Genomic Data and Biological Processes
Author: Francisco A. Gómez Vela
Publisher:
Total Pages: 222
Release: 2021
Genre:
ISBN: 9783039437726

In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Computational Methods and Analyses in Comparative Genomics and Epigenomics

Computational Methods and Analyses in Comparative Genomics and Epigenomics
Author: Qian Peng
Publisher:
Total Pages: 139
Release: 2012
Genre:
ISBN: 9781267247681

As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields : comparative genomics and epigenomics, and employs a variety of computational techniques to address the problemsOne fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective : to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.

Computational Epigenomics

Computational Epigenomics
Author: Angela Yen
Publisher:
Total Pages: 225
Release: 2016
Genre:
ISBN:

One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries. While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms. In this thesis, we combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, in chapter 2, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Next, in chapter 3, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. In chapter 4, we apply our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through differential expression analysis and gene set enrichment. In chapter 5, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles, cell types, sample groups, experimental technologies, and histone mark data to reveal insightful epigenetic patterns and relationships. Finally, in chapter 6, we identify "hidden" or "unknown" covariates in epigenomic data by using agnostic principal component analysis on our samples to discover similarities between our known covariates and the identified components. In summation, our research highlights the importance of both algorithm development and method application for epigenomic questions, reaffirming the importance of interdisciplinary research that brings together cutting-edge techniques in computer science with appropriate biological hypotheses and data. While questions and analysis must be carefully paired in an informed manner to produce meaningful, interpretable, and believable results in computational biology, our work here provides a sampling of the vast potential for scientific discovery at the intersection of the fields of computer science and biology.

Computational Epigenomics and Disease

Computational Epigenomics and Disease
Author: Misook Ha
Publisher: Academic Press
Total Pages: 320
Release: 2017-01-01
Genre: Medical
ISBN: 9780128041048

Computational Epigenomics and Diseases: Epigenomic Data Analytics for Human Health Application explains the current computational approaches inferring epigenetic mechanisms from epigenetic data. Epigenetic research leads to a considerable amount of data that can be more efficiently organized and analyzed using computer-based systems. All applicable computational approaches are explained in detail within this volume. Computational Epigenetics discusses topics such as statistical analysis and management of big epigenetics datasets; relationships among epigenetic factors and diseases; computational inference of spatial organization of genome; differential regulations and inference of variations of chromatin modifications; and systems biology approaches for identifying chromatin regulators. Additionally, strategies for applying epigenetics data analysis results to disease diagnosis, prognosis, and case studies are included in order to provide thorough and translational comprehension and applicability. The book is a valuable resource for computer scientists, mathematicians, and statisticians interested in bioinformatics and computational biology approaches to epigenetic data analysis, as well as geneticists who are looking to improve their knowledge of computational analytics for their research. Explains the computational methods inferring features of epigenetic marks; Describes the basic computational methods for understanding and deciphering chromatin signatures at the primary organization level; Offers example publications and case studies to show the range of possible applications of the computational analyses of epigenetics data.

Big Data Analytics in Genomics

Big Data Analytics in Genomics
Author: Ka-Chun Wong
Publisher: Springer
Total Pages: 426
Release: 2016-10-24
Genre: Computers
ISBN: 3319412795

This contributed volume explores the emerging intersection between big data analytics and genomics. Recent sequencing technologies have enabled high-throughput sequencing data generation for genomics resulting in several international projects which have led to massive genomic data accumulation at an unprecedented pace. To reveal novel genomic insights from this data within a reasonable time frame, traditional data analysis methods may not be sufficient or scalable, forcing the need for big data analytics to be developed for genomics. The computational methods addressed in the book are intended to tackle crucial biological questions using big data, and are appropriate for either newcomers or veterans in the field.This volume offers thirteen peer-reviewed contributions, written by international leading experts from different regions, representing Argentina, Brazil, China, France, Germany, Hong Kong, India, Japan, Spain, and the USA. In particular, the book surveys three main areas: statistical analytics, computational analytics, and cancer genome analytics. Sample topics covered include: statistical methods for integrative analysis of genomic data, computation methods for protein function prediction, and perspectives on machine learning techniques in big data mining of cancer. Self-contained and suitable for graduate students, this book is also designed for bioinformaticians, computational biologists, and researchers in communities ranging from genomics, big data, molecular genetics, data mining, biostatistics, biomedical science, cancer research, medical research, and biology to machine learning and computer science. Readers will find this volume to be an essential read for appreciating the role of big data in genomics, making this an invaluable resource for stimulating further research on the topic.