Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance

Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance
Author: Yu Sun (Ph.D.)
Publisher:
Total Pages: 0
Release: 2023
Genre:
ISBN:

Phylogenetic inference is a fundamental aspect of evolutionary biology, with the recoverabilityof the species tree being crucial to the statistical consistency of algorithms that utilize certain statistical quantities. Additionally, it is important to analyze the theoretical sample complexity of statistically consistent algorithms. In this thesis, we summarize our results in three different directions regarding recoverability and sample complexity: 1. For a widely used species tree to gene tree model, can we extend its recoverability result to a more general phylogenetic structure? 2. For a statistically consistent species tree inference algorithm, can we find a good theoretical upper bound for sample complexity? 3. Can we use similar tools for providing recoverability results from a well-known model to other models? For the first question, we consider phylogenetic networks, which is the generalized notion of phylogenetic trees to model non-vertical inheritance, by which a lineage inherits genetic material from multiple parents through hybridization or other lateral gene transfer. Under the network multispecies coalescent (NMSC) model, any topology can be observed in individual gene trees arising from a network. In Chapter 3, we provide a recoverability result for level-1 networks under NMSC through expected internode distances. This result can be considered as an extension of the recoverability of species trees under the multispecies coalescent (MSC) model. The global and local features of our identifiability result are consistent with other studies about the level-1 network with or without NMSC (i.e. [1] and NANUQ in [2]). For the second question, we make progress towards deriving sample complexity bounds for our new species tree reconstruction methods based on sample averages of internode distances. Our sample complexity depends polylogarithmically on N, the number of species in the tree. Our results are similar to sample complexity results for other stochastically consistent species tree inference algorithms, i.e., GLASS (Mossel and Roch, [3]) and ASTRAL (Mirarab et al. [4] and Shekhar et al. [5]), although do not quite match them. For the last question, we consider the gene duplication and loss (GDL) model and estimate the species trees from multilocus data sets. As a top-down model, GDL explains the phenomenon of gene tree incongruence by gene duplication and loss. In Chapter 5, we derive a theoretical result using expected internode distance to recover the unrooted species tree topology under the GDL model in the loss-less case. As far as we know, this is the first theoretical result about the recoverability of the species tree using distance-based methods under the GDL model.

Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance

Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance
Author: Yu Sun (Ph.D.)
Publisher:
Total Pages: 0
Release: 2023
Genre:
ISBN:

Phylogenetic inference is a fundamental aspect of evolutionary biology, with the recoverabilityof the species tree being crucial to the statistical consistency of algorithms that utilize certain statistical quantities. Additionally, it is important to analyze the theoretical sample complexity of statistically consistent algorithms. In this thesis, we summarize our results in three different directions regarding recoverability and sample complexity: 1. For a widely used species tree to gene tree model, can we extend its recoverability result to a more general phylogenetic structure? 2. For a statistically consistent species tree inference algorithm, can we find a good theoretical upper bound for sample complexity? 3. Can we use similar tools for providing recoverability results from a well-known model to other models? For the first question, we consider phylogenetic networks, which is the generalized notion of phylogenetic trees to model non-vertical inheritance, by which a lineage inherits genetic material from multiple parents through hybridization or other lateral gene transfer. Under the network multispecies coalescent (NMSC) model, any topology can be observed in individual gene trees arising from a network. In Chapter 3, we provide a recoverability result for level-1 networks under NMSC through expected internode distances. This result can be considered as an extension of the recoverability of species trees under the multispecies coalescent (MSC) model. The global and local features of our identifiability result are consistent with other studies about the level-1 network with or without NMSC (i.e. [1] and NANUQ in [2]). For the second question, we make progress towards deriving sample complexity bounds for our new species tree reconstruction methods based on sample averages of internode distances. Our sample complexity depends polylogarithmically on N, the number of species in the tree. Our results are similar to sample complexity results for other stochastically consistent species tree inference algorithms, i.e., GLASS (Mossel and Roch, [3]) and ASTRAL (Mirarab et al. [4] and Shekhar et al. [5]), although do not quite match them. For the last question, we consider the gene duplication and loss (GDL) model and estimate the species trees from multilocus data sets. As a top-down model, GDL explains the phenomenon of gene tree incongruence by gene duplication and loss. In Chapter 5, we derive a theoretical result using expected internode distance to recover the unrooted species tree topology under the GDL model in the loss-less case. As far as we know, this is the first theoretical result about the recoverability of the species tree using distance-based methods under the GDL model.

Phylogenetics

Phylogenetics
Author: E. O. Wiley
Publisher: John Wiley & Sons
Total Pages: 444
Release: 2011-06-07
Genre: Science
ISBN: 0470905964

The long-awaited revision of the industry standard on phylogenetics Since the publication of the first edition of this landmark volume more than twenty-five years ago, phylogenetic systematics has taken its place as the dominant paradigm of systematic biology. It has profoundly influenced the way scientists study evolution, and has seen many theoretical and technical advances as the field has continued to grow. It goes almost without saying that the next twenty-five years of phylogenetic research will prove as fascinating as the first, with many exciting developments yet to come. This new edition of Phylogenetics captures the very essence of this rapidly evolving discipline. Written for the practicing systematist and phylogeneticist, it addresses both the philosophical and technical issues of the field, as well as surveys general practices in taxonomy. Major sections of the book deal with the nature of species and higher taxa, homology and characters, trees and tree graphs, and biogeography—the purpose being to develop biologically relevant species, character, tree, and biogeographic concepts that can be applied fruitfully to phylogenetics. The book then turns its focus to phylogenetic trees, including an in-depth guide to tree-building algorithms. Additional coverage includes: Parsimony and parsimony analysis Parametric phylogenetics including maximum likelihood and Bayesian approaches Phylogenetic classification Critiques of evolutionary taxonomy, phenetics, and transformed cladistics Specimen selection, field collecting, and curating Systematic publication and the rules of nomenclature Providing a thorough synthesis of the field, this important update to Phylogenetics is essential for students and researchers in the areas of evolutionary biology, molecular evolution, genetics and evolutionary genetics, paleontology, physical anthropology, and zoology.

Bayesian Evolutionary Analysis with BEAST

Bayesian Evolutionary Analysis with BEAST
Author: Alexei J. Drummond
Publisher: Cambridge University Press
Total Pages: 263
Release: 2015-08-06
Genre: Science
ISBN: 1316298345

What are the models used in phylogenetic analysis and what exactly is involved in Bayesian evolutionary analysis using Markov chain Monte Carlo (MCMC) methods? How can you choose and apply these models, which parameterisations and priors make sense, and how can you diagnose Bayesian MCMC when things go wrong? These are just a few of the questions answered in this comprehensive overview of Bayesian approaches to phylogenetics. This practical guide: • Addresses the theoretical aspects of the field • Advises on how to prepare and perform phylogenetic analysis • Helps with interpreting analyses and visualisation of phylogenies • Describes the software architecture • Helps developing BEAST 2.2 extensions to allow these models to be extended further. With an accompanying website providing example files and tutorials (http://beast2.org/), this one-stop reference to applying the latest phylogenetic models in BEAST 2 will provide essential guidance for all users – from those using phylogenetic tools, to computational biologists and Bayesian statisticians.

Gossip Algorithms

Gossip Algorithms
Author: Devavrat Shah
Publisher: Now Publishers Inc
Total Pages: 140
Release: 2009
Genre: Computers
ISBN: 1601982364

A systematic survey of many of these recent results on Gossip network algorithms.

Molecular Evolution

Molecular Evolution
Author: Ziheng Yang
Publisher: Oxford University Press
Total Pages: 509
Release: 2014
Genre: Science
ISBN: 0199602603

Studies of evolution at the molecular level have experienced phenomenal growth in the last few decades, due to rapid accumulation of genetic sequence data, improved computer hardware and software, and the development of sophisticated analytical methods. The flood of genomic data has generated an acute need for powerful statistical methods and efficient computational algorithms to enable their effective analysis and interpretation. Molecular Evolution: a statistical approach presents and explains modern statistical methods and computational algorithms for the comparative analysis of genetic sequence data in the fields of molecular evolution, molecular phylogenetics, statistical phylogeography, and comparative genomics. Written by an expert in the field, the book emphasizes conceptual understanding rather than mathematical proofs. The text is enlivened with numerous examples of real data analysis and numerical calculations to illustrate the theory, in addition to the working problems at the end of each chapter. The coverage of maximum likelihood and Bayesian methods are in particular up-to-date, comprehensive, and authoritative. This advanced textbook is aimed at graduate level students and professional researchers (both empiricists and theoreticians) in the fields of bioinformatics and computational biology, statistical genomics, evolutionary biology, molecular systematics, and population genetics. It will also be of relevance and use to a wider audience of applied statisticians, mathematicians, and computer scientists working in computational biology.

Fundamentals of Tree Ring Research

Fundamentals of Tree Ring Research
Author: James H. Speer
Publisher: University of Arizona Press
Total Pages: 360
Release: 2010
Genre: Science
ISBN: 0816526850

This comprehensive text addresses all of the subjects that a reader who is new to the field will need to know and will be a welcome reference for practitioners at all levels. It includes a history of the discipline, biological and ecological background, principles of the field, basic scientific information on the structure and growth of trees, the complete range of dendrochronology methods, and a full description of each of the relevant subdisciplines.

Bioinformatics and Phylogenetics

Bioinformatics and Phylogenetics
Author: Tandy Warnow
Publisher: Springer
Total Pages: 410
Release: 2019-04-08
Genre: Computers
ISBN: 3030108376

This volume presents a compelling collection of state-of-the-art work in algorithmic computational biology, honoring the legacy of Professor Bernard M.E. Moret in this field. Reflecting the wide-ranging influences of Prof. Moret’s research, the coverage encompasses such areas as phylogenetic tree and network estimation, genome rearrangements, cancer phylogeny, species trees, divide-and-conquer strategies, and integer linear programming. Each self-contained chapter provides an introduction to a cutting-edge problem of particular computational and mathematical interest. Topics and features: addresses the challenges in developing accurate and efficient software for the NP-hard maximum likelihood phylogeny estimation problem; describes the inference of species trees, covering strategies to scale phylogeny estimation methods to large datasets, and the construction of taxonomic supertrees; discusses the inference of ultrametric distances from additive distance matrices, and the inference of ancestral genomes under genome rearrangement events; reviews different techniques for inferring evolutionary histories in cancer, from the use of chromosomal rearrangements to tumor phylogenetics approaches; examines problems in phylogenetic networks, including questions relating to discrete mathematics, and issues of statistical estimation; highlights how evolution can provide a framework within which to understand comparative and functional genomics; provides an introduction to Integer Linear Programming and its use in computational biology, including its use for solving the Traveling Salesman Problem. Offering an invaluable source of insights for computer scientists, applied mathematicians, and statisticians, this illuminating volume will also prove useful for graduate courses on computational biology and bioinformatics.