Topics in Phylogenetic Species Tree Inference Under the Coalescent Model

Topics in Phylogenetic Species Tree Inference Under the Coalescent Model
Author: Yuan Tian
Publisher:
Total Pages: 160
Release: 2016
Genre:
ISBN:

Phylogenetic tree inference is a fundamental tool to estimate the ancestor-descendant relationships for different species. Currently, it is of great interest to explore the evolutionary relationships for a set of species for which DNA data have been collected and thus accurate and efficient methods are required to estimate phylogenetic trees. However, because the evolutionary relationships can be analyzed at two distinct levels (gene trees and species trees), and it is not necessary for the gene trees and species trees to agree with one another, phylogenetic inference has become increasingly complicated. Incomplete lineage sorting (ILS) is considered to be one of the major factors that cause disagreement between species trees and gene trees. The coalescent process is a widely-accepted model for ILS, and numerous genealogy-based phylogenetic inference methods have been established based on the coalescent model. In this thesis, coalescent-based methods for phylogenetic tree inference are studied. In Chapter 2, the expected amount of incongruence between gene trees under the same species tree is considered. More specifically, the extent of gene tree incongruence arising from incomplete lineage sorting, as modeled by the coalescent process, is computed. The results in Chapter 2 highlight the fact that substantial discordance among gene trees may occur, even when the number of species is very small. In Chapter 3, a coalescent model for three species that allows gene flow between both pairs of sister populations is proposed, and the resulting gene tree history distribution is derived. The results suggest conditions under which the species tree and associated parameters, such as the ancestral effective population sizes and the rates of gene flow, are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the coalescent history distribution may identify the species tree and associated parameters. In Chapter 4, a rooting method based on the site pattern probabilities under the coalescent model is developed. The proposed technique provides a method to root every four-taxon species tree within a larger species tree of more than four taxa. The inferred roots for the four-taxon subtrees are then used together to estimate the root for the larger species tree. This rooting method is a computationally feasible method, and is the first method proposed to root a species tree that explicitly incorporates the coalescent process.

Molecular Evolution

Molecular Evolution
Author: Ziheng Yang
Publisher: Oxford University Press
Total Pages: 509
Release: 2014
Genre: Science
ISBN: 0199602603

Studies of evolution at the molecular level have experienced phenomenal growth in the last few decades, due to rapid accumulation of genetic sequence data, improved computer hardware and software, and the development of sophisticated analytical methods. The flood of genomic data has generated an acute need for powerful statistical methods and efficient computational algorithms to enable their effective analysis and interpretation. Molecular Evolution: a statistical approach presents and explains modern statistical methods and computational algorithms for the comparative analysis of genetic sequence data in the fields of molecular evolution, molecular phylogenetics, statistical phylogeography, and comparative genomics. Written by an expert in the field, the book emphasizes conceptual understanding rather than mathematical proofs. The text is enlivened with numerous examples of real data analysis and numerical calculations to illustrate the theory, in addition to the working problems at the end of each chapter. The coverage of maximum likelihood and Bayesian methods are in particular up-to-date, comprehensive, and authoritative. This advanced textbook is aimed at graduate level students and professional researchers (both empiricists and theoreticians) in the fields of bioinformatics and computational biology, statistical genomics, evolutionary biology, molecular systematics, and population genetics. It will also be of relevance and use to a wider audience of applied statisticians, mathematicians, and computer scientists working in computational biology.

Species Tree Inference

Species Tree Inference
Author: Laura Kubatko
Publisher: Princeton University Press
Total Pages: 352
Release: 2023-03-14
Genre: Science
ISBN: 0691207607

"Inferring evolutionary relationships among a collection of organisms -- that is, their relationship to each other on the tree of life -- remains a central focus of much of evolutionary biology as these relationships provide the background for key hypotheses. For example, support for different hypotheses about early animal evolution are contingent upon the phylogenetic relationships among the earliest animal lineages. Within the last 20 years, the field of phylogenetics has grown rapidly, both in the quantity of data available for inference and in the number of methods available for phylogenetic estimation. The authors' first book, "Estimating Species Trees: Practical and Theoretical Aspects", published in 2010, gave an overview of the state of phylogenetic practice for analyzing data at the time, but much has changed since then. The goal of this book is to serve as an updated reference on current methods within the field. The book is organized in three sections, the first of which provides an overview of the analytical and methodological developments of species tree inference. Section two focuses on empirical inference. Section three explores various applications of species trees in evolutionary biology. The combination of theoretical and empirical approaches is meant to provide readers with a level of knowledge of both the advances and limitations of species-tree inference that can help researchers in applying the methods, while also inspiring future advances among those researchers with an interest in methodological development"--

Estimating Species Trees

Estimating Species Trees
Author: L. Lacey Knowles
Publisher: John Wiley & Sons
Total Pages: 332
Release: 2011-09-20
Genre: Science
ISBN: 1118211405

Recent computational and modeling advances have produced methods for estimating species trees directly, avoiding the problems and limitations of the traditional phylogenetic paradigm where an estimated gene tree is equated with the history of species divergence. The overarching goal of the volume is to increase the visibility and use of these new methods by the entire phylogenetic community by specifically addressing several challenges: (i) firm understanding of the theoretical underpinnings of the methodology, (ii) empirical examples demonstrating the utility of the methodology as well as its limitations, and (iii) attention to technical aspects involved in the actual software implementation of the methodology. As such, this volume will not only be poised to become the quintessential guide to training the next generation of researchers, but it will also be instrumental in ushering in a new phylogenetic paradigm for the 21st century.

Inferring Phylogenies

Inferring Phylogenies
Author: Joseph Felsenstein
Publisher: Sinauer Associates Incorporated
Total Pages: 664
Release: 2004-01
Genre: Science
ISBN: 9780878931774

Phylogenies, or evolutionary trees, are the basic structures necessary to think about and analyze differences between species. Statistical, computational, and algorithmic work in this field has been ongoing for four decades now, and there have been great advances in understanding. Yet no book has summarized this work. Inferring Phylogenies does just that in a single, compact volume. Phylogenies are inferred with various kinds of data. This book concentrates on some of the central ones: discretely coded characters, molecular sequences, gene frequencies, and quantitative traits. Also covered are restriction sites, RAPDs, and microsatellites.

The Phylogenetic Handbook

The Phylogenetic Handbook
Author: Marco Salemi
Publisher: Cambridge University Press
Total Pages: 750
Release: 2009-03-26
Genre: Science
ISBN: 0521877105

A broad, hands on guide with detailed explanations of current methodology, relevant exercises and popular software tools.

Analysis of Phylogenetics and Evolution with R

Analysis of Phylogenetics and Evolution with R
Author: Emmanuel Paradis
Publisher: Springer Science & Business Media
Total Pages: 221
Release: 2006-11-25
Genre: Science
ISBN: 0387351000

This book integrates a wide variety of data analysis methods into a single and flexible interface: the R language. The book starts with a presentation of different R packages and gives a short introduction to R for phylogeneticists unfamiliar with this language. The basic phylogenetic topics are covered. The chapter on tree drawing uses R's powerful graphical environment. A section deals with the analysis of diversification with phylogenies, one of the author's favorite research topics. The last chapter is devoted to the development of phylogenetic methods with R and interfaces with other languages (C and C++). Some exercises conclude these chapters.

Gene Genealogies, Variation and Evolution: A primer in coalescent theory

Gene Genealogies, Variation and Evolution: A primer in coalescent theory
Author: Jotun Hein
Publisher: Oxford University Press, USA
Total Pages: 298
Release: 2004-12-09
Genre: Population genetics
ISBN: 9780191546150

Authored by leading experts, this seminal text presents a straightforward and elementary account of coalescent theory, which is a central concept in the study of genetic sequence variation observed in a population. Rich in examples and illustrations it is ideal for a graduate course in statistics, population, molecular and medical genetics, bioscience and medicine, and for students studying the evolution of human population and disease. It is also an invaluable reference for bioscientists and statisticians in the pharmaceutical industry and academia - ;Coalescent theory is a central concept in the study of genetic sequence variation that probabilistically describes the genealogy relating the sampled sequences. In this text, besides fulfilling the glaring need for such a book, the authors present this theory in a straightforward and elementary manner and describe the statistical and computational methods used in modelling and analyzing genetic sequence variation. Rich in examples and illustrations the book covers basic concepts, complications arising from geographical structure and recombination before considering aspects of statistical inference based on these models. The book ends with chapters on Gene Mapping, which combines sequence variation data with phenotypic data (such as disease) to define areas of the genome where genes are responsible for the trait, and Human Evolution, a research area that is experiencing a renaissance due to the enormous amounts of data produced in molecular studies. Authored by leading experts, this seminal text presents a straightforward and elementary account of coalescent theory, which is a central concept in the study of genetic sequence variation observed in a population. It is highly suitable for a graduate course in statistics, population, molecular and medical genetics, bioscience and medicine and students studying the evolution of human population and disease, and will be an invaluable reference for bioscientists and statisticians in the pharmaceutical industry and academia - ;an excellent and timely book that should appeal to a variety of people in genetics and applied mathematics. - Professor Montgomery Slatkin (Berkeley);the authors are outstanding experts in the field, and the book is topical and timely. - Professor David Balding (Imperial College);Hein, Schierup and Wiuf have written the first general book on the coalescent. It is an engaging combination of clear mathematical derivation and real data examples. - Professor Joe Felsenstein (University of Washington)

Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance

Theoretical Guarantees for Species Tree and Network Reconstruction from Internode Distance
Author: Yu Sun (Ph.D.)
Publisher:
Total Pages: 0
Release: 2023
Genre:
ISBN:

Phylogenetic inference is a fundamental aspect of evolutionary biology, with the recoverabilityof the species tree being crucial to the statistical consistency of algorithms that utilize certain statistical quantities. Additionally, it is important to analyze the theoretical sample complexity of statistically consistent algorithms. In this thesis, we summarize our results in three different directions regarding recoverability and sample complexity: 1. For a widely used species tree to gene tree model, can we extend its recoverability result to a more general phylogenetic structure? 2. For a statistically consistent species tree inference algorithm, can we find a good theoretical upper bound for sample complexity? 3. Can we use similar tools for providing recoverability results from a well-known model to other models? For the first question, we consider phylogenetic networks, which is the generalized notion of phylogenetic trees to model non-vertical inheritance, by which a lineage inherits genetic material from multiple parents through hybridization or other lateral gene transfer. Under the network multispecies coalescent (NMSC) model, any topology can be observed in individual gene trees arising from a network. In Chapter 3, we provide a recoverability result for level-1 networks under NMSC through expected internode distances. This result can be considered as an extension of the recoverability of species trees under the multispecies coalescent (MSC) model. The global and local features of our identifiability result are consistent with other studies about the level-1 network with or without NMSC (i.e. [1] and NANUQ in [2]). For the second question, we make progress towards deriving sample complexity bounds for our new species tree reconstruction methods based on sample averages of internode distances. Our sample complexity depends polylogarithmically on N, the number of species in the tree. Our results are similar to sample complexity results for other stochastically consistent species tree inference algorithms, i.e., GLASS (Mossel and Roch, [3]) and ASTRAL (Mirarab et al. [4] and Shekhar et al. [5]), although do not quite match them. For the last question, we consider the gene duplication and loss (GDL) model and estimate the species trees from multilocus data sets. As a top-down model, GDL explains the phenomenon of gene tree incongruence by gene duplication and loss. In Chapter 5, we derive a theoretical result using expected internode distance to recover the unrooted species tree topology under the GDL model in the loss-less case. As far as we know, this is the first theoretical result about the recoverability of the species tree using distance-based methods under the GDL model.

Bayesian Phylogenetics

Bayesian Phylogenetics
Author: Ming-Hui Chen
Publisher: CRC Press
Total Pages: 398
Release: 2014-05-27
Genre: Mathematics
ISBN: 1466500794

Offering a rich diversity of models, Bayesian phylogenetics allows evolutionary biologists, systematists, ecologists, and epidemiologists to obtain answers to very detailed phylogenetic questions. Suitable for graduate-level researchers in statistics and biology, Bayesian Phylogenetics: Methods, Algorithms, and Applications presents a snapshot of current trends in Bayesian phylogenetic research. Encouraging interdisciplinary research, this book introduces state-of-the-art phylogenetics to the Bayesian statistical community and, likewise, presents state-of-the-art Bayesian statistics to the phylogenetics community. The book emphasizes model selection, reflecting recent interest in accurately estimating marginal likelihoods. It also discusses new approaches to improve mixing in Bayesian phylogenetic analyses in which the tree topology varies. In addition, the book covers divergence time estimation, biologically realistic models, and the burgeoning interface between phylogenetics and population genetics.