Statistical Methods for Large-scale Multiple Testing Problems

Statistical Methods for Large-scale Multiple Testing Problems
Author: Yu Gao
Publisher:
Total Pages: 100
Release: 2019
Genre: Genetics
ISBN:

A large-scale multiple testing problem simultaneously tests thousands or even millions of null hypotheses, and it is widely used in different fields, for example genetics and astronomy. An error rate serves as a measure of the performance of a testing procedure. The use of the family-wise error rate can accommodate any dependence between hypotheses, but it is often overly conservative and has limited detection power.The false discovery rate is more powerful, however not as widely used due to the requirement of independence and other reasons. In this thesis, we develop statistical methods for large-scale multiple testing problems in pharmacovigilance and genetic studies, and adopt the false discovery rate to improve the detection power by tacking mixed challenges.

Large-scale Multiple Hypothesis Testing with Complex Data Structure

Large-scale Multiple Hypothesis Testing with Complex Data Structure
Author: Xiaoyu Dai
Publisher:
Total Pages: 104
Release: 2018
Genre: Electronic dissertations
ISBN:

In the last decade, motivated by a variety of applications in medicine, bioinformatics, genomics, brain imaging, etc., a growing amount of statistical research has been devoted to large-scale multiple testing, where thousands or even greater numbers of tests are conducted simultaneously. However, due to the complexity of real data sets, the assumptions of many existing multiple testing procedures, e.g. that tests are independent and have continuous null distributions of p-values, may not hold. This poses limitations in their performances such as low detection power and inflated false discovery rate (FDR). In this dissertation, we study how to better proceed the multiple testing problems under complex data structures. In Chapter 2, we study the multiple testing with discrete test statistics. In Chapter 3, we study the discrete multiple testing with prior ordering information incorporated. In Chapter 4, we study the multiple testing under complex dependency structure. We propose novel procedures under each scenario, based on the marginal critical functions (MCFs) of randomized tests, the conditional random field (CRF) or the deep neural network (DNN). The theoretical properties of our procedures are carefully studied, and their performances are evaluated through various simulations and real applications with the analysis of genetic data from next-generation sequencing (NGS) experiments.

A Multiple-Testing Approach to the Multivariate Behrens-Fisher Problem

A Multiple-Testing Approach to the Multivariate Behrens-Fisher Problem
Author: Tejas Desai
Publisher: Springer Science & Business Media
Total Pages: 60
Release: 2013-02-26
Genre: Mathematics
ISBN: 1461464439

​​ ​ In statistics, the Behrens–Fisher problem is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples. In his 1935 paper, Fisher outlined an approach to the Behrens-Fisher problem. Since high-speed computers were not available in Fisher’s time, this approach was not implementable and was soon forgotten. Fortunately, now that high-speed computers are available, this approach can easily be implemented using just a desktop or a laptop computer. Furthermore, Fisher’s approach was proposed for univariate samples. But this approach can also be generalized to the multivariate case. In this monograph, we present the solution to the afore-mentioned multivariate generalization of the Behrens-Fisher problem. We start out by presenting a test of multivariate normality, proceed to test(s) of equality of covariance matrices, and end with our solution to the multivariate Behrens-Fisher problem. All methods proposed in this monograph will be include both the randomly-incomplete-data case as well as the complete-data case. Moreover, all methods considered in this monograph will be tested using both simulations and examples. ​

Handbook of Multiple Comparisons

Handbook of Multiple Comparisons
Author: Xinping Cui
Publisher: CRC Press
Total Pages: 418
Release: 2021-11-18
Genre: Mathematics
ISBN: 0429633882

Written by experts that include originators of some key ideas, chapters in the Handbook of Multiple Testing cover multiple comparison problems big and small, with guidance toward error rate control and insights on how principles developed earlier can be applied to current and emerging problems. Some highlights of the coverages are as follows. Error rate control is useful for controlling the incorrect decision rate. Chapter 1 introduces Tukey's original multiple comparison error rates and point to how they have been applied and adapted to modern multiple comparison problems as discussed in the later chapters. Principles endure. While the closed testing principle is more familiar, Chapter 4 shows the partitioning principle can derive confidence sets for multiple tests, which may become important as the profession goes beyond making decisions based on p-values. Multiple comparisons of treatment efficacy often involve multiple doses and endpoints. Chapter 12 on multiple endpoints explains how different choices of endpoint types lead to different multiplicity adjustment strategies, while Chapter 11 on the MCP-Mod approach is particularly useful for dose-finding. To assess efficacy in clinical trials with multiple doses and multiple endpoints, the reader can see the traditional approach in Chapter 2, the Graphical approach in Chapter 5, and the multivariate approach in Chapter 3. Personalized/precision medicine based on targeted therapies, already a reality, naturally leads to analysis of efficacy in subgroups. Chapter 13 draws attention to subtle logical issues in inferences on subgroups and their mixtures, with a principled solution that resolves these issues. This chapter has implication toward meeting the ICHE9R1 Estimands requirement. Besides the mere multiple testing methodology itself, the handbook also covers related topics like the statistical task of model selection in Chapter 7 or the estimation of the proportion of true null hypotheses (or, in other words, the signal prevalence) in Chapter 8. It also contains decision-theoretic considerations regarding the admissibility of multiple tests in Chapter 6. The issue of selected inference is addressed in Chapter 9. Comparison of responses can involve millions of voxels in medical imaging or SNPs in genome-wide association studies (GWAS). Chapter 14 and Chapter 15 provide state of the art methods for large scale simultaneous inference in these settings.

Resampling-Based Multiple Testing

Resampling-Based Multiple Testing
Author: Peter H. Westfall
Publisher: John Wiley & Sons
Total Pages: 382
Release: 1993-01-12
Genre: Mathematics
ISBN: 9780471557616

Combines recent developments in resampling technology (including the bootstrap) with new methods for multiple testing that are easy to use, convenient to report and widely applicable. Software from SAS Institute is available to execute many of the methods and programming is straightforward for other applications. Explains how to summarize results using adjusted p-values which do not necessitate cumbersome table look-ups. Demonstrates how to incorporate logical constraints among hypotheses, further improving power.

Multiple Testing and False Discovery Rate Control

Multiple Testing and False Discovery Rate Control
Author: Shiyun Chen
Publisher:
Total Pages: 142
Release: 2019
Genre:
ISBN:

Multiple testing, a situation where multiple hypothesis tests are performed simultaneously, is a core research topic in statistics that arises in almost every scientific field. When more hypotheses are tested, more errors are bound to occur. Controlling the false discovery rate (FDR) [BH95], which is the expected proportion of falsely rejected null hypotheses among all rejections, is an important challenge for making meaningful inferences. Throughout the dissertation, we analyze the asymptotic performance of several FDR-controlling procedures under different multiple testing settings. In Chapter 1, we study the famous Benjamini-Hochberg (BH) method [BH95] which often serves as benchmark among FDR-controlling procedures, and show that it is asymptotic optimal in a stylized setting. We then prove that a distribution-free FDR control method of Barber and Candès [FBC15], which only requires the (unknown) null distribution to be symmetric, can achieve the same asymptotic performance as the BH method, thus is also optimal. Chapter 2 proposes an interval-type procedure which identifies the longest interval with the estimated FDR under a given level and rejects the corresponding hypotheses with P-values lying inside the interval. Unlike the threshold approaches, this procedure scans over all intervals with the left point not necessary being zero. We show that this scan procedure provides strong control of the asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate (FNR), deriving conditions under which it outperforms the BH procedure. In Chapter 3, we consider an online multiple testing problem where the hypotheses arrive sequentially in a stream, and investigate two procedures proposed by Javanmard and Montanari [JM15] which control FDR in an online manner. We quantify their asymptotic performance in the same location models as in Chapter 1 and compare their power with the (static) BH method. In Chapter 4, we propose a new class of powerful online testing procedures which incorporates the available contextual information, and prove that any rule in this class controls the online FDR under some standard assumptions. We also derive a practical algorithm that can make more empirical discoveries in an online fashion, compared to the state-of-the-art procedures.

Model-free Methods for Multiple Testing and Predictive Inference

Model-free Methods for Multiple Testing and Predictive Inference
Author: Zhimei Ren
Publisher:
Total Pages:
Release: 2021
Genre:
ISBN:

Recent advances in technology have allowed us to collect, store and process an enormous amount of data, bringing unprecedented challenges to interpretable data analysis: first, the structure of data is often complicated, while model assumptions are hard to justify in practice; second, the algorithms used to analyze the data can be extremely complex--think of the convolutional neural nets--making it difficult to develop validity guarantees for the results. Indeed, it has been noticed by researchers that many of the classical statistical methods fail when applied to the modern type of problems--we need a new set of tools to conduct statistical data analysis in the modern era. This dissertation contributes to the toolbox of statistical data analysis in the modern world by presenting several model-free methods for multiple testing and predictive inference. The methods proposed in this dissertation, building upon knockoffs and conformal inference, bypass the modelling of the data structure and the analysis of complex algorithms, and work as wrappers of other (potentially black-box) existing algorithms. Despite the flexibility of these methods, they are guaranteed to achieve statistical validity under the minimal set of assumptions. The validity and efficacy of these methods are evaluated in extensive numerical experiments. Applying these methods to real genetic and clinical data has led to new scientific insights.

Permutation Tests

Permutation Tests
Author: Phillip I. Good
Publisher: Springer Science & Business Media
Total Pages: 296
Release: 2000
Genre: Mathematics
ISBN:

This book provides a step-by-step manual on the application of permutation tests in biology, business, medicine, science, and engineering. The first edition of this book is well known for its intuitive and informal style, and the inclusion of numerous real-world problems. This new edition has more than l00 additional pages, and includes streamlined statistics for the k-sample comparison and analysis of variance plus expanded sections on computational techniques, multiple comparisons, multiple regression, comparing variances, and testing interactions in balanced designs.