Large-Scale Global and Simultaneous Inference

Large-Scale Global and Simultaneous Inference
Author: Tony Cai
Publisher:
Total Pages: 0
Release: 2017
Genre:
ISBN:

Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures
Author: Tony Cai
Publisher:
Total Pages:
Release: 2017
Genre:
ISBN:

Driven by a wide range of contemporary applications, statistical inference for covariance structures has been an active area of current research in high-dimensional statistics. This review provides a selective survey of some recent developments in hypothesis testing for high-dimensional covariance structures, including global testing for the overall pattern of the covariance structures and simultaneous testing of a large collection of hypotheses on the local covariance structures with false discovery proportion and false discovery rate control. Both one-sample and two-sample settings are considered. The specific testing problems discussed include global testing for the covariance, correlation, and precision matrices, and multiple testing for the correlations, Gaussian graphical models, and differential networks.

Large-Scale Inference

Large-Scale Inference
Author: Bradley Efron
Publisher: Cambridge University Press
Total Pages:
Release: 2012-11-29
Genre: Mathematics
ISBN: 1139492136

We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.

Topics in Large-scale Statistical Inference

Topics in Large-scale Statistical Inference
Author: Jeffrey Regier
Publisher:
Total Pages: 133
Release: 2016
Genre:
ISBN:

Statistical inference may be large-scale in terms of the size of the dataset, the dimension of the data, or the amount of data needed for provably accurate inference. This dissertation presents three applications of large-scale statistical inference. Part I considers finding and characterizing stars and galaxies in images from telescopes. Part II considers figuring out who wrote what in large collection of articles, where authors often do not have unique names. Part III considers approximating a high-dimensional function based on a small number of observations, a common problem when interpreting computer experiments.

Simultaneous Inference for High Dimensional and Correlated Data

Simultaneous Inference for High Dimensional and Correlated Data
Author: Afroza Polin
Publisher:
Total Pages: 100
Release: 2019
Genre: Correlation (Statistics)
ISBN:

In high dimensional data, the number of covariates is larger than the sample size, which makes the estimation process challenging. We consider a high-dimensional and longitudinal data where at each time point, the number of covariates is much higher than the number of subjects. We consider two different settings of longitudinal data. First, we consider that the samples at different time points are generated from different populations. Second, we consider that the samples at different time points are generated from a multivariate distribution. In both cases, the number of covariates is much larger than the sample size and the standard least square methods are not applicable.In longitudinal study, our main focus is in the changes of the mean responses over the time and how these changes are related to the explanatory variables. Thus we are interested in testing the effect of the covariates over the time points simultaneously. In the first scenario, we use lasso at each time point to regress the response on the explanatory variables. Along with estimating the regression coefficients lasso also does dimension reduction. We use de-biased lasso for inference. To adjust the multiplicity effect in simultaneous testing we apply Bonferroni, Holm's, Hochberg's and the coherent stepwise procedures. In the second scenario, the samples at different time points are generated from a multivariate distribution and the dimension of the multivariate distribution is equal to the number of time points. We use lasso and de-biased lasso for inferences. To adjust the multiplicity effect in simultaneous testing, we use Bonferroni, Holm's, Hochberg's and stepwise procedures. We provide theoretical details that Bonferroni, Holm's step-down and the coherent step-wise procedures controls the family-wise error rate in strong sense for de-biased lasso estimators. While Hochberg's procedure provides a strong control of family-wise error rate only for independent or positively correlated test statistics.