Simultaneous Inference for High Dimensional and Correlated Data

Simultaneous Inference for High Dimensional and Correlated Data
Author: Afroza Polin
Publisher:
Total Pages: 100
Release: 2019
Genre: Correlation (Statistics)
ISBN:

In high dimensional data, the number of covariates is larger than the sample size, which makes the estimation process challenging. We consider a high-dimensional and longitudinal data where at each time point, the number of covariates is much higher than the number of subjects. We consider two different settings of longitudinal data. First, we consider that the samples at different time points are generated from different populations. Second, we consider that the samples at different time points are generated from a multivariate distribution. In both cases, the number of covariates is much larger than the sample size and the standard least square methods are not applicable.In longitudinal study, our main focus is in the changes of the mean responses over the time and how these changes are related to the explanatory variables. Thus we are interested in testing the effect of the covariates over the time points simultaneously. In the first scenario, we use lasso at each time point to regress the response on the explanatory variables. Along with estimating the regression coefficients lasso also does dimension reduction. We use de-biased lasso for inference. To adjust the multiplicity effect in simultaneous testing we apply Bonferroni, Holm's, Hochberg's and the coherent stepwise procedures. In the second scenario, the samples at different time points are generated from a multivariate distribution and the dimension of the multivariate distribution is equal to the number of time points. We use lasso and de-biased lasso for inferences. To adjust the multiplicity effect in simultaneous testing, we use Bonferroni, Holm's, Hochberg's and stepwise procedures. We provide theoretical details that Bonferroni, Holm's step-down and the coherent step-wise procedures controls the family-wise error rate in strong sense for de-biased lasso estimators. While Hochberg's procedure provides a strong control of family-wise error rate only for independent or positively correlated test statistics.

Valid Simultaneous Inference in High-dimensional Settings (with the HDM Package for R)

Valid Simultaneous Inference in High-dimensional Settings (with the HDM Package for R)
Author: Philipp Bach
Publisher:
Total Pages:
Release: 2019
Genre:
ISBN:

Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and efficient hypothesis tests for a potentially large number of coefficients as well as the construction of simultaneous confidence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO.

Simultaneous Statistical Inference

Simultaneous Statistical Inference
Author: Thorsten Dickhaus
Publisher: Springer Science & Business Media
Total Pages: 182
Release: 2014-01-23
Genre: Science
ISBN: 3642451829

This monograph will provide an in-depth mathematical treatment of modern multiple test procedures controlling the false discovery rate (FDR) and related error measures, particularly addressing applications to fields such as genetics, proteomics, neuroscience and general biology. The book will also include a detailed description how to implement these methods in practice. Moreover new developments focusing on non-standard assumptions are also included, especially multiple tests for discrete data. The book primarily addresses researchers and practitioners but will also be beneficial for graduate students.

Partially Linear Models

Partially Linear Models
Author: Wolfgang Härdle
Publisher: Springer Science & Business Media
Total Pages: 210
Release: 2012-12-06
Genre: Mathematics
ISBN: 3642577008

In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.

Simultaneous Inference on Sample Covariances

Simultaneous Inference on Sample Covariances
Author: Han Xiao
Publisher:
Total Pages: 125
Release: 2011
Genre:
ISBN: 9781124869605

This thesis considers the maximum deviations of the sample covariances in the contexts of high dimensional data analysis and time series analysis.

Resampling-Based Multiple Testing

Resampling-Based Multiple Testing
Author: Peter H. Westfall
Publisher: John Wiley & Sons
Total Pages: 382
Release: 1993-01-12
Genre: Mathematics
ISBN: 9780471557616

Combines recent developments in resampling technology (including the bootstrap) with new methods for multiple testing that are easy to use, convenient to report and widely applicable. Software from SAS Institute is available to execute many of the methods and programming is straightforward for other applications. Explains how to summarize results using adjusted p-values which do not necessitate cumbersome table look-ups. Demonstrates how to incorporate logical constraints among hypotheses, further improving power.

High-Dimensional Probability

High-Dimensional Probability
Author: Roman Vershynin
Publisher: Cambridge University Press
Total Pages: 299
Release: 2018-09-27
Genre: Business & Economics
ISBN: 1108415199

An integrated package of powerful probabilistic tools and key applications in modern mathematical data science.

Epigenetic Biomarker and Personalized Precision Medicine

Epigenetic Biomarker and Personalized Precision Medicine
Author: Jiucun Wang
Publisher: Frontiers Media SA
Total Pages: 485
Release: 2020-12-21
Genre: Science
ISBN: 2889661849

This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Handbook of Big Data Analytics

Handbook of Big Data Analytics
Author: Wolfgang Karl Härdle
Publisher: Springer
Total Pages: 532
Release: 2018-07-20
Genre: Computers
ISBN: 3319182846

Addressing a broad range of big data analytics in cross-disciplinary applications, this essential handbook focuses on the statistical prospects offered by recent developments in this field. To do so, it covers statistical methods for high-dimensional problems, algorithmic designs, computation tools, analysis flows and the software-hardware co-designs that are needed to support insightful discoveries from big data. The book is primarily intended for statisticians, computer experts, engineers and application developers interested in using big data analytics with statistics. Readers should have a solid background in statistics and computer science.

Large-Scale Global and Simultaneous Inference

Large-Scale Global and Simultaneous Inference
Author: Tony Cai
Publisher:
Total Pages: 0
Release: 2017
Genre:
ISBN:

Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.