Statistical Inference as Severe Testing

Statistical Inference as Severe Testing
Author: Deborah G. Mayo
Publisher: Cambridge University Press
Total Pages: 503
Release: 2018-09-20
Genre: Mathematics
ISBN: 1108563309

Mounting failures of replication in social and biological sciences give a new urgency to critically appraising proposed reforms. This book pulls back the cover on disagreements between experts charged with restoring integrity to science. It denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run. If statistical consumers are unaware of assumptions behind rival evidence reforms, they can't scrutinize the consequences that affect them (in personalized medicine, psychology, etc.). The book sets sail with a simple tool: if little has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Many methods advocated by data experts do not stand up to severe scrutiny and are in tension with successful strategies for blocking or accounting for cherry picking and selective reporting. Through a series of excursions and exhibits, the philosophy and history of inductive inference come alive. Philosophical tools are put to work to solve problems about science and pseudoscience, induction and falsification.

Statistical Inference: Testing Of Hypotheses

Statistical Inference: Testing Of Hypotheses
Author: Srivastava & Srivastava
Publisher: PHI Learning Pvt. Ltd.
Total Pages: 414
Release: 2009-12
Genre: Reference
ISBN: 812033728X

it emphasizes on J. Neyman and Egon Pearson's mathematical foundations of hypothesis testing, which is one of the finest methodologies of reaching conclusions on population parameter. Following Wald and Ferguson's approach, the book presents Neyman-Pearson theory under broader premises of decision theory resulting into simplification and generalization of results. On account of smooth mathematical development of this theory, the book outlines the main result on Lebesgue theory in abstract spaces prior to rigorous theoretical developments on most powerful (MP), uniformly most powerful (UMP) and UMP unbiased tests for different types of testing problems. Likelihood ratio tests their large sample properties to variety of testing situations and connection between confidence estimation and testing of hypothesis have been discussed in separate chapters. The book illustrates simplification of testing problems and reduction in dimensionality of class of tests resulting into existence of an optimal test through the principle of sufficiency and invariance. It concludes with rigorous theoretical developments on non-parametric tests including their optimality, asymptotic relative efficiency, consistency, and asymptotic null distribution.

A Solution to the Ecological Inference Problem

A Solution to the Ecological Inference Problem
Author: Gary King
Publisher: Princeton University Press
Total Pages: 366
Release: 2013-09-20
Genre: Political Science
ISBN: 1400849209

This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public policy, and other academic disciplines, ranging from epidemiology and marketing to sociology and quantitative history. Although many have attempted to make such cross-level inferences, scholars agree that all existing methods yield very inaccurate conclusions about the world. In this volume, Gary King lays out a unique--and reliable--solution to this venerable problem. King begins with a qualitative overview, readable even by those without a statistical background. He then unifies the apparently diverse findings in the methodological literature, so that only one aggregation problem remains to be solved. He then presents his solution, as well as empirical evaluations of the solution that include over 16,000 comparisons of his estimates from real aggregate data to the known individual-level answer. The method works in practice. King's solution to the ecological inference problem will enable empirical researchers to investigate substantive questions that have heretofore proved unanswerable, and move forward fields of inquiry in which progress has been stifled by this problem.

On Some Inference Problems for Networks

On Some Inference Problems for Networks
Author: Soumendu Sundar Mukherjee
Publisher:
Total Pages: 109
Release: 2018
Genre:
ISBN:

Networks are abstract representations of relationships between a set of entities. As such they can be used to represent data in a variety of complex interactive systems such as people and their social connections, researchers and their collaborations, proteins and their interactions, and so on. Vast amounts of such interaction data are being collected routinely in a range of disciplines and thus call for the attention of the statistician. Due to their large size (number of observations scales as the square of the number of nodes), traditional statistical methods are usually not scalable and one needs to come up with more computationally feasible inference techniques. A concrete example of the issue is the problem of community detection in networks. Traditional likelihood-based methods are computationally intractable, so researchers have come up with various computation-friendly alternatives. Although these methods work well on small to moderately large networks, most of them cannot handle truly large networks in a reasonable amount of time. In this dissertation, we first advance divide and conquer strategies for community detection. We propose two algorithms which perform clustering on a number of small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they bring down significantly the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity based methods, likelihood based methods, etc., without losing on accuracy and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Thus, exploiting the facts that most traditional algorithms are accurate and the corresponding optimization problems are much simpler in small problems, our divide and conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real data analysis to understand the advantages of the divide and conquer approach in various settings. We then extend these divide and conquer methods to the more realistic situation of mixed memberships. Models that can be tackled are the mixed membership blockmodel, topic models, etc. Next we focus on the problem of network comparison. We tackle two aspects of this problem: clustering and changepoint detection. While being able to cluster within a network, in the sense of community detection, is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. For example, brain networks of a group of patients have node correspondence, whereas collaboration networks of researchers in different disciplines such as Computer Science, Mathematics or Statistics will have little node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on traces of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency. In the changepoint problem, one observes a series of networks indexed by time and wishes the check if there is some significant change in the structure of these networks at some point of time. Potential applications are in, for instance, brain imaging, where one has brain scans of individuals collected over time and is looking for abnormalities, ecological networks observed over time, where one wonders if there is a structural change. We consider a CUSUM (short for cumulative sum) statistic for this problem, and prove its consistency. We find that in this high dimensional setting, the estimation error rate is better than the classical rate for fixed dimensional changepoint problems. As applications, we detect changepoints in the MIT reality mining data and the US senate roll call data.

Targeted Learning in Data Science

Targeted Learning in Data Science
Author: Mark J. van der Laan
Publisher: Springer
Total Pages: 655
Release: 2018-03-28
Genre: Mathematics
ISBN: 3319653040

This textbook for graduate students in statistics, data science, and public health deals with the practical challenges that come with big, complex, and dynamic data. It presents a scientific roadmap to translate real-world data science applications into formal statistical estimation problems by using the general template of targeted maximum likelihood estimators. These targeted machine learning algorithms estimate quantities of interest while still providing valid inference. Targeted learning methods within data science area critical component for solving scientific problems in the modern age. The techniques can answer complex questions including optimal rules for assigning treatment based on longitudinal data with time-dependent confounding, as well as other estimands in dependent data structures, such as networks. Included in Targeted Learning in Data Science are demonstrations with soft ware packages and real data sets that present a case that targeted learning is crucial for the next generation of statisticians and data scientists. Th is book is a sequel to the first textbook on machine learning for causal inference, Targeted Learning, published in 2011. Mark van der Laan, PhD, is Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at UC Berkeley. His research interests include statistical methods in genomics, survival analysis, censored data, machine learning, semiparametric models, causal inference, and targeted learning. Dr. van der Laan received the 2004 Mortimer Spiegelman Award, the 2005 Van Dantzig Award, the 2005 COPSS Snedecor Award, the 2005 COPSS Presidential Award, and has graduated over 40 PhD students in biostatistics and statistics. Sherri Rose, PhD, is Associate Professor of Health Care Policy (Biostatistics) at Harvard Medical School. Her work is centered on developing and integrating innovative statistical approaches to advance human health. Dr. Rose’s methodological research focuses on nonparametric machine learning for causal inference and prediction. She co-leads the Health Policy Data Science Lab and currently serves as an associate editor for the Journal of the American Statistical Association and Biostatistics.

The Prevention and Treatment of Missing Data in Clinical Trials

The Prevention and Treatment of Missing Data in Clinical Trials
Author: National Research Council
Publisher: National Academies Press
Total Pages: 163
Release: 2010-12-21
Genre: Medical
ISBN: 030918651X

Randomized clinical trials are the primary tool for evaluating new medical interventions. Randomization provides for a fair comparison between treatment and control groups, balancing out, on average, distributions of known and unknown factors among the participants. Unfortunately, these studies often lack a substantial percentage of data. This missing data reduces the benefit provided by the randomization and introduces potential biases in the comparison of the treatment groups. Missing data can arise for a variety of reasons, including the inability or unwillingness of participants to meet appointments for evaluation. And in some studies, some or all of data collection ceases when participants discontinue study treatment. Existing guidelines for the design and conduct of clinical trials, and the analysis of the resulting data, provide only limited advice on how to handle missing data. Thus, approaches to the analysis of data with an appreciable amount of missing values tend to be ad hoc and variable. The Prevention and Treatment of Missing Data in Clinical Trials concludes that a more principled approach to design and analysis in the presence of missing data is both needed and possible. Such an approach needs to focus on two critical elements: (1) careful design and conduct to limit the amount and impact of missing data and (2) analysis that makes full use of information on all randomized participants and is based on careful attention to the assumptions about the nature of the missing data underlying estimates of treatment effects. In addition to the highest priority recommendations, the book offers more detailed recommendations on the conduct of clinical trials and techniques for analysis of trial data.

An Introduction to Causal Inference

An Introduction to Causal Inference
Author: Judea Pearl
Publisher: Createspace Independent Publishing Platform
Total Pages: 0
Release: 2015
Genre: Causation
ISBN: 9781507894293

This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation. -- p. 1.

Some Statistical Inference Problems in Data Fusion and Semi-Parametric Regression

Some Statistical Inference Problems in Data Fusion and Semi-Parametric Regression
Author:
Publisher:
Total Pages: 4
Release: 1996
Genre:
ISBN:

The proposal dealt with several problems in the areas of data fusion and semi-parametric regression models. The problems were formulated specifically in the context of linear models, both univariate and multivariate. Some data fusion problems that arise in the area of calibration has been successfully solved. In addition, in the context of the calibration problem, univariate as well as multivariate, several satisfactory confidence regions have been constructed and applied to real data. In the same spirit, confidence regions and tests in some linear functional relationship models have been derived as well.