Contributions to Numerical Formal Concept Analysis, Bayesian Predictive Inference and Sample Size Determination

Contributions to Numerical Formal Concept Analysis, Bayesian Predictive Inference and Sample Size Determination
Author: Junheng Ma
Publisher:
Total Pages: 154
Release: 2011
Genre:
ISBN:

This dissertation contributes to three areas in Statistics: Numerical Formal Concept Analysis (nFCA), Bayesian predictive inference and sample size determination, and has applications beyond statistics. Formal concept analysis (FCA) is a powerful data analysis tool, popular in Computer Science (CS), to visualize binary data and its inherent structure. In the first part of this dissertation, Numerical Formal Concept Analysis (nFCA) is developed. It overcomes FCA's limitation to provide a new methodology for analyzing more general numerical data. It combines the Statistics and Computer Science graphical visualization to provide a pair of nFCA graphs, H-graph and I-graph, to reveal the hierarchical clustering and inherent structure among the data. Comparing with conventional statistical hierarchical clustering methods, nFCA provides more intuitive and complete relational network among the data. nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient which measures the consistency of a dendrogram to the original distance matrix. We have also applied nFCA to cardiovascular (CV) traits data. nFCA produces consistent results to the earlier discovery and provides a complete relational network among the CV traits. In the second part of this dissertation, Bayesian predictive inference is investigated for finite population quantities under informative sampling, i.e., unequal selection probabilities. Only limited information about the sample design is available, i.e., only the first-order selection probabilities corresponding to the sampled units are known. We have developed a full Bayesian approach to make inference for the parameters of the finite population and also predictive inference for the non-sampled units. Thus we can make inference for any characteristic of the finite population quantities. In addition, our methodology, using Markov chain Monte Carlo, avoids the necessity of using asymptotic approximations. Sample size determination is one of the most important practical tasks for statisticians. There has been extensive research to develop appropriate methodology for sample size determination, say, for continuous, or ordered categorical outcome data. However, sample size determination for comparative studies with unordered categorical data remains largely untouched. In terms of statistical terminology, one is interested in finding the sample size needed to detect a specified difference between the parameters of two multinomial distributions. For this purpose, in the third part of this dissertation, we have developed a frequentist approach based on a chi-squared test to calculate the required sampled size. Three improvement for the original frequentist approach (using bootstrap, minimum difference and asymptotic correction) have been proposed and investigated. In addition, using an extension of a posterior predictive p-value, we further develop a simulation-based Bayesian approach to determine the required sample size. The performance of these methods is evaluated via both a simulation study and a real application to Leukoplakia lesion data. Some asymptotic are also provided.

Extreme Value Modeling and Risk Analysis

Extreme Value Modeling and Risk Analysis
Author: Dipak K. Dey
Publisher: CRC Press
Total Pages: 538
Release: 2016-01-06
Genre: Mathematics
ISBN: 1498701310

Extreme Value Modeling and Risk Analysis: Methods and Applications presents a broad overview of statistical modeling of extreme events along with the most recent methodologies and various applications. The book brings together background material and advanced topics, eliminating the need to sort through the massive amount of literature on the subje

All of Statistics

All of Statistics
Author: Larry Wasserman
Publisher: Springer Science & Business Media
Total Pages: 446
Release: 2013-12-11
Genre: Mathematics
ISBN: 0387217363

Taken literally, the title "All of Statistics" is an exaggeration. But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like non-parametric curve estimation, bootstrapping, and classification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analysing data.

Statistical Inference as Severe Testing

Statistical Inference as Severe Testing
Author: Deborah G. Mayo
Publisher: Cambridge University Press
Total Pages: 503
Release: 2018-09-20
Genre: Mathematics
ISBN: 1108563309

Mounting failures of replication in social and biological sciences give a new urgency to critically appraising proposed reforms. This book pulls back the cover on disagreements between experts charged with restoring integrity to science. It denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run. If statistical consumers are unaware of assumptions behind rival evidence reforms, they can't scrutinize the consequences that affect them (in personalized medicine, psychology, etc.). The book sets sail with a simple tool: if little has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Many methods advocated by data experts do not stand up to severe scrutiny and are in tension with successful strategies for blocking or accounting for cherry picking and selective reporting. Through a series of excursions and exhibits, the philosophy and history of inductive inference come alive. Philosophical tools are put to work to solve problems about science and pseudoscience, induction and falsification.

Conformal and Probabilistic Prediction with Applications

Conformal and Probabilistic Prediction with Applications
Author: Alexander Gammerman
Publisher: Springer
Total Pages: 235
Release: 2016-04-16
Genre: Computers
ISBN: 331933395X

This book constitutes the refereed proceedings of the 5th International Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2016, held in Madrid, Spain, in April 2016. The 14 revised full papers presented together with 1 invited paper were carefully reviewed and selected from 23 submissions and cover topics on theory of conformal prediction; applications of conformal prediction; and machine learning.

Elements of Causal Inference

Elements of Causal Inference
Author: Jonas Peters
Publisher: MIT Press
Total Pages: 289
Release: 2017-11-29
Genre: Computers
ISBN: 0262037319

A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning. The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem. The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

High-Frequency Financial Econometrics

High-Frequency Financial Econometrics
Author: Yacine Aït-Sahalia
Publisher: Princeton University Press
Total Pages: 683
Release: 2014-07-21
Genre: Business & Economics
ISBN: 0691161437

A comprehensive introduction to the statistical and econometric methods for analyzing high-frequency financial data High-frequency trading is an algorithm-based computerized trading practice that allows firms to trade stocks in milliseconds. Over the last fifteen years, the use of statistical and econometric methods for analyzing high-frequency financial data has grown exponentially. This growth has been driven by the increasing availability of such data, the technological advancements that make high-frequency trading strategies possible, and the need of practitioners to analyze these data. This comprehensive book introduces readers to these emerging methods and tools of analysis. Yacine Aït-Sahalia and Jean Jacod cover the mathematical foundations of stochastic processes, describe the primary characteristics of high-frequency financial data, and present the asymptotic concepts that their analysis relies on. Aït-Sahalia and Jacod also deal with estimation of the volatility portion of the model, including methods that are robust to market microstructure noise, and address estimation and testing questions involving the jump part of the model. As they demonstrate, the practical importance and relevance of jumps in financial data are universally recognized, but only recently have econometric methods become available to rigorously analyze jump processes. Aït-Sahalia and Jacod approach high-frequency econometrics with a distinct focus on the financial side of matters while maintaining technical rigor, which makes this book invaluable to researchers and practitioners alike.

Microeconometrics

Microeconometrics
Author: A. Colin Cameron
Publisher: Cambridge University Press
Total Pages: 1058
Release: 2005-05-09
Genre: Business & Economics
ISBN: 1139444867

This book provides the most comprehensive treatment to date of microeconometrics, the analysis of individual-level data on the economic behavior of individuals or firms using regression methods for cross section and panel data. The book is oriented to the practitioner. A basic understanding of the linear regression model with matrix algebra is assumed. The text can be used for a microeconometrics course, typically a second-year economics PhD course; for data-oriented applied microeconometrics field courses; and as a reference work for graduate students and applied researchers who wish to fill in gaps in their toolkit. Distinguishing features of the book include emphasis on nonlinear models and robust inference, simulation-based estimation, and problems of complex survey data. The book makes frequent use of numerical examples based on generated data to illustrate the key models and methods. More substantially, it systematically integrates into the text empirical illustrations based on seven large and exceptionally rich data sets.

Predictive Inference

Predictive Inference
Author: Seymour Geisser
Publisher: Routledge
Total Pages: 280
Release: 2017-11-22
Genre: Mathematics
ISBN: 1351422294

The author's research has been directed towards inference involving observables rather than parameters. In this book, he brings together his views on predictive or observable inference and its advantages over parametric inference. While the book discusses a variety of approaches to prediction including those based on parametric, nonparametric, and nonstochastic statistical models, it is devoted mainly to predictive applications of the Bayesian approach. It not only substitutes predictive analyses for parametric analyses, but it also presents predictive analyses that have no real parametric analogues. It demonstrates that predictive inference can be a critical component of even strict parametric inference when dealing with interim analyses. This approach to predictive inference will be of interest to statisticians, psychologists, econometricians, and sociologists.