Methodology (stat.ME)
Fri, 09 Jun 2023
1.Variable screening using factor analysis for high-dimensional data with multicollinearity
Authors:Shuntaro Tanaka, Hidetoshi Matsui
Abstract: Screening methods are useful tools for variable selection in regression analysis when the number of predictors is much larger than the sample size. Factor analysis is used to eliminate multicollinearity among predictors, which improves the variable selection performance. We propose a new method, called Truncated Preconditioned Profiled Independence Screening (TPPIS), that better selects the number of factors to eliminate multicollinearity. The proposed method improves the variable selection performance by truncating unnecessary parts from the information obtained by factor analysis. We confirmed the superior performance of the proposed method in variable selection through analysis using simulation data and real datasets.
2.The Use of Covariate Adjustment in Randomized Controlled Trials: An Overview
Authors:Kelly Van Lancker, Frank Bretz, Oliver Dukes
Abstract: There has been a growing interest in covariate adjustment in the analysis of randomized controlled trials in past years. For instance, the U.S. Food and Drug Administration recently issued guidance that emphasizes the importance of distinguishing between conditional and marginal treatment effects. Although these effects coincide in linear models, this is not typically the case in other settings, and this distinction is often overlooked in clinical trial practice. Considering these developments, this paper provides a review of when and how to utilize covariate adjustment to enhance precision in randomized controlled trials. We describe the differences between conditional and marginal estimands and stress the necessity of aligning statistical analysis methods with the chosen estimand. Additionally, we highlight the potential misalignment of current practices in estimating marginal treatment effects. Instead, we advocate for the utilization of standardization, which can improve efficiency by leveraging the information contained in baseline covariates while remaining robust to model misspecification. Finally, we present practical considerations that have arisen in our respective consultations to further clarify the advantages and limitations of covariate adjustment.
3.A reduced-rank approach to predicting multiple binary responses through machine learning
Authors:The Tien Mai
Abstract: This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous studies that primarily address estimation error, we directly analyze the prediction error of our method using PAC-Bayesian bounds techniques. In this paper, we introduce a pseudo-Bayesian approach capable of handling incomplete response data. Our strategy is efficiently implemented using the Langevin Monte Carlo method. Through simulation studies and a practical application using real data, we demonstrate the effectiveness of our proposed method, producing comparable or sometimes superior results compared to the current state-of-the-art method.
4.Causal Effect Estimation from Observational and Interventional Data Through Matrix Weighted Linear Estimators
Authors:Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
Abstract: We study causal effect estimation from a mixture of observational and interventional data in a confounded linear regression model with multivariate treatments. We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators arising from both the observational and interventional setting. To this end, we derive methods based on matrix weighted linear estimators and prove that our methods are asymptotically unbiased in the infinite sample limit. This is an important improvement compared to the pooled estimator using the union of interventional and observational data, for which the bias only vanishes if the ratio of observational to interventional data tends to zero. Studies on synthetic data confirm our theoretical findings. In settings where confounding is substantial and the ratio of observational to interventional data is large, our estimators outperform a Stein-type estimator and various other baselines.
5.Semiparametric posterior corrections
Authors:Andrew Yiu, Edwin Fong, Chris Holmes, Judith Rousseau
Abstract: We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional nonparametric posterior, we target the functional of interest by transforming the entire distribution with a Bayesian bootstrap correction. We provide conditions for the resulting $\textit{one-step posterior}$ to possess calibrated frequentist properties and specialize the results for several canonical examples: the integrated squared density, the mean of a missing-at-random outcome, and the average causal treatment effect on the treated. The procedure is computationally attractive, requiring only a simple, efficient post-processing step that can be attached onto any arbitrary posterior sampling algorithm. Using the ACIC 2016 causal data analysis competition, we illustrate that our approach can outperform the existing state-of-the-art through the propagation of Bayesian uncertainty.