Methodology (stat.ME)
Fri, 30 Jun 2023
1.Minimax optimal subgroup identification
Authors:Matteo Bonvini, Edward H. Kennedy, Luke J. Keele
Abstract: Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all units for whom the treatment effect exceeds that threshold. For example, if the cutoff is zero, the estimand is the set of all units who would benefit from receiving treatment. Assigning treatment just to this set represents the optimal treatment rule that maximises the mean population outcome. Similarly, cutoffs greater than zero represent optimal rules under resource constraints. The level set estimator that we study follows the plug-in principle and consists of simply thresholding a good estimator of the CATE. While many CATE estimators have been recently proposed and analysed, how their properties relate to those of the corresponding level set estimators remains unclear. Our first goal is thus to fill this gap by deriving the asymptotic properties of level set estimators depending on which estimator of the CATE is used. Next, we identify a minimax optimal estimator in a model where the CATE, the propensity score and the outcome model are Holder-smooth of varying orders. We consider data generating processes that satisfy a margin condition governing the probability of observing units for whom the CATE is close to the threshold. We investigate the performance of the estimators in simulations and illustrate our methods on a dataset used to study the effects on mortality of laparoscopic vs open surgery in the treatment of various conditions of the colon.
2.Leveraging Observational Data for Efficient CATE Estimation in Randomized Controlled Trials
Authors:Amir Asiaee, Chiara Di Gravio, Yuting Mei, Jared D. Huling
Abstract: Randomized controlled trials (RCTs) are the gold standard for causal inference, but they are often powered only for average effects, making estimation of heterogeneous treatment effects (HTEs) challenging. Conversely, large-scale observational studies (OS) offer a wealth of data but suffer from confounding bias. Our paper presents a novel framework to leverage OS data for enhancing the efficiency in estimating conditional average treatment effects (CATEs) from RCTs while mitigating common biases. We propose an innovative approach to combine RCTs and OS data, expanding the traditionally used control arms from external sources. The framework relaxes the typical assumption of CATE invariance across populations, acknowledging the often unaccounted systematic differences between RCT and OS participants. We demonstrate this through the special case of a linear outcome model, where the CATE is sparsely different between the two populations. The core of our framework relies on learning potential outcome means from OS data and using them as a nuisance parameter in CATE estimation from RCT data. We further illustrate through experiments that using OS findings reduces the variance of the estimated CATE from RCTs and can decrease the required sample size for detecting HTEs.
3.Flexible and Accurate Methods for Estimation and Inference of Gaussian Graphical Models with Applications
Authors:Yueqi Qian, Xianghong Hu, Can Yang
Abstract: The Gaussian graphical model (GGM) incorporates an undirected graph to represent the conditional dependence between variables, with the precision matrix encoding partial correlation between pair of variables given the others. To achieve flexible and accurate estimation and inference of GGM, we propose the novel method FLAG, which utilizes the random effects model for pairwise conditional regression to estimate the precision matrix and applies statistical tests to recover the graph. Compared with existing methods, FLAG has several unique advantages: (i) it provides accurate estimation without sparsity assumptions on the precision matrix, (ii) it allows for element-wise inference of the precision matrix, (iii) it achieves computational efficiency by developing an efficient PX-EM algorithm and a MM algorithm accelerated with low-rank updates, and (iv) it enables joint estimation of multiple graphs using FLAG-Meta or FLAG-CA. The proposed methods are evaluated using various simulation settings and real data applications, including gene expression in the human brain, term association in university websites, and stock prices in the U.S. financial market. The results demonstrate that FLAG and its extensions provide accurate precision estimation and graph recovery.
4.Top-Two Thompson Sampling for Contextual Top-mc Selection Problems
Authors:Xinbo Shi, Yijie Peng, Gongbo Zhang
Abstract: We aim to efficiently allocate a fixed simulation budget to identify the top-mc designs for each context among a finite number of contexts. The performance of each design under a context is measured by an identifiable statistical characteristic, possibly with the existence of nuisance parameters. Under a Bayesian framework, we extend the top-two Thompson sampling method designed for selecting the best design in a single context to the contextual top-mc selection problems, leading to an efficient sampling policy that simultaneously allocates simulation samples to both contexts and designs. To demonstrate the asymptotic optimality of the proposed sampling policy, we characterize the exponential convergence rate of the posterior distribution for a wide range of identifiable sampling distribution families. The proposed sampling policy is proved to be consistent, and asymptotically satisfies a necessary condition for optimality. In particular, when selecting contextual best designs (i.e., mc = 1), the proposed sampling policy is proved to be asymptotically optimal. Numerical experiments demonstrate the good finite sample performance of the proposed sampling policy.
5.Learned harmonic mean estimation of the marginal likelihood with normalizing flows
Authors:Alicja Polanska, Matthew A. Price, Alessio Spurio Mancini, Jason D. McEwen
Abstract: Computing the marginal likelihood (also called the Bayesian model evidence) is an important task in Bayesian model selection, providing a principled quantitative way to compare models. The learned harmonic mean estimator solves the exploding variance problem of the original harmonic mean estimation of the marginal likelihood. The learned harmonic mean estimator learns an importance sampling target distribution that approximates the optimal distribution. While the approximation need not be highly accurate, it is critical that the probability mass of the learned distribution is contained within the posterior in order to avoid the exploding variance problem. In previous work a bespoke optimization problem is introduced when training models in order to ensure this property is satisfied. In the current article we introduce the use of normalizing flows to represent the importance sampling target distribution. A flow-based model is trained on samples from the posterior by maximum likelihood estimation. Then, the probability density of the flow is concentrated by lowering the variance of the base distribution, i.e. by lowering its "temperature", ensuring its probability mass is contained within the posterior. This approach avoids the need for a bespoke optimisation problem and careful fine tuning of parameters, resulting in a more robust method. Moreover, the use of normalizing flows has the potential to scale to high dimensional settings. We present preliminary experiments demonstrating the effectiveness of the use of flows for the learned harmonic mean estimator. The harmonic code implementing the learned harmonic mean, which is publicly available, has been updated to now support normalizing flows.
6.Proximal nested sampling with data-driven priors for physical scientists
Authors:Jason D. McEwen, Tobías I. Liaudat, Matthew A. Price, Xiaohao Cai, Marcelo Pereyra
Abstract: Proximal nested sampling was introduced recently to open up Bayesian model selection for high-dimensional problems such as computational imaging. The framework is suitable for models with a log-convex likelihood, which are ubiquitous in the imaging sciences. The purpose of this article is two-fold. First, we review proximal nested sampling in a pedagogical manner in an attempt to elucidate the framework for physical scientists. Second, we show how proximal nested sampling can be extended in an empirical Bayes setting to support data-driven priors, such as deep neural networks learned from training data.
7.Design Sensitivity and Its Implications for Weighted Observational Studies
Authors:Melody Huang, Dan Soriano, Samuel D. Pimentel
Abstract: Sensitivity to unmeasured confounding is not typically a primary consideration in designing treated-control comparisons in observational studies. We introduce a framework allowing researchers to optimize robustness to omitted variable bias at the design stage using a measure called design sensitivity. Design sensitivity, which describes the asymptotic power of a sensitivity analysis, allows transparent assessment of the impact of different estimation strategies on sensitivity. We apply this general framework to two commonly-used sensitivity models, the marginal sensitivity model and the variance-based sensitivity model. By comparing design sensitivities, we interrogate how key features of weighted designs, including choices about trimming of weights and model augmentation, impact robustness to unmeasured confounding, and how these impacts may differ for the two different sensitivity models. We illustrate the proposed framework on a study examining drivers of support for the 2016 Colombian peace agreement.
8.High-Dimensional Bayesian Structure Learning in Gaussian Graphical Models using Marginal Pseudo-Likelihood
Authors:Reza Mohammadi, Marit Schoonhoven, Lucas Vogels, S. Ilker Birbil
Abstract: Gaussian graphical models depict the conditional dependencies between variables within a multivariate normal distribution in a graphical format. The identification of these graph structures is an area known as structure learning. However, when utilizing Bayesian methodologies in structure learning, computational complexities can arise, especially with high-dimensional graphs surpassing 250 nodes. This paper introduces two innovative search algorithms that employ marginal pseudo-likelihood to address this computational challenge. These methods can swiftly generate reliable estimations for problems encompassing 1000 variables in just a few minutes on standard computers. For those interested in practical applications, the code supporting this new approach is made available through the R package BDgraph.
9.Latent Subgroup Identification in Image-on-scalar Regression
Authors:Zikai Lin, Yajuan Si, Jian Kang
Abstract: Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, e.g., the Adolescent Brain Cognitive Development (ABCD) study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: 1) within each subgroup the brain activities have homogeneous associations with the clinical measures; 2) across subgroups the associations are heterogeneous; and 3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multi-site neuroimaging data with diverse sociodemographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github: https://github.com/zikaiLin/lasir.