Methodology (stat.ME)
Fri, 26 May 2023
1.Angular Combining of Forecasts of Probability Distributions
Authors:James W. Taylor, Xiaochun Meng
Abstract: When multiple forecasts are available for a probability distribution, forecast combining enables a pragmatic synthesis of the available information to extract the wisdom of the crowd. A linear opinion pool has been widely used, whereby the combining is applied to the probability predictions of the distributional forecasts. However, it has been argued that this will tend to deliver overdispersed distributional forecasts, prompting the combination to be applied, instead, to the quantile predictions of the distributional forecasts. Results from different applications are mixed, leaving it as an empirical question whether to combine probabilities or quantiles. In this paper, we present an alternative approach. Looking at the distributional forecasts, combining the probability forecasts can be viewed as vertical combining, with quantile forecast combining seen as horizontal combining. Our alternative approach is to allow combining to take place on an angle between the extreme cases of vertical and horizontal combining. We term this angular combining. The angle is a parameter that can be optimized using a proper scoring rule. We show that, as with vertical and horizontal averaging, angular averaging results in a distribution with mean equal to the average of the means of the distributions that are being combined. We also show that angular averaging produces a distribution with lower variance than vertical averaging, and, under certain assumptions, greater variance than horizontal averaging. We provide empirical support for angular combining using weekly distributional forecasts of COVID-19 mortality at the national and state level in the U.S.
2.On Consistent Bayesian Inference from Synthetic Data
Authors:Ossi Räisä, Joonas Jälkö, Antti Honkela
Abstract: Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between making data easily available, and the privacy of data subjects. Several works have shown that consistency of downstream analyses from synthetic data, including accurate uncertainty estimation, requires accounting for the synthetic data generation. There are very few methods of doing so, most of them for frequentist analysis. In this paper, we study how to perform consistent Bayesian inference from synthetic data. We prove that mixing posterior samples obtained separately from multiple large synthetic datasets converges to the posterior of the downstream analysis under standard regularity conditions when the analyst's model is compatible with the data provider's model. We show experimentally that this works in practice, unlocking consistent Bayesian inference from synthetic data while reusing existing downstream analysis methods.
3.A novel framework extending cause-effect inference methods to multivariate causal discovery
Authors:Hongyi Chen, Maurits Kaptein
Abstract: We focus on the extension of bivariate causal learning methods into multivariate problem settings in a systematic manner via a novel framework. It is purposive to augment the scale to which bivariate causal discovery approaches can be applied since contrast to traditional causal discovery methods, bivariate methods render estimation in the form of a causal Directed Acyclic Graph (DAG) instead of its complete partial directed acyclic graphs (CPDAGs). To tackle the problem, an auxiliary framework is proposed in this work so that together with any bivariate causal inference method, one could identify and estimate causal structure over variables more than two from observational data. In particular, we propose a local graphical structure in causal graph that is identifiable by a given bivariate method, which could be iteratively exploited to discover the whole causal structure under certain assumptions. We show both theoretically and experimentally that the proposed framework can achieve sound results in causal learning problems.
4.On efficient covariate adjustment selection in causal effect estimation
Authors:Hongyi Chen, Maurits Kaptein
Abstract: In order to achieve unbiased and efficient estimators of causal effects from observational data, covariate selection for confounding adjustment becomes an important task in causal inference. Despite recent advancements in graphical criterion for constructing valid and efficient adjustment sets, these methods often rely on assumptions that may not hold in practice. We examine the properties of existing graph-free covariate selection methods with respect to both validity and efficiency, highlighting the potential dangers of producing invalid adjustment sets when hidden variables are present. To address this issue, we propose a novel graph-free method, referred to as CMIO, adapted from Mixed Integer Optimization (MIO) with a set of causal constraints. Our results demonstrate that CMIO outperforms existing state-of-the-art methods and provides theoretically sound outputs. Furthermore, we present a revised version of CMIO capable of handling the scenario in the absence of causal sufficiency and graphical information, offering efficient and valid covariate adjustments for causal inference.
5.Learning Causal Graphs via Monotone Triangular Transport Maps
Authors:Sina Akbari, Luca Ganassali, Negar Kiyavash
Abstract: We study the problem of causal structure learning from data using optimal transport (OT). Specifically, we first provide a constraint-based method which builds upon lower-triangular monotone parametric transport maps to design conditional independence tests which are agnostic to the noise distribution. We provide an algorithm for causal discovery up to Markov Equivalence with no assumptions on the structural equations/noise distributions, which allows for settings with latent variables. Our approach also extends to score-based causal discovery by providing a novel means for defining scores. This allows us to uniquely recover the causal graph under additional identifiability and structural assumptions, such as additive noise or post-nonlinear models. We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world datasets.
6.Clip-OGD: An Experimental Design for Adaptive Neyman Allocation in Sequential Experiments
Authors:Jessica Dai, Paula Gradu, Christopher Harshaw
Abstract: From clinical development of cancer therapies to investigations into partisan bias, adaptive sequential designs have become increasingly popular method for causal inference, as they offer the possibility of improved precision over their non-adaptive counterparts. However, even in simple settings (e.g. two treatments) the extent to which adaptive designs can improve precision is not sufficiently well understood. In this work, we study the problem of Adaptive Neyman Allocation in a design-based potential outcomes framework, where the experimenter seeks to construct an adaptive design which is nearly as efficient as the optimal (but infeasible) non-adaptive Neyman design, which has access to all potential outcomes. Motivated by connections to online optimization, we propose Neyman Ratio and Neyman Regret as two (equivalent) performance measures of adaptive designs for this problem. We present Clip-OGD, an adaptive design which achieves $\widetilde{O}(\sqrt{T})$ expected Neyman regret and thereby recovers the optimal Neyman variance in large samples. Finally, we construct a conservative variance estimator which facilitates the development of asymptotically valid confidence intervals. To complement our theoretical results, we conduct simulations using data from a microeconomic experiment.