Thu, 10 Aug 2023
1.TSLiNGAM: DirectLiNGAM under heavy tails
Authors:Sarah Leyder, Jakob Raymaekers, Tim Verdonck
Abstract: One of the established approaches to causal discovery consists of combining directed acyclic graphs (DAGs) with structural causal models (SCMs) to describe the functional dependencies of effects on their causes. Possible identifiability of SCMs given data depends on assumptions made on the noise variables and the functional classes in the SCM. For instance, in the LiNGAM model, the functional class is restricted to linear functions and the disturbances have to be non-Gaussian. In this work, we propose TSLiNGAM, a new method for identifying the DAG of a causal model based on observational data. TSLiNGAM builds on DirectLiNGAM, a popular algorithm which uses simple OLS regression for identifying causal directions between variables. TSLiNGAM leverages the non-Gaussianity assumption of the error terms in the LiNGAM model to obtain more efficient and robust estimation of the causal structure. TSLiNGAM is justified theoretically and is studied empirically in an extensive simulation study. It performs significantly better on heavy-tailed and skewed data and demonstrates a high small-sample efficiency. In addition, TSLiNGAM also shows better robustness properties as it is more resilient to contamination.
2.A Forecaster's Review of Judea Pearl's Causality: Models, Reasoning and Inference, Second Edition, 2009
Abstract: With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.
3.Optimally weighted average derivative effects
Authors:Oliver Hines, Karla Diaz-Ordaz, Stijn Vansteelandt
Abstract: Inference for weighted average derivative effects (WADEs) usually relies on kernel density estimators, which introduce complicated bandwidth-dependant biases. By considering a new class of Riesz representers, we propose WADEs which require estimating conditional expectations only, and derive an optimally efficient WADE, also connected to projection parameters in partially linear models. We derive efficient estimators under the nonparametric model, which are amenable to machine learning of working models. We propose novel learning strategies based on the R-learner strategy. We perform a simulation study and apply our estimators to determine the effect of Warfarin dose on blood clotting function.
4.Filtering Dynamical Systems Using Observations of Statistics
Authors:Eviatar Bach, Tim Colonius, Andrew Stuart
Abstract: We consider the problem of filtering dynamical systems, possibly stochastic, using observations of statistics. Thus the computational task is to estimate a time-evolving density $\rho(v, t)$ given noisy observations of $\rho$; this contrasts with the standard filtering problem based on observations of the state $v$. The task is naturally formulated as an infinite-dimensional filtering problem in the space of densities $\rho$. However, for the purposes of tractability, we seek algorithms in state space; specifically we introduce a mean field state space model and, using interacting particle system approximations to this model, we propose an ensemble method. We refer to the resulting methodology as the ensemble Fokker-Planck filter (EnFPF). Under certain restrictive assumptions we show that the EnFPF approximates the Kalman-Bucy filter for the Fokker-Planck equation, which is the exact solution of the infinite-dimensional filtering problem; our numerical experiments show that the methodology is useful beyond this restrictive setting. Specifically the experiments show that the EnFPF is able to correct ensemble statistics, to accelerate convergence to the invariant density for autonomous systems, and to accelerate convergence to time-dependent invariant densities for non-autonomous systems. We discuss possible applications of the EnFPF to climate ensembles and to turbulence modelling.
5.Quantile regression outcome-adaptive lasso: variable selection for causal quantile treatment effect estimation
Authors:Yahang Liu, Kecheng Wei, Chen Huang, Yongfu Yu, Guoyou Qin
Abstract: Quantile treatment effects (QTEs) can characterize the potentially heterogeneous causal effect of a treatment on different points of the entire outcome distribution. Propensity score (PS) methods are commonly employed for estimating QTEs in non-randomized studies. Empirical and theoretical studies have shown that insufficient and unnecessary adjustment for covariates in PS models can lead to bias and efficiency loss in estimating treatment effects. Striking a balance between bias and efficiency through variable selection is a crucial concern in casual inference. It is essential to acknowledge that the covariates related treatment and outcome may vary across different quantiles of the outcome distribution. However, previous studies have overlooked to adjust for different covariates separately in the PS models when estimating different QTEs. In this article, we proposed the quantile regression outcome-adaptive lasso (QROAL) method to select covariates that can provide unbiased and efficient estimates of QTEs. A distinctive feature of our proposed method is the utilization of linear quantile regression models for constructing penalty weights, enabling covariate selection in PS models separately when estimating different QTEs. We conducted simulation studies to show the superiority of our proposed method over the outcome-adaptive lasso (OAL) method in variable selection. Moreover, the proposed method exhibited favorable performance compared to the OAL method in terms of root mean square error in a range of settings, including both homogeneous and heterogeneous scenarios. Additionally, we applied the QROAL method to datasets from the China Health and Retirement Longitudinal Study (CHARLS) to explore the impact of smoking status on the severity of depression symptoms.
6.Rank tests for outlier detection
Authors:Chiara G. Magnani, Aldo Solari
Abstract: In novelty detection, the objective is to determine whether the test sample contains any outliers, using a sample of controls (inliers). This involves many-to-one comparisons of individual test points against the control sample. A recent approach applies the Benjamini-Hochberg procedure to the conformal $p$-values resulting from these comparisons, ensuring false discovery rate control. In this paper, we suggest using Wilcoxon-Mann-Whitney tests for the comparisons and subsequently applying the closed testing principle to derive post-hoc confidence bounds for the number of outliers in any subset of the test sample. We revisit an elegant result that under a nonparametric alternative known as Lehmann's alternative, Wilcoxon-Mann-Whitney is locally most powerful among rank tests. By combining this result with a simple observation, we demonstrate that the proposed procedure is more powerful for the null hypothesis of no outliers than the Benjamini-Hochberg procedure applied to conformal $p$-values.
7.Optimal Designs for Two-Stage Inference
Authors:Jonathan W. Stallrich, Michael McKibben
Abstract: The analysis of screening experiments is often done in two stages, starting with factor selection via an analysis under a main effects model. The success of this first stage is influenced by three components: (1) main effect estimators' variances and (2) bias, and (3) the estimate of the noise variance. Component (3) has only recently been given attention with design techniques that ensure an unbiased estimate of the noise variance. In this paper, we propose a design criterion based on expected confidence intervals of the first stage analysis that balances all three components. To address model misspecification, we propose a computationally-efficient all-subsets analysis and a corresponding constrained design criterion based on lack-of-fit. Scenarios found in existing design literature are revisited with our criteria and new designs are provided that improve upon existing methods.
8.Bayesian Record Linkage with Variables in One File
Authors:Gauri Kamat, Mingyang Shan, Roee Gutman
Abstract: In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate associations between variables exclusive to each of the files. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method improves the linking process, and results in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare Enrollment records.