Mon, 31 Jul 2023
1.Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources
Authors:Marco Zaffalon, Alessandro Antonucci, Rafael Cabañas, David Huber
Abstract: We address the problem of integrating data from multiple, possibly biased, observational and interventional studies, to eventually compute counterfactuals in structural causal models. We start from the case of a single observational dataset affected by a selection bias. We show that the likelihood of the available data has no local maxima. This enables us to use the causal expectation-maximisation scheme to approximate the bounds for partially identifiable counterfactual queries, which are the focus of this paper. We then show how the same approach can address the general case of multiple datasets, no matter whether interventional or observational, biased or unbiased, by remapping it into the former one via graphical transformations. Systematic numerical experiments and a case study on palliative care show the effectiveness of our approach, while hinting at the benefits of fusing heterogeneous data sources to get informative outcomes in case of partial identifiability.
2.A Spectral Approach for the Dynamic Bradley-Terry Model
Authors:Xin-Yu Tian, Jian Shi, Xiaotong Shen, Kai Song
Abstract: The dynamic ranking, due to its increasing importance in many applications, is becoming crucial, especially with the collection of voluminous time-dependent data. One such application is sports statistics, where dynamic ranking aids in forecasting the performance of competitive teams, drawing on historical and current data. Despite its usefulness, predicting and inferring rankings pose challenges in environments necessitating time-dependent modeling. This paper introduces a spectral ranker called Kernel Rank Centrality, designed to rank items based on pairwise comparisons over time. The ranker operates via kernel smoothing in the Bradley-Terry model, utilizing a Markov chain model. Unlike the maximum likelihood approach, the spectral ranker is nonparametric, demands fewer model assumptions and computations, and allows for real-time ranking. We establish the asymptotic distribution of the ranker by applying an innovative group inverse technique, resulting in a uniform and precise entrywise expansion. This result allows us to devise a new inferential method for predictive inference, previously unavailable in existing approaches. Our numerical examples showcase the ranker's utility in predictive accuracy and constructing an uncertainty measure for prediction, leveraging data from the National Basketball Association (NBA). The results underscore our method's potential compared to the gold standard in sports, the Arpad Elo rating system.
3.Clustering multivariate functional data based on new definitions of the epigraph and hypograph indexes
Authors:Belén Pulido, Alba M. Franco-Pereira, Rosa E. Lillo
Abstract: The proliferation of data generation has spurred advancements in functional data analysis. With the ability to analyze multiple variables simultaneously, the demand for working with multivariate functional data has increased. This study proposes a novel formulation of the epigraph and hypograph indexes, as well as their generalized expressions, specifically tailored for the multivariate functional context. These definitions take into account the interrelations between components. Furthermore, the proposed indexes are employed to cluster multivariate functional data. In the clustering process, the indexes are applied to both the data and their first and second derivatives. This generates a reduced-dimension dataset from the original multivariate functional data, enabling the application of well-established multivariate clustering techniques that have been extensively studied in the literature. This methodology has been tested through simulated and real datasets, performing comparative analyses against state-of-the-art to assess its performance.
4.Forster-Warmuth Counterfactual Regression: A Unified Learning Approach
Authors:Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen
Abstract: Series or orthogonal basis regression is one of the most popular non-parametric regression techniques in practice, obtained by regressing the response on features generated by evaluating the basis functions at observed covariate values. The most routinely used series estimator is based on ordinary least squares fitting, which is known to be minimax rate optimal in various settings, albeit under stringent restrictions on the basis functions and the distribution of covariates. In this work, inspired by the recently developed Forster-Warmuth (FW) learner, we propose an alternative series regression estimator that can attain the minimax estimation rate under strictly weaker conditions imposed on the basis functions and the joint law of covariates, than existing series estimators in the literature. Moreover, a key contribution of this work generalizes the FW-learner to a so-called counterfactual regression problem, in which the response variable of interest may not be directly observed (hence, the name ``counterfactual'') on all sampled units, and therefore needs to be inferred in order to identify and estimate the regression in view from the observed data. Although counterfactual regression is not entirely a new area of inquiry, we propose the first-ever systematic study of this challenging problem from a unified pseudo-outcome perspective. In fact, we provide what appears to be the first generic and constructive approach for generating the pseudo-outcome (to substitute for the unobserved response) which leads to the estimation of the counterfactual regression curve of interest with small bias, namely bias of second order. Several applications are used to illustrate the resulting FW-learner including many nonparametric regression problems in missing data and causal inference literature, for which we establish high-level conditions for minimax rate optimality of the proposed FW-learner.