Mon, 03 Jul 2023
1.Statistical Inference on Multi-armed Bandits with Delayed Feedback
Authors:Lei Shi, Jingshen Wang, Tianhao Wu
Abstract: Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback. While existing MAB literature often focuses on maximizing the expected cumulative reward outcomes (or, equivalently, regret minimization), few efforts have been devoted to establish valid statistical inference approaches to quantify the uncertainty of learned policies. We attempt to fill this gap by providing a unified statistical inference framework for policy evaluation where a target policy is allowed to differ from the data collecting policy, and our framework allows delay to be associated with the treatment arms. We present an adaptively weighted estimator that on one hand incorporates the arm-dependent delaying mechanism to achieve consistency, and on the other hand mitigates the variance inflation across stages due to vanishing sampling probability. In particular, our estimator does not critically depend on the ability to estimate the unknown delay mechanism. Under appropriate conditions, we prove that our estimator converges to a normal distribution as the number of time points goes to infinity, which provides guarantees for large-sample statistical inference. We illustrate the finite-sample performance of our approach through Monte Carlo experiments.
2.Engression: Extrapolation for Nonlinear Regression?
Authors:Xinwei Shen, Nicolai Meinshausen
Abstract: Extrapolation is crucial in many statistical and machine learning applications, as it is common to encounter test data outside the training support. However, extrapolation is a considerable challenge for nonlinear models. Conventional models typically struggle in this regard: while tree ensembles provide a constant prediction beyond the support, neural network predictions tend to become uncontrollable. This work aims at providing a nonlinear regression methodology whose reliability does not break down immediately at the boundary of the training support. Our primary contribution is a new method called `engression' which, at its core, is a distributional regression technique for pre-additive noise models, where the noise is added to the covariates before applying a nonlinear transformation. Our experimental results indicate that this model is typically suitable for many real data sets. We show that engression can successfully perform extrapolation under some assumptions such as a strictly monotone function class, whereas traditional regression approaches such as least-squares regression and quantile regression fall short under the same assumptions. We establish the advantages of engression over existing approaches in terms of extrapolation, showing that engression consistently provides a meaningful improvement. Our empirical results, from both simulated and real data, validate these findings, highlighting the effectiveness of the engression method. The software implementations of engression are available in both R and Python.
3.Variable selection in a specific regression time series of counts
Abstract: Time series of counts occurring in various applications are often overdispersed, meaning their variance is much larger than the mean. This paper proposes a novel variable selection approach for processing such data. Our approach consists in modelling them using sparse negative binomial GLARMA models. It combines estimating the autoregressive moving average (ARMA) coefficients of GLARMA models and the overdispersion parameter with performing variable selection in regression coefficients of Generalized Linear Models (GLM) with regularised methods. We describe our three-step estimation procedure, which is implemented in the NBtsVarSel package. We evaluate the performance of the approach on synthetic data and compare it to other methods. Additionally, we apply our approach to RNA sequencing data. Our approach is computationally efficient and outperforms other methods in selecting variables, i.e. recovering the non-null regression coefficients.
4.Pareto optimal proxy metrics
Authors:Lee Richardson, Alessandro Zito, Dylan Greaves, Jacopo Soriano
Abstract: North star metrics and online experimentation play a central role in how technology companies improve their products. In many practical settings, however, evaluating experiments based on the north star metric directly can be difficult. The two most significant issues are 1) low sensitivity of the north star metric and 2) differences between the short-term and long-term impact on the north star metric. A common solution is to rely on proxy metrics rather than the north star in experiment evaluation and launch decisions. Existing literature on proxy metrics concentrates mainly on the estimation of the long-term impact from short-term experimental data. In this paper, instead, we focus on the trade-off between the estimation of the long-term impact and the sensitivity in the short term. In particular, we propose the Pareto optimal proxy metrics method, which simultaneously optimizes prediction accuracy and sensitivity. In addition, we give an efficient multi-objective optimization algorithm that outperforms standard methods. We applied our methodology to experiments from a large industrial recommendation system, and found proxy metrics that are eight times more sensitive than the north star and consistently moved in the same direction, increasing the velocity and the quality of the decisions to launch new features.
5.Vector Quantile Regression on Manifolds
Authors:Marco Pegoraro, Sanketh Vedula, Aviv A. Rosenberg, Irene Tallini, Emanuele Rodolà, Alex M. Bronstein
Abstract: Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate measurements), tori (dihedral angles in proteins), or Lie groups (attitude in navigation). By leveraging optimal transport theory and the notion of $c$-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through preliminary synthetic data experiments.
6.Reliever: Relieving the Burden of Costly Model Fits for Changepoint Detection
Authors:Chengde Qian, Guanghui Wang, Changliang Zou
Abstract: We propose a general methodology Reliever for fast and reliable changepoint detection when the model fitting is costly. Instead of fitting a sequence of models for each potential search interval, Reliever employs a substantially reduced number of proxy/relief models that are trained on a predetermined set of intervals. This approach can be seamlessly integrated with state-of-the-art changepoint search algorithms. In the context of high-dimensional regression models with changepoints, we establish that the Reliever, when combined with an optimal search scheme, achieves estimators for both the changepoints and corresponding regression coefficients that attain optimal rates of convergence, up to a logarithmic factor. Through extensive numerical studies, we showcase the ability of Reliever to rapidly and accurately detect changes across a diverse range of parametric and nonparametric changepoint models.
7.Conditional partial exchangeability: a probabilistic framework for multi-view clustering
Authors:Beatrice Franzolini, Maria De Iorio, Johan Eriksson
Abstract: Standard clustering techniques assume a common configuration for all features in a dataset. However, when dealing with multi-view or longitudinal data, the clusters' number, frequencies, and shapes may need to vary across features to accurately capture dependence structures and heterogeneity. In this setting, classical model-based clustering fails to account for within-subject dependence across domains. We introduce conditional partial exchangeability, a novel probabilistic paradigm for dependent random partitions of the same objects across distinct domains. Additionally, we study a wide class of Bayesian clustering models based on conditional partial exchangeability, which allows for flexible dependent clustering of individuals across features, capturing the specific contribution of each feature and the within-subject dependence, while ensuring computational feasibility.