Methodology (stat.ME)
Fri, 02 Jun 2023
1.Alternative Measures of Direct and Indirect Effects
Authors:Jose M. Peña
Abstract: There are a number of measures of direct and indirect effects in the literature. They are suitable in some cases and unsuitable in others. We describe a case where the existing measures are unsuitable and propose new suitable ones. We also show that the new measures can partially handle unmeasured treatment-outcome confounding, and bound long-term effects by combining experimental and observational data.
2.Robust Bayesian Inference for Measurement Error Models
Authors:Charita Dellaporta, Theodoros Damoulas
Abstract: Measurement error occurs when a set of covariates influencing a response variable are corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework which is robust to mismeasured covariates, does not require the preceding assumptions, and is able to incorporate prior beliefs about the true error distribution. Our approach gives rise to two methods that are robust to measurement error via different loss functions: one based on the Total Least Squares objective and the other based on Maximum Mean Discrepancy (MMD). The latter allows for generalisation to non-Gaussian distributed errors and non-linear covariate-response relationships. We provide bounds on the generalisation error using the MMD-loss and showcase the effectiveness of the proposed framework versus prior art in real-world mental health and dietary datasets that contain significant measurement errors.
3.Uncertainty Quantification in Bayesian Reduced-Rank Sparse Regressions
Authors:Maria F. Pintado, Matteo Iacopini, Luca Rossini, Alexander Y. Shestopaloff
Abstract: Reduced-rank regression recognises the possibility of a rank-deficient matrix of coefficients, which is particularly useful when the data is high-dimensional. We propose a novel Bayesian model for estimating the rank of the rank of the coefficient matrix, which obviates the need of post-processing steps, and allows for uncertainty quantification. Our method employs a mixture prior on the regression coefficient matrix along with a global-local shrinkage prior on its low-rank decomposition. Then, we rely on the Signal Adaptive Variable Selector to perform sparsification, and define two novel tools, the Posterior Inclusion Probability uncertainty index and the Relevance Index. The validity of the method is assessed in a simulation study, then its advantages and usefulness are shown in real-data applications on the chemical composition of tobacco and on the photometry of galaxies.
4.Augmenting treatment arms with external data through propensity-score weighted power-priors: an application in expanded access
Authors:Tobias B. Polak, Jeremy A. Labrecque, Carin A. Uyl-de Groot, Joost van Rosmalen
Abstract: The incorporation of "real-world data" to supplement the analysis of trials and improve decision-making has spurred the development of statistical techniques to account for introduced confounding. Recently, "hybrid" methods have been developed through which measured confounding is first attenuated via propensity scores and unmeasured confounding is addressed through (Bayesian) dynamic borrowing. Most efforts to date have focused on augmenting control arms with historical controls. Here we consider augmenting treatment arms through "expanded access", which is a pathway of non-trial access to investigational medicine for patients with seriously debilitating or life-threatening illnesses. Motivated by a case study on expanded access, we developed a novel method (the ProPP) that provides a conceptually simple and easy-to-use combination of propensity score weighting and the modified power prior. Our weighting scheme is based on the estimation of the average treatment effect of the patients in the trial, with the constraint that external patients cannot receive higher weights than trial patients. The causal implications of the weighting scheme and propensity-score integrated approaches in general are discussed. In a simulation study our method compares favorably with existing (hybrid) borrowing methods in terms of precision and type-I error rate. We illustrate our method by jointly analysing individual patient data from the trial and expanded access program for vemurafenib to treat metastatic melanoma. Our method provides a double safeguard against prior-data conflict and forms a straightforward addition to evidence synthesis methods of trial and real-world (expanded access) data.
5.Fatigue detection via sequential testing of biomechanical data using martingale statistic
Authors:Rupsa Basu, Katharina Proksch
Abstract: Injuries to the knee joint are very common for long-distance and frequent runners, an issue which is often attributed to fatigue. We address the problem of fatigue detection from biomechanical data from different sources, consisting of lower extremity joint angles and ground reaction forces from running athletes with the goal of better understanding the impact of fatigue on the biomechanics of runners in general and on an individual level. This is done by sequentially testing for change in a datastream using a simple martingale test statistic. Time-uniform probabilistic martingale bounds are provided which are used as thresholds for the test statistic. Sharp bounds can be developed by a hybrid of a piece-wise linear- and a law of iterated logarithm- bound over all time regimes, where the probability of an early detection is controlled in a uniform way. If the underlying distribution of the data gradually changes over the course of a run, then a timely upcrossing of the martingale over these bounds is expected. The methods are developed for a setting when change sets in gradually in an incoming stream of data. Parameter selection for the bounds are based on simulations and methodological comparison is done with respect to existing advances. The algorithms presented here can be easily adapted to an online change-detection setting. Finally, we provide a detailed data analysis based on extensive measurements of several athletes and benchmark the fatigue detection results with the runners' individual feedback over the course of the data collection. Qualitative conclusions on the biomechanical profiles of the athletes can be made based on the shape of the martingale trajectories even in the absence of an upcrossing of the threshold.
6.On the minimum information checkerboard copulas under fixed Kendall's rank correlation
Authors:Issey Sukeda, Tomonari Sei
Abstract: Copulas have become very popular as a statistical model to represent dependence structures between multiple variables in many applications. Given a finite number of constraints in advance, the minimum information copula is the closest to the uniform copula when measured in Kullback-Leibler divergence. For these constraints, the expectation of moments such as Spearman's rho are mostly considered in previous researches. These copulas are obtained as the optimal solution to convex programming. On the other hand, other types of correlation have not been studied previously in this context. In this paper, we present MICK, a novel minimum information copula where Kendall's rank correlation is specified. Although this copula is defined as the solution to non-convex optimization problem, we show that the uniqueness of this copula is guaranteed when correlation is small enough. We also show that the family of checkerboard copulas admits representation as non-orthogonal vector space. In doing so, we observe local and global dependencies of MICK, thereby unifying results on minimum information copulas.
7.Bayesian Segmentation Modeling of Epidemic Growth
Authors:Tejasv Bedi, Yanxun Xu, Qiwei Li
Abstract: Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compartmental models, have been proposed in the literature. These models assume that an epidemic would last over a short duration and the observed cases/deaths would attain a single peak. However, some infectious diseases, such as COVID-19, extend over a longer duration than expected. Moreover, time-varying disease transmission rates due to government interventions have made the observed data multi-modal. To address these challenges, this work proposes stochastic epidemiological models under a unified Bayesian framework augmented by a change-point detection mechanism to account for multiple peaks. The Bayesian framework allows us to incorporate prior knowledge, such as dates of influential policy changes, to predict the change-point locations precisely. We develop a trans-dimensional reversible jump Markov chain Monte Carlo algorithm to sample the posterior distributions of epidemiological parameters while estimating the number of change points and the resulting parameters. The proposed method is evaluated and compared to alternative methods in terms of change-point detection, parameter estimation, and long-term forecasting accuracy on both simulated and COVID-19 data of several major states in the United States.