Methodology (stat.ME)
Wed, 13 Sep 2023
1.Empirical Bayes Double Shrinkage for Combining Biased and Unbiased Causal Estimates
Authors:Evan T. R. Rosenman, Francesca Dominici, Luke Miratrix
Abstract: Motivated by the proliferation of observational datasets and the need to integrate non-randomized evidence with randomized controlled trials, causal inference researchers have recently proposed several new methodologies for combining biased and unbiased estimators. We contribute to this growing literature by developing a new class of estimators for the data-combination problem: double-shrinkage estimators. Double-shrinkers first compute a data-driven convex combination of the the biased and unbiased estimators, and then apply a final, Stein-like shrinkage toward zero. Such estimators do not require hyperparameter tuning, and are targeted at multidimensional causal estimands, such as vectors of conditional average treatment effects (CATEs). We derive several workable versions of double-shrinkage estimators and propose a method for constructing valid Empirical Bayes confidence intervals. We also demonstrate the utility of our estimators using simulations on data from the Women's Health Initiative.
2.Bayesian jackknife empirical likelihood with complex surveys
Authors:Mengdong Shang, Xia Chen
Abstract: We introduce a novel approach called the Bayesian Jackknife empirical likelihood method for analyzing survey data obtained from various unequal probability sampling designs. This method is particularly applicable to parameters described by U-statistics. Theoretical proofs establish that under a non-informative prior, the Bayesian Jackknife pseudo-empirical likelihood ratio statistic converges asymptotically to a normal distribution. This statistic can be effectively employed to construct confidence intervals for complex survey samples. In this paper, we investigate various scenarios, including the presence or absence of auxiliary information and the use of design weights or calibration weights. We conduct numerical studies to assess the performance of the Bayesian Jackknife pseudo-empirical likelihood ratio confidence intervals, focusing on coverage probability and tail error rates. Our findings demonstrate that the proposed methods outperform those based solely on the jackknife pseudo-empirical likelihood, addressing its limitations.
3.Spatial autoregressive fractionally integrated moving average model
Authors:Philipp Otto, Philipp Sibbertsen
Abstract: In this paper, we introduce the concept of fractional integration for spatial autoregressive models. We show that the range of the dependence can be spatially extended or diminished by introducing a further fractional integration parameter to spatial autoregressive moving average models (SARMA). This new model is called the spatial autoregressive fractionally integrated moving average model, briefly sp-ARFIMA. We show the relation to time-series ARFIMA models and also to (higher-order) spatial autoregressive models. Moreover, an estimation procedure based on the maximum-likelihood principle is introduced and analysed in a series of simulation studies. Eventually, the use of the model is illustrated by an empirical example of atmospheric fine particles, so-called aerosol optical thickness, which is important in weather, climate and environmental science.
4.CARE: Large Precision Matrix Estimation for Compositional Data
Authors:Shucong Zhang, Huiyuan Wang, Wei Lin
Abstract: High-dimensional compositional data are prevalent in many applications. The simplex constraint poses intrinsic challenges to inferring the conditional dependence relationships among the components forming a composition, as encoded by a large precision matrix. We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart, which is shown to be asymptotically identifiable under suitable sparsity assumptions. By exploiting this connection, we propose a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix. We derive rates of convergence for the estimator and provide theoretical guarantees on support recovery and data-driven parameter tuning. Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis. In particular, in sufficiently high dimensions, the CARE estimator achieves minimax optimality and performs as well as if the basis were observed. We further discuss how our framework can be extended to handle data containing zeros, including sampling zeros and structural zeros. The advantages of CARE over existing methods are illustrated by simulation studies and an application to inferring microbial ecological networks in the human gut.
5.Basket trial designs based on power priors
Authors:Lukas Baumann, Lukas Sauer, Meinhard Kieser
Abstract: In basket trials a treatment is investigated in several subgroups. They are primarily used in oncology in early clinical phases as single-arm trials with a binary endpoint. For their analysis primarily Bayesian methods have been suggested, as they allow partial sharing of information based on the observed similarity between subgroups. Fujikawa et al. (2020) suggested an approach using empirical Bayes methods that allows flexible sharing based on easily interpreteable weights derived from the Jensen-Shannon divergence between the subgroupwise posterior distributions. We show that this design is closely related to the method of power priors and investigate several modifications of Fujikawa's design using methods from the power prior literature. While in Fujikawa's design, the amount of information that is shared between two baskets is only determined by their pairwise similarity, we also discuss extensions where the outcomes of all baskets are considered in the computation of the sharing-weights. The results of our comparison study show that the power prior design has compareable performance to fully Bayesian designs in a range of different scenarios. At the same time, the power prior design is computationally cheap and even allows analytical computation of operating characteristics in some settings.
6.An adaptive functional regression framework for spatially heterogeneous signals in spectroscopy
Authors:Federico Ferraccioli, Alessandro Casa, Marco Stefanucci
Abstract: The attention towards food products characteristics, such as nutritional properties and traceability, has risen substantially in the recent years. Consequently, we are witnessing an increased demand for the development of modern tools to monitor, analyse and assess food quality and authenticity. Within this framework, an essential set of data collection techniques is provided by vibrational spectroscopy. In fact, methods such as Fourier near infrared and mid infrared spectroscopy have been often exploited to analyze different foodstuffs. Nonetheless, existing statistical methods often struggle to deal with the challenges presented by spectral data, such as their high dimensionality, paired with strong relationships among the wavelengths. Therefore, the definition of proper statistical procedures accounting for the peculiarities of spectroscopy data is paramount. In this work, motivated by two dairy science applications, we propose an adaptive functional regression framework for spectroscopy data. The method stems from the trend filtering literature, allowing the definition of a highly flexible and adaptive estimator able to handle different degrees of smoothness. We provide a fast optimization procedure that is suitable for both Gaussian and non Gaussian scalar responses, and allows for the inclusion of scalar covariates. Moreover, we develop inferential procedures for both the functional and the scalar component thus enhancing not only the interpretability of the results, but also their usability in real world scenarios. The method is applied to two sets of MIR spectroscopy data, providing excellent results when predicting milk chemical composition and cows' dietary treatments. Moreover, the developed inferential routine provides relevant insights, potentially paving the way for a richer interpretation and a better understanding of the impact of specific wavelengths on milk features.
7.A Study of "Symbiosis Bias" in A/B Tests of Recommendation Algorithms
Authors:David Holtz, Jennifer Brennan, Jean Pouget-Abadie
Abstract: One assumption underlying the unbiasedness of global treatment effect estimates from randomized experiments is the stable unit treatment value assumption (SUTVA). Many experiments that compare the efficacy of different recommendation algorithms violate SUTVA, because each algorithm is trained on a pool of shared data, often coming from a mixture of recommendation algorithms in the experiment. We explore, through simulation, cluster randomized and data-diverted solutions to mitigating this bias, which we call "symbiosis bias."