Mon, 11 Sep 2023
1.A Note on Location Parameter Estimation using the Weighted Hodges-Lehmann Estimator
Authors:Xuehong Gao, Zhijin Chen, Bosung Kim, Chanseok Park
Abstract: Robust design is one of the main tools employed by engineers for the facilitation of the design of high-quality processes. However, most real-world processes invariably contend with external uncontrollable factors, often denoted as outliers or contaminated data, which exert a substantial distorting effect upon the computed sample mean. In pursuit of mitigating the inherent bias entailed by outliers within the dataset, the concept of weight adjustment emerges as a prudent recourse, to make the sample more representative of the statistical population. In this sense, the intricate challenge lies in the judicious application of these diverse weights toward the estimation of an alternative to the robust location estimator. Different from the previous studies, this study proposes two categories of new weighted Hodges-Lehmann (WHL) estimators that incorporate weight factors in the location parameter estimation. To evaluate their robust performances in estimating the location parameter, this study constructs a set of comprehensive simulations to compare various location estimators including mean, weighted mean, weighted median, Hodges-Lehmann estimator, and the proposed WHL estimators. The findings unequivocally manifest that the proposed WHL estimators clearly outperform the traditional methods in terms of their breakdown points, biases, and relative efficiencies.
2.A conformal test of linear models via permutation-augmented regressions
Abstract: Permutation tests are widely recognized as robust alternatives to tests based on the normal theory. Random permutation tests have been frequently employed to assess the significance of variables in linear models. Despite their widespread use, existing random permutation tests lack finite-sample and assumption-free guarantees for controlling type I error in partial correlation tests. To address this standing challenge, we develop a conformal test through permutation-augmented regressions, which we refer to as PALMRT. PALMRT not only achieves power competitive with conventional methods but also provides reliable control of type I errors at no more than $2\alpha$ given any targeted level $\alpha$, for arbitrary fixed-designs and error distributions. We confirmed this through extensive simulations. Compared to the cyclic permutation test (CPT), which also offers theoretical guarantees, PALMRT does not significantly compromise power or set stringent requirements on the sample size, making it suitable for diverse biomedical applications. We further illustrate their differences in a long-Covid study where PALMRT validated key findings previously identified using the t-test, while CPT suffered from a drastic loss of power. We endorse PALMRT as a robust and practical hypothesis test in scientific research for its superior error control, power preservation, and simplicity.
3.Bayesian spatial+: A joint model perspective
Authors:Isa Marques, Paul F. V. Wiemann
Abstract: A common phenomenon in spatial regression models is spatial confounding. This phenomenon occurs when spatially indexed covariates modeling the mean of the response are correlated with a spatial effect included in the model. spatial+ Dupont et al. (2022) is a popular approach to reducing spatial confounding. spatial+ is a two-stage frequentist approach that explicitly models the spatial structure in the confounded covariate, removes it, and uses the corresponding residuals in the second stage. In a frequentist setting, there is no uncertainty propagation from the first stage estimation determining the residuals since only point estimates are used. Inference can also be cumbersome in a frequentist setting, and some of the gaps in the original approach can easily be remedied in a Bayesian framework. First, a Bayesian joint model can easily achieve uncertainty propagation from the first to the second stage of the model. In a Bayesian framework, we also have the tools to infer the model's parameters directly. Notably, another advantage of using a Bayesian framework we thoroughly explore is the ability to use prior information to impose restrictions on the spatial effects rather than applying them directly to their posterior. We build a joint prior for the smoothness of all spatial effects that simultaneously shrinks towards a high smoothness of the response and imposes that the spatial effect in the response is a smoother of the confounded covariates' spatial effect. This prevents the response from operating at a smaller scale than the covariate and can help to avoid situations where there is insufficient variation in the residuals resulting from the first stage model. We evaluate the performance of the Bayesian spatial+ via both simulated and real datasets.
4.Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation
Authors:Erin E Gabriel, Michael C Sachs, Torben Martinussen, Ingeborg Waernbaum, Els Goetghebeur, Stijn Vansteelandt, Arvid Sjölander
Abstract: There are now many options for doubly robust estimation; however, there is a concerning trend in the applied literature to believe that the combination of a propensity score and an adjusted outcome model automatically results in a doubly robust estimator and/or to misuse more complex established doubly robust estimators. A simple alternative, canonical link generalized linear models (GLM) fit via inverse probability of treatment (propensity score) weighted maximum likelihood estimation followed by standardization (the g-formula) for the average causal effect, is a doubly robust estimation method. Our aim is for the reader not just to be able to use this method, which we refer to as IPTW GLM, for doubly robust estimation, but to fully understand why it has the doubly robust property. For this reason, we define clearly, and in multiple ways, all concepts needed to understand the method and why it is doubly robust. In addition, we want to make very clear that the mere combination of propensity score weighting and an adjusted outcome model does not generally result in a doubly robust estimator. Finally, we hope to dispel the misconception that one can adjust for residual confounding remaining after propensity score weighting by adjusting in the outcome model for what remains `unbalanced' even when using doubly robust estimators. We provide R code for our simulations and real open-source data examples that can be followed step-by-step to use and hopefully understand the IPTW GLM method. We also compare to a much better-known but still simple doubly robust estimator.
5.D-Vine GAM Copula based Quantile Regression with Application to Ensemble Postprocessing
Authors:David Jobst, Annette Möller, Jürgen Groß
Abstract: Temporal, spatial or spatio-temporal probabilistic models are frequently used for weather forecasting. The D-vine (drawable vine) copula quantile regression (DVQR) is a powerful tool for this application field, as it can automatically select important predictor variables from a large set and is able to model complex nonlinear relationships among them. However, the current DVQR does not always explicitly and economically allow to account for additional covariate effects, e.g. temporal or spatio-temporal information. Consequently, we propose an extension of the current DVQR, where we parametrize the bivariate copulas in the D-vine copula through Kendall's Tau which can be linked to additional covariates. The parametrization of the correlation parameter allows generalized additive models (GAMs) and spline smoothing to detect potentially hidden covariate effects. The new method is called GAM-DVQR, and its performance is illustrated in a case study for the postprocessing of 2m surface temperature forecasts. We investigate a constant as well as a time-dependent Kendall's Tau. The GAM-DVQR models are compared to the benchmark methods Ensemble Model Output Statistics (EMOS), its gradient-boosted extension (EMOS-GB) and basic DVQR. The results indicate that the GAM-DVQR models are able to identify time-dependent correlations as well as relevant predictor variables and significantly outperform the state-of-the-art methods EMOS and EMOS-GB. Furthermore, the introduced parameterization allows using a static training period for GAM-DVQR, yielding a more sustainable model estimation in comparison to DVQR using a sliding training window. Finally, we give an outlook of further applications and extensions of the GAM-DVQR model. To complement this article, our method is accompanied by an R-package called gamvinereg.
6.Minimum Area Confidence Set Optimality for Simultaneous Confidence Bands for Percentiles in Linear Regression
Authors:Lingjiao Wang, Yang Han, Wei Liu, Frank Bretz
Abstract: Simultaneous confidence bands (SCBs) for percentiles in linear regression are valuable tools with many applications. In this paper, we propose a novel criterion for comparing SCBs for percentiles, termed the Minimum Area Confidence Set (MACS) criterion. This criterion utilizes the area of the confidence set for the pivotal quantities, which are generated from the confidence set of the unknown parameters. Subsequently, we employ the MACS criterion to construct exact SCBs over any finite covariate intervals and to compare multiple SCBs of different forms. This approach can be used to determine the optimal SCBs. It is discovered that the area of the confidence set for the pivotal quantities of an asymmetric SCB is uniformly and can be very substantially smaller than that of the corresponding symmetric SCB. Therefore, under the MACS criterion, exact asymmetric SCBs should always be preferred. Furthermore, a new computationally efficient method is proposed to calculate the critical constants of exact SCBs for percentiles. A real data example on drug stability study is provided for illustration.