Methodology (stat.ME)
Thu, 04 May 2023
1.On adjustment for temperature in heatwave epidemiology: a new method and toward clarification of methods to estimate health effects of heatwaves
Authors:Honghyok Kim, Michelle Bell
Abstract: Defining the effect of exposure of interest and selecting an appropriate estimation method are prerequisite for causal inference. Understanding the ways in which association between heatwaves (i.e., consecutive days of extreme high temperature) and an outcome depends on whether adjustment was made for temperature and how such adjustment was conducted, is limited. This paper aims to investigate this dependency, demonstrate that temperature is a confounder in heatwave-outcome associations, and introduce a new modeling approach to estimate a new heatwave-outcome relation: E[R(Y)|HW=1, Z]/E[R(Y)|T=OT, Z], where HW is a daily binary variable to indicate the presence of a heatwave; R(Y) is the risk of an outcome, Y; T is a temperature variable; OT is optimal temperature; and Z is a set of confounders including typical confounders but also some types of T as a confounder. We recommend characterization of heatwave-outcome relations and careful selection of modeling approaches to understand the impacts of heatwaves under climate change. We demonstrate our approach using real-world data for Seoul, which suggests that the effect of heatwaves may be larger than what may be inferred from the extant literature. An R package, HEAT (Heatwave effect Estimation via Adjustment for Temperature), was developed and made publicly available.
2.Credibility of high $R^2$ in regression problems: a permutation approach
Authors:Michał Ciszewski, Jakob Söhl, Ton Leenen, Bart van Trigt, Geurt Jongbloed
Abstract: The question of whether $Y$ can be predicted based on $X$ often arises and while a well adjusted model may perform well on observed data, the risk of overfitting always exists, leading to poor generalization error on unseen data. This paper proposes a rigorous permutation test to assess the credibility of high $R^2$ values in regression models, which can also be applied to any measure of goodness of fit, without the need for sample splitting, by generating new pairings of $(X_i, Y_j)$ and providing an overall interpretation of the model's accuracy. It introduces a new formulation of the null hypothesis and justification for the test, which distinguishes it from previous literature. The theoretical findings are applied to both simulated data and sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test, showing that the less informative the predictors, the lower the probability of rejecting the null hypothesis, and emphasizing that detecting weaker dependence between variables requires a sufficient sample size.
3.On factor copula-based mixed regression models
Authors:Pavel Krupskii, Bouchra R Nasri, Bruno N Remillard
Abstract: In this article, a copula-based method for mixed regression models is proposed, where the conditional distribution of the response variable, given covariates, is modelled by a parametric family of continuous or discrete distributions, and the effect of a common latent variable pertaining to a cluster is modelled with a factor copula. We show how to estimate the parameters of the copula and the parameters of the margins, and we find the asymptotic behaviour of the estimation errors. Numerical experiments are performed to assess the precision of the estimators for finite samples. An example of an application is given using COVID-19 vaccination hesitancy from several countries. Computations are based on R package CopulaGAMM.
4.An Efficient Doubly-robust Imputation Framework for Longitudinal Dropout, with an Application to an Alzheimer's Clinical Trial
Authors:Yuqi Qiu, Karen Messer
Abstract: We develop a novel doubly-robust (DR) imputation framework for longitudinal studies with monotone dropout, motivated by the informative dropout that is common in FDA-regulated trials for Alzheimer's disease. In this approach, the missing data are first imputed using a doubly-robust augmented inverse probability weighting (AIPW) estimator, then the imputed completed data are substituted into a full-data estimating equation, and the estimate is obtained using standard software. The imputed completed data may be inspected and compared to the observed data, and standard model diagnostics are available. The same imputed completed data can be used for several different estimands, such as subgroup analyses in a clinical trial, allowing for reduced computation and increased consistency across analyses. We present two specific DR imputation estimators, AIPW-I and AIPW-S, study their theoretical properties, and investigate their performance by simulation. AIPW-S has substantially reduced computational burden compared to many other DR estimators, at the cost of some loss of efficiency and the requirement of stronger assumptions. Simulation studies support the theoretical properties and good performance of the DR imputation framework. Importantly, we demonstrate their ability to address time-varying covariates, such as a time by treatment interaction. We illustrate using data from a large randomized Phase III trial investigating the effect of donepezil in Alzheimer's disease, from the Alzheimer's Disease Cooperative Study (ADCS) group.
5.Marginal Inference for Hierarchical Generalized Linear Mixed Models with Patterned Covariance Matrices Using the Laplace Approximation
Authors:Jay M. Ver Hoef, Eryn Blagg, Michael Dumelle, Philip M. Dixon, Dale L. Zimmerman, Paul Conn
Abstract: Using a hierarchical construction, we develop methods for a wide and flexible class of models by taking a fully parametric approach to generalized linear mixed models with complex covariance dependence. The Laplace approximation is used to marginally estimate covariance parameters while integrating out all fixed and latent random effects. The Laplace approximation relies on Newton-Raphson updates, which also leads to predictions for the latent random effects. We develop methodology for complete marginal inference, from estimating covariance parameters and fixed effects to making predictions for unobserved data, for any patterned covariance matrix in the hierarchical generalized linear mixed models framework. The marginal likelihood is developed for six distributions that are often used for binary, count, and positive continuous data, and our framework is easily extended to other distributions. The methods are illustrated with simulations from stochastic processes with known parameters, and their efficacy in terms of bias and interval coverage is shown through simulation experiments. Examples with binary and proportional data on election results, count data for marine mammals, and positive-continuous data on heavy metal concentration in the environment are used to illustrate all six distributions with a variety of patterned covariance structures that include spatial models (e.g., geostatistical and areal models), time series models (e.g., first-order autoregressive models), and mixtures with typical random intercepts based on grouping.