Methodology (stat.ME)
Tue, 05 Sep 2023
1.Some Additional Remarks on Statistical Properties of Cohen's d from Linear Regression
Authors:Jürgen Groß, Annette Möller
Abstract: The size of the effect of the difference in two groups with respect to a variable of interest may be estimated by the classical Cohen's $d$. A recently proposed generalized estimator allows conditioning on further independent variables within the framework of a linear regression model. In this note, it is demonstrated how unbiased estimation of the effect size parameter together with a corresponding standard error may be obtained based on the non-central $t$ distribution. The portrayed estimator may be considered as a natural generalization of the unbiased Hedges' $g$. In addition, confidence interval estimation for the unknown parameter is demonstrated by applying the so-called inversion confidence interval principle. The regarded properties collapse to already known ones in case of absence of any additional independent variables. The stated remarks are illustrated with a publicly available data set.
2.Debiased Regression Adjustment in Completely Randomized Experiments with Moderately High-dimensional Covariates
Authors:Xin Lu, Fan Yang, Yuhao Wang
Abstract: Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment effect estimation. In this paper, we investigate this problem under the randomization-based framework, i.e., that the covariates and potential outcomes of all experimental candidates are assumed as deterministic quantities and the randomness comes solely from the treatment assignment mechanism. Under this framework, to achieve asymptotically valid inference, existing estimators usually require either (i) that the dimension of covariates $p$ grows at a rate no faster than $O(n^{2 / 3})$ as sample size $n \to \infty$; or (ii) certain sparsity constraints on the linear representations of potential outcomes constructed via possibly high-dimensional covariates. In this paper, we consider the moderately high-dimensional regime where $p$ is allowed to be in the same order of magnitude as $n$. We develop a novel debiased estimator with a corresponding inference procedure and establish its asymptotic normality under mild assumptions. Our estimator is model-free and does not require any sparsity constraint on potential outcome's linear representations. We also discuss its asymptotic efficiency improvements over the unadjusted treatment effect estimator under different dimensionality constraints. Numerical analysis confirms that compared to other regression adjustment based treatment effect estimators, our debiased estimator performs well in moderately high dimensions.
3.Detecting Spatial Health Disparities Using Disease Maps
Authors:Luca Aiello, Sudipto Banerjee
Abstract: Epidemiologists commonly use regional aggregates of health outcomes to map mortality or incidence rates and identify geographic disparities. However, to detect health disparities across regions, it is necessary to identify "difference boundaries" that separate neighboring regions with significantly different spatial effects. This can be particularly challenging when dealing with multiple outcomes for each unit and accounting for dependence among diseases and across areal units. In this study, we address the issue of multivariate difference boundary detection for correlated diseases by formulating the problem in terms of Bayesian pairwise multiple comparisons by extending it through the introduction of adjacency modeling and disease graph dependencies. Specifically, we seek the posterior probabilities of neighboring spatial effects being different. To accomplish this, we adopt a class of multivariate areally referenced Dirichlet process models that accommodate spatial and interdisease dependence by endowing the spatial random effects with a discrete probability law. Our method is evaluated through simulation studies and applied to detect difference boundaries for multiple cancers using data from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute.
4.Identifying Causal Effects Using Instrumental Variables from the Auxiliary Population
Authors:Kang Shuai, Shanshan Luo, Wei Li, Yangbo He
Abstract: Instrumental variable approaches have gained popularity for estimating causal effects in the presence of unmeasured confounding. However, the availability of instrumental variables in the primary population is often challenged due to stringent and untestable assumptions. This paper presents a novel method to identify and estimate causal effects in the primary population by utilizing instrumental variables from the auxiliary population, incorporating a structural equation model, even in scenarios with nonlinear treatment effects. Our approach involves using two datasets: one from the primary population with joint observations of treatment and outcome, and another from the auxiliary population providing information about the instrument and treatment. Our strategy differs from most existing methods by not depending on the simultaneous measurements of instrument and outcome. The central idea for identifying causal effects is to establish a valid substitute through the auxiliary population, addressing unmeasured confounding. This is achieved by developing a control function and projecting it onto the function space spanned by the treatment variable. We then propose a three-step estimator for estimating causal effects and derive its asymptotic results. We illustrate the proposed estimator through simulation studies, and the results demonstrate favorable performance. We also conduct a real data analysis to evaluate the causal effect between vitamin D status and BMI.
5.Beyond the classical type I error: Bayesian metrics for Bayesian designs using informative priors
Authors:Nicky Best GSK, UK, Maxine Ajimi AstraZeneca, UK, Beat Neuenschwander Novartis Pharma AG, Switzerland, Gaelle Saint-Hilary Saryga, France Politecnico di Torino, Italy, Simon Wandel Novartis Pharma AG, Switzerland
Abstract: There is growing interest in Bayesian clinical trial designs with informative prior distributions, e.g. for extrapolation of adult data to pediatrics, or use of external controls. While the classical type I error is commonly used to evaluate such designs, it cannot be strictly controlled and it is acknowledged that other metrics may be more appropriate. We focus on two common situations - borrowing control data or information on the treatment contrast - and discuss several fully probabilistic metrics to evaluate the risk of false positive conclusions. Each metric requires specification of a design prior, which can differ from the analysis prior and permits understanding of the behaviour of a Bayesian design under scenarios where the analysis prior differs from the true data generation process. The metrics include the average type I error and the pre-posterior probability of a false positive result. We show that, when borrowing control data, the average type I error is asymptotically (in certain cases strictly) controlled when the analysis and design prior coincide. We illustrate use of these Bayesian metrics with real applications, and discuss how they could facilitate discussions between sponsors, regulators and other stakeholders about the appropriateness of Bayesian borrowing designs for pivotal studies.