Methodology (stat.ME)
Tue, 06 Jun 2023
1.Resampling-based confidence intervals and bands for the average treatment effect in observational studies with competing risks
Authors:Jasmin Rühl, Sarah Friedrich
Abstract: The g-formula can be used to estimate the treatment effect while accounting for confounding bias in observational studies. With regard to time-to-event endpoints, possibly subject to competing risks, the construction of valid pointwise confidence intervals and time-simultaneous confidence bands for the causal risk difference is complicated, however. A convenient solution is to approximate the asymptotic distribution of the corresponding stochastic process by means of resampling approaches. In this paper, we consider three different resampling methods, namely the classical nonparametric bootstrap, the influence function equipped with a resampling approach as well as a martingale-based bootstrap version. We set up a simulation study to compare the accuracy of the different techniques, which reveals that the wild bootstrap should in general be preferred if the sample size is moderate and sufficient data on the event of interest have been accrued. For illustration, the three resampling methods are applied to data on the long-term survival in patients with early-stage Hodgkin's disease.
2.Statistical inference for sketching algorithms
Authors:R. P. Browne, J. L. Andrews
Abstract: Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set or a combination of the full and sketched data sets. Previous work has obtained the distribution of these estimators under repeated sketching, along with the first two moments for both estimators. Using a different approach, we also derive the distribution of the complete sketch estimator, but additionally consider the error term under both repeated sketching and sampling. Importantly, we obtain pivotal quantities which are based solely on the sketched data set which specifically not requiring information from the full data model fit. These pivotal quantities can be used for inference on the full data set regression estimates or the model parameters. For partial sketching, we derive pivotal quantities for a marginal test and an approximate distribution for the partial sketch under repeated sketching or repeated sampling, again avoiding reliance on a full data model fit. We extend these results to include the Hadamard and Clarkson-Woodruff sketches then compare them in a simulation study.
3.Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning
Authors:Kwangho Kim, José R. Zubizarreta
Abstract: We propose a simple and general framework for nonparametric estimation of heterogeneous treatment effects under fairness constraints. Under standard regularity conditions, we show that the resulting estimators possess the double robustness property. We use this framework to characterize the trade-off between fairness and the maximum welfare achievable by the optimal policy. We evaluate the methods in a simulation study and illustrate them in a real-world case study.
4.Bayesian inference for group-level cortical surface image-on-scalar-regression with Gaussian process priors
Authors:Andrew S. Whiteman, Timothy D. Johnson, Jian Kang
Abstract: In regression-based analyses of group-level neuroimage data researchers typically fit a series of marginal general linear models to image outcomes at each spatially-referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, resulting inference can be poorly calibrated. Spatial modeling of effects of interest leads to more powerful analyses, however the number of locations in a typical neuroimage can preclude standard computation with explicitly spatial models. Here we contribute a Bayesian spatial regression model for group-level neuroimaging analyses. We induce regularization of spatially varying regression coefficient functions through Gaussian process priors. When combined with a simple nonstationary model for the error process, our prior hierarchy can lead to more data-adaptive smoothing than standard methods. We achieve computational tractability through Vecchia approximation of our prior which, critically, can be constructed for a wide class of spatial correlation functions and results in prior models that retain full spatial rank. We outline several ways to work with our model in practice and compare performance against standard vertex-wise analyses. Finally we illustrate our method in an analysis of cortical surface fMRI task contrast data from a large cohort of children enrolled in the Adolescent Brain Cognitive Development study.
5.Bayesian meta-analysis for evaluating treatment effectiveness in biomarker subgroups using trials of mixed patient populations
Authors:Lorna Wheaton, Dan Jackson, Sylwia Bujkiewicz
Abstract: During drug development, evidence can emerge to suggest a treatment is more effective in a specific patient subgroup. Whilst early trials may be conducted in biomarker-mixed populations, later trials are more likely to enrol biomarker-positive patients alone, thus leading to trials of the same treatment investigated in different populations. When conducting a meta-analysis, a conservative approach would be to combine only trials conducted in the biomarker-positive subgroup. However, this discards potentially useful information on treatment effects in the biomarker-positive subgroup concealed within observed treatment effects in biomarker-mixed populations. We extend standard random-effects meta-analysis to combine treatment effects obtained from trials with different populations to estimate pooled treatment effects in a biomarker subgroup of interest. The model assumes a systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups, which is estimated from trials which report either or both treatment effects. The estimated systematic difference and proportion of biomarker-negative patients in biomarker-mixed studies are used to interpolate treatment effects in the biomarker-positive subgroup from observed treatment effects in the biomarker-mixed population. The developed methods are applied to an illustrative example in metastatic colorectal cancer and evaluated in a simulation study. In the example, the developed method resulted in improved precision of the pooled treatment effect estimate compared to standard random-effects meta-analysis of trials investigating only biomarker-positive patients. The simulation study confirmed that when the systematic difference in treatment effects between biomarker subgroups is not very large, the developed method can improve precision of estimation of pooled treatment effects while maintaining low bias.
6.U-Statistic Reduction: Higher-Order Accurate Risk Control and Statistical-Computational Trade-Off, with Application to Network Method-of-Moments
Authors:Meijia Shao, Dong Xia, Yuan Zhang
Abstract: U-statistics play central roles in many statistical learning tools but face the haunting issue of scalability. Significant efforts have been devoted into accelerating computation by U-statistic reduction. However, existing results almost exclusively focus on power analysis, while little work addresses risk control accuracy -- comparatively, the latter requires distinct and much more challenging techniques. In this paper, we establish the first statistical inference procedure with provably higher-order accurate risk control for incomplete U-statistics. The sharpness of our new result enables us to reveal how risk control accuracy also trades off with speed for the first time in literature, which complements the well-known variance-speed trade-off. Our proposed general framework converts the long-standing challenge of formulating accurate statistical inference procedures for many different designs into a surprisingly routine task. This paper covers non-degenerate and degenerate U-statistics, and network moments. We conducted comprehensive numerical studies and observed results that validate our theory's sharpness. Our method also demonstrates effectiveness on real-world data applications.
7.Functional repeated measures analysis of variance and its application
Authors:Katarzyna Kuryło, Łukasz Smaga
Abstract: This paper is motivated by medical studies in which the same patients with multiple sclerosis are examined at several successive visits and described by fractional anisotropy tract profiles, which can be represented as functions. Since the observations for each patient are dependent random processes, they follow a repeated measures design for functional data. To compare the results for different visits, we thus consider functional repeated measures analysis of variance. For this purpose, a pointwise test statistic is constructed by adapting the classical test statistic for one-way repeated measures analysis of variance to the functional data framework. By integrating and taking the supremum of the pointwise test statistic, we create two global test statistics. Apart from verifying the general null hypothesis on the equality of mean functions corresponding to different objects, we also propose a simple method for post hoc analysis. We illustrate the finite sample properties of permutation and bootstrap testing procedures in an extensive simulation study. Finally, we analyze a motivating real data example in detail.