Methodology (stat.ME)
Tue, 22 Aug 2023
1.Computational Inference for Directions in Canonical Correlation Analysis
Authors:Daniel Kessler, Elizaveta Levina
Abstract: Canonical Correlation Analysis (CCA) is a method for analyzing pairs of random vectors; it learns a sequence of paired linear transformations such that the resultant canonical variates are maximally correlated within pairs while uncorrelated across pairs. CCA outputs both canonical correlations as well as the canonical directions which define the transformations. While inference for canonical correlations is well developed, conducting inference for canonical directions is more challenging and not well-studied, but is key to interpretability. We propose a computational bootstrap method (combootcca) for inference on CCA directions. We conduct thorough simulation studies that range from simple and well-controlled to complex but realistic and validate the statistical properties of combootcca while comparing it to several competitors. We also apply the combootcca method to a brain imaging dataset and discover linked patterns in brain connectivity and behavioral scores.
2.A one-step spatial+ approach to mitigate spatial confounding in multivariate spatial areal models
Authors:A. Urdangarin, T. Goicoa, T. Kneib, M. D. Ugarte
Abstract: Ecological spatial areal models encounter the well-known and challenging problem of spatial confounding. This issue makes it arduous to distinguish between the impacts of observed covariates and spatial random effects. Despite previous research and various proposed methods to tackle this problem, finding a definitive solution remains elusive. In this paper, we propose a one-step version of the spatial+ approach that involves dividing the covariate into two components. One component captures large-scale spatial dependence, while the other accounts for short-scale dependence. This approach eliminates the need to separately fit spatial models for the covariates. We apply this method to analyze two forms of crimes against women, namely rapes and dowry deaths, in Uttar Pradesh, India, exploring their relationship with socio-demographic covariates. To evaluate the performance of the new approach, we conduct extensive simulation studies under different spatial confounding scenarios. The results demonstrate that the proposed method provides reliable estimates of fixed effects and posterior correlations between different responses.
3.Identification and validation of periodic autoregressive model with additive noise: finite-variance case
Authors:Wojciech Żuławiński, Aleksandra Grzesiek, Radosław Zimroz, Agnieszka Wyłomańska
Abstract: In this paper, we address the problem of modeling data with periodic autoregressive (PAR) time series and additive noise. In most cases, the data are processed assuming a noise-free model (i.e., without additive noise), which is not a realistic assumption in real life. The first two steps in PAR model identification are order selection and period estimation, so the main focus is on these issues. Finally, the model should be validated, so a procedure for analyzing the residuals, which are considered here as multidimensional vectors, is proposed. Both order and period selection, as well as model validation, are addressed by using the characteristic function (CF) of the residual series. The CF is used to obtain the probability density function, which is utilized in the information criterion and for residuals distribution testing. To complete the PAR model analysis, the procedure for estimating the coefficients is necessary. However, this issue is only mentioned here as it is a separate task (under consideration in parallel). The presented methodology can be considered as the general framework for analyzing data with periodically non-stationary characteristics disturbed by finite-variance external noise. The original contribution is in the selection of the optimal model order and period identification, as well as the analysis of residuals. All these findings have been inspired by our previous work on machine condition monitoring that used PAR modeling
4.The modified Yule-Walker method for multidimensional infinite-variance periodic autoregressive model of order 1
Authors:Prashant Giri, Aleksandra Grzesiek, Wojciech Żuławiński, S. Sundar, Agnieszka Wyłomańska
Abstract: The time series with periodic behavior, such as the periodic autoregressive (PAR) models belonging to the class of the periodically correlated processes, are present in various real applications. In the literature, such processes were considered in different directions, especially with the Gaussian-distributed noise. However, in most of the applications, the assumption of the finite-variance distribution seems to be too simplified. Thus, one can consider the extensions of the classical PAR model where the non-Gaussian distribution is applied. In particular, the Gaussian distribution can be replaced by the infinite-variance distribution, e.g. by the $\alpha-$stable distribution. In this paper, we focus on the multidimensional $\alpha-$stable PAR time series models. For such models, we propose a new estimation method based on the Yule-Walker equations. However, since for the infinite-variance case the covariance does not exist, thus it is replaced by another measure, namely the covariation. In this paper we propose to apply two estimators of the covariation measure. The first one is based on moment representation (moment-based) while the second one - on the spectral measure representation (spectral-based). The validity of the new approaches are verified using the Monte Carlo simulations in different contexts, including the sample size and the index of stability of the noise. Moreover, we compare the moment-based covariation-based method with spectral-based covariation-based technique. Finally, the real data analysis is presented.
5.Weighting Based Approaches to Borrowing Historical Controls for Indirect comparison for Time-to-Event Data with a Cure Fraction
Authors:Jixian Wang, Hongtao Zhang, Ram Tiwari
Abstract: To use historical controls for indirect comparison with single-arm trials, the population difference between data sources should be adjusted to reduce confounding bias. The adjustment is more difficult for time-to-event data with a cure fraction. We propose different adjustment approaches based on pseudo observations and calibration weighting by entropy balancing. We show a simple way to obtain the pseudo observations for the cure rate and propose a simple weighted estimator based on them. Estimation of the survival function in presence of a cure fraction is also considered. Simulations are conducted to examine the proposed approaches. An application to a breast cancer study is presented.
6.Towards a unified approach to formal risk of bias assessments for causal and descriptive inference
Authors:Oliver L. Pescott, Robin J. Boyd, Gary D. Powney, Gavin B. Stewart
Abstract: Statistics is sometimes described as the science of reasoning under uncertainty. Statistical models provide one view of this uncertainty, but what is frequently neglected is the invisible portion of uncertainty: that assumed not to exist once a model has been fitted to some data. Systematic errors, i.e. bias, in data relative to some model and inferential goal can seriously undermine research conclusions, and qualitative and quantitative techniques have been created across several disciplines to quantify and generally appraise such potential biases. Perhaps best known are so-called risk of bias assessment instruments used to investigate the likely quality of randomised controlled trials in medical research. However, the logic of assessing the risks caused by various types of systematic error to statistical arguments applies far more widely. This logic applies even when statistical adjustment strategies for potential biases are used, as these frequently make assumptions (e.g. data missing at random) that can never be guaranteed in finite samples. Mounting concern about such situations can be seen in the increasing calls for greater consideration of biases caused by nonprobability sampling in descriptive inference (i.e. survey sampling), and the statistical generalisability of in-sample causal effect estimates in causal inference; both of which relate to the consideration of model-based and wider uncertainty when presenting research conclusions from models. Given that model-based adjustments are never perfect, we argue that qualitative risk of bias reporting frameworks for both descriptive and causal inferential arguments should be further developed and made mandatory by journals and funders. It is only through clear statements of the limits to statistical arguments that consumers of research can fully judge their value for any specific application.
7.Nonparametric Assessment of Variable Selection and Ranking Algorithms
Authors:Zhou Tang, Ted Westling
Abstract: Selecting from or ranking a set of candidates variables in terms of their capacity for predicting an outcome of interest is an important task in many scientific fields. A variety of methods for variable selection and ranking have been proposed in the literature. In practice, it can be challenging to know which method is most appropriate for a given dataset. In this article, we propose methods of comparing variable selection and ranking algorithms. We first introduce measures of the quality of variable selection and ranking algorithms. We then define estimators of our proposed measures, and establish asymptotic results for our estimators in the regime where the dimension of the covariates is fixed as the sample size grows. We use our results to conduct large-sample inference for our measures, and we propose a computationally efficient partial bootstrap procedure to potentially improve finite-sample inference. We assess the properties of our proposed methods using numerical studies, and we illustrate our methods with an analysis of data for predicting wine quality from its physicochemical properties.