arXiv daily

Methodology (stat.ME)

Tue, 27 Jun 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.General multiple tests for functional data

Authors:Merle Munko, Marc Ditzhaus, Markus Pauly, Łukasz Smaga, Jin-Ting Zhang

Abstract: While there exists several inferential methods for analyzing functional data in factorial designs, there is a lack of statistical tests that are valid (i) in general designs, (ii) under non-restrictive assumptions on the data generating process and (iii) allow for coherent post-hoc analyses. In particular, most existing methods assume Gaussianity or equal covariance functions across groups (homoscedasticity) and are only applicable for specific study designs that do not allow for evaluation of interactions. Moreover, all available strategies are only designed for testing global hypotheses and do not directly allow a more in-depth analysis of multiple local hypotheses. To address the first two problems (i)-(ii), we propose flexible integral-type test statistics that are applicable in general factorial designs under minimal assumptions on the data generating process. In particular, we neither postulate homoscedasticity nor Gaussianity. To approximate the statistics' null distribution, we adopt a resampling approach and validate it methodologically. Finally, we use our flexible testing framework to (iii) infer several local null hypotheses simultaneously. To allow for powerful data analysis, we thereby take the complex dependencies of the different local test statistics into account. In extensive simulations we confirm that the new methods are flexibly applicable. Two illustrate data analyses complete our study. The new testing procedures are implemented in the R package multiFANOVA, which will be available on CRAN soon.

2.Multilayer random dot product graphs: Estimation and online change point detection

Authors:Fan Wang, Wanshan Li, Oscar Hernan Madrid Padilla, Yi Yu, Alessandro Rinaldo

Abstract: In this paper, we first introduce the multilayer random dot product graph (MRDPG) model, which can be seen as an extension of the random dot product graph model to multilayer networks. The MRDPG model is convenient for incorporating nodes' latent positions when understanding connectivity. By modelling a multilayer network as an MRDPG, we further deploy a tensor-based method and demonstrate its superiority over the state-of-the-art methods. We then move from a static to a dynamic MRDPG and are concerned with online change point detection problems. At every time point, we observe a realisation from an $L$-layered MRDPG. Across layers, we assume shared common node sets and latent positions, but allow for different connectivity matrices. In this paper we unfold a comprehensive picture concerning a range of problems. For both fixed and random latent position cases, we propose efficient online change point detection algorithms, minimising the delay in detection while controlling the false alarms. Notably, in the random latent position case, we devise a novel nonparametric change point detection algorithm with a kernel estimator in its core, allowing for the case when the density does not exist, accommodating stochastic block models as special cases. Our theoretical findings are supported by extensive numerical experiments, with the code available online https://github.com/MountLee/MRDPG.

3.Multivariate Rank-Based Analysis of Multiple Endpoints in Clinical Trials: A Global Test Approach

Authors:Kexuan Li, Lingli Yang, Shaofei Zhao, Susie Sinks, Luan Lin, Peng Sun

Abstract: Clinical trials often involve the assessment of multiple endpoints to comprehensively evaluate the efficacy and safety of interventions. In the work, we consider a global nonparametric testing procedure based on multivariate rank for the analysis of multiple endpoints in clinical trials. Unlike other existing approaches that rely on pairwise comparisons for each individual endpoint, the proposed method directly incorporates the multivariate ranks of the observations. By considering the joint ranking of all endpoints, the proposed approach provides robustness against diverse data distributions and censoring mechanisms commonly encountered in clinical trials. Through extensive simulations, we demonstrate the superior performance of the multivariate rank-based approach in controlling type I error and achieving higher power compared to existing rank-based methods. The simulations illustrate the advantages of leveraging multivariate ranks and highlight the robustness of the approach in various settings. The proposed method offers an effective tool for the analysis of multiple endpoints in clinical trials, enhancing the reliability and efficiency of outcome evaluations.

4.Testing for asymmetric dependency structures in financial markets: regime-switching and local Gaussian correlation

Authors:Kristian Gundersen, Timothée Bacri, Jan Bulla, Sondre Hølleland, Bård Støve

Abstract: This paper examines asymmetric and time-varying dependency structures between financial returns, using a novel approach consisting of a combination of regime-switching models and the local Gaussian correlation (LGC). We propose an LGC-based bootstrap test for whether the dependence structure in financial returns across different regimes is equal. We examine this test in a Monte Carlo study, where it shows good level and power properties. We argue that this approach is more intuitive than competing approaches, typically combining regime-switching models with copula theory. Furthermore, the LGC is a semi-parametric approach, hence avoids any parametric specification of the dependence structure. We illustrate our approach using returns from the US-UK stock markets and the US stock and government bond markets. Using a two-regime model for the US-UK stock returns, the test rejects equality of the dependence structure in the two regimes. Furthermore, we find evidence of lower tail dependence in the regime associated with financial downturns in the LGC structure. For a three-regime model fitted to US stock and bond returns, the test rejects equality of the dependence structures between all regime pairs. Furthermore, we find that the LGC has a primarily positive relationship in the time period 1980-2000, mostly a negative relationship from 2000 and onwards. In addition, the regime associated with bear markets indicates less, but asymmetric dependence, clearly documenting the loss of diversification benefits in times of crisis.

5.Bayesian Interrupted Time Series for evaluating policy change on mental well-being: an application to England's welfare reform

Authors:Connor Gascoigne, Marta Blangiardo, Zejing Shao, Annie Jeffery, Sara Geneletti, James Kirkbride, Gianluca Baio

Abstract: Factors contributing to social inequalities are also associated with negative mental health outcomes leading to disparities in mental well-being. We propose a Bayesian hierarchical model which can evaluate the impact of policies on population well-being, accounting for spatial/temporal dependencies. Building on an interrupted time series framework, our approach can evaluate how different profiles of individuals are affected in different ways, whilst accounting for their uncertainty. We apply the framework to assess the impact of the United Kingdoms welfare reform, which took place throughout the 2010s, on mental well-being using data from the UK Household Longitudinal Study. The additional depth of knowledge is essential for effective evaluation of current policy and implementation of future policy.

6.Sparse estimation in ordinary kriging for functional data

Authors:Hidetoshi Matsui, Yuya Yamakawa

Abstract: We introduce a sparse estimation in the ordinary kriging for functional data. The functional kriging predicts a feature given as a function at a location where the data are not observed by a linear combination of data observed at other locations. To estimate the weights of the linear combination, we apply the lasso-type regularization in minimizing the expected squared error. We derive an algorithm to derive the estimator using the augmented Lagrange method. Tuning parameters included in the estimation procedure are selected by cross-validation. Since the proposed method can shrink some of the weights of the linear combination toward zeros exactly, we can investigate which locations are necessary or unnecessary to predict the feature. Simulation and real data analysis show that the proposed method appropriately provides reasonable results.

7.Robust and efficient projection predictive inference

Authors:Yann McLatchie, Sölvi Rögnvaldsson, Frank Weber, Aki Vehtari

Abstract: The concepts of Bayesian prediction, model comparison, and model selection have developed significantly over the last decade. As a result, the Bayesian community has witnessed a rapid growth in theoretical and applied contributions to building and selecting predictive models. Projection predictive inference in particular has shown promise to this end, finding application across a broad range of fields. It is less prone to over-fitting than na\"ive selection based purely on cross-validation or information criteria performance metrics, and has been known to out-perform other methods in terms of predictive performance. We survey the core concept and contemporary contributions to projection predictive inference, and present a safe, efficient, and modular workflow for prediction-oriented model selection therein. We also provide an interpretation of the projected posteriors achieved by projection predictive inference in terms of their limitations in causal settings.

8.Assessing small area estimates via artificial populations from KBAABB: a kNN-based approximation to ABB

Authors:Jerzy A. Wieczorek, Grayson W. White, Zachariah W. Cody, Emily X. Tan, Jacqueline O. Chistolini, Kelly S. McConville, Tracey S. Frescino, Gretchen G. Moisen

Abstract: Comparing and evaluating small area estimation (SAE) models for a given application is inherently difficult. Typically, we do not have enough data in many areas to check unit-level modeling assumptions or to assess unit-level predictions empirically; and there is no ground truth available for checking area-level estimates. Design-based simulation from artificial populations can help with each of these issues, but only if the artificial populations (a) realistically represent the application at hand and (b) are not built using assumptions that could inherently favor one SAE model over another. In this paper, we borrow ideas from random hot deck, approximate Bayesian bootstrap (ABB), and k nearest neighbor (kNN) imputation methods, which are often used for multiple imputation of missing data. We propose a kNN-based approximation to ABB (KBAABB) for a different purpose: generating an artificial population when rich unit-level auxiliary data is available. We introduce diagnostic checks on the process of building the artificial population itself, and we demonstrate how to use such an artificial population for design-based simulation studies to compare and evaluate SAE models, using real data from the Forest Inventory and Analysis (FIA) program of the US Forest Service. We illustrate how such simulation studies may be disseminated and explored interactively through an online R Shiny application.

9.Network-Adjusted Covariates for Community Detection

Authors:Yaofang Hu, Wanjie Wang

Abstract: Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.

10.Biclustering random matrix partitions with an application to classification of forensic body fluids

Authors:Chieh-Hsi Wu, Amy D. Roeder, Geoff K. Nicholls

Abstract: Classification of unlabeled data is usually achieved by supervised learning from labeled samples. Although there exist many sophisticated supervised machine learning methods that can predict the missing labels with a high level of accuracy, they often lack the required transparency in situations where it is important to provide interpretable results and meaningful measures of confidence. Body fluid classification of forensic casework data is the case in point. We develop a new Biclustering Dirichlet Process (BDP), with a three-level hierarchy of clustering, and a model-based approach to classification which adapts to block structure in the data matrix. As the class labels of some observations are missing, the number of rows in the data matrix for each class is unknown. The BDP handles this and extends existing biclustering methods by simultaneously biclustering multiple matrices each having a randomly variable number of rows. We demonstrate our method by applying it to the motivating problem, which is the classification of body fluids based on mRNA profiles taken from crime scenes. The analyses of casework-like data show that our method is interpretable and produces well-calibrated posterior probabilities. Our model can be more generally applied to other types of data with a similar structure to the forensic data.

11.A non-parametric approach to detect patterns in binary sequences

Authors:Anushka De

Abstract: To determine any pattern in an ordered binary sequence of wins and losses of a player over a period of time, the Runs Test may show results contradictory to the intuition visualised by scatter plots of win proportions over time. We design a test suitable for this purpose by computing the gaps between two consecutive wins and then using exact binomial tests and non-parametric tests like Kendall's Tau and Siegel-Tukey's test for scale problem for determination of heteroscedastic patterns and direction of the occurrence of wins. Further modifications suggested by Jan Vegelius(1982) have been applied in the Siegel Tukey test to adjust for tied ranks.

12.Likelihood-free neural Bayes estimators for censored peaks-over-threshold models

Authors:Jordan Richards, Matthew Sainsbury-Dale, Andrew Zammit-Mangion, Raphaël Huser

Abstract: Inference for spatial extremal dependence models can be computationally burdensome in moderate-to-high dimensions due to their reliance on intractable and/or censored likelihoods. Exploiting recent advances in likelihood-free inference with neural Bayes estimators (that is, neural estimators that target Bayes estimators), we develop a novel approach to construct highly efficient estimators for censored peaks-over-threshold models by encoding censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference for spatial extremes. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators for inference of popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture processes. We also illustrate that it is possible to train a single estimator for a general censoring level, obviating the need to retrain when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.