arXiv daily

Methodology (stat.ME)

Mon, 07 Aug 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.Nonparametric Bayes multiresolution testing for high-dimensional rare events

Authors:Jyotishka Datta, Sayantan Banerjee, David B. Dunson

Abstract: In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests for differential mutation rates across groups. The multiresolution approach yields fast and accurate detection of spatial clusters of rare variants, and our nonparametric Bayes framework provides great flexibility for modeling the intensities of rare variants. Some theoretical properties are also assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma mixture over multiple resolutions. Simulation studies illustrate excellent small sample properties relative to competitors, and we apply the method to detect rare variants related to common variable immunodeficiency from whole exome sequencing data on 215 patients and over 60,027 control subjects.

2.Not Linearly Correlated, But Dependent: A Family of Normal Mode Copulas

Authors:Kentaro Fukumoto

Abstract: When scholars study joint distributions of multiple variables, copulas are useful. However, if the variables are not linearly correlated with each other yet are still not independent, most of conventional copulas are not up to the task. Examples include (inversed) U-shaped relationships and heteroskedasticity. To fill this gap, this manuscript sheds new light on a little-known copula, which I call the "normal mode copula." I characterize the copula's properties and show that the copula is asymmetric and nonmonotonic under certain conditions. I also apply the copula to a dataset about U.S. House vote share and campaign expenditure to demonstrate that the normal mode copula has better performance than other conventional copulas.

3.Individual participant data from digital sources informed and improved precision in the evaluation of predictive biomarkers in Bayesian network meta-analysis

Authors:Chinyereugo M Umemneku-Chikere, Lorna Wheaton, Heather Poad, Devleena Ray, Ilse Cuevas Andrade, Sam Khan, Paul Tappenden, Keith R Abrams, Rhiannon K Owen, Sylwia Bujkiewicz

Abstract: Objective: We aimed to develop a meta-analytic model for evaluation of predictive biomarkers and targeted therapies, utilising data from digital sources when individual participant data (IPD) from randomised controlled trials (RCTs) are unavailable. Methods: A Bayesian network meta-regression model, combining aggregate data (AD) from RCTs and IPD, was developed for modelling time-to-event data to evaluate predictive biomarkers. IPD were sourced from electronic health records, using target trial emulation approach, or digitised Kaplan-Meier curves. The model is illustrated using two examples; breast cancer with a hormone receptor biomarker, and metastatic colorectal cancer with the Kirsten Rat Sarcoma (KRAS) biomarker. Results: The model developed allowed for estimation of treatment effects in two subgroups of patients defined by their biomarker status. Effectiveness of taxane did not differ in hormone receptor positive and negative breast cancer patients. Epidermal growth factor receptor (EGFR) inhibitors were more effective than chemotherapy in KRAS wild type colorectal cancer patients but not in patients with KRAS mutant status. Use of IPD reduced uncertainty of the sub-group specific treatment effect estimates by up to 49%. Conclusion: Utilisation of IPD allowed for more detailed evaluation of predictive biomarkers and cancer therapies and improved precision of the estimates compared to use of AD alone.

4.Measuring income inequality via percentile relativities

Authors:Vytaras Brazauskas, Francesca Greselin, Ricardas Zitikis

Abstract: "The rich are getting richer" implies that the population income distributions are getting more right skewed and heavily tailed. For such distributions, the mean is not the best measure of the center, but the classical indices of income inequality, including the celebrated Gini index, are all mean-based. In view of this, Professor Gastwirth sounded an alarm back in 2014 by suggesting to incorporate the median into the definition of the Gini index, although noted a few shortcomings of his proposed index. In the present paper we make a further step in the modification of classical indices and, to acknowledge the possibility of differing viewpoints, arrive at three median-based indices of inequality. They avoid the shortcomings of the previous indices and can be used even when populations are ultra heavily tailed, that is, when their first moments are infinite. The new indices are illustrated both analytically and numerically using parametric families of income distributions, and further illustrated using capital incomes coming from 2001 and 2018 surveys of fifteen European countries. We also discuss the performance of the indices from the perspective of income transfers.

5.Spatial wildfire risk modeling using mixtures of tree-based multivariate Pareto distributions

Authors:Daniela Cisneros, Arnab Hazra, Raphaël Huser

Abstract: Wildfires pose a severe threat to the ecosystem and economy, and risk assessment is typically based on fire danger indices such as the McArthur Forest Fire Danger Index (FFDI) used in Australia. Studying the joint tail dependence structure of high-resolution spatial FFDI data is thus crucial for estimating current and future extreme wildfire risk. However, existing likelihood-based inference approaches are computationally prohibitive in high dimensions due to the need to censor observations in the bulk of the distribution. To address this, we construct models for spatial FFDI extremes by leveraging the sparse conditional independence structure of H\"usler--Reiss-type generalized Pareto processes defined on trees. These models allow for a simplified likelihood function that is computationally efficient. Our framework involves a mixture of tree-based multivariate Pareto distributions with randomly generated tree structures, resulting in a flexible model that can capture nonstationary spatial dependence structures. We fit the model to summer FFDI data from different spatial clusters in Mainland Australia and 14 decadal windows between 1999--2022 to study local spatiotemporal variability with respect to the magnitude and extent of extreme wildfires. Our results demonstrate that our proposed method fits the margins and spatial tail dependence structure adequately, and is helpful to provide extreme wildfire risk measures.

6.Regulation-incorporated Gene Expression Network-based Heterogeneity Analysis

Authors:Rong Li, Qingzhao Zhang, Shuangge Ma

Abstract: Gene expression-based heterogeneity analysis has been extensively conducted. In recent studies, it has been shown that network-based analysis, which takes a system perspective and accommodates the interconnections among genes, can be more informative than that based on simpler statistics. Gene expressions are highly regulated. Incorporating regulations in analysis can better delineate the "sources" of gene expression effects. Although conditional network analysis can somewhat serve this purpose, it does render enough attention to the regulation relationships. In this article, significantly advancing from the existing heterogeneity analyses based only on gene expression networks, conditional gene expression network analyses, and regression-based heterogeneity analyses, we propose heterogeneity analysis based on gene expression networks (after accounting for or "removing" regulation effects) as well as regulations of gene expressions. A high-dimensional penalized fusion approach is proposed, which can determine the number of sample groups and parameter values in a single step. An effective computational algorithm is proposed. It is rigorously proved that the proposed approach enjoys the estimation, selection, and grouping consistency properties. Extensive simulations demonstrate its practical superiority over closely related alternatives. In the analysis of two breast cancer datasets, the proposed approach identifies heterogeneity and gene network structures different from the alternatives and with sound biological implications.