Methodology (stat.ME)
Mon, 28 Aug 2023
1.Categorical data analysis using discretization of continuous variables to investigate associations in marine ecosystems
Authors:H. Solvang, S. Imori, M. Biuw, U. Lindstrøm, T. Haug
Abstract: Understanding and predicting interactions between predators and prey and their environment are fundamental for understanding food web structure, dynamics, and ecosystem function in both terrestrial and marine ecosystems.Thus, estimating the conditional associations between species and their environments is important for exploring connections or cooperative links in the ecosystem, which in turn can help to clarify such causal relationships. For this purpose, a relevant and practical statistical method is required to link presence/absence observations with biomass, abundance, and physical quantities obtained as continuous real values.These data are sometimes sparse in oceanic space and too short as time series data. To meet this challenge, we provide an approach based on applying categorical data analysis to present/absent observations and real-number data.This approach consists of a two-step procedure for categorical data analysis:1) finding the appropriate threshold to discretize the real-number data for applying an independent test;and 2) identifying the best conditional probability model to investigate the possible associations among the data based on a statistical information criterion.We conduct a simulation study to validate our proposed approach. Furthermore, the approach is applied to two datasets: 1) one collected during an international synoptic krill survey in the Scotia Sea west of the Antarctic Peninsula to investigate associations among krill, fin whale (Balaenoptera physalus),surface temperature, depth, slope in depth, and temperature gradient; 2) the other collected by ecosystem surveys conducted during August-September in 2014 - 2017 to investigate associations among common minke whales, the predatory fish Atlantic cod, and their main prey groups in Arctic Ocean waters to the west and north of Svalbard, Norway.
2.Temporal clustering of extreme events in sequences of dependent observations separated by heavy-tailed waiting times
Authors:Christina Meschede, Katharina Hees, Roland Fried
Abstract: The occurrence of extreme events like heavy precipitation or storms at a certain location often shows a clustering behaviour and is thus not described well by a Poisson process. We construct a general model for the inter-exceedance times in between such events which combines different candidate models for such behaviour. This allows us to distinguish data generating mechanisms leading to clusters of dependent events with exponential inter-exceedance times in between clusters from independent events with heavy-tailed inter-exceedance times, and even allows us to combine these two mechanisms for better descriptions of such occurrences. We investigate a modification of the Cram\'er-von Mises distance for the purpose of model fitting. An application to mid-latitude winter cyclones illustrates the usefulness of our work.
3.A generalized Bayesian stochastic block model for microbiome community detection
Authors:Kevin C. Lutz, Michael L. Neugent, Tejasv Bedi, Nicole J. De Nisco, Qiwei Li
Abstract: Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the microbiome study. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high-dimensional and compositional, suffering from uneven sampling depth, over-dispersion, and zero-inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To this end, we propose a Bayesian stochastic block model to study the microbiome co-occurrence network based on the recently developed modified centered-log ratio transformation tailored for microbiome data analysis. Our model allows us to incorporate taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non-informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women, the first time the urinary microbiome co-occurrence network structure has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies.
4.Exploring the likelihood surface in multivariate Gaussian mixtures using Hamiltonian Monte Carlo
Authors:Francesca Azzolini, Hans Skaug
Abstract: Multimodality of the likelihood in Gaussian mixtures is a well-known problem. The choice of the initial parameter vector for the numerical optimizer may affect whether the optimizer finds the global maximum, or gets trapped in a local maximum of the likelihood. We propose to use Hamiltonian Monte Carlo (HMC) to explore the part of the parameter space which has a high likelihood. Each sampled parameter vector is used as the initial value for quasi-Newton optimizer, and the resulting sample of (maximum) likelihood values is used to determine if the likelihood is multimodal. We use a single simulated data set from a three component bivariate mixture to develop and test the method. We use state-of-the-art HCM software, but experience difficulties when trying to directly apply HMC to the full model with 15 parameters. To improve the mixing of the Markov Chain we explore various tricks, and conclude that for the dataset at hand we have found the global maximum likelihood estimate.