Methodology (stat.ME)
Tue, 13 Jun 2023
1.A gamma tail statistic and its asymptotics
Authors:Toshiya Iwashita, Bernhard Klar
Abstract: Asmussen and Lehtomaa [Distinguishing log-concavity from heavy tails. Risks 5(10), 2017] introduced an interesting function $g$ which is able to distinguish between log-convex and log-concave tail behaviour of distributions, and proposed a randomized estimator for $g$. In this paper, we show that $g$ can also be seen as a tool to detect gamma distributions or distributions with gamma tail. We construct a more efficient estimator $\hat{g}_n$ based on $U$-statistics, propose several estimators of the (asymptotic) variance of $\hat{g}_n$, and study their performance by simulations. Finally, the methods are applied to several real data sets.
2.On the term "randomization test"
Authors:Jesse Hemerik
Abstract: There exists no consensus on the meaning of the term "randomization test". Contradicting uses of the term are leading to confusion, misunderstandings and indeed invalid data analyses. As we point out, a main source of the confusion is that the term was not explicitly defined when it was first used in the 1930's. Later authors made clear proposals to reach a consensus regarding the term. This resulted in some level of agreement around the 1970's. However, in the last few decades, the term has often been used in ways that contradict these proposals. This paper provides an overview of the history of the term per se, for the first time tracing it back to 1937. This will hopefully lead to more agreement on terminology and less confusion on the related fundamental concepts.
3.An Approach to Nonparametric Inference on the Causal Dose Response Function
Authors:Aaron Hudson, Elvin H. Geng, Thomas A. Odeny, Elizabeth A. Bukusi, Maya L. Petersen, Mark J. van der Laan
Abstract: The causal dose response curve is commonly selected as the statistical parameter of interest in studies where the goal is to understand the effect of a continuous exposure on an outcome.Most of the available methodology for statistical inference on the dose-response function in the continuous exposure setting requires strong parametric assumptions on the probability distribution. Such parametric assumptions are typically untenable in practice and lead to invalid inference. It is often preferable to instead use nonparametric methods for inference, which only make mild assumptions about the data-generating mechanism. We propose a nonparametric test of the null hypothesis that the dose-response function is equal to a constant function. We argue that when the null hypothesis holds, the dose-response function has zero variance. Thus, one can test the null hypothesis by assessing whether there is sufficient evidence to claim that the variance is positive. We construct a novel estimator for the variance of the dose-response function, for which we can fully characterize the null limiting distribution and thus perform well-calibrated tests of the null hypothesis. We also present an approach for constructing simultaneous confidence bands for the dose-response function by inverting our proposed hypothesis test. We assess the validity of our proposal in a simulation study. In a data example, we study, in a population of patients who have initiated treatment for HIV, how the distance required to travel to an HIV clinic affects retention in care.
4.Local inference for functional data on manifold domains using permutation tests
Authors:Niels Lundtorp Olsen, Alessia Pini, Simone Vantini
Abstract: Pini and Vantini (2017) introduced the interval-wise testing procedure which performs local inference for functional data defined on an interval domain, where the output is an adjusted p-value function that controls for type I errors. We extend this idea to a general setting where domain is a Riemannian manifolds. This requires new methodology such as how to define adjustment sets on product manifolds and how to approximate the test statistic when the domain has non-zero curvature. We propose to use permutation tests for inference and apply the procedure in three settings: a simulation on a "chameleon-shaped" manifold and two applications related to climate change where the manifolds are a complex subset of $S^2$ and $S^2 \times S^1$, respectively. We note the tradeoff between type I and type II errors: increasing the adjustment set reduces the type I error but also results in smaller areas of significance. However, some areas still remain significant even at maximal adjustment.
5.Simulation-Based Frequentist Inference with Tractable and Intractable Likelihoods
Authors:Ali Al Kadhim, Harrison B. Prosper, Olivia F. Prosper
Abstract: High-fidelity simulators that connect theoretical models with observations are indispensable tools in many sciences. When coupled with machine learning, a simulator makes it possible to infer the parameters of a theoretical model directly from real and simulated observations without explicit use of the likelihood function. This is of particular interest when the latter is intractable. We introduce a simple modification of the recently proposed likelihood-free frequentist inference (LF2I) approach that has some computational advantages. The utility of our algorithm is illustrated by applying it to three pedagogically interesting examples: the first is from cosmology, the second from high-energy physics and astronomy, both with tractable likelihoods, while the third, with an intractable likelihood, is from epidemiology.
6.Regionalization approaches for the spatial analysis of extremal dependence
Authors:Justus Contzen, Thorsten Dickhaus, Gerrit Lohmann
Abstract: The impact of an extreme climate event depends strongly on its geographical scale. Max-stable processes can be used for the statistical investigation of climate extremes and their spatial dependencies on a continuous area. Most existing parametric models of max-stable processes assume spatial stationarity and are therefore not suitable for the application to data that cover a large and heterogeneous area. For this reason, it has recently been proposed to use a clustering algorithm to divide the area of investigation into smaller regions and to fit parametric max-stable processes to the data within those regions. We investigate this clustering algorithm further and point out that there are cases in which it results in regions on which spatial stationarity is not a reasonable assumption. We propose an alternative clustering algorithm and demonstrate in a simulation study that it can lead to improved results.
7.Stochastic differential equation for modelling health related quality of life
Authors:Ralph Brinks
Abstract: In this work we propose a stochastic differential equation (SDE) for modelling health related quality of life (HRQoL) over a lifespan. HRQoL is assumed to be bounded between 0 and 1, equivalent to death and perfect health, respectively. Drift and diffusion parameters of the SDE are chosen to mimic decreasing HRQoL over life and ensuring epidemiological meaningfulness. The Euler-Maruyama method is used to simulate trajectories of individuals in a population of n = 1000 people. Age of death of an individual is simulated as a stopping time with Weibull distribution conditioning the current value of HRQoL as time-varying covariate. The life expectancy and health adjusted life years are compared to the corresponding values for German women.
8.Topological Data Analysis for Directed Dependence Networks of Multivariate Time Series Data
Authors:Anass B. El-Yaagoubi, Hernando Ombao
Abstract: Topological data analysis (TDA) approaches are becoming increasingly popular for studying the dependence patterns in multivariate time series data. In particular, various dependence patterns in brain networks may be linked to specific tasks and cognitive processes, which can be altered by various neurological impairments such as epileptic seizures. Existing TDA approaches rely on the notion of distance between data points that is symmetric by definition for building graph filtrations. For brain dependence networks, this is a major limitation that constrains practitioners to using only symmetric dependence measures, such as correlations or coherence. However, it is known that the brain dependence network may be very complex and can contain a directed flow of information from one brain region to another. Such dependence networks are usually captured by more advanced measures of dependence such as partial directed coherence, which is a Granger causality based dependence measure. These dependence measures will result in a non-symmetric distance function, especially during epileptic seizures. In this paper we propose to solve this limitation by decomposing the weighted connectivity network into its symmetric and anti-symmetric components using matrix decomposition and comparing the anti-symmetric component prior to and post seizure. Our analysis of epileptic seizure EEG data shows promising results.