Methodology (stat.ME)
Thu, 15 Jun 2023
1.Spatial modeling of extremes and an angular component
Authors:Gaspard Tamagny, Mathieu Ribatet
Abstract: Many environmental processes such as rainfall, wind or snowfall are inherently spatial and the modeling of extremes has to take into account that feature. In addition, environmental extremes are often attached with an angle, e.g., wind gusts and direction or extreme snowfall and time of occurrence. This article proposes a Bayesian hierarchical model with a conditional independence assumption that aims at modeling simultaneously spatial extremes and angles. The proposed model relies on the extreme value theory as well a recent development for handling directional statistics over a continuous domain. Starting with sketches of the necessary elements of extreme value theory and directional statistics, the model is motivated. Working within a Bayesian setting, a Gibbs sampler is introduced and whose performances are analyzed through a simulation study. The paper ends with an application on extreme snowfalls in the French Alps. Results show that, the most severe events tend to occur later in the snowfall season for high elevation regions than for lower altitudes.
2.Bootstrap aggregation and confidence measures to improve time series causal discovery
Authors:Kevin Debeire DLR, Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany DLR, Institut für Datenwissenschaften, Jena, Germany, Jakob Runge DLR, Institut für Datenwissenschaften, Jena, Germany Technische Universität Berlin, Faculty of Computer Science, Berlin, Germany, Andreas Gerhardus DLR, Institut für Datenwissenschaften, Jena, Germany, Veronika Eyring DLR, Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany University of Bremen, Institute of Environmental Physics, Bremen, Germany
Abstract: Causal discovery methods have demonstrated the ability to identify the time series graphs representing the causal temporal dependency structure of dynamical systems. However, they do not include a measure of the confidence of the estimated links. Here, we introduce a novel bootstrap aggregation (bagging) and confidence measure method that is combined with time series causal discovery. This new method allows measuring confidence for the links of the time series graphs calculated by causal discovery methods. This is done by bootstrapping the original times series data set while preserving temporal dependencies. Next to confidence measures, aggregating the bootstrapped graphs by majority voting yields a final aggregated output graph. In this work, we combine our approach with the state-of-the-art conditional-independence-based algorithm PCMCI+. With extensive numerical experiments we empirically demonstrate that, in addition to providing confidence measures for links, Bagged-PCMCI+ improves the precision and recall of its base algorithm PCMCI+. Specifically, Bagged-PCMCI+ has a higher detection power regarding adjacencies and a higher precision in orienting contemporaneous edges while at the same time showing a lower rate of false positives. These performance improvements are especially pronounced in the more challenging settings (short time sample size, large number of variables, high autocorrelation). Our bootstrap approach can also be combined with other time series causal discovery algorithms and can be of considerable use in many real-world applications, especially when confidence measures for the links are desired.
3.Ranking and Selection in Large-Scale Inference of Heteroscedastic Units
Authors:Bowen Gang, Luella Fu, Gareth James, Wenguang Sun
Abstract: The allocation of limited resources to a large number of potential candidates presents a pervasive challenge. In the context of ranking and selecting top candidates from heteroscedastic units, conventional methods often result in over-representations of subpopulations, and this issue is further exacerbated in large-scale settings where thousands of candidates are considered simultaneously. To address this challenge, we propose a new multiple comparison framework that incorporates a modified power notion to prioritize the selection of important effects and employs a novel ranking metric to assess the relative importance of units. We develop both oracle and data-driven algorithms, and demonstrate their effectiveness in controlling the error rates and achieving optimality. We evaluate the numerical performance of our proposed method using simulated and real data. The results show that our framework enables a more balanced selection of effects that are both statistically significant and practically important, and results in an objective and relevant ranking scheme that is well-suited to practical scenarios.
4.Estimating the Sampling Distribution of Test-Statistics in Bayesian Clinical Trials
Authors:Shirin Golchi, James Willard
Abstract: Bayesian inference and the use of posterior or posterior predictive probabilities for decision making have become increasingly popular in clinical trials. The current approach toward Bayesian clinical trials is, however, a hybrid Bayesian-frequentist approach where the design and decision criteria are assessed with respect to frequentist operating characteristics such as power and type I error rate. These operating characteristics are commonly obtained via simulation studies. In this article we propose methodology to utilize large sample theory of the posterior distribution to define simple parametric models for the sampling distribution of the Bayesian test statistics, i.e., posterior tail probabilities. The parameters of these models are then estimated using a small number of simulation scenarios, thereby refining these models to capture the sampling distribution for small to moderate sample size. The proposed approach toward assessment of operating characteristics and sample size determination can be considered as simulation-assisted rather than simulation-based and significantly reduces the computational burden for design of Bayesian trials.
5.Orthogonal Extended Infomax Algorithm
Authors:Nicole Ille
Abstract: The extended infomax algorithm for independent component analysis (ICA) can separate sub- and super-Gaussian signals but converges slowly as it uses stochastic gradient optimization. In this paper, an improved extended infomax algorithm is presented that converges much faster. Accelerated convergence is achieved by replacing the natural gradient learning rule of extended infomax by a fully-multiplicative orthogonal-group based update scheme of the unmixing matrix leading to an orthogonal extended infomax algorithm (OgExtInf). Computational performance of OgExtInf is compared with two fast ICA algorithms: the popular FastICA and Picard, a L-BFGS algorithm belonging to the family of quasi-Newton methods. Our results demonstrate superior performance of the proposed method on small-size EEG data sets as used for example in online EEG processing systems, such as brain-computer interfaces or clinical systems for spike and seizure detection.