Methodology (stat.ME)
Fri, 12 May 2023
1.A comparison between Bayesian and ordinary kriging based on validation criteria: application to radiological characterisation
Authors:Martin Wieskotten LMA, ISEC, Marielle Crozet ISEC, CETAMA, Bertrand Iooss EDF R&D PRISME, GdR MASCOT-NUM, IMT, Céline Lacaux LMA, Amandine Marrel IMT, IRESNE
Abstract: In decommissioning projects of nuclear facilities, the radiological characterisation step aims to estimate the quantity and spatial distribution of different radionuclides. To carry out the estimation, measurements are performed on site to obtain preliminary information. The usual industrial practice consists in applying spatial interpolation tools (as the ordinary kriging method) on these data to predict the value of interest for the contamination (radionuclide concentration, radioactivity, etc.) at unobserved positions. This paper questions the ordinary kriging tool on the well-known problem of the overoptimistic prediction variances due to not taking into account uncertainties on the estimation of the kriging parameters (variance and range). To overcome this issue, the practical use of the Bayesian kriging method, where the model parameters are considered as random variables, is deepened. The usefulness of Bayesian kriging, whilst comparing its performance to that of ordinary kriging, is demonstrated in the small data context (which is often the case in decommissioning projects). This result is obtained via several numerical tests on different toy models, and using complementary validation criteria: the predictivity coefficient (Q${}^2$), the Predictive Variance Adequacy (PVA), the $\alpha$-Confidence Interval plot (and its associated Mean Squared Error alpha (MSEalpha)), and the Predictive Interval Adequacy (PIA). The latter is a new criterion adapted to the Bayesian kriging results. Finally, the same comparison is performed on a real dataset coming from the decommissioning project of the CEA Marcoule G3 reactor. It illustrates the practical interest of Bayesian kriging in industrial radiological characterisation.
2.Robust score matching for compositional data
Authors:Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood
Abstract: The restricted polynomially-tilted pairwise interaction (RPPI) distribution gives a flexible model for compositional data. It is particularly well-suited to situations where some of the marginal distributions of the components of a composition are concentrated near zero, possibly with right skewness. This article develops a method of tractable robust estimation for the model by combining two ideas. The first idea is to use score matching estimation after an additive log-ratio transformation. The resulting estimator is automatically insensitive to zeros in the data compositions. The second idea is to incorporate suitable weights in the estimating equations. The resulting estimator is additionally resistant to outliers. These properties are confirmed in simulation studies where we further also demonstrate that our new outlier-robust estimator is efficient in high concentration settings, even in the case when there is no model contamination. An example is given using microbiome data. A user-friendly R package accompanies the article.
3.Distribution free MMD tests for model selection with estimated parameters
Authors:Florian Brück, Jean-David Fermanian, Aleksey Min
Abstract: Several kernel based testing procedures are proposed to solve the problem of model selection in the presence of parameter estimation in a family of candidate models. Extending the two sample test of Gretton et al. (2006), we first provide a way of testing whether some data is drawn from a given parametric model (model specification). Second, we provide a test statistic to decide whether two parametric models are equally valid to describe some data (model comparison), in the spirit of Vuong (1989). All our tests are asymptotically standard normal under the null, even when the true underlying distribution belongs to the competing parametric families.Some simulations illustrate the performance of our tests in terms of power and level.
4.Robustness of Bayesian ordinal response model against outliers via divergence approach
Authors:Tomotaka Momozaki, Tomoyuki Nakagawa
Abstract: Ordinal response model is a popular and commonly used regression for ordered categorical data in a wide range of fields such as medicine and social sciences. However, it is empirically known that the existence of ``outliers'', combinations of the ordered categorical response and covariates that are heterogeneous compared to other pairs, makes the inference with the ordinal response model unreliable. In this article, we prove that the posterior distribution in the ordinal response model does not satisfy the posterior robustness with any link functions, i.e., the posterior cannot ignore the influence of large outliers. Furthermore, to achieve robust Bayesian inference in the ordinal response model, this article defines general posteriors in the ordinal response model with two robust divergences (the density-power and $\gamma$-divergences) based on the framework of the general posterior inference. We also provide an algorithm for generating posterior samples from the proposed posteriors. The robustness of the proposed methods against outliers is clarified from the posterior robustness and the index of robustness based on the Fisher-Rao metric. Through numerical experiments on artificial data and two real datasets, we show that the proposed methods perform better than the ordinary bayesian methods with and without outliers in the data for various link functions.
5.An Application of the Causal Roadmap in Two Safety Monitoring Case Studies: Covariate-Adjustment and Outcome Prediction using Electronic Health Record Data
Authors:Brian D Williamson, Richard Wyss, Elizabeth A Stuart, Lauren E Dang, Andrew N Mertens, Andrew Wilson, Susan Gruber
Abstract: Real-world data, such as administrative claims and electronic health records, are increasingly used for safety monitoring and to help guide regulatory decision-making. In these settings, it is important to document analytic decisions transparently and objectively to ensure that analyses meet their intended goals. The Causal Roadmap is an established framework that can guide and document analytic decisions through each step of the analytic pipeline, which will help investigators generate high-quality real-world evidence. In this paper, we illustrate the utility of the Causal Roadmap using two case studies previously led by workgroups sponsored by the Sentinel Initiative -- a program for actively monitoring the safety of regulated medical products. Each case example focuses on different aspects of the analytic pipeline for drug safety monitoring. The first case study shows how the Causal Roadmap encourages transparency, reproducibility, and objective decision-making for causal analyses. The second case study highlights how this framework can guide analytic decisions beyond inference on causal parameters, improving outcome ascertainment in clinical phenotyping. These examples provide a structured framework for implementing the Causal Roadmap in safety surveillance and guide transparent, reproducible, and objective analysis.
6.Nonparametric data segmentation in multivariate time series via joint characteristic functions
Authors:Euan T. McGonigle, Haeran Cho
Abstract: Modern time series data often exhibit complex dependence and structural changes which are not easily characterised by shifts in the mean or model parameters. We propose a nonparametric data segmentation methodology for multivariate time series termed NP-MOJO. By considering joint characteristic functions between the time series and its lagged values, NP-MOJO is able to detect change points in the marginal distribution, but also those in possibly non-linear serial dependence, all without the need to pre-specify the type of changes. We show the theoretical consistency of NP-MOJO in estimating the total number and the locations of the change points, and demonstrate the good performance of NP-MOJO against a variety of change point scenarios. We further demonstrate its usefulness in applications to seismology and economic time series.
7.Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable
Authors:Gabriela Ciuperca
Abstract: We consider a linear model which can have a large number of explanatory variables, the errors with an asymmetric distribution or some values of the explained variable are missing at random. In order to take in account these several situations, we consider the non parametric empirical likelihood (EL) estimation method. Because a constraint in EL contains an indicator function then a smoothed function instead of the indicator will be considered. Two smoothed expectile maximum EL methods are proposed, one of which will automatically select the explanatory variables. For each of the methods we obtain the convergence rate of the estimators and their asymptotic normality. The smoothed expectile empirical log-likelihood ratio process follow asymptotically a chi-square distribution and moreover the adaptive LASSO smoothed expectile maximum EL estimator satisfies the sparsity property which guarantees the automatic selection of zero model coefficients. In order to implement these methods, we propose four algorithms.