Methodology (stat.ME)
Thu, 17 Aug 2023
1.Spectral information criterion for automatic elbow detection
Authors:L. Martino, R. San Millan-Castillo, E. Morgado
Abstract: We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC in several numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
2.Rethinking Hypothesis Tests
Authors:Rafael Izbicki, Luben M. C. Cabezas, Fernando A. B. Colugnatti, Rodrigo F. L. Lassance, Altay A. L. de Souza, Rafael B. Stern
Abstract: Null Hypothesis Significance Testing (NHST) have been a popular statistical tool across various scientific disciplines since the 1920s. However, the exclusive reliance on a p-value threshold of 0.05 has recently come under criticism; in particular, it is argued to have contributed significantly to the reproducibility crisis. We revisit some of the main issues associated with NHST and propose an alternative approach that is easy to implement and can address these concerns. Our proposed approach builds on equivalence tests and three-way decision procedures, which offer several advantages over the traditional NHST. We demonstrate the efficacy of our approach on real-world examples and show that it has many desirable properties.
3.Sparse reconstruction of ordinary differential equations with inference
Authors:Sara Venkatraman, Sumanta Basu, Martin T. Wells
Abstract: Sparse regression has emerged as a popular technique for learning dynamical systems from temporal data, beginning with the SINDy (Sparse Identification of Nonlinear Dynamics) framework proposed by arXiv:1509.03580. Quantifying the uncertainty inherent in differential equations learned from data remains an open problem, thus we propose leveraging recent advances in statistical inference for sparse regression to address this issue. Focusing on systems of ordinary differential equations (ODEs), SINDy assumes that each equation is a parsimonious linear combination of a few candidate functions, such as polynomials, and uses methods such as sequentially-thresholded least squares or the Lasso to identify a small subset of these functions that govern the system's dynamics. We instead employ bias-corrected versions of the Lasso and ridge regression estimators, as well as an empirical Bayes variable selection technique known as SEMMS, to estimate each ODE as a linear combination of terms that are statistically significant. We demonstrate through simulations that this approach allows us to recover the functional terms that correctly describe the dynamics more often than existing methods that do not account for uncertainty.