Methodology (stat.ME)
Tue, 25 Jul 2023
1.Minimum regularized covariance trace estimator and outlier detection for functional data
Authors:Jeremy Oguamalam, Una Radojičić, Peter Filzmoser
Abstract: In this paper, we propose the Minimum Regularized Covariance Trace (MRCT) estimator, a novel method for robust covariance estimation and functional outlier detection. The MRCT estimator employs a subset-based approach that prioritizes subsets exhibiting greater centrality based on the generalization of the Mahalanobis distance, resulting in a fast-MCD type algorithm. Notably, the MRCT estimator handles high-dimensional data sets without the need for preprocessing or dimension reduction techniques, due to the internal smoothening whose amount is determined by the regularization parameter $\alpha > 0$. The selection of the regularization parameter $\alpha$ is automated. The proposed method adapts seamlessly to sparsely observed data by working directly with the finite matrix of basis coefficients. An extensive simulation study demonstrates the efficacy of the MRCT estimator in terms of robust covariance estimation and automated outlier detection, emphasizing the balance between noise exclusion and signal preservation achieved through appropriate selection of $\alpha$. The method converges fast in practice and performs favorably when compared to other functional outlier detection methods.
2.A unified class of null proportion estimators with plug-in FDR control
Authors:Sebastian Döhler, Iqraa Meah
Abstract: Since the work of \cite{Storey2004}, it is well-known that the performance of the Benjamini-Hochberg (BH) procedure can be improved by incorporating estimators of the number (or proportion) of null hypotheses, yielding an adaptive BH procedure which still controls FDR. Several such plug-in estimators have been proposed since then, for some of these, like Storey's estimator, plug-in FDR control has been established, while for some others, e.g. the estimator of \cite{PC2006}, some gaps remain to be closed. In this work we introduce a unified class of estimators, which encompasses existing and new estimators and unifies proofs of plug-in FDR control using simple convex ordering arguments. We also show that any convex combination of such estimators once more yields estimators with guaranteed plug-in FDR control. Additionally, the flexibility of the new class of estimators also allows incorporating distributional informations on the $p$-values. We illustrate this for the case of discrete tests, where the null distributions of the $p$-values are typically known. In that setting, we describe two generic approaches for adapting any estimator from the general class to the discrete setting while guaranteeing plug-in FDR control. While the focus of this paper is on presenting the generality and flexibility of the new class of estimators, we also include some analyses on simulated and real data.
3.A flexible class of priors for conducting posterior inference on structured orthonormal matrices
Authors:Joshua S. North, Mark D. Risser, F. Jay Breidt
Abstract: The big data era of science and technology motivates statistical modeling of matrix-valued data using a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction for data compression and storage. Low-rank representations such as singular value decomposition factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for explicit structure beyond what is implicitly specified via data correlation. We propose a flexible prior distribution for orthonormal matrices that can explicitly model structure in the basis functions. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. To contextualize the proposed prior and model, we discuss how the prior specification can be used for various scenarios and relate the model to its deterministic counterpart. We demonstrate favorable model properties through synthetic data examples and apply our method to sea surface temperature data from the northern Pacific, enhancing our understanding of the ocean's internal variability.