Methodology (stat.ME)
Mon, 12 Jun 2023
1.Revisiting Whittaker-Henderson Smoothing
Authors:Guillaume Biessy LPSM
Abstract: Introduced nearly a century ago, Whittaker-Henderson smoothing remains one of the most commonly used methods by actuaries for constructing one-dimensional and two-dimensional experience tables for mortality and other Life Insurance risks. This paper proposes to reframe this smoothing technique within a modern statistical framework and addresses six questions of practical interest regarding its use. Firstly, we adopt a Bayesian view of this smoothing method to build credible intervals. Next, we shed light on the choice of observation vectors and weights to which the smoothing should be applied by linking it to a maximum likelihood estimator introduced in the context of duration models. We then enhance the precision of the smoothing by relaxing an implicit asymptotic approximation on which it relies. Afterward, we select the smoothing parameters based on maximizing a marginal likelihood. We later improve numerical performance in the presence of a large number of observation points and, consequently, parameters. Finally, we extrapolate the results of the smoothing while preserving consistency between estimated and predicted values through the use of constraints.
2.On the closed-loop Volterra method for analyzing time series
Authors:Maryam Movahedifar, Thorsten Dickhaus
Abstract: The main focus of this paper is to approximate time series data based on the closed-loop Volterra series representation. Volterra series expansions are a valuable tool for representing, analyzing, and synthesizing nonlinear dynamical systems. However, a major limitation of this approach is that as the order of the expansion increases, the number of terms that need to be estimated grows exponentially, posing a considerable challenge. This paper considers a practical solution for estimating the closed-loop Volterra series in stationary nonlinear time series using the concepts of Reproducing Kernel Hilbert Spaces (RKHS) and polynomial kernels. We illustrate the applicability of the suggested Volterra representation by means of simulations and real data analysis. Furthermore, we apply the Kolmogorov-Smirnov Predictive Accuracy (KSPA) test, to determine whether there exists a statistically significant difference between the distribution of estimated errors for concurring time series models, and secondly to determine whether the estimated time series with the lower error based on some loss function also has exhibits a stochastically smaller error than estimated time series from a competing method. The obtained results indicate that the closed-loop Volterra method can outperform the ARFIMA, ETS, and Ridge regression methods in terms of both smaller error and increased interpretability.
3.Multivariate extensions of the Multilevel Best Linear Unbiased Estimator for ensemble-variational data assimilation
Authors:Mayeul Destouches, Paul Mycek, Selime Gürol
Abstract: Multilevel estimators aim at reducing the variance of Monte Carlo statistical estimators, by combining samples generated with simulators of different costs and accuracies. In particular, the recent work of Schaden and Ullmann (2020) on the multilevel best linear unbiased estimator (MLBLUE) introduces a framework unifying several multilevel and multifidelity techniques. The MLBLUE is reintroduced here using a variance minimization approach rather than the regression approach of Schaden and Ullmann. We then discuss possible extensions of the scalar MLBLUE to a multidimensional setting, i.e. from the expectation of scalar random variables to the expectation of random vectors. Several estimators of increasing complexity are proposed: a) multilevel estimators with scalar weights, b) with element-wise weights, c) with spectral weights and d) with general matrix weights. The computational cost of each method is discussed. We finally extend the MLBLUE to the estimation of second-order moments in the multidimensional case, i.e. to the estimation of covariance matrices. The multilevel estimators proposed are d) a multilevel estimator with scalar weights and e) with element-wise weights. In large-dimension applications such as data assimilation for geosciences, the latter estimator is computationnally unaffordable. As a remedy, we also propose f) a multilevel covariance matrix estimator with optimal multilevel localization, inspired by the optimal localization theory of M\'en\'etrier and Aulign\'e (2015). Some practical details on weighted MLMC estimators of covariance matrices are given in appendix.
4.Foundations of Causal Discovery on Groups of Variables
Authors:Jonas Wahl, Urmi Ninad, Jakob Runge
Abstract: Discovering causal relationships from observational data is a challenging task that relies on assumptions connecting statistical quantities to graphical or algebraic causal models. In this work, we focus on widely employed assumptions for causal discovery when objects of interest are (multivariate) groups of random variables rather than individual (univariate) random variables, as is the case in a variety of problems in scientific domains such as climate science or neuroscience. If the group-level causal models are derived from partitioning a micro-level model into groups, we explore the relationship between micro and group-level causal discovery assumptions. We investigate the conditions under which assumptions like Causal Faithfulness hold or fail to hold. Our analysis encompasses graphical causal models that contain cycles and bidirected edges. We also discuss grouped time series causal graphs and variants thereof as special cases of our general theoretical framework. Thereby, we aim to provide researchers with a solid theoretical foundation for the development and application of causal discovery methods for variable groups.
5.Improving Forecasts for Heterogeneous Time Series by "Averaging", with Application to Food Demand Forecast
Authors:Lukas Neubauer, Peter Filzmoser
Abstract: A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.
6.Ultra-efficient MCMC for Bayesian longitudinal functional data analysis
Authors:Thomas Y. Sun, Daniel R. Kowal
Abstract: Functional mixed models are widely useful for regression analysis with dependent functional data, including longitudinal functional data with scalar predictors. However, existing algorithms for Bayesian inference with these models only provide either scalable computing or accurate approximations to the posterior distribution, but not both. We introduce a new MCMC sampling strategy for highly efficient and fully Bayesian regression with longitudinal functional data. Using a novel blocking structure paired with an orthogonalized basis reparametrization, our algorithm jointly samples the fixed effects regression functions together with all subject- and replicate-specific random effects functions. Crucially, the joint sampler optimizes sampling efficiency for these key parameters while preserving computational scalability. Perhaps surprisingly, our new MCMC sampling algorithm even surpasses state-of-the-art algorithms for frequentist estimation and variational Bayes approximations for functional mixed models -- while also providing accurate posterior uncertainty quantification -- and is orders of magnitude faster than existing Gibbs samplers. Simulation studies show improved point estimation and interval coverage in nearly all simulation settings over competing approaches. We apply our method to a large physical activity dataset to study how various demographic and health factors associate with intraday activity.
7.Bayesian estimation of covariate assisted principal regression for brain functional connectivity
Authors:Hyung G. Park
Abstract: This paper presents a Bayesian reformulation of covariate-assisted principal (CAP) regression of Zhao et al. (2021), which aims to identify components in the covariance of response signal that are associated with covariates in a regression framework. We introduce a geometric formulation and reparameterization of individual covariance matrices in their tangent space. By mapping the covariance matrices to the tangent space, we leverage Euclidean geometry to perform posterior inference. This approach enables joint estimation of all parameters and uncertainty quantification within a unified framework, fusing dimension reduction for covariance matrices with regression model estimation. We validate the proposed method through simulation studies and apply it to analyze associations between covariates and brain functional connectivity, utilizing data from the Human Connectome Project.
8.Three-way Cross-Fitting and Pseudo-Outcome Regression for Estimation of Conditional Effects and other Linear Functionals
Authors:Aaron Fisher, Virginia Fisher
Abstract: We propose an approach to better inform treatment decisions at an individual level by adapting recent advances in average treatment effect estimation to conditional average treatment effect estimation. Our work is based on doubly robust estimation methods, which combine flexible machine learning tools to produce efficient effect estimates while relaxing parametric assumptions about the data generating process. Refinements to doubly robust methods have achieved faster convergence by incorporating 3-way cross-fitting, which entails dividing the sample into three partitions, using the first to estimate the conditional probability of treatment, the second to estimate the conditional expectation of the outcome, and the third to perform a first order bias correction step. Here, we combine the approaches of 3-way cross-fitting and pseudo-outcome regression to produce personalized effect estimates. We show that this approach yields fast convergence rates under a smoothness condition on the conditional expectation of the outcome.
9.Nonparametric empirical Bayes biomarker imputation and estimation
Authors:Alton Barbehenn, Sihai Dave Zhao
Abstract: Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes $g$-modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and demonstrate the utility of the estimated biomarker measurements for down-stream analysis.