Methodology (stat.ME)
Wed, 07 Jun 2023
1.Transfer Learning for General M-estimators with Decomposable Regularizers in High-dimensions
Authors:Zeyu Li, Dong Liu, Yong He, Xinsheng Zhang
Abstract: To incorporate useful information from related statistical tasks into the target one, we propose a two-step transfer learning algorithm in the general M-estimators framework with decomposable regularizers in high-dimensions. When the informative sources are known in the oracle sense, in the first step, we acquire knowledge of the target parameter by pooling the useful source datasets and the target one. In the second step, the primal estimator is fine-tuned using the target dataset. In contrast to the existing literatures which exert homogeneity conditions for Hessian matrices of various population loss functions, our theoretical analysis shows that even if the Hessian matrices are heterogeneous, the pooling estimators still provide adequate information by slightly enlarging regularization, and numerical studies further validate the assertion. Sparse regression and low-rank trace regression for both linear and generalized linear cases are discussed as two specific examples under the M-estimators framework. When the informative source datasets are unknown, a novel truncated-penalized algorithm is proposed to acquire the primal estimator and its oracle property is proved. Extensive numerical experiments are conducted to support the theoretical arguments.
2.funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores
Authors:Jacopo Di Iorio, Marzia A. Cremona, Francesca Chiaromonte
Abstract: Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical "shapes" or "patterns" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose funBIalign for their discovery and evaluation. Inspired by clustering and biclustering techniques, funBIalign is a multi-step procedure which uses agglomerative hierarchical clustering with complete linkage and a functional distance based on mean squared residue scores to discover functional motifs, both in a single curve (e.g., time series) and in a set of curves. We assess its performance and compare it to other recent methods through extensive simulations. Moreover, we use funBIalign for discovering motifs in two real-data case studies; one on food price inflation and one on temperature changes.
3.Inferring unknown unknowns: Regularized bias-aware ensemble Kalman filter
Authors:Andrea Nóvoa, Alberto Racca, Luca Magri
Abstract: Because of physical assumptions and numerical approximations, reduced-order models are affected by uncertainties in the state and parameters, and by model biases. Model biases, also known as model errors or systematic errors, are difficult to infer because they are 'unknown unknowns', i.e., we do not necessarily know their functional form a priori. With biased models, data assimilation methods may be ill-posed because they are either (i) 'bias-unaware', i.e. the estimators are assumed unbiased, or (ii) they rely on an a priori parametric model for the bias, or (iii) they can infer model biases that are not unique for the same model and data. First, we design a data assimilation framework to perform combined state, parameter, and bias estimation. Second, we propose a mathematical solution with a sequential method, i.e., the regularized bias-aware Kalman Filter (r-EnKF). The method requires a model of the bias and its gradient (i.e., the Jacobian). Third, we propose an echo state network as the model bias estimator. We derive the Jacobian of the network, and design a robust training strategy with data augmentation to accurately infer the bias in different scenarios. Fourth, we apply the r-EnKF to nonlinearly coupled oscillators (with and without time-delay) affected by different forms of bias. The r-EnKF infers in real-time parameters and states, and a unique bias. The applications that we showcase are relevant to acoustics, thermoacoustics, and vibrations; however, the r-EnKF opens new opportunities for combined state, parameter and bias estimation for real-time prediction and control in nonlinear systems.
4.Evaluating the impact of outcome delay on the efficiency of two-arm group-sequential trials
Authors:Aritra Mukherjee, Michael J. Grayling, James M. S. Wason
Abstract: Adaptive designs(AD) are a broad class of trial designs that allow preplanned modifications based on patient data providing improved efficiency and flexibility. However, a delay in observing the primary outcome variable can harm this added efficiency. In this paper, we aim to ascertain the size of such outcome delay that results in the realised efficiency gains of ADs becoming negligible compared to classical fixed sample RCTs. We measure the impact of delay by developing formulae for the no. of overruns in 2 arm GSDs with normal data, assuming different recruitment models. The efficiency of a GSD is usually measured in terms of the expected sample size (ESS), with GSDs generally reducing the ESS compared to a standard RCT. Our formulae measures the efficiency gain from a GSD in terms of ESS reduction that is lost due to delay. We assess whether careful choice of design (e.g., altering the spacing of the IAs) can help recover the benefits of GSDs in presence of delay. We also analyse the efficiency of GSDs with respect to time to complete the trial. Comparing the expected efficiency gains, with and without consideration of delay, it is evident GSDs suffer considerable losses due to delay. Even a small delay can have a significant impact on the trial's efficiency. In contrast, even in the presence of substantial delay, a GSD will have a smaller expected time to trial completion in comparison to a simple RCT. Although the no. of stages have little influence on the efficiency losses, the timing of IAs can impact the efficiency of a GSDs with delay. Particularly, for unequally spaced IAs, pushing IAs towards latter end of the trial can be harmful for the design with delay.
5.Tree models for assessing covariate-dependent method agreement
Authors:Siranush Karapetyan, Achim Zeileis, André Henriksen, Alexander Hapfelmeier
Abstract: Method comparison studies explore the agreement of measurements made by two or more methods. Commonly, agreement is evaluated by the well-established Bland-Altman analysis. However, the underlying assumption is that differences between measurements are identically distributed for all observational units and in all application settings. We introduce the concept of conditional method agreement and propose a respective modeling approach to alleviate this constraint. Therefore, the Bland-Altman analysis is embedded in the framework of recursive partitioning to explicitly define subgroups with heterogeneous agreement in dependence of covariates in an exploratory analysis. Three different modeling approaches, conditional inference trees with an appropriate transformation of the modeled differences (CTreeTrafo), distributional regression trees (DistTree), and model-based trees (MOB) are considered. The performance of these models is evaluated in terms of type-I error probability and power in several simulation studies. Further, the adjusted rand index (ARI) is used to quantify the models' ability to uncover given subgroups. An application example to real data of accelerometer device measurements is used to demonstrate the applicability. Additionally, a two-sample Bland-Altman test is proposed for exploratory or confirmatory hypothesis testing of differences in agreement between subgroups. Results indicate that all models were able to detect given subgroups with high accuracy as the sample size increased. Relevant covariates that may affect agreement could be detected in the application to accelerometer data. We conclude that conditional method agreement trees (COAT) enable the exploratory analysis of method agreement in dependence of covariates and the respective exploratory or confirmatory hypothesis testing of group differences. It is made publicly available through the R package coat.