Methodology (stat.ME)
Wed, 09 Aug 2023
1.Linear shrinkage of sample covariance matrix or matrices under elliptical distributions: a review
Authors:Esa Ollila
Abstract: This chapter reviews methods for linear shrinkage of the sample covariance matrix (SCM) and matrices (SCM-s) under elliptical distributions in single and multiple populations settings, respectively. In the single sample setting a popular linear shrinkage estimator is defined as a linear combination of the sample covariance matrix (SCM) with a scaled identity matrix. The optimal shrinkage coefficients minimizing the mean squared error (MSE) under elliptical sampling are shown to be functions of few key parameters only, such as elliptical kurtosis and sphericity parameter. Similar results and estimators are derived for multiple population setting and applications of the studied shrinkage estimators are illustrated in portfolio optimization.
2.Repelled point processes with application to numerical integration
Authors:Diala Hawat, Rémi Bardenet, Raphaël Lachièze-Rey
Abstract: Linear statistics of point processes yield Monte Carlo estimators of integrals. While the simplest approach relies on a homogeneous Poisson point process, more regularly spread point processes, such as scrambled low-discrepancy sequences or determinantal point processes, can yield Monte Carlo estimators with fast-decaying mean square error. Following the intuition that more regular configurations result in lower integration error, we introduce the repulsion operator, which reduces clustering by slightly pushing the points of a configuration away from each other. Our main theoretical result is that applying the repulsion operator to a homogeneous Poisson point process yields an unbiased Monte Carlo estimator with lower variance than under the original point process. On the computational side, the evaluation of our estimator is only quadratic in the number of integrand evaluations and can be easily parallelized without any communication across tasks. We illustrate our variance reduction result with numerical experiments and compare it to popular Monte Carlo methods. Finally, we numerically investigate a few open questions on the repulsion operator. In particular, the experiments suggest that the variance reduction also holds when the operator is applied to other motion-invariant point processes.
3.Stein Variational Rare Event Simulation
Authors:Max Ehre, Iason Papaioannou, Daniel Straub
Abstract: Rare event simulation and rare event probability estimation are important tasks within the analysis of systems subject to uncertainty and randomness. Simultaneously, accurately estimating rare event probabilities is an inherently difficult task that calls for dedicated tools and methods. One way to improve estimation efficiency on difficult rare event estimation problems is to leverage gradients of the computational model representing the system in consideration, e.g., to explore the rare event faster and more reliably. We present a novel approach for estimating rare event probabilities using such model gradients by drawing on a technique to generate samples from non-normalized posterior distributions in Bayesian inference - the Stein variational gradient descent. We propagate samples generated from a tractable input distribution towards a near-optimal rare event importance sampling distribution by exploiting a similarity of the latter with Bayesian posterior distributions. Sample propagation takes the shape of passing samples through a sequence of invertible transforms such that their densities can be tracked and used to construct an unbiased importance sampling estimate of the rare event probability - the Stein variational rare event estimator. We discuss settings and parametric choices of the algorithm and suggest a method for balancing convergence speed with stability by choosing the step width or base learning rate adaptively. We analyze the method's performance on several analytical test functions and two engineering examples in low to high stochastic dimensions ($d = 2 - 869$) and find that it consistently outperforms other state-of-the-art gradient-based rare event simulation methods.
4.Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data
Authors:Daniel Schwartz, Riddhiman Saha, Steffen Ventz, Lorenzo Trippa
Abstract: Subgroup analysis of randomized controlled trials (RCTs) constitutes an important component of the drug development process in precision medicine. In particular, subgroup analysis of early phase trials often influences the design and eligibility criteria of subsequent confirmatory trials and ultimately impacts which subpopulations will receive the treatment after regulatory approval. However, subgroup analysis is typically complicated by small sample sizes, which lead to substantial uncertainty about the subgroup-specific treatment effects. In this work we explore the use of external control (EC) data to augment an RCT's subgroup analysis. We define and discuss harmonized estimators of subgroup-specific treatment effects to leverage EC data while ensuring that the subgroup analysis is coherent with a primary analysis using RCT data alone. Our approach modifies subgroup-specific treatment effect estimators obtained through popular methods (e.g., linear regression) applied jointly to the RCT and EC datasets. We shrink these estimates so that their weighted average is close to a robust estimate of the average treatment effect in the overall trial population based on the RCT data alone. We study the proposed harmonized estimators with analytic results and simulations, and investigate standard performance metrics. The method is illustrated with a case study in glioblastoma.
5.Dynamic survival analysis: modelling the hazard function via ordinary differential equations
Authors:J. A. Christen, F. J. Rubio
Abstract: The hazard function represents one of the main quantities of interest in the analysis of survival data. We propose a general approach for modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over time. Our proposal capitalises on the extensive literature of ODEs which, in particular, allow for establishing basic rules or laws on the dynamics of the hazard function via the use of autonomous ODEs. We show how to implement the proposed modelling framework in cases where there is an analytic solution to the system of ODEs or where an ODE solver is required to obtain a numerical solution. We focus on the use of a Bayesian modelling approach, but the proposed methodology can also be coupled with maximum likelihood estimation. A simulation study is presented to illustrate the performance of these models and the interplay of sample size and censoring. Two case studies using real data are presented to illustrate the use of the proposed approach and to highlight the interpretability of the corresponding models. We conclude with a discussion on potential extensions of our work and strategies to include covariates into our framework.