Tue, 11 Apr 2023
1.Density Estimation on the Binary Hypercube using Transformed Fourier-Walsh Diagonalizations
Authors:Arthur C. Campello
Abstract: This article focuses on estimating distribution elements over a high-dimensional binary hypercube from multivariate binary data. A popular approach to this problem, optimizing Walsh basis coefficients, is made more interpretable by an alternative representation as a "Fourier-Walsh" diagonalization. Allowing monotonic transformations of the resulting matrix elements yields a versatile binary density estimator: the main contribution of this article. It is shown that the Aitchison and Aitken kernel emerges from a constrained exponential form of this estimator, and that relaxing these constraints yields a flexible variable-weighted version of the kernel that retains positive-definiteness. Estimators within this unifying framework mix together well and span over extremes of the speed-flexibility trade-off, allowing them to serve a wide range of statistical inference and learning problems.
Authors:Fernando Delbianco, Fernando Tohmé
Abstract: The problem of individualized prediction can be addressed using variants of conformal prediction, obtaining the intervals to which the actual values of the variables of interest belong. Here we present a method based on detecting the observations that may be relevant for a given question and then using simulated controls to yield the intervals for the predicted values. This method is shown to be adaptive and able to detect the presence of latent relevant variables.
3.A nonparametric framework for treatment effect modifier discovery in high dimensions
Authors:Philippe Boileau, Ning Leng, Nima S. Hejazi, Mark van der Laan, Sandrine Dudoit
Abstract: Heterogeneous treatment effects are driven by treatment effect modifiers, pre-treatment covariates that modify the effect of a treatment on an outcome. Current approaches for uncovering these variables are limited to low-dimensional data, data with weakly correlated covariates, or data generated according to parametric processes. We resolve these issues by developing a framework for defining model-agnostic treatment effect modifier variable importance parameters applicable to high-dimensional data with arbitrary correlation structure, deriving one-step, estimating equation and targeted maximum likelihood estimators of these parameters, and establishing these estimators' asymptotic properties. This framework is showcased by defining variable importance parameters for data-generating processes with continuous, binary, and time-to-event outcomes with binary treatments, and deriving accompanying multiply-robust and asymptotically linear estimators. Simulation experiments demonstrate that these estimators' asymptotic guarantees are approximately achieved in realistic sample sizes for observational and randomized studies alike. This framework is applied to gene expression data collected for a clinical trial assessing the effect of a monoclonal antibody therapy on disease-free survival in breast cancer patients. Genes predicted to have the greatest potential for treatment effect modification have previously been linked to breast cancer. An open-source R package implementing this methodology, unihtee, is made available on GitHub at https://github.com/insightsengineering/unihtee.