Tue, 02 May 2023
1.Robust and Adaptive Functional Logistic Regression
Abstract: We introduce and study a family of robust estimators for the functional logistic regression model whose robustness automatically adapts to the data thereby leading to estimators with high efficiency in clean data and a high degree of resistance towards atypical observations. The estimators are based on the concept of density power divergence between densities and may be formed with any combination of lower rank approximations and penalties, as the need arises. For these estimators we prove uniform convergence and high rates of convergence with respect to the commonly used prediction error under fairly general assumptions. The highly competitive practical performance of our proposal is illustrated on a simulation study and a real data example which includes atypical observations.
2.tmfast fits topic models fast
Authors:Daniel J. Hicks
Abstract: tmfast is an R package for fitting topic models using a fast algorithm based on partial PCA and the varimax rotation. After providing mathematical background to the method, we present two examples, using a simulated corpus and aggregated works of a selection of authors from the long nineteenth century, and compare the quality of the fitted models to a standard topic modeling package.
3.Network method for voxel-pair-level brain connectivity analysis under spatial-contiguity constraints
Authors:Tong Lu, Yuan Zhang, Peter Kochunov, Elliot Hong, Shuo Chen
Abstract: Brain connectome analysis commonly compresses high-resolution brain scans (typically composed of millions of voxels) down to only hundreds of regions of interest (ROIs) by averaging within-ROI signals. This huge dimension reduction improves computational speed and the morphological properties of anatomical structures; however, it also comes at the cost of substantial losses in spatial specificity and sensitivity, especially when the signals exhibit high within-ROI heterogeneity. Oftentimes, abnormally expressed functional connectivity (FC) between a pair of ROIs caused by a brain disease is primarily driven by only small subsets of voxel pairs within the ROI pair. This article proposes a new network method for detection of voxel-pair-level neural dysconnectivity with spatial constraints. Specifically, focusing on an ROI pair, our model aims to extract dense sub-areas that contain aberrant voxel-pair connections while ensuring that the involved voxels are spatially contiguous. In addition, we develop sub-community-detection algorithms to realize the model, and the consistency of these algorithms is justified. Comprehensive simulation studies demonstrate our method's effectiveness in reducing the false-positive rate while increasing statistical power, detection replicability, and spatial specificity. We apply our approach to reveal: (i) voxel-wise schizophrenia-altered FC patterns within the salience and temporal-thalamic network from 330 participants in a schizophrenia study; (ii) disrupted voxel-wise FC patterns related to nicotine addiction between the basal ganglia, hippocampus, and insular gyrus from 3269 participants using UK Biobank data. The detected results align with previous medical findings but include improved localized information.
4.On the selection of optimal subdata for big data regression based on leverage scores
Authors:Vasilis Chasiotis, Dimitris Karlis
Abstract: Regression can be really difficult in case of big datasets, since we have to dealt with huge volumes of data. The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper we consider an approach based on leverages scores, already existing in the current literature. The aforementioned approach proposed in order to select subdata for linear model discrimination. However, we highlight its importance on the selection of data points that are the most informative for estimating unknown parameters. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.