
Machine Learning (stat.ML)
Mon, 11 Sep 2023
1.Comprehensive analysis of synthetic learning applied to neonatal brain MRI segmentation
Authors:R Valabregue ICM, F Girka ICM, A Pron INT, F Rousseau LaTIM, G Auzias INT
Abstract: Brain segmentation from neonatal MRI images is a very challenging task due to large changes in the shape of cerebral structures and variations in signal intensities reflecting the gestational process. In this context, there is a clear need for segmentation techniques that are robust to variations in image contrast and to the spatial configuration of anatomical structures. In this work, we evaluate the potential of synthetic learning, a contrast-independent model trained using synthetic images generated from the ground truth labels of very few subjects.We base our experiments on the dataset released by the developmental Human Connectome Project, for which high-quality T1- and T2-weighted images are available for more than 700 babies aged between 26 and 45 weeks post-conception. First, we confirm the impressive performance of a standard Unet trained on a few T2-weighted volumes, but also confirm that such models learn intensity-related features specific to the training domain. We then evaluate the synthetic learning approach and confirm its robustness to variations in image contrast by reporting the capacity of such a model to segment both T1- and T2-weighted images from the same individuals. However, we observe a clear influence of the age of the baby on the predictions. We improve the performance of this model by enriching the synthetic training set with realistic motion artifacts and over-segmentation of the white matter. Based on extensive visual assessment, we argue that the better performance of the model trained on real T2w data may be due to systematic errors in the ground truth. We propose an original experiment combining two definitions of the ground truth allowing us to show that learning from real data will reproduce any systematic bias from the training set, while synthetic models can avoid this limitation. Overall, our experiments confirm that synthetic learning is an effective solution for segmenting neonatal brain MRI. Our adapted synthetic learning approach combines key features that will be instrumental for large multi-site studies and clinical applications.
2.Boundary Peeling: Outlier Detection Method Using One-Class Peeling
Authors:Sheikh Arafat, Na Sun, Maria L. Weese, Waldyn G. Martinez
Abstract: Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and perform consistently well across diverse underlying data distributions. We introduce One-Class Boundary Peeling, an unsupervised outlier detection algorithm. One-class Boundary Peeling uses the average signed distance from iteratively-peeled, flexible boundaries generated by one-class support vector machines. One-class Boundary Peeling has robust hyperparameter settings and, for increased flexibility, can be cast as an ensemble method. In synthetic data simulations One-Class Boundary Peeling outperforms all state of the art methods when no outliers are present while maintaining comparable or superior performance in the presence of outliers, as compared to benchmark methods. One-Class Boundary Peeling performs competitively in terms of correct classification, AUC, and processing time using common benchmark data sets.
3.On the quality of randomized approximations of Tukey's depth
Authors:Simon Briend, Gábor Lugosi, Roberto Imbuzeiro Oliveira
Abstract: Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.