arXiv daily

Image and Video Processing (eess.IV)

Mon, 17 Jul 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

Authors:Ke Yan, Xiaoli Yin, Yingda Xia, Fakai Wang, Shu Wang, Yuan Gao, Jiawen Yao, Chunli Li, Xiaoyu Bai, Jingren Zhou, Ling Zhang, Le Lu, Yu Shi

Abstract: Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.

2.Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

Authors:Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu

Abstract: One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels: First, it is shown that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these (generalised) linear models are proposed as hyperparameters, to weigh multiple networks that independently represent individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference, for much improved efficiency. Experimental results based on data from 850 patients, for the application of automating radiologist labelling multi-parametric MR, compare the proposed combiner networks with other commonly-adopted end-to-end networks. Using the added advantages of obtaining and interpreting the modality combining rules, in terms of the linear weights or odds-ratios on individual image modalities, three clinical applications are presented for prostate cancer segmentation, including modality availability assessment, importance quantification and rule discovery.

3.Power-Efficient Video Streaming on Mobile Devices Using Optimal Spatial Scaling

Authors:Christian Herglotz, André Kaup, Stéphane Coulombe, Ahmad Vakili

Abstract: This paper derives optimal spatial scaling and rate control parameters for power-efficient wireless video streaming on portable devices. A video streaming application is studied, which receives a high-resolution and high-quality video stream from a remote server and displays the content to the end-user.We show that the resolution of the input video can be adjusted such that the quality-power trade-off is optimized. Making use of a power model from the literature and subjective quality evaluation using a perceptual metric, we derive optimal combinations of the scaling factor and the rate-control parameter for encoding. For HD sequences, up to 10% of power can be saved at negligible quality losses and up to 15% of power can be saved at tolerable distortions. To show general validity, the method was tested for Wi-Fi and a mobile network as well as for two different smartphones.

4.Power Modeling for Virtual Reality Video Playback Applications

Authors:Christian Herglotz, Stéphane Coulombe, Ahmad Vakili, André Kaup

Abstract: This paper proposes a method to evaluate and model the power consumption of modern virtual reality playback and streaming applications on smartphones. Due to the high computational complexity of the virtual reality processing toolchain, the corresponding power consumption is very high, which reduces operating times of battery-powered devices. To tackle this problem, we analyze the power consumption in detail by performing power measurements. Furthermore, we construct a model to estimate the true power consumption with a mean error of less than 3.5%. The model can be used to save power at critical battery levels by changing the streaming video parameters. Particularly, the results show that the power consumption is significantly reduced by decreasing the input video resolution.

5.Efficient coding of 360° videos exploiting inactive regions in projection formats

Authors:Christian Herglotz, Mohammadreza Jamali, Stéphane Coulombe, Carlos Vazquez, Ahmad Vakili

Abstract: This paper presents an efficient method for encoding common projection formats in 360$^\circ$ video coding, in which we exploit inactive regions. These regions are ignored in the reconstruction of the equirectangular format or the viewport in virtual reality applications. As the content of these pixels is irrelevant, we neglect the corresponding pixel values in ratedistortion optimization, residual transformation, as well as inloop filtering and achieve bitrate savings of up to 10%.

6.Neural Modulation Fields for Conditional Cone Beam Neural Tomography

Authors:Samuele Papa, David M. Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data.

7.Component-wise Power Estimation of Electrical Devices Using Thermal Imaging

Authors:Christian Herglotz, Sion Grosche, Akarsh Bharadwaj, André Kaup

Abstract: This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on the board are dissipating power. In order to enable reliable estimates, a segmentation of the thermal image must be available that can be obtained by manual labeling, object detection methods, or exploiting layout information. Evaluations show that with low-resolution consumer infrared cameras and dissipated powers larger than 300mW, mean estimation errors of 10% can be achieved.

8.Domain Adaptation using Silver Standard Masks for Lateral Ventricle Segmentation in FLAIR MRI

Authors:Owen Crystal, Pejman J. Maralani, Sandra Black, Alan R. Moody, April Khademi

Abstract: Lateral ventricular volume (LVV) is an important biomarker for clinical investigation. We present the first transfer learning-based LVV segmentation method for fluid-attenuated inversion recovery (FLAIR) MRI. To mitigate covariate shifts between source and target domains, this work proposes an domain adaptation method that optimizes performance on three target datasets. Silver standard (SS) masks were generated from the target domain using a novel conventional image processing ventricular segmentation algorithm and used to supplement the gold standard (GS) data from the source domain, Canadian Atherosclerosis Imaging Network (CAIN). Four models were tested on held-out test sets from four datasets: 1) SS+GS: trained on target SS masks and fine-tuned on source GS masks, 2) GS+SS: trained on source GS masks and fine-tuned on target SS masks, 3) trained on source GS (GS CAIN Only) and 4) trained on target SS masks (SS Only). The SS+GS model had the best and most consistent performance (mean DSC = 0.89, CoV = 0.05) and showed significantly (p < 0.05) higher DSC compared to the GS-only model on three target domains. Results suggest pre-training with noisy labels from the target domain allows the model to adapt to the dataset-specific characteristics and provides robust parameter initialization while fine-tuning with GS masks allows the model to learn detailed features. This method has wide application to other medical imaging problems where labeled data is scarce, and can be used as a per-dataset calibration method to accelerate wide-scale adoption.

9.EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Authors:Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, Yuzhuo Fu

Abstract: Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and a Group Aggregation Bridge module (GAB) in a lightweight manner. The GHPA groups input features and performs Hadamard Product Attention mechanism (HPA) on different axes to extract pathological information from diverse perspectives. The GAB effectively fuses multi-scale information by grouping low-level features, high-level features, and a mask generated by the decoder at each stage. Comprehensive experiments on the ISIC2017 and ISIC2018 datasets demonstrate that EGE-UNet outperforms existing state-of-the-art methods. In short, compared to the TransFuse, our model achieves superior segmentation performance while reducing parameter and computation costs by 494x and 160x, respectively. Moreover, to our best knowledge, this is the first model with a parameter count limited to just 50KB. Our code is available at https://github.com/JCruan519/EGE-UNet.

10.Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

Authors:Marcel Beetz, Abhirup Banerjee, Julius Ossenberg-Engels, Vicente Grau

Abstract: Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.

11.Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

Authors:Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang

Abstract: Look-up table(LUT)-based methods have shown the great efficacy in single image super-resolution (SR) task. However, previous methods ignore the essential reason of restricted receptive field (RF) size in LUT, which is caused by the interaction of space and channel features in vanilla convolution. They can only increase the RF at the cost of linearly increasing LUT size. To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation. It can be formulated as $n^2$ 1D LUTs to maintain $n\times n$ receptive field, which is obviously smaller than $n\times n$D LUT formulated before. The LUT generated by our RC module reaches less than 1/10000 storage compared with SR-LUT baseline. The proposed Reconstructed Convolution module based LUT method, termed as RCLUT, can enlarge the RF size by 9 times than the state-of-the-art LUT-based SR method and achieve superior performance on five popular benchmark dataset. Moreover, the efficient and robust RC module can be used as a plugin to improve other LUT-based SR methods. The code is available at https://github.com/liuguandu/RC-LUT.

12.Neural Image Compression: Generalization, Robustness, and Spectral Biases

Authors:Kelsey Lieberman, James Diffenderfer, Charles Godfrey, Bhavya Kailkhura

Abstract: Recent neural image compression (NIC) advances have produced models which are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.