arXiv daily

Image and Video Processing (eess.IV)

Mon, 10 Apr 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023
1.HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains

Authors:Huanjing Yue, Yubo Peng, Biting Yu, Xuanwu Yin, Zhenyu Zhou, Jingyu Yang

Abstract: High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper.

2.ADS_UNet: A Nested UNet for Histopathology Image Segmentation

Authors:Yilong Yang, Srinandan Dasmahapatra, Sasan Mahmoodi

Abstract: The UNet model consists of fully convolutional network (FCN) layers arranged as contracting encoder and upsampling decoder maps. Nested arrangements of these encoder and decoder maps give rise to extensions of the UNet model, such as UNete and UNet++. Other refinements include constraining the outputs of the convolutional layers to discriminate between segment labels when trained end to end, a property called deep supervision. This reduces feature diversity in these nested UNet models despite their large parameter space. Furthermore, for texture segmentation, pixel correlations at multiple scales contribute to the classification task; hence, explicit deep supervision of shallower layers is likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise additive training algorithm that incorporates resource-efficient deep supervision in shallower layers and takes performance-weighted combinations of the sub-UNets to create the segmentation model. We provide empirical evidence on three histopathology datasets to support the claim that the proposed ADS UNet reduces correlations between constituent features and improves performance while being more resource efficient. We demonstrate that ADS_UNet outperforms state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training time as that required by Transformers.

3.Reconstruction-driven Dynamic Refinement based Unsupervised Domain Adaptation for Joint Optic Disc and Cup Segmentation

Authors:Ziyang Chen, Yongsheng Pan, Yong Xia

Abstract: Glaucoma is one of the leading causes of irreversible blindness. Segmentation of optic disc (OD) and optic cup (OC) on fundus images is a crucial step in glaucoma screening. Although many deep learning models have been constructed for this task, it remains challenging to train an OD/OC segmentation model that could be deployed successfully to different healthcare centers. The difficulties mainly comes from the domain shift issue, i.e., the fundus images collected at these centers usually vary greatly in the tone, contrast, and brightness. To address this issue, in this paper, we propose a novel unsupervised domain adaptation (UDA) method called Reconstruction-driven Dynamic Refinement Network (RDR-Net), where we employ a due-path segmentation backbone for simultaneous edge detection and region prediction and design three modules to alleviate the domain gap. The reconstruction alignment (RA) module uses a variational auto-encoder (VAE) to reconstruct the input image and thus boosts the image representation ability of the network in a self-supervised way. It also uses a style-consistency constraint to force the network to retain more domain-invariant information. The low-level feature refinement (LFR) module employs input-specific dynamic convolutions to suppress the domain-variant information in the obtained low-level features. The prediction-map alignment (PMA) module elaborates the entropy-driven adversarial learning to encourage the network to generate source-like boundaries and regions. We evaluated our RDR-Net against state-of-the-art solutions on four public fundus image datasets. Our results indicate that RDR-Net is superior to competing models in both segmentation performance and generalization ability

4.Accelerated deep self-supervised ptycho-laminography for three-dimensional nanoscale imaging of integrated circuits

Authors:Iksung Kang, Yi Jiang, Mirko Holler, Manuel Guizar-Sicairos, A. F. J. Levi, Jeffrey Klug, Stefan Vogt, George Barbastathis

Abstract: Three-dimensional inspection of nanostructures such as integrated circuits is important for security and reliability assurance. Two scanning operations are required: ptychographic to recover the complex transmissivity of the specimen; and rotation of the specimen to acquire multiple projections covering the 3D spatial frequency domain. Two types of rotational scanning are possible: tomographic and laminographic. For flat, extended samples, for which the full 180 degree coverage is not possible, the latter is preferable because it provides better coverage of the 3D spatial frequency domain compared to limited-angle tomography. It is also because the amount of attenuation through the sample is approximately the same for all projections. However, both techniques are time consuming because of extensive acquisition and computation time. Here, we demonstrate the acceleration of ptycho-laminographic reconstruction of integrated circuits with 16-times fewer angular samples and 4.67-times faster computation by using a physics-regularized deep self-supervised learning architecture. We check the fidelity of our reconstruction against a densely sampled reconstruction that uses full scanning and no learning. As already reported elsewhere [Zhou and Horstmeyer, Opt. Express, 28(9), pp. 12872-12896], we observe improvement of reconstruction quality even over the densely sampled reconstruction, due to the ability of the self-supervised learning kernel to fill the missing cone.

5.Localise to segment: crop to improve organ at risk segmentation accuracy

Authors:Abraham George Smith, Denis Kutnár, Ivan Richter Vogelius, Sune Darkner, Jens Petersen

Abstract: Increased organ at risk segmentation accuracy is required to reduce cost and complications for patients receiving radiotherapy treatment. Some deep learning methods for the segmentation of organs at risk use a two stage process where a localisation network first crops an image to the relevant region and then a locally specialised network segments the cropped organ of interest. We investigate the accuracy improvements brought about by such a localisation stage by comparing to a single-stage baseline network trained on full resolution images. We find that localisation approaches can improve both training time and stability and a two stage process involving both a localisation and organ segmentation network provides a significant increase in segmentation accuracy for the spleen, pancreas and heart from the Medical Segmentation Decathlon dataset. We also observe increased benefits of localisation for smaller organs. Source code that recreates the main results is available at \href{https://github.com/Abe404/localise_to_segment}{this https URL}.

6.Brain Extraction comparing Segment Anything Model (SAM) and FSL Brain Extraction Tool

Authors:Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug

Abstract: Brain extraction is a critical preprocessing step in almost every neuroimaging study, enabling accurate segmentation and analysis of Magnetic Resonance Imaging (MRI) data. FSL's Brain Extraction Tool (BET), although considered the current gold standard, presents limitations such as over-extraction, which can be particularly problematic in brains with lesions affecting the outer regions, inaccurate differentiation between brain tissue and surrounding meninges, and susceptibility to image quality issues. Recent advances in computer vision research have led to the development of the Segment Anything Model (SAM) by Meta AI, which has demonstrated remarkable potential across a wide range of applications. In this paper, we present a comparative analysis of brain extraction techniques using BET and SAM on a variety of brain scans with varying image qualities, MRI sequences, and brain lesions affecting different brain regions. We find that SAM outperforms BET based on several metrics, particularly in cases where image quality is compromised by signal inhomogeneities, non-isotropic voxel resolutions, or the presence of brain lesions that are located near or involve the outer regions of the brain and the meninges. These results suggest that SAM has the potential to emerge as a more accurate and precise tool for a broad range of brain extraction applications.

7.SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

Authors:Saikat Roy, Tassilo Wald, Gregor Koehler, Maximilian R. Rokuss, Nico Disch, Julius Holzschuh, David Zimmerer, Klaus H. Maier-Hein

Abstract: Foundation models have taken over natural language processing and image generation domains due to the flexibility of prompting. With the recent introduction of the Segment Anything Model (SAM), this prompt-driven paradigm has entered image segmentation with a hitherto unexplored abundance of capabilities. The purpose of this paper is to conduct an initial evaluation of the out-of-the-box zero-shot capabilities of SAM for medical image segmentation, by evaluating its performance on an abdominal CT organ segmentation task, via point or bounding box based prompting. We show that SAM generalizes well to CT data, making it a potential catalyst for the advancement of semi-automatic segmentation tools for clinicians. We believe that this foundation model, while not reaching state-of-the-art segmentation performance in our investigations, can serve as a highly potent starting point for further adaptations of such models to the intricacies of the medical domain. Keywords: medical image segmentation, SAM, foundation models, zero-shot learning

8.LCDctCNN: Lung Cancer Diagnosis of CT scan Images Using CNN Based Model

Authors:Muntasir Mamun, Md Ishtyaq Mahmud, Mahabuba Meherin, Ahmed Abdelgawad

Abstract: The most deadly and life-threatening disease in the world is lung cancer. Though early diagnosis and accurate treatment are necessary for lowering the lung cancer mortality rate. A computerized tomography (CT) scan-based image is one of the most effective imaging techniques for lung cancer detection using deep learning models. In this article, we proposed a deep learning model-based Convolutional Neural Network (CNN) framework for the early detection of lung cancer using CT scan images. We also have analyzed other models for instance Inception V3, Xception, and ResNet-50 models to compare with our proposed model. We compared our models with each other considering the metrics of accuracy, Area Under Curve (AUC), recall, and loss. After evaluating the model's performance, we observed that CNN outperformed other models and has been shown to be promising compared to traditional methods. It achieved an accuracy of 92%, AUC of 98.21%, recall of 91.72%, and loss of 0.328.