arXiv daily

Image and Video Processing (eess.IV)

Fri, 28 Apr 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.Segment Anything Model for Medical Images?

Authors:Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Haozhe Chi, Xindi Hu, Deng-Ping Fan, Fajin Dong, Dong Ni

Abstract: The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It designed a novel promotable segmentation task, ensuring zero-shot image segmentation using the pre-trained model via two main modes including automatic everything and manual prompt. SAM has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging due to the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. SAM has achieved impressive results on various natural image segmentation tasks. Meanwhile, zero-shot and efficient MIS can well reduce the annotation time and boost the development of medical image analysis. Hence, SAM seems to be a potential tool and its performance on large medical datasets should be further validated. We collected and sorted 52 open-source datasets, and build a large medical segmentation dataset with 16 modalities, 68 objects, and 553K slices. We conducted a comprehensive analysis of different SAM testing strategies on the so-called COSMOS 553K dataset. Extensive experiments validate that SAM performs better with manual hints like points and boxes for object perception in medical images, leading to better performance in prompt mode compared to everything mode. Additionally, SAM shows remarkable performance in some specific objects and modalities, but is imperfect or even totally fails in other situations. Finally, we analyze the influence of different factors (e.g., the Fourier-based boundary complexity and size of the segmented objects) on SAM's segmentation performance. Extensive experiments validate that SAM's zero-shot segmentation capability is not sufficient to ensure its direct application to the MIS.

2.SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

Authors:An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning.

3.An Efficient Hash-based Data Structure for Dynamic Vision Sensors and its Application to Low-energy Low-memory Noise Filtering

Authors:Pradeep Kumar Gopalakrishnan, Chip-Hong Chang, Arindam Basu

Abstract: Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. It overcomes the saturation issue in the Bloom Filter (BF) and the memory reset issue in other hash-based arrays by using a second dimension to clear 1 out of D rows at regular intervals. A hardware-friendly, low-power, and low-memory-footprint noise filter for DVS is demonstrated using BF_2. For the tested datasets, the performance of the filter matches those of state-of-the-art filters like the BAF/STCF while consuming less than 10% and 15% of their memory and energy-per-event, respectively, for a correlation time constant Tau = 5 ms. The memory and energy advantages of the proposed filter increase with increasing sensor sizes. The proposed filter compares favourably with other hardware-friendly, event-based filters in hardware complexity, memory requirement and energy-per-event - as demonstrated through its implementation on an FPGA. The parameters of the data structure can be adjusted for trade-offs between performance and memory consumption, based on application requirements.

4.Making the Invisible Visible: Toward High-Quality Terahertz Tomographic Imaging via Physics-Guided Restoration

Authors:Weng-Tai Su, Yi-Chun Hung, Po-Jen Yu, Shang-Hua Yang, Chia-Wen Lin

Abstract: Terahertz (THz) tomographic imaging has recently attracted significant attention thanks to its non-invasive, non-destructive, non-ionizing, material-classification, and ultra-fast nature for object exploration and inspection. However, its strong water absorption nature and low noise tolerance lead to undesired blurs and distortions of reconstructed THz images. The diffraction-limited THz signals highly constrain the performances of existing restoration methods. To address the problem, we propose a novel multi-view Subspace-Attention-guided Restoration Network (SARNet) that fuses multi-view and multi-spectral features of THz images for effective image restoration and 3D tomographic reconstruction. To this end, SARNet uses multi-scale branches to extract intra-view spatio-spectral amplitude and phase features and fuse them via shared subspace projection and self-attention guidance. We then perform inter-view fusion to further improve the restoration of individual views by leveraging the redundancies between neighboring views. Here, we experimentally construct a THz time-domain spectroscopy (THz-TDS) system covering a broad frequency range from 0.1 THz to 4 THz for building up a temporal/spectral/spatial/ material THz database of hidden 3D objects. Complementary to a quantitative evaluation, we demonstrate the effectiveness of our SARNet model on 3D THz tomographic reconstruction applications.

5.Unified Noise-aware Network for Low-count PET Denoising

Authors:Huidong Xie, Qiong Liu, Bo Zhou, Xiongchao Chen, Xueqi Guo, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. However, neural networks trained on a specific noise level cannot be easily generalized to other noise levels due to different noise amplitude and variances. To obtain optimal denoised results, we may need to train multiple networks using data with different noise levels. But this approach may be infeasible in reality due to limited data availability. Denoising dynamic PET images presents additional challenge due to tracer decay and continuously changing noise levels across dynamic frames. To address these issues, we propose a Unified Noise-aware Network (UNN) that combines multiple sub-networks with varying denoising power to generate optimal denoised results regardless of the input noise levels. Evaluated using large-scale data from two medical centers with different vendors, presented results showed that the UNN can consistently produce promising denoised results regardless of input noise levels, and demonstrate superior performance over networks trained on single noise level data, especially for extremely low-count data.