arXiv daily

Image and Video Processing (eess.IV)

Mon, 22 May 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.Quantifying the effect of X-ray scattering for data generation in real-time defect detection

Authors:Vladyslav Andriiashen, Robert van Liere, Tristan van Leeuwen, K. Joost Batenburg

Abstract: X-ray imaging is widely used for non-destructive detection of defects in industrial products on a conveyor belt. Real-time detection requires highly accurate, robust, and fast algorithms to analyze X-ray images. Deep convolutional neural networks (DCNNs) satisfy these requirements if a large amount of labeled data is available. To overcome the challenge of collecting these data, different methods of X-ray image generation can be considered. Depending on the desired level of similarity to real data, various physical effects either should be simulated or can be ignored. X-ray scattering is known to be computationally expensive to simulate, and this effect can heavily influence the accuracy of a generated X-ray image. We propose a methodology for quantitative evaluation of the effect of scattering on defect detection. This methodology compares the accuracy of DCNNs trained on different versions of the same data that include and exclude the scattering signal. We use the Probability of Detection (POD) curves to find the size of the smallest defect that can be detected with a DCNN and evaluate how this size is affected by the choice of training data. We apply the proposed methodology to a model problem of defect detection in cylinders. Our results show that the exclusion of the scattering signal from the training data has the largest effect on the smallest detectable defects. Furthermore, we demonstrate that accurate inspection is more reliant on high-quality training data for images with a high quantity of scattering. We discuss how the presented methodology can be used for other tasks and objects.

2.An efficient deep learning model to categorize brain tumor using reconstruction and fine-tuning

Authors:Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Md. Alamgir Jalil Pramanik, Sunil Aryal, Muhammad Ali Abdulllah Almoyad, Khondokar Fida Hasan, Mohammad Ali Moni

Abstract: Brain tumors are among the most fatal and devastating diseases, often resulting in significantly reduced life expectancy. An accurate diagnosis of brain tumors is crucial to devise treatment plans that can extend the lives of affected individuals. Manually identifying and analyzing large volumes of MRI data is both challenging and time-consuming. Consequently, there is a pressing need for a reliable deep learning (DL) model to accurately diagnose brain tumors. In this study, we propose a novel DL approach based on transfer learning to effectively classify brain tumors. Our novel method incorporates extensive pre-processing, transfer learning architecture reconstruction, and fine-tuning. We employ several transfer learning algorithms, including Xception, ResNet50V2, InceptionResNetV2, and DenseNet201. Our experiments used the Figshare MRI brain tumor dataset, comprising 3,064 images, and achieved accuracy scores of 99.40%, 99.68%, 99.36%, and 98.72% for Xception, ResNet50V2, InceptionResNetV2, and DenseNet201, respectively. Our findings reveal that ResNet50V2 achieves the highest accuracy rate of 99.68% on the Figshare MRI brain tumor dataset, outperforming existing models. Therefore, our proposed model's ability to accurately classify brain tumors in a short timeframe can aid neurologists and clinicians in making prompt and precise diagnostic decisions for brain tumor patients.

3.RSA-INR: Riemannian Shape Autoencoding via 4D Implicit Neural Representations

Authors:Sven Dummer, Nicola Strisciuglio, Christoph Brune

Abstract: Shape encoding and shape analysis are valuable tools for comparing shapes and for dimensionality reduction. A specific framework for shape analysis is the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, which is capable of shape matching and dimensionality reduction. Researchers have recently introduced neural networks into this framework. However, these works can not match more than two objects simultaneously or have suboptimal performance in shape variability modeling. The latter limitation occurs as the works do not use state-of-the-art shape encoding methods. Moreover, the literature does not discuss the connection between the LDDMM Riemannian distance and the Riemannian geometry for deep learning literature. Our work aims to bridge this gap by demonstrating how LDDMM can integrate Riemannian geometry into deep learning. Furthermore, we discuss how deep learning solves and generalizes shape matching and dimensionality reduction formulations of LDDMM. We achieve both goals by designing a novel implicit encoder for shapes. This model extends a neural network-based algorithm for LDDMM-based pairwise registration, results in a nonlinear manifold PCA, and adds a Riemannian geometry aspect to deep learning models for shape variability modeling. Additionally, we demonstrate that the Riemannian geometry component improves the reconstruction procedure of the implicit encoder in terms of reconstruction quality and stability to noise. We hope our discussion paves the way to more research into how Riemannian geometry, shape/image analysis, and deep learning can be combined.

4.TSPTQ-ViT: Two-scaled post-training quantization for vision transformer

Authors:Yu-Shan Tai Andy, Ming-Guang Lin Andy, An-Yeu Andy, Wu

Abstract: Vision transformers (ViTs) have achieved remarkable performance in various computer vision tasks. However, intensive memory and computation requirements impede ViTs from running on resource-constrained edge devices. Due to the non-normally distributed values after Softmax and GeLU, post-training quantization on ViTs results in severe accuracy degradation. Moreover, conventional methods fail to address the high channel-wise variance in LayerNorm. To reduce the quantization loss and improve classification accuracy, we propose a two-scaled post-training quantization scheme for vision transformer (TSPTQ-ViT). We design the value-aware two-scaled scaling factors (V-2SF) specialized for post-Softmax and post-GeLU values, which leverage the bit sparsity in non-normal distribution to save bit-widths. In addition, the outlier-aware two-scaled scaling factors (O-2SF) are introduced to LayerNorm, alleviating the dominant impacts from outlier values. Our experimental results show that the proposed methods reach near-lossless accuracy drops (<0.5%) on the ImageNet classification task under 8-bit fully quantized ViTs.

5.A Fast and Accurate Optical Flow Camera for Resource-Constrained Edge Applications

Authors:Jonas Kühne, Michele Magno, Luca Benini

Abstract: Optical Flow (OF) is the movement pattern of pixels or edges that is caused in a visual scene by the relative motion between an agent and a scene. OF is used in a wide range of computer vision algorithms and robotics applications. While the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, it needs to be executed at low latency, especially in robotics applications. Therefore, OF estimation is today performed on powerful CPUs or GPUs to satisfy the stringent requirements in terms of execution speed for control and actuation. On-sensor hardware acceleration is a promising approach to enable low latency OF calculations and fast execution even on resource-constrained devices such as nano drones and AR/VR glasses and headsets. This paper analyzes the achievable accuracy, frame rate, and power consumption when using a novel optical flow sensor consisting of a global shutter camera with an Application Specific Integrated Circuit (ASIC) for optical flow computation. The paper characterizes the optical flow sensor in high frame-rate, low-latency settings, with a frame rate of up to 88 fps at the full resolution of 1124 by 1364 pixels and up to 240 fps at a reduced camera resolution of 280 by 336, for both classical camera images and optical flow data.

6.GSURE-Based Diffusion Model Training with Corrupted Data

Authors:Bahjat Kawar, Noam Elata, Tomer Michaeli, Michael Elad

Abstract: Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on corrupted data. We introduce a loss function based on the Generalized Stein's Unbiased Risk Estimator (GSURE), and prove that under some conditions, it is equivalent to the training objective used in fully supervised diffusion models. We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI), where the use of undersampled data significantly alleviates data collection costs. Our approach achieves generative performance comparable to its fully supervised counterpart without training on any clean signals. In addition, we deploy the resulting diffusion model in various downstream tasks beyond the degradation present in the training set, showcasing promising results.

7.Morphological Sampling Theorem and its Extension to Grey-value Images

Authors:Vivek Sridhar, Michael Breuß

Abstract: Sampling is a basic operation in image processing. In classic literature, a morphological sampling theorem has been established, which shows how sampling interacts by morphological operations with image reconstruction. Many aspects of morphological sampling have been investigated for binary images, but only some of them have been explored for grey-value imagery. With this paper, we make a step towards completion of this open matter. By relying on the umbra notion, we show how to transfer classic theorems in binary morphology about the interaction of sampling with the fundamental morphological operations dilation, erosion, opening and closing, to the grey-value setting. In doing this we also extend the theory relating the morphological operations and corresponding reconstructions to use of non-flat structuring elements. We illustrate the theoretical developments at hand of examples.