arXiv daily: Image and Video Processing

arXiv daily: Image and Video Processing (eess.IV)

1.Towards Large-scale Single-shot Millimeter-wave Imaging for Low-cost Security Inspection

Authors:Liheng Bian, Daoyu Li, Shuoguang Wang, Huteng Liu, Chunyang Teng, Hanwen Xu, Rike Jie, Xuyang Chang, Guoqiang Zhao, Houjun Sun, Shiyong Li, Jun Zhang

Abstract: Millimeter-wave (MMW) imaging is emerging as a promising technique for safe security inspection. It achieves a delicate balance between imaging resolution, penetrability and human safety, resulting in higher resolution compared to low-frequency microwave, stronger penetrability compared to visible light, and stronger safety compared to X ray. Despite of recent advance in the last decades, the high cost of requisite large-scale antenna array hinders widespread adoption of MMW imaging in practice. To tackle this challenge, we report a large-scale single-shot MMW imaging framework using sparse antenna array, achieving low-cost but high-fidelity security inspection under an interpretable learning scheme. We first collected extensive full-sampled MMW echoes to study the statistical ranking of each element in the large-scale array. These elements are then sampled based on the ranking, building the experimentally optimal sparse sampling strategy that reduces the cost of antenna array by up to one order of magnitude. Additionally, we derived an untrained interpretable learning scheme, which realizes robust and accurate image reconstruction from sparsely sampled echoes. Last, we developed a neural network for automatic object detection, and experimentally demonstrated successful detection of concealed centimeter-sized targets using 10% sparse array, whereas all the other contemporary approaches failed at the same sample sampling ratio. The performance of the reported technique presents higher than 50% superiority over the existing MMW imaging schemes on various metrics including precision, recall, and mAP50. With such strong detection ability and order-of-magnitude cost reduction, we anticipate that this technique provides a practical way for large-scale single-shot MMW imaging, and could advocate its further practical applications.

2.Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation

Authors:Xinyue Xu, Yuhan Hsi, Haonan Wang, Xiaomeng Li

Abstract: Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.

3.Leveraging object detection for the identification of lung cancer

Authors:Karthick Prasad Gunasekaran

Abstract: Lung cancer poses a significant global public health challenge, emphasizing the importance of early detection for improved patient outcomes. Recent advancements in deep learning algorithms have shown promising results in medical image analysis. This study aims to explore the application of object detection particularly YOLOv5, an advanced object identification system, in medical imaging for lung cancer identification. To train and evaluate the algorithm, a dataset comprising chest X-rays and corresponding annotations was obtained from Kaggle. The YOLOv5 model was employed to train an algorithm capable of detecting cancerous lung lesions. The training process involved optimizing hyperparameters and utilizing augmentation techniques to enhance the model's performance. The trained YOLOv5 model exhibited exceptional proficiency in identifying lung cancer lesions, displaying high accuracy and recall rates. It successfully pinpointed malignant areas in chest radiographs, as validated by a separate test set where it outperformed previous techniques. Additionally, the YOLOv5 model demonstrated computational efficiency, enabling real-time detection and making it suitable for integration into clinical procedures. This proposed approach holds promise in assisting radiologists in the early discovery and diagnosis of lung cancer, ultimately leading to prompt treatment and improved patient outcomes.

4.A Diffusion Probabilistic Prior for Low-Dose CT Image Denoising

Authors:Xuan Liu, Yaoqin Xie, Songhui Diao, Shan Tan, Xiaokun Liang

Abstract: Low-dose computed tomography (CT) image denoising is crucial in medical image computing. Recent years have been remarkable improvement in deep learning-based methods for this task. However, training deep denoising neural networks requires low-dose and normal-dose CT image pairs, which are difficult to obtain in the clinic settings. To address this challenge, we propose a novel fully unsupervised method for low-dose CT image denoising, which is based on denoising diffusion probabilistic model -- a powerful generative model. First, we train an unconditional denoising diffusion probabilistic model capable of generating high-quality normal-dose CT images from random noise. Subsequently, the probabilistic priors of the pre-trained diffusion model are incorporated into a Maximum A Posteriori (MAP) estimation framework for iteratively solving the image denoising problem. Our method ensures the diffusion model produces high-quality normal-dose CT images while keeping the image content consistent with the input low-dose CT images. We evaluate our method on a widely used low-dose CT image denoising benchmark, and it outperforms several supervised low-dose CT image denoising methods in terms of both quantitative and visual performance.

5.NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation

Authors:Pengcheng Shi, Xutao Guo, Yanwu Yang, Chenfei Ye, Ting Ma

Abstract: Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variability, particularly for the vasculature. Motivated by the successful breakthroughs of graph neural networks (GNN) in capturing topological properties and non-Euclidean relationships across various fields, we propose NexToU, a novel hybrid architecture for medical image segmentation. NexToU comprises improved Pool GNN and Swin GNN modules from Vision GNN (ViG) for learning both global and local topological representations while minimizing computational costs. To address the containment and exclusion relationships among various anatomical structures, we reformulate the topological interaction (TI) module based on the nature of binary trees, rapidly encoding the topological constraints into NexToU. Extensive experiments conducted on three datasets (including distinct imaging dimensions, disease types, and imaging modalities) demonstrate that our method consistently outperforms other state-of-the-art (SOTA) architectures. All the code is publicly available at

6.VEDA: Uneven light image enhancement via a vision-based exploratory data analysis model

Authors:Tian Pu, Shuhang Wang, Zhenming Peng, Qingsong Zhu

Abstract: Uneven light image enhancement is a highly demanded task in many industrial image processing applications. Many existing enhancement methods using physical lighting models or deep-learning techniques often lead to unnatural results. This is mainly because: 1) the assumptions and priors made by the physical lighting model (PLM) based approaches are often violated in most natural scenes, and 2) the training datasets or loss functions used by deep-learning technique based methods cannot handle the various lighting scenarios in the real world well. In this paper, we propose a novel vision-based exploratory data analysis model (VEDA) for uneven light image enhancement. Our method is conceptually simple yet effective. A given image is first decomposed into a contrast image that preserves most of the perceptually important scene details, and a residual image that preserves the lighting variations. After achieving this decomposition at multiple scales using a retinal model that simulates the neuron response to light, the enhanced result at each scale can be obtained by manipulating the two images and recombining them. Then, a weighted averaging strategy based on the residual image is designed to obtain the output image by combining enhanced results at multiple scales. A similar weighting strategy can also be leveraged to reconcile noise suppression and detail preservation. Extensive experiments on different image datasets demonstrate that the proposed method can achieve competitive results in its simplicity and effectiveness compared with state-of-the-art methods. It does not require any explicit assumptions and priors about the scene imaging process, nor iteratively solving any optimization functions or any learning procedures.

7.Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction

Authors:Tobias Weber, Michael Ingrisch, Bernd Bischl, David Rügamer

Abstract: Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space and thereby reduce acquisition times at the cost of decreased image quality. In this work, we directly learn the undersampling masks to derive task- and domain-specific patterns. To solve this discrete optimization challenge, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks. Furthermore, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.

8.Learned Wavelet Video Coding using Motion Compensated Temporal Filtering

Authors:Anna Meyer, Fabian Brand, André Kaup

Abstract: We present an end-to-end trainable wavelet video coder based on motion compensated temporal filtering (MCTF). Thereby, we introduce a different coding scheme for learned video compression, which is currently dominated by residual and conditional coding approaches. By performing discrete wavelet transforms in temporal, horizontal, and vertical dimension, we obtain an explainable framework with spatial and temporal scalability. We focus on investigating a novel trainable MCTF module that is implemented using the lifting scheme. We show how multiple temporal decomposition levels in MCTF can be considered during training and how larger temporal displacements due to the MCTF coding order can be handled. Further, we present a content adaptive extension to MCTF which adapts to different motion strengths during inference. In our experiments, we compare our MCTF-based approach to learning-based conditional coders and traditional hybrid video coding. Especially at high rates, our approach has promising rate-distortion performance. Our method achieves average Bj{\o}ntegaard Delta savings of up to 21% over HEVC on the UVG data set and thereby outperforms state-of-the-art learned video coders.

9.Incomplete Multimodal Learning for Complex Brain Disorders Prediction

Authors:Reza Shirkavand, Liang Zhan, Heng Huang, Li Shen, Paul M. Thompson

Abstract: Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research cohorts and are prohibitive to collect in routine clinical practice. Especially in studies of brain diseases, research cohorts may include both neuroimaging data and genetic data, but for practical clinical diagnosis, we often need to make disease predictions only based on neuroimages. As a result, it is desired to design machine learning models which can use all available data (different data could provide complementary information) during training but conduct inference using only the most common data modality. We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Experimental results demonstrate that our approach outperforms the related machine learning and deep learning methods by a significant margin.

10.An AI-Ready Multiplex Staining Dataset for Reproducible and Accurate Characterization of Tumor Immune Microenvironment

Authors:Parmida Ghahremani, Joseph Marino, Juan Hernandez-Prera, Janis V. de la Iglesia, Robbert JC Slebos, Christine H. Chung, Saad Nadeem

Abstract: We introduce a new AI-ready computational pathology dataset containing restained and co-registered digitized images from eight head-and-neck squamous cell carcinoma patients. Specifically, the same tumor sections were stained with the expensive multiplex immunofluorescence (mIF) assay first and then restained with cheaper multiplex immunohistochemistry (mIHC). This is a first public dataset that demonstrates the equivalence of these two staining methods which in turn allows several use cases; due to the equivalence, our cheaper mIHC staining protocol can offset the need for expensive mIF staining/scanning which requires highly-skilled lab technicians. As opposed to subjective and error-prone immune cell annotations from individual pathologists (disagreement > 50%) to drive SOTA deep learning approaches, this dataset provides objective immune and tumor cell annotations via mIF/mIHC restaining for more reproducible and accurate characterization of tumor immune microenvironment (e.g. for immunotherapy). We demonstrate the effectiveness of this dataset in three use cases: (1) IHC quantification of CD3/CD8 tumor-infiltrating lymphocytes via style transfer, (2) virtual translation of cheap mIHC stains to more expensive mIF stains, and (3) virtual tumor/immune cellular phenotyping on standard hematoxylin images. The dataset is available at \url{}.

11.Score-based Diffusion Models for Bayesian Image Reconstruction

Authors:Michael T. McCann, Hyungjin Chung, Jong Chul Ye, Marc L. Klasky

Abstract: This paper explores the use of score-based diffusion models for Bayesian image reconstruction. Diffusion models are an efficient tool for generative modeling. Diffusion models can also be used for solving image reconstruction problems. We present a simple and flexible algorithm for training a diffusion model and using it for maximum a posteriori reconstruction, minimum mean square error reconstruction, and posterior sampling. We present experiments on both a linear and a nonlinear reconstruction problem that highlight the strengths and limitations of the approach.

1.Deep Learning-based Bio-Medical Image Segmentation using UNet Architecture and Transfer Learning

Authors:Nima Hassanpour, Abouzar Ghavami

Abstract: Image segmentation is a branch of computer vision that is widely used in real world applications including biomedical image processing. With recent advancement of deep learning, image segmentation has achieved at a very high level performance. Recently, UNet architecture is found as the core of novel deep learning segmentation methods. In this paper we implement UNet architecture from scratch with using basic blocks in Pytorch and evaluate its performance on multiple biomedical image datasets. We also use transfer learning to apply novel modified UNet segmentation packages on the biomedical image datasets. We fine tune the pre-trained transferred model with each specific dataset. We compare its performance with our fundamental UNet implementation. We show that transferred learning model has better performance in image segmentation than UNet model that is implemented from scratch.

2.Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming

Authors:Christian Herglotz, Werner Robitza, Alexander Raake, Tobias Hossfeld, André Kaup

Abstract: This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing research, which focused on reducing the power consumption at the same QoE optimizing video parameters, we investigate potential power savings by other means such as using a different playback device, a different codec, or a predefined maximum quality level. We find that based on the power consumption of the streaming sessions from the crowdsourcing dataset, devices could save more than 55% of power if all participants adhere to low-power settings.

3.Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Authors:Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu

Abstract: Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pretrained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pretrained diffusion-based SR model, which means that our sampling method ``boosts'' current diffusion-based SR models without any additional training.

1.KidneyRegNet: A Deep Learning Method for 3DCT-2DUS Kidney Registration during Breathing

Authors:Chi Yanling, Xu Yuyu, Liu Huiying, Wu Xiaoxiang, Liu Zhiqiang, Mao Jiawei, Xu Guibin, Huang Weimin

Abstract: This work proposed a novel deep registration pipeline for 3D CT and 2D U/S kidney scans of free breathing, which consists of a feature network, and a 3D-2D CNN-based registration network. The feature network has handcraft texture feature layers to reduce the semantic gap. The registration network is encoder-decoder structure with loss of feature-image-motion (FIM), which enables hierarchical regression at decoder layers and avoids multiple network concatenation. It was first pretrained with retrospective datasets cum training data generation strategy, then adapted to specific patient data under unsupervised one-cycle transfer learning in onsite application. The experiment was on 132 U/S sequences, 39 multiple phase CT and 210 public single phase CT images, and 25 pairs of CT and U/S sequences. It resulted in mean contour distance (MCD) of 0.94 mm between kidneys on CT and U/S images and MCD of 1.15 mm on CT and reference CT images. For datasets with small transformations, it resulted in MCD of 0.82 and 1.02 mm respectively. For large transformations, it resulted in MCD of 1.10 and 1.28 mm respectively. This work addressed difficulties in 3DCT-2DUS kidney registration during free breathing via novel network structures and training strategy.

2.Multi-BVOC Super-Resolution Exploiting Compounds Inter-Connection

Authors:Antonio Giganti, Sara Mandelli, Paolo Bestagini, Marco Marcon, Stefano Tubaro

Abstract: Biogenic Volatile Organic Compounds (BVOCs) emitted from the terrestrial ecosystem into the Earth's atmosphere are an important component of atmospheric chemistry. Due to the scarcity of measurement, a reliable enhancement of BVOCs emission maps can aid in providing denser data for atmospheric chemical, climate, and air quality models. In this work, we propose a strategy to super-resolve coarse BVOC emission maps by simultaneously exploiting the contributions of different compounds. To this purpose, we first accurately investigate the spatial inter-connections between several BVOC species. Then, we exploit the found similarities to build a Multi-Image Super-Resolution (MISR) system, in which a number of emission maps associated with diverse compounds are aggregated to boost Super-Resolution (SR) performance. We compare different configurations regarding the species and the number of joined BVOCs. Our experimental results show that incorporating BVOCs' relationship into the process can substantially improve the accuracy of the super-resolved maps. Interestingly, the best results are achieved when we aggregate the emission maps of strongly uncorrelated compounds. This peculiarity seems to confirm what was already guessed for other data-domains, i.e., joined uncorrelated information are more helpful than correlated ones to boost MISR performance. Nonetheless, the proposed work represents the first attempt in SR of BVOC emissions through the fusion of multiple different compounds.

3.A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Authors:Fangda Li, Zhiqiang Hu, Wen Chen, Avinash Kak

Abstract: Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challenge in achieving generalization for machine-learning based computer-aided diagnostic tools. To desensitize the learned models to stain variations, we propose the Generative Stain Augmentation Network (G-SAN) -- a GAN-based framework that augments a collection of cell images with simulated yet realistic stain variations. At its core, G-SAN uses a novel and highly computationally efficient Laplacian Pyramid (LP) based generator architecture, that is capable of disentangling stain from cell morphology. Through the task of patch classification and nucleus segmentation, we show that using G-SAN-augmented training data provides on average 15.7% improvement in F1 score and 7.3% improvement in panoptic quality, respectively. Our code is available at

1.Quantifying the effect of X-ray scattering for data generation in real-time defect detection

Authors:Vladyslav Andriiashen, Robert van Liere, Tristan van Leeuwen, K. Joost Batenburg

Abstract: X-ray imaging is widely used for non-destructive detection of defects in industrial products on a conveyor belt. Real-time detection requires highly accurate, robust, and fast algorithms to analyze X-ray images. Deep convolutional neural networks (DCNNs) satisfy these requirements if a large amount of labeled data is available. To overcome the challenge of collecting these data, different methods of X-ray image generation can be considered. Depending on the desired level of similarity to real data, various physical effects either should be simulated or can be ignored. X-ray scattering is known to be computationally expensive to simulate, and this effect can heavily influence the accuracy of a generated X-ray image. We propose a methodology for quantitative evaluation of the effect of scattering on defect detection. This methodology compares the accuracy of DCNNs trained on different versions of the same data that include and exclude the scattering signal. We use the Probability of Detection (POD) curves to find the size of the smallest defect that can be detected with a DCNN and evaluate how this size is affected by the choice of training data. We apply the proposed methodology to a model problem of defect detection in cylinders. Our results show that the exclusion of the scattering signal from the training data has the largest effect on the smallest detectable defects. Furthermore, we demonstrate that accurate inspection is more reliant on high-quality training data for images with a high quantity of scattering. We discuss how the presented methodology can be used for other tasks and objects.

2.An efficient deep learning model to categorize brain tumor using reconstruction and fine-tuning

Authors:Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Md. Alamgir Jalil Pramanik, Sunil Aryal, Muhammad Ali Abdulllah Almoyad, Khondokar Fida Hasan, Mohammad Ali Moni

Abstract: Brain tumors are among the most fatal and devastating diseases, often resulting in significantly reduced life expectancy. An accurate diagnosis of brain tumors is crucial to devise treatment plans that can extend the lives of affected individuals. Manually identifying and analyzing large volumes of MRI data is both challenging and time-consuming. Consequently, there is a pressing need for a reliable deep learning (DL) model to accurately diagnose brain tumors. In this study, we propose a novel DL approach based on transfer learning to effectively classify brain tumors. Our novel method incorporates extensive pre-processing, transfer learning architecture reconstruction, and fine-tuning. We employ several transfer learning algorithms, including Xception, ResNet50V2, InceptionResNetV2, and DenseNet201. Our experiments used the Figshare MRI brain tumor dataset, comprising 3,064 images, and achieved accuracy scores of 99.40%, 99.68%, 99.36%, and 98.72% for Xception, ResNet50V2, InceptionResNetV2, and DenseNet201, respectively. Our findings reveal that ResNet50V2 achieves the highest accuracy rate of 99.68% on the Figshare MRI brain tumor dataset, outperforming existing models. Therefore, our proposed model's ability to accurately classify brain tumors in a short timeframe can aid neurologists and clinicians in making prompt and precise diagnostic decisions for brain tumor patients.

3.RSA-INR: Riemannian Shape Autoencoding via 4D Implicit Neural Representations

Authors:Sven Dummer, Nicola Strisciuglio, Christoph Brune

Abstract: Shape encoding and shape analysis are valuable tools for comparing shapes and for dimensionality reduction. A specific framework for shape analysis is the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, which is capable of shape matching and dimensionality reduction. Researchers have recently introduced neural networks into this framework. However, these works can not match more than two objects simultaneously or have suboptimal performance in shape variability modeling. The latter limitation occurs as the works do not use state-of-the-art shape encoding methods. Moreover, the literature does not discuss the connection between the LDDMM Riemannian distance and the Riemannian geometry for deep learning literature. Our work aims to bridge this gap by demonstrating how LDDMM can integrate Riemannian geometry into deep learning. Furthermore, we discuss how deep learning solves and generalizes shape matching and dimensionality reduction formulations of LDDMM. We achieve both goals by designing a novel implicit encoder for shapes. This model extends a neural network-based algorithm for LDDMM-based pairwise registration, results in a nonlinear manifold PCA, and adds a Riemannian geometry aspect to deep learning models for shape variability modeling. Additionally, we demonstrate that the Riemannian geometry component improves the reconstruction procedure of the implicit encoder in terms of reconstruction quality and stability to noise. We hope our discussion paves the way to more research into how Riemannian geometry, shape/image analysis, and deep learning can be combined.

4.TSPTQ-ViT: Two-scaled post-training quantization for vision transformer

Authors:Yu-Shan Tai Andy, Ming-Guang Lin Andy, An-Yeu Andy, Wu

Abstract: Vision transformers (ViTs) have achieved remarkable performance in various computer vision tasks. However, intensive memory and computation requirements impede ViTs from running on resource-constrained edge devices. Due to the non-normally distributed values after Softmax and GeLU, post-training quantization on ViTs results in severe accuracy degradation. Moreover, conventional methods fail to address the high channel-wise variance in LayerNorm. To reduce the quantization loss and improve classification accuracy, we propose a two-scaled post-training quantization scheme for vision transformer (TSPTQ-ViT). We design the value-aware two-scaled scaling factors (V-2SF) specialized for post-Softmax and post-GeLU values, which leverage the bit sparsity in non-normal distribution to save bit-widths. In addition, the outlier-aware two-scaled scaling factors (O-2SF) are introduced to LayerNorm, alleviating the dominant impacts from outlier values. Our experimental results show that the proposed methods reach near-lossless accuracy drops (<0.5%) on the ImageNet classification task under 8-bit fully quantized ViTs.

5.A Fast and Accurate Optical Flow Camera for Resource-Constrained Edge Applications

Authors:Jonas Kühne, Michele Magno, Luca Benini

Abstract: Optical Flow (OF) is the movement pattern of pixels or edges that is caused in a visual scene by the relative motion between an agent and a scene. OF is used in a wide range of computer vision algorithms and robotics applications. While the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, it needs to be executed at low latency, especially in robotics applications. Therefore, OF estimation is today performed on powerful CPUs or GPUs to satisfy the stringent requirements in terms of execution speed for control and actuation. On-sensor hardware acceleration is a promising approach to enable low latency OF calculations and fast execution even on resource-constrained devices such as nano drones and AR/VR glasses and headsets. This paper analyzes the achievable accuracy, frame rate, and power consumption when using a novel optical flow sensor consisting of a global shutter camera with an Application Specific Integrated Circuit (ASIC) for optical flow computation. The paper characterizes the optical flow sensor in high frame-rate, low-latency settings, with a frame rate of up to 88 fps at the full resolution of 1124 by 1364 pixels and up to 240 fps at a reduced camera resolution of 280 by 336, for both classical camera images and optical flow data.

6.GSURE-Based Diffusion Model Training with Corrupted Data

Authors:Bahjat Kawar, Noam Elata, Tomer Michaeli, Michael Elad

Abstract: Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on corrupted data. We introduce a loss function based on the Generalized Stein's Unbiased Risk Estimator (GSURE), and prove that under some conditions, it is equivalent to the training objective used in fully supervised diffusion models. We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI), where the use of undersampled data significantly alleviates data collection costs. Our approach achieves generative performance comparable to its fully supervised counterpart without training on any clean signals. In addition, we deploy the resulting diffusion model in various downstream tasks beyond the degradation present in the training set, showcasing promising results.

7.Morphological Sampling Theorem and its Extension to Grey-value Images

Authors:Vivek Sridhar, Michael Breuß

Abstract: Sampling is a basic operation in image processing. In classic literature, a morphological sampling theorem has been established, which shows how sampling interacts by morphological operations with image reconstruction. Many aspects of morphological sampling have been investigated for binary images, but only some of them have been explored for grey-value imagery. With this paper, we make a step towards completion of this open matter. By relying on the umbra notion, we show how to transfer classic theorems in binary morphology about the interaction of sampling with the fundamental morphological operations dilation, erosion, opening and closing, to the grey-value setting. In doing this we also extend the theory relating the morphological operations and corresponding reconstructions to use of non-flat structuring elements. We illustrate the theoretical developments at hand of examples.

1.JOINEDTrans: Prior Guided Multi-task Transformer for Joint Optic Disc/Cup Segmentation and Fovea Detection

Authors:Huaqing He, Li Lin, Zhiyuan Cai, Pujin Cheng, Xiaoying Tang

Abstract: Deep learning-based image segmentation and detection models have largely improved the efficiency of analyzing retinal landmarks such as optic disc (OD), optic cup (OC), and fovea. However, factors including ophthalmic disease-related lesions and low image quality issues may severely complicate automatic OD/OC segmentation and fovea detection. Most existing works treat the identification of each landmark as a single task, and take into account no prior information. To address these issues, we propose a prior guided multi-task transformer framework for joint OD/OC segmentation and fovea detection, named JOINEDTrans. JOINEDTrans effectively combines various spatial features of the fundus images, relieving the structural distortions induced by lesions and other imaging issues. It contains a segmentation branch and a detection branch. To be noted, we employ an encoder pretrained in a vessel segmentation task to effectively exploit the positional relationship among vessel, OD/OC, and fovea, successfully incorporating spatial prior into the proposed JOINEDTrans framework. There are a coarse stage and a fine stage in JOINEDTrans. In the coarse stage, OD/OC coarse segmentation and fovea heatmap localization are obtained through a joint segmentation and detection module. In the fine stage, we crop regions of interest for subsequent refinement and use predictions obtained in the coarse stage to provide additional information for better performance and faster convergence. Experimental results demonstrate that JOINEDTrans outperforms existing state-of-the-art methods on the publicly available GAMMA, REFUGE, and PALM fundus image datasets. We make our code available at

2.Sim-to-Real Segmentation in Robot-assisted Transoral Tracheal Intubation

Authors:Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation.

3.A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

Authors:Xiyao Jin, Yao Hao, Jessica Hilliard, Zhehao Zhang, Maria A. Thomas, Hua Li, Abhinav K. Jha, Geoffrey D. Hugo

Abstract: To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework. As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.

4.Towards More Transparent and Accurate Cancer Diagnosis with an Unsupervised CAE Approach

Authors:Zahra Tabatabaei, Adrian Colomer, Javier Oliver Moll, Valery Naranjo

Abstract: Digital pathology has revolutionized cancer diagnosis by leveraging Content-Based Medical Image Retrieval (CBMIR) for analyzing histopathological Whole Slide Images (WSIs). CBMIR enables searching for similar content, enhancing diagnostic reliability and accuracy. In 2020, breast and prostate cancer constituted 11.7% and 14.1% of cases, respectively, as reported by the Global Cancer Observatory (GCO). The proposed Unsupervised CBMIR (UCBMIR) replicates the traditional cancer diagnosis workflow, offering a dependable method to support pathologists in WSI-based diagnostic conclusions. This approach alleviates pathologists' workload, potentially enhancing diagnostic efficiency. To address the challenge of the lack of labeled histopathological images in CBMIR, a customized unsupervised Convolutional Auto Encoder (CAE) was developed, extracting 200 features per image for the search engine component. UCBMIR was evaluated using widely-used numerical techniques in CBMIR, alongside visual evaluation and comparison with a classifier. The validation involved three distinct datasets, with an external evaluation demonstrating its effectiveness. UCBMIR outperformed previous studies, achieving a top 5 recall of 99% and 80% on BreaKHis and SICAPv2, respectively, using the first evaluation technique. Precision rates of 91% and 70% were achieved for BreaKHis and SICAPv2, respectively, using the second evaluation technique. Furthermore, UCBMIR demonstrated the capability to identify various patterns in patches, achieving an 81% accuracy in the top 5 when tested on an external image from Arvaniti.

1.BlindHarmony: "Blind" Harmonization for MR Images via Flow model

Authors:Hwihun Jeong, Heejoon Byun, Dong un Kang, Jongho Lee

Abstract: In MRI, images of the same contrast (e.g., T1) from the same subject can show noticeable differences when acquired using different hardware, sequences, or scan parameters. These differences in images create a domain gap that needs to be bridged by a step called image harmonization, in order to process the images successfully using conventional or deep learning-based image analysis (e.g., segmentation). Several methods, including deep learning-based approaches, have been proposed to achieve image harmonization. However, they often require datasets of multiple characteristics for deep learning training and may still be unsuccessful when applied to images of an unseen domain. To address this limitation, we propose a novel concept called "Blind Harmonization," which utilizes only target domain data for training but still has the capability of harmonizing unseen domain images. For the implementation of Blind Harmonization, we developed BlindHarmony using an unconditional flow model trained on target domain data. The harmonized image is optimized to have a correlation with the input source domain image while ensuring that the latent vector of the flow model is close to the center of the Gaussian. BlindHarmony was evaluated using simulated and real datasets and compared with conventional methods. BlindHarmony achieved a noticeable performance in both datasets, highlighting its potential for future use in clinical settings.

2.Transformer-based Variable-rate Image Compression with Region-of-interest Control

Authors:Chia-Hao Kao, Ying-Chieh Weng, Yi-Hsin Chen, Wei-Chen Chiu, Wen-Hsiao Peng

Abstract: This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we introduce prompt generation networks to condition the transformer-based autoencoder of compression. Our prompt generation networks generate content-adaptive tokens according to the input image, an ROI mask, and a rate parameter. The separation of the ROI mask and the rate parameter allows an intuitive way to achieve variable-rate and ROI coding simultaneously. Extensive experiments validate the effectiveness of our proposed method and confirm its superiority over the other competing methods.

3.Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation

Authors:Syed Samiul Alam, Samiul Based Shuvo, Shams Nafisa Ali, Fardeen Ahmed, Arbil Chakma, Yeong Min Jang

Abstract: Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii that can cause vision problems. Diagnosis is typically done through a clinical examination and imaging, but these methods can be complicated and costly, requiring trained personnel. To address this issue, we have created a benchmark study that evaluates the effectiveness of existing pre-trained networks using transfer learning techniques to detect OT from fundus images. Furthermore, we have also analysed the performance of transfer-learning based segmentation networks to segment lesions in the images. This research seeks to provide a guide for future researchers looking to utilise DL techniques and develop a cheap, automated, easy-to-use, and accurate diagnostic method. We have performed in-depth analysis of different feature extraction techniques in order to find the most optimal one for OT classification and segmentation of lesions. For classification tasks, we have evaluated pre-trained models such as VGG16, MobileNetV2, InceptionV3, ResNet50, and DenseNet121 models. Among them, MobileNetV2 outperformed all other models in terms of Accuracy (Acc), Recall, and F1 Score outperforming the second-best model, InceptionV3 by 0.7% higher Acc. However, DenseNet121 achieved the best result in terms of Precision, which was 0.1% higher than MobileNetv2. For the segmentation task, this work has exploited U-Net architecture. In order to utilize transfer learning the encoder block of the traditional U-Net was replaced by MobileNetV2, InceptionV3, ResNet34, and VGG16 to evaluate different architectures moreover two different two different loss functions (Dice loss and Jaccard loss) were exploited in order to find the most optimal one. The MobileNetV2/U-Net outperformed ResNet34 by 0.5% and 2.1% in terms of Acc and Dice Score, respectively when Jaccard loss function is employed during the training.

4.NODE-ImgNet: a PDE-informed effective and robust model for image denoising

Authors:Xinheng Xie, Yue Wu, Hao Ni, Cuiyu He

Abstract: Inspired by the traditional partial differential equation (PDE) approach for image denoising, we propose a novel neural network architecture, referred as NODE-ImgNet, that combines neural ordinary differential equations (NODEs) with convolutional neural network (CNN) blocks. NODE-ImgNet is intrinsically a PDE model, where the dynamic system is learned implicitly without the explicit specification of the PDE. This naturally circumvents the typical issues associated with introducing artifacts during the learning process. By invoking such a NODE structure, which can also be viewed as a continuous variant of a residual network (ResNet) and inherits its advantage in image denoising, our model achieves enhanced accuracy and parameter efficiency. In particular, our model exhibits consistent effectiveness in different scenarios, including denoising gray and color images perturbed by Gaussian noise, as well as real-noisy images, and demonstrates superiority in learning from small image datasets.

5.Skin Lesion Diagnosis Using Convolutional Neural Networks

Authors:Daniel Alonso Villanueva Nunez, Yongmin Li

Abstract: Cancerous skin lesions are one of the most common malignancies detected in humans, and if not detected at an early stage, they can lead to death. Therefore, it is crucial to have access to accurate results early on to optimize the chances of survival. Unfortunately, accurate results are typically obtained by highly trained dermatologists, who may not be accessible to many people, particularly in low-income and middle-income countries. Artificial Intelligence (AI) appears to be a potential solution to this problem, as it has proven to provide equal or even better diagnoses than healthcare professionals. This project aims to address the issue by collecting state-of-the-art techniques for image classification from various fields and implementing them. Some of these techniques include mixup, presizing, and test-time augmentation, among others. Three architectures were used for the implementation: DenseNet121, VGG16 with batch normalization, and ResNet50. The models were designed with two main purposes. First, to classify images into seven categories, including melanocytic nevus, melanoma, benign keratosis-like lesions, basal cell carcinoma, actinic keratoses and intraepithelial carcinoma, vascular lesions, and dermatofibroma. Second, to classify images into benign or malignant. The models were trained using a dataset of 8012 images, and their performance was evaluated using 2003 images. It's worth noting that this model is trained end-to-end, directly from the image to the labels, without the need for handcrafted feature extraction.

1.DeepMSS: Deep Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images

Authors:Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, Jinman Kim

Abstract: Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-Task Learning (MTL). However, existing deep survival models have difficulties in exploring out-of-tumor prognostic information (e.g., local lymph node metastasis and adjacent tissue invasions). In addition, existing deep survival models are underdeveloped in utilizing multi-modality images. Empirically-designed strategies were commonly adopted to fuse multi-modality information via fixed pre-designed networks. In this study, we propose a Deep Multi-modality Segmentation-to-Survival model (DeepMSS) for survival prediction from PET/CT images. Instead of adopting MTL, we propose a novel Segmentation-to-Survival Learning (SSL) strategy, where our DeepMSS is trained for tumor segmentation and survival prediction sequentially. This strategy enables the DeepMSS to initially focus on tumor regions and gradually expand its focus to include other prognosis-related regions. We also propose a data-driven strategy to fuse multi-modality image information, which realizes automatic optimization of fusion strategies based on training data during training and also improves the adaptability of DeepMSS to different training targets. Our DeepMSS is also capable of incorporating conventional radiomics features as an enhancement, where handcrafted features can be extracted from the DeepMSS-segmented tumor regions and cooperatively integrated into the DeepMSS's training and inference. Extensive experiments with two large clinical datasets show that our DeepMSS outperforms state-of-the-art survival prediction methods.

2.A robust multi-domain network for short-scanning amyloid PET reconstruction

Authors:Hyoung Suk Park, Young Jin Jeong, Kiwan Jeon

Abstract: This paper presents a robust multi-domain network designed to restore low-quality amyloid PET images acquired in a short period of time. The proposed method is trained on pairs of PET images from short (2 minutes) and standard (20 minutes) scanning times, sourced from multiple domains. Learning relevant image features between these domains with a single network is challenging. Our key contribution is the introduction of a mapping label, which enables effective learning of specific representations between different domains. The network, trained with various mapping labels, can efficiently correct amyloid PET datasets in multiple training domains and unseen domains, such as those obtained with new radiotracers, acquisition protocols, or PET scanners. Internal, temporal, and external validations demonstrate the effectiveness of the proposed method. Notably, for external validation datasets from unseen domains, the proposed method achieved comparable or superior results relative to methods trained with these datasets, in terms of quantitative metrics such as normalized root mean-square error and structure similarity index measure. Two nuclear medicine physicians evaluated the amyloid status as positive or negative for the external validation datasets, with accuracies of 0.970 and 0.930 for readers 1 and 2, respectively.

3.An Ensemble Deep Learning Approach for COVID-19 Severity Prediction Using Chest CT Scans

Authors:Sidra Aleem, Mayug Maniparambil, Suzanne Little, Noel O'Connor, Kevin McGuinness

Abstract: Chest X-rays have been widely used for COVID-19 screening; however, 3D computed tomography (CT) is a more effective modality. We present our findings on COVID-19 severity prediction from chest CT scans using the STOIC dataset. We developed an ensemble deep learning based model that incorporates multiple neural networks to improve predictions. To address data imbalance, we used slicing functions and data augmentation. We further improved performance using test time data augmentation. Our approach which employs a simple yet effective ensemble of deep learning-based models with strong test time augmentations, achieved results comparable to more complex methods and secured the fourth position in the STOIC2021 COVID-19 AI Challenge. Our code is available on online: at: baseline-finalphase-main.

4.Can Deep Learning Reliably Recognize Abnormality Patterns on Chest X-rays? A Multi-Reader Study Examining One Month of AI Implementation in Everyday Radiology Clinical Practice

Authors:Daniel Kvak, Anna Chromcová, Petra Ovesná, Jakub Dandár, Marek Biroš, Robert Hrubý, Daniel Dufek, Marija Pajdaković

Abstract: In this study, we developed a deep-learning-based automatic detection algorithm (DLAD, Carebot AI CXR) to detect and localize seven specific radiological findings (atelectasis (ATE), consolidation (CON), pleural effusion (EFF), pulmonary lesion (LES), subcutaneous emphysema (SCE), cardiomegaly (CMG), pneumothorax (PNO)) on chest X-rays (CXR). We collected 956 CXRs and compared the performance of the DLAD with that of six individual radiologists who assessed the images in a hospital setting. The proposed DLAD achieved high sensitivity (ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.905 (0.715-0.978), SCE 1.000 (0.366-1.000), CMG 0.837 (0.711-0.917), PNO 0.875 (0.538-0.986)), even when compared to the radiologists (LOWEST: ATE 0.000 (0.000-0.376), CON 0.182 (0.070-0.382), EFF 0.400 (0.302-0.506), LES 0.238 (0.103-0.448), SCE 0.000 (0.000-0.634), CMG 0.347 (0.228-0.486), PNO 0.375 (0.134-0.691), HIGHEST: ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.667 (0.456-0.830), SCE 1.000 (0.366-1.000), CMG 0.980 (0.896-0.999), PNO 0.875 (0.538-0.986)). The findings of the study demonstrate that the suggested DLAD holds potential for integration into everyday clinical practice as a decision support system, effectively mitigating the false negative rate associated with junior and intermediate radiologists.

5.CHMMOTv1 -- Cardiac and Hepatic Multi-Echo (T2*) MRI Images and Clinical Dataset for Iron Overload on Thalassemia Patients

Authors:Iraj Abedi, Maryam Zamanian, Hamidreza Bolhasani, Milad Jalilian

Abstract: Owing to the invasiveness and low accuracy of other tests, including biopsy and ferritin levels, magnetic resonance imaging (T2 and T2*-MRI) has been considered the standard test for patients with thalassemia (THM). Regarding deep learning networks in medical sciences for improving diagnosis and treatment purposes and the existence of minimal resources for them, we decided to provide a set of magnetic resonance images of the cardiac and hepatic organs. The dataset included 124 patients (67 women and 57 men) with a THM age range of (5-52) years. In addition, patients were divided into two groups: with follow-up (1-5 times) at time intervals of about (5-6) months and without follow-up. Also, T2* and, R2* values, the results of the cardiac and hepatic report (normal, mild, moderate, severe, and very severe), and laboratory tests including Ferritin, Bilirubin (D, and T), AST, ALT, and ALP levels were provided as an Excel file. This dataset CHMMOTv1) has been published in Mendeley Dataverse and is accessible through the web at:

6.PromptUNet: Toward Interactive Medical Image Segmentation

Authors:Junde Wu

Abstract: Prompt-based segmentation, also known as interactive segmentation, has recently become a popular approach in image segmentation. A well-designed prompt-based model called Segment Anything Model (SAM) has demonstrated its ability to segment a wide range of natural images, which has sparked a lot of discussion in the community. However, recent studies have shown that SAM performs poorly on medical images. This has motivated us to design a new prompt-based segmentation model specifically for medical image segmentation. In this paper, we combine the prompted-based segmentation paradigm with UNet, which is a widly-recognized successful architecture for medical image segmentation. We have named the resulting model PromptUNet. In order to adapt the real-world clinical use, we expand the existing prompt types in SAM to include novel Supportive Prompts and En-face Prompts. We have evaluated the capabilities of PromptUNet on 19 medical image segmentation tasks using a variety of image modalities, including CT, MRI, ultrasound, fundus, and dermoscopic images. Our results show that PromptUNet outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, including nnUNet, TransUNet, UNetr, MedSegDiff, and MSA. Code will be released at:

7.Evolving Tsukamoto Neuro Fuzzy Model for Multiclass Covid 19 Classification with Chest X Ray Images

Authors:Marziyeh Rezaei, Sevda Molani, Negar Firoozeh, Hossein Abbasi, Farzan Vahedifard, Maysam Orouskhani

Abstract: Du e to rapid population growth and the need to use artificial intelligence to make quick decisions, developing a machine learning-based disease detection model and abnormality identification system has greatly improved the level of medical diagnosis Since COVID-19 has become one of the most severe diseases in the world, developing an automatic COVID-19 detection framework helps medical doctors in the diagnostic process of disease and provides correct and fast results. In this paper, we propose a machine lear ning based framework for the detection of Covid 19. The proposed model employs a Tsukamoto Neuro Fuzzy Inference network to identify and distinguish Covid 19 disease from normal and pneumonia cases. While the traditional training methods tune the parameters of the neuro-fuzzy model by gradient-based algorithms and recursive least square method, we use an evolutionary-based optimization, the Cat swarm algorithm to update the parameters. In addition, six texture features extracted from chest X-ray images are give n as input to the model. Finally, the proposed model is conducted on the chest X-ray dataset to detect Covid 19. The simulation results indicate that the proposed model achieves an accuracy of 98.51%, sensitivity of 98.35%, specificity of 98.08%, and F1 score of 98.17%.

1.CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

Authors:Momina Liaqat Ali, Zunaira Rauf, Asifullah Khan, Anabia Sohail, Rafi Ullah, Jeonghwan Gwak

Abstract: Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.

2.Osteosarcoma Tumor Detection using Transfer Learning Models

Authors:Raisa Fairooz Meem, Khandaker Tabin Hasan

Abstract: The field of clinical image analysis has been applying transfer learning models increasingly due to their less computational complexity, better accuracy etc. These are pre-trained models that don't require to be trained from scratch which eliminates the necessity of large datasets. Transfer learning models are mostly used for the analysis of brain, breast, or lung images but other sectors such as bone marrow cell detection or bone cancer detection can also benefit from using transfer learning models, especially considering the lack of available large datasets for these tasks. This paper studies the performance of several transfer learning models for osteosarcoma tumour detection. Osteosarcoma is a type of bone cancer mostly found in the cells of the long bones of the body. The dataset consists of H&E stained images divided into 4 categories- Viable Tumor, Non-viable Tumor, Non-Tumor and Viable Non-viable. Both datasets were randomly divided into train and test sets following an 80-20 ratio. 80% was used for training and 20\% for test. 4 models are considered for comparison- EfficientNetB7, InceptionResNetV2, NasNetLarge and ResNet50. All these models are pre-trained on ImageNet. According to the result, InceptionResNetV2 achieved the highest accuracy (93.29%), followed by NasNetLarge (90.91%), ResNet50 (89.83%) and EfficientNetB7 (62.77%). It also had the highest precision (0.8658) and recall (0.8658) values among the 4 models.

3.Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks

Authors:Chongyu Qu, Tiezheng Zhang, Hualin Qiao, Jie Liu, Yucheng Tang, Alan Yuille, Zongwei Zhou

Abstract: Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation. We have created the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous refinement of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.

1.MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Authors:Abdul Rehman, Asifullah Khan

Abstract: Convolutional neural networks have made significant strides in medical image analysis in recent years. However, the local nature of the convolution operator inhibits the CNNs from capturing global and long-range interactions. Recently, Transformers have gained popularity in the computer vision community and also medical image segmentation. But scalability issues of self-attention mechanism and lack of the CNN like inductive bias have limited their adoption. In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer for medical image segmentation. The proposed hybrid decoder, also based on MaxViT-block, is designed to harness the power of convolution and self-attention mechanism at each decoding stage with minimal computational burden. The multi-axis self-attention in each decoder stage helps in differentiating between the object and background regions much more efficiently. The hybrid decoder block initially fuses the lower level features upsampled via transpose convolution, with skip-connection features coming from hybrid encoder, then fused features are refined using multi-axis attention mechanism. The proposed decoder block is repeated multiple times to accurately segment the nuclei regions. Experimental results on MoNuSeg dataset proves the effectiveness of the proposed technique. Our MaxViT-UNet outperformed the previous CNN only (UNet) and Transformer only (Swin-UNet) techniques by a large margin of 2.36% and 5.31% on Dice metric respectively.

2.Towards Automated COVID-19 Presence and Severity Classification

Authors:Dominik Müller, Niklas Schröter, Silvan Mertes, Fabio Hellmann, Miriam Elia, Wolfgang Reif, Bernhard Bauer, Elisabeth André, Frank Kramer

Abstract: COVID-19 presence classification and severity prediction via (3D) thorax computed tomography scans have become important tasks in recent times. Especially for capacity planning of intensive care units, predicting the future severity of a COVID-19 patient is crucial. The presented approach follows state-of-theart techniques to aid medical professionals in these situations. It comprises an ensemble learning strategy via 5-fold cross-validation that includes transfer learning and combines pre-trained 3D-versions of ResNet34 and DenseNet121 for COVID19 classification and severity prediction respectively. Further, domain-specific preprocessing was applied to optimize model performance. In addition, medical information like the infection-lung-ratio, patient age, and sex were included. The presented model achieves an AUC of 79.0% to predict COVID-19 severity, and 83.7% AUC to classify the presence of an infection, which is comparable with other currently popular methods. This approach is implemented using the AUCMEDI framework and relies on well-known network architectures to ensure robustness and reproducibility.

1.Color Deconvolution applied to Domain Adaptation in HER2 histopathological images

Authors:David Anglada-Rotger, Ferran Marqués, Montse Pardàs

Abstract: Breast cancer early detection is crucial for improving patient outcomes. The Institut Catal\`a de la Salut (ICS) has launched the DigiPatICS project to develop and implement artificial intelligence algorithms to assist with the diagnosis of cancer. In this paper, we propose a new approach for facing the color normalization problem in HER2-stained histopathological images of breast cancer tissue, posed as an style transfer problem. We combine the Color Deconvolution technique with the Pix2Pix GAN network to present a novel approach to correct the color variations between different HER2 stain brands. Our approach focuses on maintaining the HER2 score of the cells in the transformed images, which is crucial for the HER2 analysis. Results demonstrate that our final model outperforms the state-of-the-art image style transfer methods in maintaining the cell classes in the transformed images and is as effective as them in generating realistic images.

2.Unlocking the Potential of Medical Imaging with ChatGPT's Intelligent Diagnostics

Authors:Ayyub Alzahem, Shahid Latif, Wadii Boulila, Anis Koubaa

Abstract: Medical imaging is an essential tool for diagnosing various healthcare diseases and conditions. However, analyzing medical images is a complex and time-consuming task that requires expertise and experience. This article aims to design a decision support system to assist healthcare providers and patients in making decisions about diagnosing, treating, and managing health conditions. The proposed architecture contains three stages: 1) data collection and labeling, 2) model training, and 3) diagnosis report generation. The key idea is to train a deep learning model on a medical image dataset to extract four types of information: the type of image scan, the body part, the test image, and the results. This information is then fed into ChatGPT to generate automatic diagnostics. The proposed system has the potential to enhance decision-making, reduce costs, and improve the capabilities of healthcare providers. The efficacy of the proposed system is analyzed by conducting extensive experiments on a large medical image dataset. The experimental outcomes exhibited promising performance for automatic diagnosis through medical images.

3.Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain tumor images

Authors:Muhammad Usman Akbar, Wuhao Wang, Anders Eklund

Abstract: Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and diffusion models, using BRATS20 and BRATS21 datasets, to synthesize brain tumor images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are much more likely to memorize the training images, especially for small datasets. Researchers should be careful when using diffusion models for medical imaging, if the final goal is to share the synthetic images.

1.Deep Learning for Retrospective Motion Correction in MRI: A Comprehensive Review

Authors:Veronika Spieker, Hannah Eichhorn, Kerstin Hammernik, Daniel Rueckert, Christine Preibisch, Dimitrios C. Karampinos, Julia A. Schnabel

Abstract: Motion represents one of the major challenges in magnetic resonance imaging (MRI). Since the MR signal is acquired in frequency space, any motion of the imaged object leads to complex artefacts in the reconstructed image in addition to other MR imaging artefacts. Deep learning has been frequently proposed for motion correction at several stages of the reconstruction process. The wide range of MR acquisition sequences, anatomies and pathologies of interest, and motion patterns (rigid vs. deformable and random vs. regular) makes a comprehensive solution unlikely. To facilitate the transfer of ideas between different applications, this review provides a detailed overview of proposed methods for learning-based motion correction in MRI together with their common challenges and potentials. This review identifies differences and synergies in underlying data usage, architectures and evaluation strategies. We critically discuss general trends and outline future directions, with the aim to enhance interaction between different application areas and research fields.

2.Generating high-quality 3DMPCs by adaptive data acquisition and NeREF-based reflectance correction to facilitate efficient plant phenotyping

Authors:Pengyao Xie, Zhihong Ma, Ruiming Du, Mengqi Lv, Yutao Shen, Xuqi Lu, Jiangpeng Zhu, Haiyan Cen

Abstract: Non-destructive assessments of plant phenotypic traits using high-quality three-dimensional (3D) and multispectral data can deepen breeders' understanding of plant growth and allow them to make informed managerial decisions. However, subjective viewpoint selection and complex illumination effects under natural light conditions decrease the data quality and increase the difficulty of resolving phenotypic parameters. We proposed methods for adaptive data acquisition and reflectance correction respectively, to generate high-quality 3D multispectral point clouds (3DMPCs) of plants. In the first stage, we proposed an efficient next-best-view (NBV) planning method based on a novel UGV platform with a multi-sensor-equipped robotic arm. In the second stage, we eliminated the illumination effects by using the neural reference field (NeREF) to predict the digital number (DN) of the reference. We tested them on 6 perilla and 6 tomato plants, and selected 2 visible leaves and 4 regions of interest (ROIs) for each plant to assess the biomass and the chlorophyll content. For NBV planning, the average execution time for single perilla and tomato plant at a joint speed of 1.55 rad/s was 58.70 s and 53.60 s respectively. The whole-plant data integrity was improved by an average of 27% compared to using fixed viewpoints alone, and the coefficients of determination (R2) for leaf biomass estimation reached 0.99 and 0.92. For reflectance correction, the average root mean squared error of the reflectance spectra with hemisphere reference-based correction at different ROIs was 0.08 and 0.07 for perilla and tomato. The R2 of chlorophyll content estimation was 0.91 and 0.93 respectively when principal component analysis and Gaussian process regression were applied. Our approach is promising for generating high-quality 3DMPCs of plants under natural light conditions and facilitates accurate plant phenotyping.

3.Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models

Authors:Sojung Go, Younghoon Ji, Sang Jun Park, Soochahn Lee

Abstract: We introduce a new technique for generating retinal fundus images that have anatomically accurate vascular structures, using diffusion models. We generate artery/vein masks to create the vascular structure, which we then condition to produce retinal fundus images. The proposed method can generate high-quality images with more realistic vascular structures and can create a diverse range of images based on the strengths of the diffusion model. We present quantitative evaluations that demonstrate the performance improvement using our method for data augmentation on vessel segmentation and artery/vein classification. We also present Turing test results by clinical experts, showing that our generated images are difficult to distinguish with real images. We believe that our method can be applied to construct stand-alone datasets that are irrelevant of patient privacy.

4.Implicit Neural Networks with Fourier-Feature Inputs for Free-breathing Cardiac MRI Reconstruction

Authors:Johannes F. Kunz, Stefan Ruschke, Reinhard Heckel

Abstract: In this paper, we propose an approach for cardiac magnetic resonance imaging (MRI), which aims to reconstruct a real-time video of a beating heart from continuous highly under-sampled measurements. This task is challenging since the object to be reconstructed (the heart) is continuously changing during signal acquisition. To address this challenge, we represent the beating heart with an implicit neural network and fit the network so that the representation of the heart is consistent with the measurements. The network in the form of a multi-layer perceptron with Fourier-feature inputs acts as an effective signal prior and enables adjusting the regularization strength in both the spatial and temporal dimensions of the signal. We examine the proposed approach for 2D free-breathing cardiac real-time MRI in different operating regimes, i.e., for different image resolutions, slice thicknesses, and acquisition lengths. Our method achieves reconstruction quality on par with or slightly better than state-of-the-art untrained convolutional neural networks and superior image quality compared to a recent method that fits an implicit representation directly to Fourier-domain measurements. However, this comes at a higher computational cost. Our approach does not require any additional patient data or biosensors including electrocardiography, making it potentially applicable in a wide range of clinical scenarios.

5.Transformers for CT Reconstruction From Monoplanar and Biplanar Radiographs

Authors:Firas Khader, Gustav Müller-Franzes, Tianyu Han, Sven Nebelung, Christiane Kuhl, Johannes Stegmaier, Daniel Truhn

Abstract: Computed Tomography (CT) scans provide detailed and accurate information of internal structures in the body. They are constructed by sending x-rays through the body from different directions and combining this information into a three-dimensional volume. Such volumes can then be used to diagnose a wide range of conditions and allow for volumetric measurements of organs. In this work, we tackle the problem of reconstructing CT images from biplanar x-rays only. X-rays are widely available and even if the CT reconstructed from these radiographs is not a replacement of a complete CT in the diagnostic setting, it might serve to spare the patients from radiation where a CT is only acquired for rough measurements such as determining organ size. We propose a novel method based on the transformer architecture, by framing the underlying task as a language translation problem. Radiographs and CT images are first embedded into latent quantized codebook vectors using two different autoencoder networks. We then train a GPT model, to reconstruct the codebook vectors of the CT image, conditioned on the codebook vectors of the x-rays and show that this approach leads to realistic looking images. To encourage further research in this direction, we make our code publicly available on GitHub: XXX.

1.Deep Learning for Predicting Progression of Patellofemoral Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data and Symptomatic Assessments

Authors:Neslihan Bayramoglu, Martin Englund, Ida K. Haugen, Muneaki Ishijima, Simo Saarakkala

Abstract: In this study, we propose a novel framework that utilizes deep learning (DL) and attention mechanisms to predict the radiographic progression of patellofemoral osteoarthritis (PFOA) over a period of seven years. This study included subjects (1832 subjects, 3276 knees) from the baseline of the MOST study. PF joint regions-of-interest were identified using an automated landmark detection tool (BoneFinder) on lateral knee X-rays. An end-to-end DL method was developed for predicting PFOA progression based on imaging data in a 5-fold cross-validation setting. A set of baselines based on known risk factors were developed and analyzed using gradient boosting machine (GBM). Risk factors included age, sex, BMI and WOMAC score, and the radiographic osteoarthritis stage of the tibiofemoral joint (KL score). Finally, we trained an ensemble model using both imaging and clinical data. Among the individual models, the performance of our deep convolutional neural network attention model achieved the best performance with an AUC of 0.856 and AP of 0.431; slightly outperforming the deep learning approach without attention (AUC=0.832, AP= 0.4) and the best performing reference GBM model (AUC=0.767, AP= 0.334). The inclusion of imaging data and clinical variables in an ensemble model allowed statistically more powerful prediction of PFOA progression (AUC = 0.865, AP=0.447), although the clinical significance of this minor performance gain remains unknown. This study demonstrated the potential of machine learning models to predict the progression of PFOA using imaging and clinical variables. These models could be used to identify patients who are at high risk of progression and prioritize them for new treatments. However, even though the accuracy of the models were excellent in this study using the MOST dataset, they should be still validated using external patient cohorts in the future.

2.Uncertainty-Aware Semi-Supervised Learning for Prostate MRI Zonal Segmentation

Authors:Matin Hosseinzadeh, Anindo Saha, Joeran Bosma, Henkjan Huisman

Abstract: Quality of deep convolutional neural network predictions strongly depends on the size of the training dataset and the quality of the annotations. Creating annotations, especially for 3D medical image segmentation, is time-consuming and requires expert knowledge. We propose a novel semi-supervised learning (SSL) approach that requires only a relatively small number of annotations while being able to use the remaining unlabeled data to improve model performance. Our method uses a pseudo-labeling technique that employs recent deep learning uncertainty estimation models. By using the estimated uncertainty, we were able to rank pseudo-labels and automatically select the best pseudo-annotations generated by the supervised model. We applied this to prostate zonal segmentation in T2-weighted MRI scans. Our proposed model outperformed the semi-supervised model in experiments with the ProstateX dataset and an external test set, by leveraging only a subset of unlabeled data rather than the full collection of 4953 cases, our proposed model demonstrated improved performance. The segmentation dice similarity coefficient in the transition zone and peripheral zone increased from 0.835 and 0.727 to 0.852 and 0.751, respectively, for fully supervised model and the uncertainty-aware semi-supervised learning model (USSL). Our USSL model demonstrates the potential to allow deep learning models to be trained on large datasets without requiring full annotation. Our code is available at

3.Image Segmentation For Improved Lossless Screen Content Compression

Authors:Shabhrish Reddy Uddehal, Tilo Strutz, Hannah Och, André Kaup

Abstract: In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed. One way to achieve this goal is to separate synthetic and natural regions. This paper proposes a segmentation method that identifies natural regions enabling better adaptive treatment. It supplements a compression method known as Soft Context Formation (SCF) and operates as a pre-processing step. If at least one natural segment is found within the SCI, it is split into two sub images (natural and synthetic parts), and the process of modelling and coding is performed separately for both. For SCIs with natural regions, the proposed method achieves a bit-rate reduction of up to 11.6% and 1.52% with respect to HEVC and the previous version of the SCF.

4.Self-Supervised Federated Learning for Fast MR Imaging

Authors:Juan Zou, Cheng Li, Ruoyou Wu, Tingrui Pei, Hairong Zheng, Shanshan Wang

Abstract: Federated learning (FL) based magnetic resonance (MR) image reconstruction can facilitate learning valuable priors from multi-site institutions without violating patient's privacy for accelerating MR imaging. However, existing methods rely on fully sampled data for collaborative training of the model. The client that only possesses undersampled data can neither participate in FL nor benefit from other clients. Furthermore, heterogeneous data distributions hinder FL from training an effective deep learning reconstruction model and thus cause performance degradation. To address these issues, we propose a Self-Supervised Federated Learning method (SSFedMRI). SSFedMRI explores the physics-based contrastive reconstruction networks in each client to realize cross-site collaborative training in the absence of fully sampled data. Furthermore, a personalized soft update scheme is designed to simultaneously capture the global shared representations among different centers and maintain the specific data distribution of each client. The proposed method is evaluated on four datasets and compared to the latest state-of-the-art approaches. Experimental results demonstrate that SSFedMRI possesses strong capability in reconstructing accurate MR images both visually and quantitatively on both in-distribution and out-of-distribution datasets.

5.Multiclass MRI Brain Tumor Segmentation using 3D Attention-based U-Net

Authors:Maryann M. Gitonga

Abstract: This paper proposes a 3D attention-based U-Net architecture for multi-region segmentation of brain tumors using a single stacked multi-modal volume created by combining three non-native MRI volumes. The attention mechanism added to the decoder side of the U-Net helps to improve segmentation accuracy by de-emphasizing healthy tissues and accentuating malignant tissues, resulting in better generalization power and reduced computational resources. The method is trained and evaluated on the BraTS 2021 Task 1 dataset, and demonstrates improvement of accuracy over other approaches. My findings suggest that the proposed approach has potential to enhance brain tumor segmentation using multi-modal MRI data, contributing to better understanding and diagnosis of brain diseases. This work highlights the importance of combining multiple imaging modalities and incorporating attention mechanisms for improved accuracy in brain tumor segmentation.

1.Trustworthy Multi-phase Liver Tumor Segmentation via Evidence-based Uncertainty

Authors:Chuanfei Hu, Tianyi Xia, Ying Cui, Quchen Zou, Yuancheng Wang, Wenbo Xiao, Shenghong Ju, Xinde Li

Abstract: Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performances of existing multi-phase liver tumor segmentation (MPLiTS)-based methods suffer from redundancy and weak interpretability, % of the fused result, resulting in the implicit unreliability of clinical applications. In this paper, we propose a novel trustworthy multi-phase liver tumor segmentation (TMPLiTS), which is a unified framework jointly conducting segmentation and uncertainty estimation. The trustworthy results could assist the clinicians to make a reliable diagnosis. Specifically, Dempster-Shafer Evidence Theory (DST) is introduced to parameterize the segmentation and uncertainty as evidence following Dirichlet distribution. The reliability of segmentation results among multi-phase CECT images is quantified explicitly. Meanwhile, a multi-expert mixture scheme (MEMS) is proposed to fuse the multi-phase evidences, which can guarantee the effect of fusion procedure based on theoretical analysis. Experimental results demonstrate the superiority of TMPLiTS compared with the state-of-the-art methods. Meanwhile, the robustness of TMPLiTS is verified, where the reliable performance can be guaranteed against the perturbations.

2.Trans-Inpainter: A Transformer Model for High Accuracy Image Inpainting from Channel State Information

Authors:Cheng Chen, Shoki Ohta, Takayuki Nishio, Mehdi Bennis, Jihong Park, Mohamed Wahib

Abstract: Radio Frequency (RF) signal-based multimodal image inpainting has recently emerged as a promising paradigm to enhance the capability of distortion-free image restoration by integrating wireless and visual information from the identical physical environment and has potential applications in fields like security and surveillance systems. In this paper, we aim to implement an RF-based image inpainting system that enables image restoration in a complex environment while maintaining high robustness and accuracy. This requires accurately converting RF signals into meaningful visual information and overcoming the challenges of RF signals in complex environments, such as multipath interference, signal attenuation, and noise. To tackle this problem, we propose Trans-Inpainter, a novel image inpainting method that utilizes the Channel State Information (CSI) of WiFi signals in combination with transformer networks to generate high-quality reconstructed images. This approach is the first to use CSI for image inpainting, which allows for extracting visual information from WiFi signals to fill in missing regions in images. To further improve Trans-Inpainter's performance, we investigate the impact of variations in CSI data on RF-based imaging ability, i.e., analyzing how the location of the CSI sensors, the combination of CSI from different sensors, and changes in temporal or frequency dimensions of CSI matrix affect the imaging quality. We compare the performance of Trans-Inpainter with RF-Inpainter, the state-of-the-art technology for RF-based multimodal image inpainting, under more realistic experimental scenarios, and with single-modality image inpainting models when only RF or image data is available, respectively. The results show that Trans-Inpainter outperforms other baseline methods in all cases.

3.Echo from noise: synthetic ultrasound image generation using diffusion models for real image segmentation

Authors:David Stojanovski, Uxio Hermida, Pablo Lamata, Arian Beqiri, Alberto Gomez

Abstract: We propose a novel pipeline for the generation of synthetic images via Denoising Diffusion Probabilistic Models (DDPMs) guided by cardiac ultrasound semantic label maps. We show that these synthetic images can serve as a viable substitute for real data in the training of deep-learning models for medical image analysis tasks such as image segmentation. To demonstrate the effectiveness of this approach, we generated synthetic 2D echocardiography images and trained a neural network for segmentation of the left ventricle and left atrium. The performance of the network trained on exclusively synthetic images was evaluated on an unseen dataset of real images and yielded mean Dice scores of 88.5 $\pm 6.0$ , 92.3 $\pm 3.9$, 86.3 $\pm 10.7$ \% for left ventricular endocardial, epicardial and left atrial segmentation respectively. This represents an increase of $9.09$, $3.7$ and $15.0$ \% in Dice scores compared to the previous state-of-the-art. The proposed pipeline has the potential for application to a wide range of other tasks across various medical imaging modalities.

4.Bone Marrow Cytomorphology Cell Detection using InceptionResNetV2

Authors:Raisa Fairooz Meem, Khandaker Tabin Hasan

Abstract: Critical clinical decision points in haematology are influenced by the requirement of bone marrow cytology for a haematological diagnosis. Bone marrow cytology, however, is restricted to reference facilities with expertise, and linked to inter-observer variability which requires a long time to process that could result in a delayed or inaccurate diagnosis, leaving an unmet need for cutting-edge supporting technologies. This paper presents a novel transfer learning model for Bone Marrow Cell Detection to provide a solution to all the difficulties faced for the task along with considerable accuracy. The proposed model achieved 96.19\% accuracy which can be used in the future for analysis of other medical images in this domain.

5.Improved Screen Content Coding in VVC Using Soft Context Formation

Authors:Hannah Och Friedrich-Alexander Universität Erlangen-Nürnberg, Shabhrish Reddy Uddehal Friedrich-Alexander Universität Erlangen-Nürnberg Hochschule für angewandte Wissenschaften Coburg, Tilo Strutz Hochschule für angewandte Wissenschaften Coburg, André Kaup Friedrich-Alexander Universität Erlangen-Nürnberg

Abstract: Screen content images (SCIs) often contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas as well as repeating colors and patterns. In the Versatile Video Coding (VVC) standard, these properties are largely exploited using Intra Block Copy and Palette Mode. However, the Soft Context Formation (SCF) coder, a pixel-wise lossless coder for SCIs based on pattern matching and entropy coding, outperforms the VVC in very synthetic image areas even when compared to the lossy VVC. In this paper, we propose an enhanced VVC coding approach for SCIs using Soft Context Formation. First, the image is separated into two distinct layers in a block-wise manner using a learning-based method with 4 block features. Highly synthetic image parts are coded losslessly using the SCF coder, whereas the rest of the image is coded using VVC. The SCF coder is further modified to incorporate information gained by the decoded VVC layer when encoding the SCF layer. Using this approach, we achieve BD-rate gains of 4.15% on average on the evaluated data sets when compared to VVC.

6.Multiscale Augmented Normalizing Flows for Image Compression

Authors:Marc Windsheimer, Fabian Brand, André Kaup

Abstract: Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of the encoding transform. This issue can be resolved by using invertible latent variable models, which allow a perfect reconstruction if no quantization is performed. Furthermore, many traditional image and video coders apply dynamic block partitioning to vary the compression of certain image regions depending on their content. Inspired by this approach, hierarchical latent spaces have been applied to learning-based compression networks. In this paper, we present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model. Our best performing model achieved average rate savings of more than 7% over comparable single-scale models.

1.Multi-Scale Energy (MuSE) plug and play framework for inverse problems

Authors:Jyothi Rikhab Chand, Mathews Jacob

Abstract: We introduce a multi-scale energy formulation for plug and play (PnP) image recovery. The main highlight of the proposed framework is energy formulation, where the log prior of the distribution is learned by a convolutional neural network (CNN) module. The energy formulation enables us to introduce optimization algorithms with guaranteed convergence, even when the CNN module is not constrained as a contraction. Current PnP methods, which do not often have well-defined energy formulations, require a contraction constraint that restricts their performance in challenging applications. The energy and the corresponding score function are learned from reference data using denoising score matching, where the noise variance serves as a smoothness parameter that controls the shape of the learned energy function. We introduce a multi-scale optimization strategy, where a sequence of smooth approximations of the true prior is used in the optimization process. This approach improves the convergence of the algorithm to the global minimum, which translates to improved performance. The preliminary results in the context of MRI show that the multi-scale energy PnP framework offers comparable performance to unrolled algorithms. Unlike unrolled methods, the proposed PnP approach can work with arbitrary forward models, making it an easier option for clinical deployment. In addition, the training of the proposed model is more efficient from a memory and computational perspective, making it attractive in large-scale (e.g., 4D) settings.

2.Compressed Video Quality Assessment for Super-Resolution: a Benchmark and a Quality Metric

Authors:Evgeney Bogatyrev, Ivan Molodetskikh, Dmitriy Vatolin

Abstract: We developed a super-resolution (SR) benchmark to analyze SR's capacity to upscale compressed videos. Our dataset employed video codecs based on five compression standards: H.264, H.265, H.266, AV1, and AVS3. We assessed 17 state-ofthe-art SR models using our benchmark and evaluated their ability to preserve scene context and their susceptibility to compression artifacts. To get an accurate perceptual ranking of SR models, we conducted a crowd-sourced side-by-side comparison of their outputs. The benchmark is publicly available at We also analyzed benchmark results and developed an objective-quality-assessment metric based on the current bestperforming objective metrics. Our metric outperforms others, according to Spearman correlation with subjective scores for compressed video upscaling. It is publicly available at

1.Dynamic DH-MBIR for Phase-Error Estimation from Streaming Digital-Holography Data

Authors:Ali G. Sheikh, Casey J. Pellizzari, Sherman J. Kisner, Gregery T. Buzzard, Charles A. Bouman

Abstract: Directed energy applications require the estimation of digital-holographic (DH) phase errors due to atmospheric turbulence in order to accurately focus the outgoing beam. These phase error estimates must be computed with very low latency to keep pace with changing atmospheric parameters, which requires that phase errors be estimated in a single shot of DH data. The digital holography model-based iterative reconstruction (DH-MBIR) algorithm is capable of accurately estimating phase errors in a single shot using the expectation maximization (EM) algorithm. However, existing implementations of DH-MBIR require hundreds of iterations, which is not practical for real-time applications. In this paper, we present the Dynamic DH-MBIR (DDH-MBIR) algorithm for estimating isoplanatic phase errors from streaming single-shot data with extremely low latency. The Dynamic DH-MBIR algorithm reduces the computation and latency by orders of magnitude relative to conventional DH-MBIR, making real-time throughput and latency feasible in applications. Using simulated data that models frozen flow of atmospheric turbulence, we show that our algorithm can achieve a consistently high Strehl ratio with realistic simulation parameters using only 1 iteration per timestep.

2.WWFedCBMIR: World-Wide Federated Content-Based Medical Image Retrieval

Authors:Zahra Tabatabaei, Yuandou Wang, Adrián Colomer, Javier Oliver Moll, Zhiming Zhao, Valery Naranjo

Abstract: The paper proposes a Federated Content-Based Medical Image Retrieval (FedCBMIR) platform that utilizes Federated Learning (FL) to address the challenges of acquiring a diverse medical data set for training CBMIR models. CBMIR assists pathologists in diagnosing breast cancer more rapidly by identifying similar medical images and relevant patches in prior cases compared to traditional cancer detection methods. However, CBMIR in histopathology necessitates a pool of Whole Slide Images (WSIs) to train to extract an optimal embedding vector that leverages search engine performance, which may not be available in all centers. The strict regulations surrounding data sharing in medical data sets also hinder research and model development, making it difficult to collect a rich data set. The proposed FedCBMIR distributes the model to collaborative centers for training without sharing the data set, resulting in shorter training times than local training. FedCBMIR was evaluated in two experiments with three scenarios on BreaKHis and Camelyon17 (CAM17). The study shows that the FedCBMIR method increases the F1-Score (F1S) of each client to 98%, 96%, 94%, and 97% in the BreaKHis experiment with a generalized model of four magnifications and does so in 6.30 hours less time than total local training. FedCBMIR also achieves 98% accuracy with CAM17 in 2.49 hours less training time than local training, demonstrating that our FedCBMIR is both fast and accurate for both pathologists and engineers. In addition, our FedCBMIR provides similar images with higher magnification for non-developed countries where participate in the worldwide FedCBMIR with developed countries to facilitate mitosis measuring in breast cancer diagnosis. We evaluate this scenario by scattering BreaKHis into four centers with different magnifications.

3.AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions

Authors:Jiaming Guo, Xueyi Zou, Yuyi Chen, Yi Liu, Jia Hao, Jianzhuang Liu, Youliang Yan

Abstract: In recent years, videos and images in 720p (HD), 1080p (FHD) and 4K (UHD) resolution have become more popular for display devices such as TVs, mobile phones and VR. However, these high resolution images cannot achieve the expected visual effect due to the limitation of the internet bandwidth, and bring a great challenge for super-resolution networks to achieve real-time performance. Following this challenge, we explore multiple efficient network designs, such as pixel-unshuffle, repeat upscaling, and local skip connection removal, and propose a fast and lightweight super-resolution network. Furthermore, by analyzing the applications of the idea of divide-and-conquer in super-resolution, we propose assembled convolutions which can adapt convolution kernels according to the input features. Experiments suggest that our method outperforms all the state-of-the-art efficient super-resolution models, and achieves optimal results in terms of runtime and quality. In addition, our method also wins the first place in NTIRE 2023 Real-Time Super-Resolution - Track 1 ($\times$2). The code will be available at

4.Domain-agnostic segmentation of thalamic nuclei from joint structural and diffusion MRI

Authors:Henry F. J. Tregidgo, Sonja Soskic, Mark D. Olchanyi, Juri Althonayan, Benjamin Billot, Chiara Maffei, Polina Golland, Anastasia Yendiki, Daniel C. Alexander, Martina Bocchetta, Jonathan D. Rohrer, Juan Eugenio Iglesias

Abstract: The human thalamus is a highly connected subcortical grey-matter structure within the brain. It comprises dozens of nuclei with different function and connectivity, which are affected differently by disease. For this reason, there is growing interest in studying the thalamic nuclei in vivo with MRI. Tools are available to segment the thalamus from 1 mm T1 scans, but the contrast of the lateral and internal boundaries is too faint to produce reliable segmentations. Some tools have attempted to incorporate information from diffusion MRI in the segmentation to refine these boundaries, but do not generalise well across diffusion MRI acquisitions. Here we present the first CNN that can segment thalamic nuclei from T1 and diffusion data of any resolution without retraining or fine tuning. Our method builds on a public histological atlas of the thalamic nuclei and silver standard segmentations on high-quality diffusion data obtained with a recent Bayesian adaptive segmentation tool. We combine these with an approximate degradation model for fast domain randomisation during training. Our CNN produces a segmentation at 0.7 mm isotropic resolution, irrespective of the resolution of the input. Moreover, it uses a parsimonious model of the diffusion signal at each voxel (fractional anisotropy and principal eigenvector) that is compatible with virtually any set of directions and b-values, including huge amounts of legacy data. We show results of our proposed method on three heterogeneous datasets acquired on dozens of different scanners. An implementation of the method is publicly available at

5.Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising

Authors:Elvira Fleig, Erik Bochinski, Thomas Sikora

Abstract: Research in the past years introduced Steered Mixture-of-Experts (SMoE) as a framework to form sparse, edge-aware models for 2D- and higher dimensional pixel data, applicable to compression, denoising, and beyond, and capable to compete with state-of-the-art compression methods. To circumvent the computationally demanding, iterative optimization method used in prior works an autoencoder design is introduced that reduces the run-time drastically while simultaneously improving reconstruction quality for block-based SMoE approaches. Coupling a deep encoder network with a shallow, parameter-free SMoE decoder enforces an efficent and explainable latent representation. Our initial work on the autoencoder design presented a simple model, with limited applicability to compression and beyond. In this paper, we build on the foundation of the first autoencoder design and improve the reconstruction quality by expanding it to models of higher complexity and different block sizes. Furthermore, we improve the noise robustness of the autoencoder for SMoE denoising applications. Our results reveal that the newly adapted autoencoders allow ultra-fast estimation of parameters for complex SMoE models with excellent reconstruction quality, both for noise free input and under severe noise. This enables the SMoE image model framework for a wide range of image processing applications, including compression, noise reduction, and super-resolution.

6.Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

Authors:Iris de Gélis Magellium - Toulouse - France IRISA UMR 6074 Université Bretagne Sud - Vannes - France, Sudipan Saha Yardi School of Artificial Intelligence Indian Institute of Technology Delhi - New Delhi - India, Muhammad Shahzad Technical University of Munich, Thomas Corpetti CNRS LETG UMR 6554 - Rennes - France, Sébastien Lefèvre IRISA UMR 6074 Université Bretagne Sud - Vannes - France, Xiao Xiang Zhu Technical University of Munich

Abstract: Change detection from traditional optical images has limited capability to model the changes in the height or shape of objects. Change detection using 3D point cloud aerial LiDAR survey data can fill this gap by providing critical depth information. While most existing machine learning based 3D point cloud change detection methods are supervised, they severely depend on the availability of annotated training data, which is in practice a critical point. To circumnavigate this dependence, we propose an unsupervised 3D point cloud change detection method mainly based on self-supervised learning using deep clustering and contrastive learning. The proposed method also relies on an adaptation of deep change vector analysis to 3D point cloud via nearest point comparison. Experiments conducted on a publicly available real dataset show that the proposed method obtains higher performance in comparison to the traditional unsupervised methods, with a gain of about 9% in mean accuracy (to reach more than 85%). Thus, it appears to be a relevant choice in scenario where prior knowledge (labels) is not ensured.

7.Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review

Authors:Chuang Zhu, Shengjie Liu, Feng Xu, Zekuan Yu, Arpit Aggarwal, Germán Corredor, Anant Madabhushi, Qixun Qu, Hongwei Fan, Fangda Li, Yueheng Li, Xianchao Guan, Yongbing Zhang, Vivek Kumar Singh, Farhan Akram, Md. Mostafa Kamal Sarker, Zhongyue Shi, Mulan Jin

Abstract: For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from hematoxylin and eosin (H&E) stained images is a valuable research direction. Therefore, we held the breast cancer immunohistochemical image generation challenge, aiming to explore novel ideas of deep learning technology in pathological image generation and promote research in this field. The challenge provided registered H&E and IHC-stained image pairs, and participants were required to use these images to train a model that can directly generate IHC-stained images from corresponding H&E-stained images. We selected and reviewed the five highest-ranking methods based on their PSNR and SSIM metrics, while also providing overviews of the corresponding pipelines and implementations. In this paper, we further analyze the current limitations in the field of breast cancer immunohistochemical image generation and forecast the future development of this field. We hope that the released dataset and the challenge will inspire more scholars to jointly study higher-quality IHC-stained image generation.

8.Segmentation of fundus vascular images based on a dual-attention mechanism

Authors:Yuanyuan Peng, Pengpeng Luan, Zixu Zhang

Abstract: Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases. However, significant light variations and non-uniform contrast in these images make segmentation quite challenging. Thus, this paper employ an attention fusion mechanism that combines the channel attention and spatial attention mechanisms constructed by Transformer to extract information from retinal fundus images in both spatial and channel dimensions. To eliminate noise from the encoder image, a spatial attention mechanism is introduced in the skip connection. Moreover, a Dropout layer is employed to randomly discard some neurons, which can prevent overfitting of the neural network and improve its generalization performance. Experiments were conducted on publicly available datasets DERIVE, STARE, and CHASEDB1. The results demonstrate that our method produces satisfactory results compared to some recent retinal fundus image segmentation algorithms.

9.How Segment Anything Model (SAM) Boost Medical Image Segmentation?

Authors:Yichi Zhang, Rushi Jiao

Abstract: Due to the flexibility of prompting, foundation models have become the dominant force in the domains of natural language processing and image generation. With the recent introduction of the Segment Anything Model (SAM), the prompt-driven paradigm has entered the realm of image segmentation, bringing with a range of previously unexplored capabilities. However, it remains unclear whether it can be applicable to medical image segmentation due to the significant differences between natural images and medical images. In this report, we summarize recent efforts to extend the success of SAM to medical image segmentation tasks, including both empirical benchmarking and methodological adaptations, and discuss potential future directions for SAM in medical image segmentation. We also set up a collection of literature reviews to boost the research on this topic at

1.Conditional and Residual Methods in Scalable Coding for Humans and Machines

Authors:Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Bajić

Abstract: We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.

2.Semantically Structured Image Compression via Irregular Group-Based Decoupling

Authors:Ruoyu Feng, Yixin Gao, Xin Jin, Runsen Feng, Zhibo Chen

Abstract: Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.

3."Seeing'' Electric Network Frequency from Events

Authors:Lexuan Xu, Guang Hua, Haijian Zhang, Lei Yu, Ning Qiao

Abstract: Most of the artificial lights fluctuate in response to the grid's alternating current and exhibit subtle variations in terms of both intensity and spectrum, providing the potential to estimate the Electric Network Frequency (ENF) from conventional frame-based videos. Nevertheless, the performance of Video-based ENF (V-ENF) estimation largely relies on the imaging quality and thus may suffer from significant interference caused by non-ideal sampling, motion, and extreme lighting conditions. In this paper, we show that the ENF can be extracted without the above limitations from a new modality provided by the so-called event camera, a neuromorphic sensor that encodes the light intensity variations and asynchronously emits events with extremely high temporal resolution and high dynamic range. Specifically, we first formulate and validate the physical mechanism for the ENF captured in events, and then propose a simple yet robust Event-based ENF (E-ENF) estimation method through mode filtering and harmonic enhancement. Furthermore, we build an Event-Video ENF Dataset (EV-ENFD) that records both events and videos in diverse scenes. Extensive experiments on EV-ENFD demonstrate that our proposed E-ENF method can extract more accurate ENF traces, outperforming the conventional V-ENF by a large margin, especially in challenging environments with object motions and extreme lighting conditions. The code and dataset are available at

4.Neuralizer: General Neuroimage Analysis without Re-Training

Authors:Steffen Czolbe, Adrian V. Dalca

Abstract: Neuroimage processing tasks like segmentation, reconstruction, and registration are central to the study of neuroscience. Robust deep learning strategies and architectures used to solve these tasks are often similar. Yet, when presented with a new task or a dataset with different visual characteristics, practitioners most often need to train a new model, or fine-tune an existing one. This is a time-consuming process that poses a substantial barrier for the thousands of neuroscientists and clinical researchers who often lack the resources or machine-learning expertise to train deep learning models. In practice, this leads to a lack of adoption of deep learning, and neuroscience tools being dominated by classical frameworks. We introduce Neuralizer, a single model that generalizes to previously unseen neuroimaging tasks and modalities without the need for re-training or fine-tuning. Tasks do not have to be known a priori, and generalization happens in a single forward pass during inference. The model can solve processing tasks across multiple image modalities, acquisition methods, and datasets, and generalize to tasks and modalities it has not been trained on. Our experiments on coronal slices show that when few annotated subjects are available, our multi-task network outperforms task-specific baselines without training on the task.

5.Expanding Synthetic Real-World Degradations for Blind Video Super Resolution

Authors:Mehran Jeelani, Sadbhawna, Noshaba Cheema, Klaus Illgner-Fehns, Philipp Slusallek, Sunil Jaiswal

Abstract: Video super-resolution (VSR) techniques, especially deep-learning-based algorithms, have drastically improved over the last few years and shown impressive performance on synthetic data. However, their performance on real-world video data suffers because of the complexity of real-world degradations and misaligned video frames. Since obtaining a synthetic dataset consisting of low-resolution (LR) and high-resolution (HR) frames are easier than obtaining real-world LR and HR images, in this paper, we propose synthesizing real-world degradations on synthetic training datasets. The proposed synthetic real-world degradations (SRWD) include a combination of the blur, noise, downsampling, pixel binning, and image and video compression artifacts. We then propose using a random shuffling-based strategy to simulate these degradations on the training datasets and train a single end-to-end deep neural network (DNN) on the proposed larger variation of realistic synthesized training data. Our quantitative and qualitative comparative analysis shows that the proposed training strategy using diverse realistic degradations improves the performance by 7.1 % in terms of NRQM compared to RealBasicVSR and by 3.34 % compared to BSRGAN on the VideoLQ dataset. We also introduce a new dataset that contains high-resolution real-world videos that can serve as a common ground for bench-marking.

6.Using Spatio-Temporal Dual-Stream Network with Self-Supervised Learning for Lung Tumor Classification on Radial Probe Endobronchial Ultrasound Video

Authors:Ching-Kai Lin, Chin-Wen Chen, Yun-Chien Cheng

Abstract: The purpose of this study is to develop a computer-aided diagnosis system for classifying benign and malignant lung lesions, and to assist physicians in real-time analysis of radial probe endobronchial ultrasound (EBUS) videos. During the biopsy process of lung cancer, physicians use real-time ultrasound images to find suitable lesion locations for sampling. However, most of these images are difficult to classify and contain a lot of noise. Previous studies have employed 2D convolutional neural networks to effectively differentiate between benign and malignant lung lesions, but doctors still need to manually select good-quality images, which can result in additional labor costs. In addition, the 2D neural network has no ability to capture the temporal information of the ultrasound video, so it is difficult to obtain the relationship between the features of the continuous images. This study designs an automatic diagnosis system based on a 3D neural network, uses the SlowFast architecture as the backbone to fuse temporal and spatial features, and uses the SwAV method of contrastive learning to enhance the noise robustness of the model. The method we propose includes the following advantages, such as (1) using clinical ultrasound films as model input, thereby reducing the need for high-quality image selection by physicians, (2) high-accuracy classification of benign and malignant lung lesions can assist doctors in clinical diagnosis and reduce the time and risk of surgery, and (3) the capability to classify well even in the presence of significant image noise. The AUC, accuracy, precision, recall and specificity of our proposed method on the validation set reached 0.87, 83.87%, 86.96%, 90.91% and 66.67%, respectively. The results have verified the importance of incorporating temporal information and the effectiveness of using the method of contrastive learning on feature extraction.

7.Spatial and Modal Optimal Transport for Fast Cross-Modal MRI Reconstruction

Authors:Qi Wang, Zhijie Wen, Jun Shi, Qian Wang, Dinggang Shen, Shihui Ying

Abstract: Multi-modal Magnetic Resonance Imaging (MRI) plays an important role in clinical medicine. However, the acquisitions of some modalities, such as the T2-weighted modality, need a long time and they are always accompanied by motion artifacts. On the other hand, the T1-weighted image (T1WI) shares the same underlying information with T2-weighted image (T2WI), which needs a shorter scanning time. Therefore, in this paper we accelerate the acquisition of the T2WI by introducing the auxiliary modality (T1WI). Concretely, we first reconstruct high-quality T2WIs with under-sampled T2WIs. Here, we realize fast T2WI reconstruction by reducing the sampling rate in the k-space. Second, we establish a cross-modal synthesis task to generate the synthetic T2WIs for guiding better T2WI reconstruction. Here, we obtain the synthetic T2WIs by decomposing the whole cross-modal generation mapping into two OT processes, the spatial alignment mapping on the T1 image manifold and the cross-modal synthesis mapping from aligned T1WIs to T2WIs. It overcomes the negative transfer caused by the spatial misalignment. Then, we prove the reconstruction and the synthesis tasks are well complementary. Finally, we compare it with state-of-the-art approaches on an open dataset FastMRI and an in-house dataset to testify the validity of the proposed method.

8.Comparison of different retinal regions-of-interest imaged by OCT for the classification of intermediate AMD

Authors:Danilo A. Jesus, Eric F. Thee, Tim Doekemeijer, Daniel Luttikhuizen, Caroline Klaver, Stefan Klein, Theo van Walsum, Hans Vingerling, Luisa Sanchez

Abstract: To study whether it is possible to differentiate intermediate age-related macular degeneration (AMD) from healthy controls using partial optical coherence tomography (OCT) data, that is, restricting the input B-scans to certain pre-defined regions of interest (ROIs). A total of 15744 B-scans from 269 intermediate AMD patients and 115 normal subjects were used in this study (split on subject level in 80% train, 10% validation and 10% test). From each OCT B-scan, three ROIs were extracted: retina, complex between retinal pigment epithelium (RPE) and Bruch membrane (BM), and choroid (CHO). These ROIs were obtained using two different methods: masking and cropping. In addition to the six ROIs, the whole OCT B-scan and the binary mask corresponding to the segmentation of the RPE-BM complex were used. For each subset, a convolutional neural network (based on VGG16 architecture and pre-trained on ImageNet) was trained and tested. The performance of the models was evaluated using the area under the receiver operating characteristic (AUROC), accuracy, sensitivity, and specificity. All trained models presented an AUROC, accuracy, sensitivity, and specificity equal to or higher than 0.884, 0.816, 0.685, and 0.644, respectively. The model trained on the whole OCT B-scan presented the best performance (AUROC = 0.983, accuracy = 0.927, sensitivity = 0.862, specificity = 0.913). The models trained on the ROIs obtained with the cropping method led to significantly higher outcomes than those obtained with masking, with the exception of the retinal tissue, where no statistically significant difference was observed between cropping and masking (p = 0.47). This study demonstrated that while using the complete OCT B-scan provided the highest accuracy in classifying intermediate AMD, models trained on specific ROIs such as the RPE-BM complex or the choroid can still achieve high performance.

9.The Polynomial Connection between Morphological Dilation and Discrete Convolution

Authors:Vivek Sridhar, Keyvan Shahin, Michael Breuß, Marc Reichenbach

Abstract: In this paper we consider the fundamental operations dilation and erosion of mathematical morphology. Many powerful image filtering operations are based on their combinations. We establish homomorphism between max-plus semi-ring of integers and subset of polynomials over the field of real numbers. This enables to reformulate the task of computing morphological dilation to that of computing sums and products of polynomials. Therefore, dilation and its dual operation erosion can be computed by convolution of discrete linear signals, which is efficiently accomplished using a Fast Fourier Transform technique. The novel method may deal with non-flat filters and incorporates no restrictions on shape or size of the structuring element, unlike many other fast methods in the field. In contrast to previous fast Fourier techniques it gives exact results and is not an approximation. The new method is in practice particularly suitable for filtering images with small tonal range or when employing large filter sizes. We explore the benefits by investigating an implementation on FPGA hardware. Several experiments demonstrate the exactness and efficiency of the proposed method.

1.DPSeq: A Novel and Efficient Digital Pathology Classifier for Predicting Cancer Biomarkers using Sequencer Architecture

Authors:Min Cen, Xingyu Li, Bangwei Guo, Jitendra Jonnagaddala, Hong Zhang, Xu Steven Xu

Abstract: In digital pathology tasks, transformers have achieved state-of-the-art results, surpassing convolutional neural networks (CNNs). However, transformers are usually complex and resource intensive. In this study, we developed a novel and efficient digital pathology classifier called DPSeq, to predict cancer biomarkers through fine-tuning a sequencer architecture integrating horizon and vertical bidirectional long short-term memory (BiLSTM) networks. Using hematoxylin and eosin (H&E)-stained histopathological images of colorectal cancer (CRC) from two international datasets: The Cancer Genome Atlas (TCGA) and Molecular and Cellular Oncology (MCO), the predictive performance of DPSeq was evaluated in series of experiments. DPSeq demonstrated exceptional performance for predicting key biomarkers in CRC (MSI status, Hypermutation, CIMP status, BRAF mutation, TP53 mutation and chromosomal instability [CING]), outperforming most published state-of-the-art classifiers in a within-cohort internal validation and a cross-cohort external validation. Additionally, under the same experimental conditions using the same set of training and testing datasets, DPSeq surpassed 4 CNN (ResNet18, ResNet50, MobileNetV2, and EfficientNet) and 2 transformer (ViT and Swin-T) models, achieving the highest AUROC and AUPRC values in predicting MSI status, BRAF mutation, and CIMP status. Furthermore, DPSeq required less time for both training and prediction due to its simple architecture. Therefore, DPSeq appears to be the preferred choice over transformer and CNN models for predicting cancer biomarkers.

2.Extraction of volumetric indices from echocardiography: which deep learning solution for clinical use?

Authors:Hang Jung Ling, Nathan Painchaud, Pierre-Yves Courand, Pierre-Marc Jodoin, Damien Garcia, Olivier Bernard

Abstract: Deep learning-based methods have spearheaded the automatic analysis of echocardiographic images, taking advantage of the publication of multiple open access datasets annotated by experts (CAMUS being one of the largest public databases). However, these models are still considered unreliable by clinicians due to unresolved issues concerning i) the temporal consistency of their predictions, and ii) their ability to generalize across datasets. In this context, we propose a comprehensive comparison between the current best performing methods in medical/echocardiographic image segmentation, with a particular focus on temporal consistency and cross-dataset aspects. We introduce a new private dataset, named CARDINAL, of apical two-chamber and apical four-chamber sequences, with reference segmentation over the full cardiac cycle. We show that the proposed 3D nnU-Net outperforms alternative 2D and recurrent segmentation methods. We also report that the best models trained on CARDINAL, when tested on CAMUS without any fine-tuning, still manage to perform competitively with respect to prior methods. Overall, the experimental results suggest that with sufficient training data, 3D nnU-Net could become the first automated tool to finally meet the standards of an everyday clinical device.

3.Semi-Supervised Segmentation of Functional Tissue Units at the Cellular Level

Authors:Volodymyr Sydorskyi, Igor Krashenyi, Denis Savka, Oleksandr Zarichkovyi

Abstract: We present a new method for functional tissue unit segmentation at the cellular level, which utilizes the latest deep learning semantic segmentation approaches together with domain adaptation and semi-supervised learning techniques. This approach allows for minimizing the domain gap, class imbalance, and captures settings influence between HPA and HubMAP datasets. The presented approach achieves comparable with state-of-the-art-result in functional tissue unit segmentation at the cellular level. The source code is available at

1.Geometric Prior Based Deep Human Point Cloud Geometry Compression

Authors:Xinju Wu, Pingping Zhang, Meng Wang, Peilin Chen, Shiqi Wang, Sam Kwong

Abstract: The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.

2.Self-supervised arbitrary scale super-resolution framework for anisotropic MRI

Authors:Haonan Zhang, Yuhan Zhang, Qing Wu, Jiangjie Wu, Zhiming Zhen, Feng Shi, Jianmin Yuan, Hongjiang Wei, Chen Liu, Yuyao Zhang

Abstract: In this paper, we propose an efficient self-supervised arbitrary-scale super-resolution (SR) framework to reconstruct isotropic magnetic resonance (MR) images from anisotropic MRI inputs without involving external training data. The proposed framework builds a training dataset using in-the-wild anisotropic MR volumes with arbitrary image resolution. We then formulate the 3D volume SR task as a SR problem for 2D image slices. The anisotropic volume's high-resolution (HR) plane is used to build the HR-LR image pairs for model training. We further adapt the implicit neural representation (INR) network to implement the 2D arbitrary-scale image SR model. Finally, we leverage the well-trained proposed model to up-sample the 2D LR plane extracted from the anisotropic MR volumes to their HR views. The isotropic MR volumes thus can be reconstructed by stacking and averaging the generated HR slices. Our proposed framework has two major advantages: (1) It only involves the arbitrary-resolution anisotropic MR volumes, which greatly improves the model practicality in real MR imaging scenarios (e.g., clinical brain image acquisition); (2) The INR-based SR model enables arbitrary-scale image SR from the arbitrary-resolution input image, which significantly improves model training efficiency. We perform experiments on a simulated public adult brain dataset and a real collected 7T brain dataset. The results indicate that our current framework greatly outperforms two well-known self-supervised models for anisotropic MR image SR tasks.

1.LCAUnet: A skin lesion segmentation network with enhanced edge and body fusion

Authors:Qisen Ma, Keming Mao, Gao Wang, Lisheng Xu, Yuhai Zhao

Abstract: Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representation with fusion of edge and body features, which are often paid little attentions in traditional methods. First, two separate branches are set for edge and body segmentation with CNNs and Transformer based architecture respectively. Then, LCAF module is utilized to fuse feature maps of edge and body of the same level by local cross-attention operation in encoder stage. Furthermore, PGMF module is embedded for feature integration with prior guided multi-scale adaption. Comprehensive experiments on public available dataset ISIC 2017, ISIC 2018, and PH2 demonstrate that LCAUnet outperforms most state-of-the-art methods. The ablation studies also verify the effectiveness of the proposed fusion techniques.

2.A Novel Low-Rank Tensor Method for Undersampling Artifact Removal in Respiratory Motion-Resolved Multi-Echo 3D Cones MRI

Authors:Seongho Jeong, MungSoo Kang, Gerald Behr, Heechul Jeong, Youngwook Kee

Abstract: We propose a novel low-rank tensor method for respiratory motion-resolved multi-echo image reconstruction. The key idea is to construct a 3-way image tensor (space $\times$ echo $\times$ motion state) from the conventional gridding reconstruction of highly undersampled multi-echo k-space raw data, and exploit low-rank tensor structure to separate it from undersampling artifacts. Healthy volunteers and patients with iron overload were recruited and imaged on a 3T clinical MRI system for this study. Results show that our proposed method Successfully reduced severe undersampling artifacts in respiratory motion-state resolved complex source images, as well as subsequent R2* and quantitative susceptibility mapping (QSM). Compared to conventional respiratory motion-resolved compressed sensing (CS) image reconstruction, the proposed method had a reconstruction time at least three times faster, accounting for signal evolution along the echo dimension in the multi-echo data.

3.Early Detection of Alzheimer's Disease using Bottleneck Transformers

Authors:Arunima Jaiswal, Ananya Sadana

Abstract: Early detection of Alzheimer's Disease (AD) and its prodromal state, Mild Cognitive Impairment (MCI), is crucial for providing suitable treatment and preventing the disease from progressing. It can also aid researchers and clinicians to identify early biomarkers and minister new treatments that have been a subject of extensive research. The application of deep learning techniques on structural Magnetic Resonance Imaging (MRI) has shown promising results in diagnosing the disease. In this research, we intend to introduce a novel approach of using an ensemble of the self-attention-based Bottleneck Transformers with a sharpness aware minimizer for early detection of Alzheimer's Disease. The proposed approach has been tested on the widely accepted ADNI dataset and evaluated using accuracy, precision, recall, F1 score, and ROC-AUC score as the performance metrics.

4.Probabilistic 3D segmentation for aleatoric uncertainty quantification in full 3D medical data

Authors:Christiaan G. A. Viviers, Amaan M. M. Valiuddin, Peter H. N. de With, Fons van der Sommen

Abstract: Uncertainty quantification in medical images has become an essential addition to segmentation models for practical application in the real world. Although there are valuable developments in accurate uncertainty quantification methods using 2D images and slices of 3D volumes, in clinical practice, the complete 3D volumes (such as CT and MRI scans) are used to evaluate and plan the medical procedure. As a result, the existing 2D methods miss the rich 3D spatial information when resolving the uncertainty. A popular approach for quantifying the ambiguity in the data is to learn a distribution over the possible hypotheses. In recent work, this ambiguity has been modeled to be strictly Gaussian. Normalizing Flows (NFs) are capable of modelling more complex distributions and thus, better fit the embedding space of the data. To this end, we have developed a 3D probabilistic segmentation framework augmented with NFs, to enable capturing the distributions of various complexity. To test the proposed approach, we evaluate the model on the LIDC-IDRI dataset for lung nodule segmentation and quantify the aleatoric uncertainty introduced by the multi-annotator setting and inherent ambiguity in the CT data. Following this approach, we are the first to present a 3D Squared Generalized Energy Distance (GED) of 0.401 and a high 0.468 Hungarian-matched 3D IoU. The obtained results reveal the value in capturing the 3D uncertainty, using a flexible posterior distribution augmented with a Normalizing Flow. Finally, we present the aleatoric uncertainty in a visual manner with the aim to provide clinicians with additional insight into data ambiguity and facilitating more informed decision-making.

1.Segment Anything Model for Medical Images?

Authors:Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Haozhe Chi, Xindi Hu, Deng-Ping Fan, Fajin Dong, Dong Ni

Abstract: The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It designed a novel promotable segmentation task, ensuring zero-shot image segmentation using the pre-trained model via two main modes including automatic everything and manual prompt. SAM has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging due to the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. SAM has achieved impressive results on various natural image segmentation tasks. Meanwhile, zero-shot and efficient MIS can well reduce the annotation time and boost the development of medical image analysis. Hence, SAM seems to be a potential tool and its performance on large medical datasets should be further validated. We collected and sorted 52 open-source datasets, and build a large medical segmentation dataset with 16 modalities, 68 objects, and 553K slices. We conducted a comprehensive analysis of different SAM testing strategies on the so-called COSMOS 553K dataset. Extensive experiments validate that SAM performs better with manual hints like points and boxes for object perception in medical images, leading to better performance in prompt mode compared to everything mode. Additionally, SAM shows remarkable performance in some specific objects and modalities, but is imperfect or even totally fails in other situations. Finally, we analyze the influence of different factors (e.g., the Fourier-based boundary complexity and size of the segmented objects) on SAM's segmentation performance. Extensive experiments validate that SAM's zero-shot segmentation capability is not sufficient to ensure its direct application to the MIS.

2.SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

Authors:An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning.

3.An Efficient Hash-based Data Structure for Dynamic Vision Sensors and its Application to Low-energy Low-memory Noise Filtering

Authors:Pradeep Kumar Gopalakrishnan, Chip-Hong Chang, Arindam Basu

Abstract: Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. It overcomes the saturation issue in the Bloom Filter (BF) and the memory reset issue in other hash-based arrays by using a second dimension to clear 1 out of D rows at regular intervals. A hardware-friendly, low-power, and low-memory-footprint noise filter for DVS is demonstrated using BF_2. For the tested datasets, the performance of the filter matches those of state-of-the-art filters like the BAF/STCF while consuming less than 10% and 15% of their memory and energy-per-event, respectively, for a correlation time constant Tau = 5 ms. The memory and energy advantages of the proposed filter increase with increasing sensor sizes. The proposed filter compares favourably with other hardware-friendly, event-based filters in hardware complexity, memory requirement and energy-per-event - as demonstrated through its implementation on an FPGA. The parameters of the data structure can be adjusted for trade-offs between performance and memory consumption, based on application requirements.

4.Making the Invisible Visible: Toward High-Quality Terahertz Tomographic Imaging via Physics-Guided Restoration

Authors:Weng-Tai Su, Yi-Chun Hung, Po-Jen Yu, Shang-Hua Yang, Chia-Wen Lin

Abstract: Terahertz (THz) tomographic imaging has recently attracted significant attention thanks to its non-invasive, non-destructive, non-ionizing, material-classification, and ultra-fast nature for object exploration and inspection. However, its strong water absorption nature and low noise tolerance lead to undesired blurs and distortions of reconstructed THz images. The diffraction-limited THz signals highly constrain the performances of existing restoration methods. To address the problem, we propose a novel multi-view Subspace-Attention-guided Restoration Network (SARNet) that fuses multi-view and multi-spectral features of THz images for effective image restoration and 3D tomographic reconstruction. To this end, SARNet uses multi-scale branches to extract intra-view spatio-spectral amplitude and phase features and fuse them via shared subspace projection and self-attention guidance. We then perform inter-view fusion to further improve the restoration of individual views by leveraging the redundancies between neighboring views. Here, we experimentally construct a THz time-domain spectroscopy (THz-TDS) system covering a broad frequency range from 0.1 THz to 4 THz for building up a temporal/spectral/spatial/ material THz database of hidden 3D objects. Complementary to a quantitative evaluation, we demonstrate the effectiveness of our SARNet model on 3D THz tomographic reconstruction applications.

5.Unified Noise-aware Network for Low-count PET Denoising

Authors:Huidong Xie, Qiong Liu, Bo Zhou, Xiongchao Chen, Xueqi Guo, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. However, neural networks trained on a specific noise level cannot be easily generalized to other noise levels due to different noise amplitude and variances. To obtain optimal denoised results, we may need to train multiple networks using data with different noise levels. But this approach may be infeasible in reality due to limited data availability. Denoising dynamic PET images presents additional challenge due to tracer decay and continuously changing noise levels across dynamic frames. To address these issues, we propose a Unified Noise-aware Network (UNN) that combines multiple sub-networks with varying denoising power to generate optimal denoised results regardless of the input noise levels. Evaluated using large-scale data from two medical centers with different vendors, presented results showed that the UNN can consistently produce promising denoised results regardless of input noise levels, and demonstrate superior performance over networks trained on single noise level data, especially for extremely low-count data.

1.A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis

Authors:Haolin Wang, Yafei Ou, Wanxuan Fang, Prasoon Ambalathankandy, Naoto Goto, Gen Ota, Masayuki Ikebe, Tamotsu Kamishima

Abstract: Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for monitoring joint space by quantifying JSN progression through image registration in radiographic images has been developed. This framework offers the advantage of high accuracy, however, challenges do exist in reducing mismatches and improving reliability. In this work, a deep intra-subject rigid registration network is proposed to automatically quantify JSN progression in the early stage of RA. In our experiments, the mean-square error of Euclidean distance between moving and fixed image is 0.0031, standard deviation is 0.0661 mm, and the mismatching rate is 0.48\%. The proposed method has sub-pixel level accuracy, exceeding manual measurements by far, and is equipped with immune to noise, rotation, and scaling of joints. Moreover, this work provides loss visualization, which can aid radiologists and rheumatologists in assessing quantification reliability, with important implications for possible future clinical applications. As a result, we are optimistic that this proposed work will make a significant contribution to the automatic quantification of JSN progression in RA.

1.Detection of Alzheimer's Disease using MRI scans based on Inertia Tensor and Machine Learning

Authors:Krishna Mahapatra, Selvakumar R

Abstract: Alzheimer's Disease is a devastating neurological disorder that is increasingly affecting the elderly population. Early and accurate detection of Alzheimer's is crucial for providing effective treatment and support for patients and their families. In this study, we present a novel approach for detecting four different stages of Alzheimer's disease from MRI scan images based on inertia tensor analysis and machine learning. From each available MRI scan image for different classes of Dementia, we first compute a very simple 2 x 2 matrix, using the techniques of forming a moment of inertia tensor, which is largely used in different physical problems. Using the properties of the obtained inertia tensor and their eigenvalues, along with some other machine learning techniques, we were able to significantly classify the different types of Dementia. This process provides a new and unique approach to identifying and classifying different types of images using machine learning, with a classification accuracy of (90%) achieved. Our proposed method not only has the potential to be more cost-effective than current methods but also provides a new physical insight into the disease by reducing the dimension of the image matrix. The results of our study highlight the potential of this approach for advancing the field of Alzheimer's disease detection and improving patient outcomes.

2.Low-field magnetic resonance image enhancement via stochastic image quality transfer

Authors:Hongxiang Lin, Matteo Figini, Felice D'Arco, Godwin Ogbole, Ryutaro Tanno, Stefano B. Blumberg, Lisa Ronan, Biobele J. Brown, David W. Carmichael, Ikeoluwa Lagunju, Judith Helen Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander

Abstract: Low-field (<1T) magnetic resonance imaging (MRI) scanners remain in widespread use in low- and middle-income countries (LMICs) and are commonly used for some applications in higher income countries e.g. for small child patients with obesity, claustrophobia, implants, or tattoos. However, low-field MR images commonly have lower resolution and poorer contrast than images from high field (1.5T, 3T, and above). Here, we present Image Quality Transfer (IQT) to enhance low-field structural MRI by estimating from a low-field image the image we would have obtained from the same subject at high field. Our approach uses (i) a stochastic low-field image simulator as the forward model to capture uncertainty and variation in the contrast of low-field images corresponding to a particular high-field image, and (ii) an anisotropic U-Net variant specifically designed for the IQT inverse problem. We evaluate the proposed algorithm both in simulation and using multi-contrast (T1-weighted, T2-weighted, and fluid attenuated inversion recovery (FLAIR)) clinical low-field MRI data from an LMIC hospital. We show the efficacy of IQT in improving contrast and resolution of low-field MR images. We demonstrate that IQT-enhanced images have potential for enhancing visualisation of anatomical structures and pathological lesions of clinical relevance from the perspective of radiologists. IQT is proved to have capability of boosting the diagnostic value of low-field MRI, especially in low-resource settings.

3.DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Authors:Shitong Shao, Xiaohan Yuan, Zhen Huang, Ziming Qiu, Shuai Wang, Kevin Zhou

Abstract: Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at

4.OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

Authors:Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Abstract: 360{\deg} omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360{\deg} omnidirectional image superresolution. The first stage employs two branches: model A, which incorporates omnidirectional position-aware deformable blocks (OPDB) and Fourier upsampling, and model B, which adds a spatial frequency fusion module (SFF) to model A. Model A aims to enhance the feature extraction ability of 360{\deg} image positional information, while Model B further focuses on the high-frequency information of 360{\deg} images. The second stage performs same-resolution enhancement based on the structure of model A with a pixel unshuffle operation. In addition, we collected data from YouTube to improve the fitting ability of the transformer, and created pseudo low-resolution images using a degradation network. Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360{\deg} omnidirectional image super-resolution.

5.Mixing Data Augmentation with Preserving Foreground Regions in Medical Image Segmentation

Authors:Xiaoqing Liu, Kenji Ono, Ryoma Bise

Abstract: The development of medical image segmentation using deep learning can significantly support doctors' diagnoses. Deep learning needs large amounts of data for training, which also requires data augmentation to extend diversity for preventing overfitting. However, the existing methods for data augmentation of medical image segmentation are mainly based on models which need to update parameters and cost extra computing resources. We proposed data augmentation methods designed to train a high accuracy deep learning network for medical image segmentation. The proposed data augmentation approaches are called KeepMask and KeepMix, which can create medical images by better identifying the boundary of the organ with no more parameters. Our methods achieved better performance and obtained more precise boundaries for medical image segmentation on datasets. The dice coefficient of our methods achieved 94.15% (3.04% higher than baseline) on CHAOS and 74.70% (5.25% higher than baseline) on MSD spleen with Unet.

6.Tissue Classification During Needle Insertion Using Self-Supervised Contrastive Learning and Optical Coherence Tomography

Authors:Debayan Bhattacharya, Sarah Latus, Finn Behrendt, Florin Thimm, Dennis Eggert, Christian Betz, Alexander Schlaefer

Abstract: Needle positioning is essential for various medical applications such as epidural anaesthesia. Physicians rely on their instincts while navigating the needle in epidural spaces. Thereby, identifying the tissue structures may be helpful to the physician as they can provide additional feedback in the needle insertion process. To this end, we propose a deep neural network that classifies the tissues from the phase and intensity data of complex OCT signals acquired at the needle tip. We investigate the performance of the deep neural network in a limited labelled dataset scenario and propose a novel contrastive pretraining strategy that learns invariant representation for phase and intensity data. We show that with 10% of the training set, our proposed pretraining strategy helps the model achieve an F1 score of 0.84 whereas the model achieves an F1 score of 0.60 without it. Further, we analyse the importance of phase and intensity individually towards tissue classification.

7.Multi-Modality Deep Network for Extreme Learned Image Compression

Authors:Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

Abstract: Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior information to guide image compression for better compression performance. We fully study the role of text description in different components of the codec, and demonstrate its effectiveness. In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions. Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2x to 4x bitrates of ours.

8.HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content

Authors:Rafal K. Mantiuk, Dounia Hammou, Param Hanji

Abstract: High-Dynamic-Range Visual-Difference-Predictor version 3, or HDR-VDP-3, is a visual metric that can fulfill several tasks, such as full-reference image/video quality assessment, prediction of visual differences between a pair of images, or prediction of contrast distortions. Here we present a high-level overview of the metric, position it with respect to related work, explain the main differences compared to version 2.2, and describe how the metric was adapted for the HDR Video Quality Measurement Grand Challenge 2023.

9.Phagocytosis Unveiled: A Scalable and Interpretable Deep learning Framework for Neurodegenerative Disease Analysis

Authors:Mehdi Ounissi, Morwena Latouche, Daniel Racoceanu

Abstract: Quantifying the phagocytosis of dynamic, unstained cells is essential for evaluating neurodegenerative diseases. However, measuring rapid cell interactions and distinguishing cells from backgrounds make this task challenging when processing time-lapse phase-contrast video microscopy. In this study, we introduce a fully automated, scalable, and versatile realtime framework for quantifying and analyzing phagocytic activity. Our proposed pipeline can process large data-sets and includes a data quality verification module to counteract potential perturbations such as microscope movements and frame blurring. We also propose an explainable cell segmentation module to improve the interpretability of deep learning methods compared to black-box algorithms. This includes two interpretable deep learning capabilities: visual explanation and model simplification. We demonstrate that interpretability in deep learning is not the opposite of high performance, but rather provides essential deep learning algorithm optimization insights and solutions. Incorporating interpretable modules results in an efficient architecture design and optimized execution time. We apply this pipeline to quantify and analyze microglial cell phagocytosis in frontotemporal dementia (FTD) and obtain statistically reliable results showing that FTD mutant cells are larger and more aggressive than control cells. To stimulate translational approaches and future research, we release an open-source pipeline and a unique microglial cells phagocytosis dataset for immune system characterization in neurodegenerative diseases research. This pipeline and dataset will consistently crystallize future advances in this field, promoting the development of efficient and effective interpretable algorithms dedicated to this critical domain.

10.Automated Classification of Stroke Blood Clot Origin using Whole-Slide Digital Pathology Images

Authors:Koushik Sivarama Krishnan, P. J. Joe Nikesh, M. Logeshwaran, G. Senthilkumar, D. Elangovan

Abstract: The classification of the origin of blood clots is a crucial step in diagnosing and treating ischemic stroke. Various imaging techniques such as computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound have been employed to detect and locate blood clots within the body. However, identifying the origin of a blood clot remains challenging due to the complexity of the blood flow dynamics and the limitations of the imaging techniques. The study suggests a novel methodology for classifying the source of a blood clot through the integration of data from whole-slide digital pathology images, which are utilized to fine-tune several cutting-edge computer vision models. Upon comparison, the SwinTransformerV2 model outperforms all the other models and achieves an accuracy score of 94.24%, precision score of 94.41%, recall score of 94.09%, and, f1-score of 94.06%. Our approach shows promising results in detecting the origin of blood clots in different vascular regions and can potentially improve the diagnosis and management of ischemic stroke.

1.STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation

Authors:Lei Shi, Tianyu Gao, Zheng Zhang, Junxing Zhang

Abstract: Automated medical image segmentation can assist doctors to diagnose faster and more accurate. Deep learning based models for medical image segmentation have made great progress in recent years. However, the existing models fail to effectively leverage Transformer and MLP for improving U-shaped architecture efficiently. In addition, the multi-scale features of the MLP have not been fully extracted in the bottleneck of U-shaped architecture. In this paper, we propose an efficient U-shaped architecture based on Swin Transformer and multi-scale MLP, namely STM-UNet. Specifically, the Swin Transformer block is added to skip connection of STM-UNet in form of residual connection, which can enhance the modeling ability of global features and long-range dependency. Meanwhile, a novel PCAS-MLP with parallel convolution module is designed and placed into the bottleneck of our architecture to contribute to the improvement of segmentation performance. The experimental results on ISIC 2016 and ISIC 2018 demonstrate the effectiveness of our proposed method. Our method also outperforms several state-of-the-art methods in terms of IoU and Dice. Our method has achieved a better trade-off between high segmentation accuracy and low model complexity.

2.Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection

Authors:Hongyang Jiang, Jingqi Huang, Chen Tang, Xiaoqing Zhang, Mengdi Gao, Jiang Liu

Abstract: Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, helping ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases.

3.The Bjøntegaard Bible -- Why your Way of Comparing Video Codecs May Be Wrong

Authors:Christian Herglotz, Hannah Och, Anna Meyer, Geetha Ramasubbu, Lena Eichermüller, Matthias Kränzler, Fabian Brand, Kristian Fischer, Dat Thanh Nguyen, Andy Regensky, André Kaup

Abstract: In this paper, we provide an in-depth assessment on the Bj{\o}ntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as classic perspective video, 360{\deg} video, point clouds, and screen content. As compression technology, we consider multiple hybrid video codecs as well as state-of-the-art neural network based compression methods. Using additional performance points inbetween standard points defined by parameters such as the quantization parameter, we assess the interpolation error of the Bj{\o}ntegaard-Delta (BD) calculus and its impact on the final BD value. Performing an in-depth analysis, we find that the BD calculus is most accurate in the standard application of rate-distortion comparisons with mean errors below 0.5 percentage points. For other applications, the errors are higher (up to 10 percentage points), but can be reduced by a higher number of performance points. We finally come up with recommendations on how to use the BD calculus such that the validity of the resulting BD-values is maximized. Main recommendations include the use of Akima interpolation, the interpretation of relative difference curves, and the use of the logarithmic domain for saturating metrics such as SSIM and VMAF.

4.Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Authors:Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Imran Razzak

Abstract: Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former-latter and latter-former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score (and correspondingly Jaccard index) while keeping the number of trainable parameters comparatively low. We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE, demonstrating its superior performance as compared with competitive approaches elsewhere in the literature.

5.MRI Recovery with Self-Calibrated Denoisers without Fully-Sampled Data

Authors:Sizhuo Liu, Philip Schniter, Rizwan Ahmad

Abstract: PURPOSE: To present and validate a self-supervised MRI reconstruction method that does not require fully sampled k-space data. METHODS: ReSiDe is inspired by plug-and-play (PnP) methods and employs a denoiser as a regularizer. In contrast to traditional PnP approaches that utilize generic denoisers or train deep learning-based denoisers using high-quality images or image patches, ReSiDe directly trains the denoiser on the image or images being reconstructed from the undersampled data. We introduce two variations of our method, ReSiDe-S and ReSiDe-M. ReSiDe-S is scan-specific and works with a single set of undersampled measurements, while ReSiDe-M operates on multiple sets of undersampled measurements. More importantly, the trained denoisers in ReSiDe-M are stored for PnP recovery without further training. To improve robustness, the denoising strength in ReSiDe-S and ReSiDe- M is auto-tuned using the discrepancy principle. RESULTS: Studies I, II, and III compare ReSiDe-S and ReSiDe-M against other self-supervised or unsupervised methods using data from T1- and T2-weighted brain MRI, MRXCAT digital perfusion phantom, and first-pass cardiac perfusion, respectively. ReSiDe-S and ReSiDe-M outperform other methods in terms of reconstruction signal-to-noise ratio and structural similarity index measure for Studies I and II and in terms of expert scoring for Study III. CONCLUSION: A self-supervised image reconstruction method is presented and validated in both static and dynamic MRI applications. These developments can benefit MRI applications where availability of fully sampled training data is limited.

6.Multi-Scale Feature Fusion using Parallel-Attention Block for COVID-19 Chest X-ray Diagnosis

Authors:Xiao Qi, David J. Foran, John L. Nosher, Ilker Hacihaliloglu

Abstract: Under the global COVID-19 crisis, accurate diagnosis of COVID-19 from Chest X-ray (CXR) images is critical. To reduce intra- and inter-observer variability, during the radiological assessment, computer-aided diagnostic tools have been utilized to supplement medical decision-making and subsequent disease management. Computational methods with high accuracy and robustness are required for rapid triaging of patients and aiding radiologists in the interpretation of the collected data. In this study, we propose a novel multi-feature fusion network using parallel attention blocks to fuse the original CXR images and local-phase feature-enhanced CXR images at multi-scales. We examine our model on various COVID-19 datasets acquired from different organizations to assess the generalization ability. Our experiments demonstrate that our method achieves state-of-art performance and has improved generalization capability, which is crucial for widespread deployment.

1.Synthetic Datasets for Autonomous Driving: A Survey

Authors:Zhihang Song, Zimin He, Xingyu Li, Qiming Ma, Ruibo Ming, Zhiqi Mao, Huaxin Pei, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Abstract: Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution.

2.Topology-Aware Focal Loss for 3D Image Segmentation

Authors:Andac Demir, Elie Massaad, Bulent Kiziltan

Abstract: The efficacy of segmentation algorithms is frequently compromised by topological errors like overlapping regions, disrupted connections, and voids. To tackle this problem, we introduce a novel loss function, namely Topology-Aware Focal Loss (TAFL), that incorporates the conventional Focal Loss with a topological constraint term based on the Wasserstein distance between the ground truth and predicted segmentation masks' persistence diagrams. By enforcing identical topology as the ground truth, the topological constraint can effectively resolve topological errors, while Focal Loss tackles class imbalance. We begin by constructing persistence diagrams from filtered cubical complexes of the ground truth and predicted segmentation masks. We subsequently utilize the Sinkhorn-Knopp algorithm to determine the optimal transport plan between the two persistence diagrams. The resultant transport plan minimizes the cost of transporting mass from one distribution to the other and provides a mapping between the points in the two persistence diagrams. We then compute the Wasserstein distance based on this travel plan to measure the topological dissimilarity between the ground truth and predicted masks. We evaluate our approach by training a 3D U-Net with the MICCAI Brain Tumor Segmentation (BraTS) challenge validation dataset, which requires accurate segmentation of 3D MRI scans that integrate various modalities for the precise identification and tracking of malignant brain tumors. Then, we demonstrate that the quality of segmentation performance is enhanced by regularizing the focal loss through the addition of a topological constraint as a penalty term.

3.Segment Anything in Medical Images

Authors:Jun Ma, Bo Wang

Abstract: Segment anything model (SAM) has revolutionized natural image segmentation, but its performance on medical images is limited. This work presents MedSAM, the first attempt at extending the success of SAM to medical images, with the goal of creating a universal tool for the segmentation of various medical targets. Specifically, we first curate a large-scale medical image dataset, encompassing over 200,000 masks across 11 different modalities. Then, we develop a simple fine-tuning method to adapt SAM to general medical image segmentation. Comprehensive experiments on 21 3D segmentation tasks and 9 2D segmentation tasks demonstrate that MedSAM outperforms the default SAM model with an average Dice Similarity Coefficient (DSC) of 22.5% and 17.6% on 3D and 2D segmentation tasks, respectively. The code and trained model are publicly available at \url{}.

1.Boosting multiple sclerosis lesion segmentation through attention mechanism

Authors:Alessia Rondinella, Elena Crispino, Francesco Guarnera, Oliver Giudice, Alessandro Ortis, Giulia Russo, Clara Di Lorenzo, Davide Maimone, Francesco Pappalardo, Sebastiano Battiato

Abstract: Magnetic resonance imaging is a fundamental tool to reach a diagnosis of multiple sclerosis and monitoring its progression. Although several attempts have been made to segment multiple sclerosis lesions using artificial intelligence, fully automated analysis is not yet available. State-of-the-art methods rely on slight variations in segmentation architectures (e.g. U-Net, etc.). However, recent research has demonstrated how exploiting temporal-aware features and attention mechanisms can provide a significant boost to traditional architectures. This paper proposes a framework that exploits an augmented U-Net architecture with a convolutional long short-term memory layer and attention mechanism which is able to segment and quantify multiple sclerosis lesions detected in magnetic resonance images. Quantitative and qualitative evaluation on challenging examples demonstrated how the method outperforms previous state-of-the-art approaches, reporting an overall Dice score of 89% and also demonstrating robustness and generalization ability on never seen new test samples of a new dedicated under construction dataset.

2.WATT-EffNet: A Lightweight and Accurate Model for Classifying Aerial Disaster Images

Authors:Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N. Duong

Abstract: Incorporating deep learning (DL) classification models into unmanned aerial vehicles (UAVs) can significantly augment search-and-rescue operations and disaster management efforts. In such critical situations, the UAV's ability to promptly comprehend the crisis and optimally utilize its limited power and processing resources to narrow down search areas is crucial. Therefore, developing an efficient and lightweight method for scene classification is of utmost importance. However, current approaches tend to prioritize accuracy on benchmark datasets at the expense of computational efficiency. To address this shortcoming, we introduce the Wider ATTENTION EfficientNet (WATT-EffNet), a novel method that achieves higher accuracy with a more lightweight architecture compared to the baseline EfficientNet. The WATT-EffNet leverages width-wise incremental feature modules and attention mechanisms over width-wise features to ensure the network structure remains lightweight. We evaluate our method on a UAV-based aerial disaster image classification dataset and demonstrate that it outperforms the baseline by up to 15 times in terms of classification accuracy and $38.3\%$ in terms of computing efficiency as measured by Floating Point Operations per second (FLOPs). Additionally, we conduct an ablation study to investigate the effect of varying the width of WATT-EffNet on accuracy and computational efficiency. Our code is available at \url{}.

3.Multi-frame-based Cross-domain Image Denoising for Low-dose Computed Tomography

Authors:Yucheng Lu, Zhixin Xu, Moon Hyung Choi, Jimin Kim, Seung-Won Jung

Abstract: Computed tomography (CT) has been used worldwide for decades as one of the most important non-invasive tests in assisting diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation dose has driven researchers to improve the reconstruction quality, especially by removing noise and artifacts. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most of them were developed on the simulated data collected using Radon transform. However, the real-world scenario significantly differs from the simulation domain, and the joint optimization of denoising with modern CT image reconstruction pipeline is still missing. In this paper, for the commercially available third-generation multi-slice spiral CT scanners, we propose a two-stage method that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our method makes good use of the high redundancy of both the multi-slice projections and the volumetric reconstructions while avoiding the collapse of information in conventional cascaded frameworks. The dedicated design also provides a clearer interpretation of the workflow. Through extensive evaluations, we demonstrate its superior performance against state-of-the-art methods.

1.MIPI 2023 Challenge on RGBW Fusion: Methods and Results

Authors:Qianhui Sun, Qingyu Yang, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Wenxiu Sun, Qingpeng Zhu, Chen Change Loy, Jinwei Gu

Abstract: Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for an in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). With the success of the 1st MIPI [email protected] 2022, we introduce the second MIPI challenge, including four tracks focusing on novel image sensors and imaging algorithms. This paper summarizes and reviews the RGBW Joint Remosaic and Denoise track on MIPI 2023. In total, 81 participants were successfully registered, and 4 teams submitted results in the final testing phase. The final results are evaluated using objective metrics, including PSNR, SSIM, LPIPS, and KLD. A detailed description of the top three models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at

2.Dark-field and directional dark-field on low coherence X-ray sources with random mask modulations: validation with SAXS anisotropy measurements

Authors:Clara Magnin, Laurene Quenot, Sylvain Bohic, Dan Mihai Cenda, Manuel Fernández Martínez, Blandine Lantz, Bertrand Faure, Emmanuel Brun

Abstract: Phase Contrast Imaging (PCI), Dark-Field (DF) and Directional Dark-Field (DDF) imaging are recent X-ray imaging modalities that have demonstrated their interest by providing access to information and contrasts different from those provided by conventional absorption X-ray imaging. However, access to these two types of images is currently limited because the acquisitions require the use of coherent sources such as synchrotron radiation or complicated optical setups to exploit the coherence requirements. This work demonstrates the possibility of efficiently performing phase contrast, dark-field and directional dark-field imaging on a low-coherence laboratory system equipped with a conventional X-ray tube, using a simple, fast and robust single-mask technique. The transfer to a low spatial coherence laboratory system was made possible by using random modulation based imaging (MoBI) and extending the low coherence system algorithm to retrieve dark-field and directional dark-field.

1.MAMAF-Net: Motion-Aware and Multi-Attention Fusion Network for Stroke Diagnosis

Authors:Aysen Degerli, Pekka Jakala, Juha Pajula, Miguel Bordallo Lopez

Abstract: Stroke is a major cause of mortality and disability worldwide from which one in four people are in danger of incurring in their lifetime. The pre-hospital stroke assessment plays a vital role in identifying stroke patients accurately to accelerate further examination and treatment in hospitals. Accordingly, the National Institutes of Health Stroke Scale (NIHSS), Cincinnati Pre-hospital Stroke Scale (CPSS) and Face Arm Speed Time (F.A.S.T.) are globally known tests for stroke assessment. However, the validity of these tests is skeptical in the absence of neurologists. Therefore, in this study, we propose a motion-aware and multi-attention fusion network (MAMAF-Net) that can detect stroke from multimodal examination videos. Contrary to other studies on stroke detection from video analysis, our study for the first time proposes an end-to-end solution from multiple video recordings of each subject with a dataset encapsulating stroke, transient ischemic attack (TIA), and healthy controls. The proposed MAMAF-Net consists of motion-aware modules to sense the mobility of patients, attention modules to fuse the multi-input video data, and 3D convolutional layers to perform diagnosis from the attention-based extracted features. Experimental results over the collected StrokeDATA dataset show that the proposed MAMAF-Net achieves a successful detection of stroke with 93.62% sensitivity and 95.33% AUC score.

2.Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network

Authors:Yeong Il Jang, Keuntek Lee, Gu Yong Park, Seyun Kim, Nam Ik Cho

Abstract: There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive field's center pixel. However, excluding any input pixel is abandoning some information, especially when the input pixel at the corresponding output position is excluded. In addition, a standard blind-spot network fails to reduce real camera noise due to the pixel-wise correlation of noise, though it successfully removes independently distributed synthetic noise. Hence, to realize a more practical denoiser, we propose a novel self-supervised training framework that can remove real noise. For this, we derive the theoretic upper bound of a supervised loss where the network is guided by the downsampled blinded output. Also, we design a conditional blind-spot network (C-BSN), which selectively controls the blindness of the network to use the center pixel information. Furthermore, we exploit a random subsampler to decorrelate noise spatially, making the C-BSN free of visual artifacts that were often seen in downsample-based methods. Extensive experiments show that the proposed C-BSN achieves state-of-the-art performance on real-world datasets as a self-supervised denoiser and shows qualitatively pleasing results without any post-processing or refinement.

3.DCELANM-Net:Medical Image Segmentation based on Dual Channel Efficient Layer Aggregation Network with Learner

Authors:Chengzhun Lu, Zhangrun Xia, Krzysztof Przystupa, Orest Kochan, Jun Su

Abstract: The DCELANM-Net structure, which this article offers, is a model that ingeniously combines a Dual Channel Efficient Layer Aggregation Network (DCELAN) and a Micro Masked Autoencoder (Micro-MAE). On the one hand, for the DCELAN, the features are more effectively fitted by deepening the network structure; the deeper network can successfully learn and fuse the features, which can more accurately locate the local feature information; and the utilization of each layer of channels is more effectively improved by widening the network structure and residual connections. We adopted Micro-MAE as the learner of the model. In addition to being straightforward in its methodology, it also offers a self-supervised learning method, which has the benefit of being incredibly scaleable for the model.

4.Cross-Reference Transformer for Few-shot Medical Image Segmentation

Authors:Yao Huang, Jianming Liu

Abstract: Due to the contradiction of medical image processing, that is, the application of medical images is more and more widely and the limitation of medical images is difficult to label, few-shot learning technology has begun to receive more attention in the field of medical image processing. This paper proposes a Cross-Reference Transformer for medical image segmentation, which addresses the lack of interaction between the existing Cross-Reference support image and the query image. It can better mine and enhance the similar parts of support features and query features in high-dimensional channels. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.

5.Optimizations of Autoencoders for Analysis and Classification of Microscopic In Situ Hybridization Images

Authors:Aleksandar A. Yanev, Galina D. Momcheva, Stoyan P. Pavlov

Abstract: Currently, analysis of microscopic In Situ Hybridization images is done manually by experts. Precise evaluation and classification of such microscopic images can ease experts' work and reveal further insights about the data. In this work, we propose a deep-learning framework to detect and classify areas of microscopic images with similar levels of gene expression. The data we analyze requires an unsupervised learning model for which we employ a type of Artificial Neural Network - Deep Learning Autoencoders. The model's performance is optimized by balancing the latent layers' length and complexity and fine-tuning hyperparameters. The results are validated by adapting the mean-squared error (MSE) metric, and comparison to expert's evaluation.

6.Application of attention-based Siamese composite neural network in medical image recognition

Authors:Zihao Huang, Xia Chen, Yue Wang, Weixing Xin, Xingtong Lin, Huizhen Li

Abstract: Medical image recognition often faces the problem of insufficient data in practical applications. Image recognition and processing under few-shot conditions will produce overfitting, low recognition accuracy, low reliability and insufficient robustness. It is often the case that the difference of characteristics is subtle, and the recognition is affected by perspectives, background, occlusion and other factors, which increases the difficulty of recognition. Furthermore, in fine-grained images, the few-shot problem leads to insufficient useful feature information in the images. Considering the characteristics of few-shot and fine-grained image recognition, this study has established a recognition model based on attention and Siamese neural network. Aiming at the problem of few-shot samples, a Siamese neural network suitable for classification model is proposed. The Attention-Based neural network is used as the main network to improve the classification effect. Covid- 19 lung samples have been selected for testing the model. The results show that the less the number of image samples are, the more obvious the advantage shows than the ordinary neural network.

1.Cashew dataset generation using augmentation and RaLSGAN and a transfer learning based tinyML approach towards disease detection

Authors:Varsha Jayaprakash, Akilesh K, Ajay kumar, Balamurugan M. S, Manoj Kumar Rajagopal

Abstract: Cashew is one of the most extensively consumed nuts in the world, and it is also known as a cash crop. A tree may generate a substantial yield in a few months and has a lifetime of around 70 to 80 years. Yet, in addition to the benefits, there are certain constraints to its cultivation. With the exception of parasites and algae, anthracnose is the most common disease affecting trees. When it comes to cashew, the dense structure of the tree makes it difficult to diagnose the disease with ease compared to short crops. Hence, we present a dataset that exclusively consists of healthy and diseased cashew leaves and fruits. The dataset is authenticated by adding RGB color transformation to highlight diseased regions, photometric and geometric augmentations, and RaLSGAN to enlarge the initial collection of images and boost performance in real-time situations when working with a constrained dataset. Further, transfer learning is used to test the classification efficiency of the dataset using algorithms such as MobileNet and Inception. TensorFlow lite is utilized to develop these algorithms for disease diagnosis utilizing drones in real-time. Several post-training optimization strategies are utilized, and their memory size is compared. They have proven their effectiveness by delivering high accuracy (up to 99%) and a decrease in memory and latency, making them ideal for use in applications with limited resources.

2.Making Thermal Imaging More Equitable and Accurate: Resolving Solar Loading Biases

Authors:Ellin Q. Zhao, Alexander Vilesov, Shreeram Athreya, Pradyumna Chari, Jeanette Merlos, Kendall Millett, Nia St. Cyr, Laleh Jalilian, Achuta Kadambi

Abstract: Thermal cameras and thermal point detectors are used to measure the temperature of human skin. These are important devices that are used everyday in clinical and mass screening settings, particularly in an epidemic. Unfortunately, despite the wide use of thermal sensors, the temperature estimates from thermal sensors do not work well in uncontrolled scene conditions. Previous work has studied the effect of wind and other environment factors on skin temperature, but has not considered the heating effect from sunlight, which is termed solar loading. Existing device manufacturers recommend that a subject who has been outdoors in sun re-acclimate to an indoor environment after a waiting period. The waiting period, up to 30 minutes, is insufficient for a rapid screening tool. Moreover, the error bias from solar loading is greater for darker skin tones since melanin absorbs solar radiation. This paper explores two approaches to address this problem. The first approach uses transient behavior of cooling to more quickly extrapolate the steady state temperature. A second approach explores the spatial modulation of solar loading, to propose single-shot correction with a wide-field thermal camera. A real world dataset comprising of thermal point, thermal image, subjective, and objective measurements of melanin is collected with statistical significance for the effect size observed. The single-shot correction scheme is shown to eliminate solar loading bias in the time of a typical frame exposure (33ms).

3.Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks

Authors:Ragnhild Holden Helland, Alexandros Ferles, André Pedersen, Ivar Kommers, Hilko Ardon, Frederik Barkhof, Lorenzo Bello, Mitchel S. Berger, Tora Dunås, Marco Conti Nibali, Julia Furtner, Shawn Hervey-Jumper, Albert J. S. Idema, Barbara Kiesel, Rishi Nandoe Tewari, Emmanuel Mandonnet, Domenique M. J. Müller, Pierre A. Robe, Marco Rossi, Lisa M. Sagberg, Tommaso Sciortino, Tom Aalders, Michiel Wagemakers, Georg Widhalm, Marnix G. Witte, Aeilko H. Zwinderman, Paulina L. Majewska, Asgeir S. Jakola, Ole Solheim, Philip C. De Witt Hamer, Ingerid Reinertsen, Roelant S. Eijgelaar, David Bouget

Abstract: Extent of resection after surgery is one of the main prognostic factors for patients diagnosed with glioblastoma. To achieve this, accurate segmentation and classification of residual tumor from post-operative MR images is essential. The current standard method for estimating it is subject to high inter- and intra-rater variability, and an automated method for segmentation of residual tumor in early post-operative MRI could lead to a more accurate estimation of extent of resection. In this study, two state-of-the-art neural network architectures for pre-operative segmentation were trained for the task. The models were extensively validated on a multicenter dataset with nearly 1000 patients, from 12 hospitals in Europe and the United States. The best performance achieved was a 61\% Dice score, and the best classification performance was about 80\% balanced accuracy, with a demonstrated ability to generalize across hospitals. In addition, the segmentation performance of the best models was on par with human expert raters. The predicted segmentations can be used to accurately classify the patients into those with residual tumor, and those with gross total resection.

4.Early detection of hip periprosthetic joint infections through CNN on Computed Tomography images

Authors:Francesco Guarnera, Alessia Rondinella, Oliver Giudice, Alessandro Ortis, Sebastiano Battiato, Francesco Rundo, Giorgio Fallica, Francesco Traina, Sabrina Conoci

Abstract: Early detection of an infection prior to prosthesis removal (e.g., hips, knees or other areas) would provide significant benefits to patients. Currently, the detection task is carried out only retrospectively with a limited number of methods relying on biometric or other medical data. The automatic detection of a periprosthetic joint infection from tomography imaging is a task never addressed before. This study introduces a novel method for early detection of the hip prosthesis infections analyzing Computed Tomography images. The proposed solution is based on a novel ResNeSt Convolutional Neural Network architecture trained on samples from more than 100 patients. The solution showed exceptional performance in detecting infections with an experimental high level of accuracy and F-score.

5.Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers -- A multi-institutional evaluation

Authors:Gustav Müller-Franzes, Fritz Müller-Franzes, Luisa Huck, Vanessa Raaff, Eva Kemmer, Firas Khader, Soroosh Tayebi Arasteh, Teresa Nolte, Jakob Nikolas Kather, Sven Nebelung, Christiane Kuhl, Daniel Truhn

Abstract: Accurate and automatic segmentation of fibroglandular tissue in breast MRI screening is essential for the quantification of breast density and background parenchymal enhancement. In this retrospective study, we developed and evaluated a transformer-based neural network for breast segmentation (TraBS) in multi-institutional MRI data, and compared its performance to the well established convolutional neural network nnUNet. TraBS and nnUNet were trained and tested on 200 internal and 40 external breast MRI examinations using manual segmentations generated by experienced human readers. Segmentation performance was assessed in terms of the Dice score and the average symmetric surface distance. The Dice score for nnUNet was lower than for TraBS on the internal testset (0.909$\pm$0.069 versus 0.916$\pm$0.067, P<0.001) and on the external testset (0.824$\pm$0.144 versus 0.864$\pm$0.081, P=0.004). Moreover, the average symmetric surface distance was higher (=worse) for nnUNet than for TraBS on the internal (0.657$\pm$2.856 versus 0.548$\pm$2.195, P=0.001) and on the external testset (0.727$\pm$0.620 versus 0.584$\pm$0.413, P=0.03). Our study demonstrates that transformer-based networks improve the quality of fibroglandular tissue segmentation in breast MRI compared to convolutional-based models like nnUNet. These findings might help to enhance the accuracy of breast density and parenchymal enhancement quantification in breast MRI screening.

6.A Comparison of Image Denoising Methods

Authors:Zhaoming Kong, Fangxi Deng, Haomin Zhuang, Xiaowei Yang, Jun Yu, Lifang He

Abstract: The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic representations and especially advanced deep neural network (DNN) architectures. Despite their sophistication, many methods may fail to achieve desirable results for simultaneous noise removal and fine detail preservation. In this paper, to investigate the applicability of existing denoising techniques, we compare a variety of denoising methods on both synthetic and real-world datasets for different applications. We also introduce a new dataset for benchmarking, and the evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Our experiments demonstrate: (i) the effectiveness and efficiency of representative traditional denoisers for various denoising tasks, (ii) a simple matrix-based algorithm may be able to produce similar results compared with its tensor counterparts, and (iii) the notable achievements of DNN models, which exhibit impressive generalization ability and show state-of-the-art performance on various datasets. In spite of the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets, code and results are made publicly available and will be continuously updated at

7.Performance of GAN-based augmentation for deep learning COVID-19 image classification

Authors:Oleksandr Fedoruk, Konrad Klimaszewski, Aleksander Ogonowski, Rafał Możdżonek

Abstract: The biggest challenge in the application of deep learning to the medical domain is the availability of training data. Data augmentation is a typical methodology used in machine learning when confronted with a limited data set. In a classical approach image transformations i.e. rotations, cropping and brightness changes are used. In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set. After assessing the quality of generated images they are used to increase the training data set improving its balance between classes. We consider the multi-class classification problem of chest X-ray images including the COVID-19 positive class that hasn't been yet thoroughly explored in the literature. Results of transfer learning-based classification of COVID-19 chest X-ray images are presented. The performance of several deep convolutional neural network models is compared. The impact on the detection performance of classical image augmentations i.e. rotations, cropping, and brightness changes are studied. Furthermore, classical image augmentation is compared with GAN-based augmentation. The most accurate model is an EfficientNet-B0 with an accuracy of 90.2 percent, trained on a dataset with a simple class balancing. The GAN augmentation approach is found to be subpar to classical methods for the considered dataset.

8.Detection and Classification of Glioblastoma Brain Tumor

Authors:Utkarsh Maurya, Appisetty Krishna Kalyan, Swapnil Bohidar, Dr. S. Sivakumar

Abstract: Glioblastoma brain tumors are highly malignant and often require early detection and accurate segmentation for effective treatment. We are proposing two deep learning models in this paper, namely UNet and Deeplabv3, for the detection and segmentation of glioblastoma brain tumors using preprocessed brain MRI images. The performance evaluation is done for these models in terms of accuracy and computational efficiency. Our experimental results demonstrate that both UNet and Deeplabv3 models achieve accurate detection and segmentation of glioblastoma brain tumors. However, Deeplabv3 outperforms UNet in terms of accuracy, albeit at the cost of requiring more computational resources. Our proposed models offer a promising approach for the early detection and segmentation of glioblastoma brain tumors, which can aid in effective treatment strategies. Further research can focus on optimizing the computational efficiency of the Deeplabv3 model while maintaining its high accuracy for real-world clinical applications. Overall, our approach works and contributes to the field of medical image analysis and deep learning-based approaches for brain tumor detection and segmentation. Our suggested models can have a major influence on the prognosis and treatment of people with glioblastoma, a fatal form of brain cancer. It is necessary to conduct more research to examine the practical use of these models in real-life healthcare settings.

9.Structure Preserving Cycle-GAN for Unsupervised Medical Image Domain Adaptation

Authors:Paolo Iacono, Naimul Khan

Abstract: The presence of domain shift in medical imaging is a common issue, which can greatly impact the performance of segmentation models when dealing with unseen image domains. Adversarial-based deep learning models, such as Cycle-GAN, have become a common model for approaching unsupervised domain adaptation of medical images. These models however, have no ability to enforce the preservation of structures of interest when translating medical scans, which can lead to potentially poor results for unsupervised domain adaptation within the context of segmentation. This work introduces the Structure Preserving Cycle-GAN (SP Cycle-GAN), which promotes medical structure preservation during image translation through the enforcement of a segmentation loss term in the overall Cycle-GAN training process. We demonstrate the structure preserving capability of the SP Cycle-GAN both visually and through comparison of Dice score segmentation performance for the unsupervised domain adaptation models. The SP Cycle-GAN is able to outperform baseline approaches and standard Cycle-GAN domain adaptation for binary blood vessel segmentation in the STARE and DRIVE datasets, and multi-class Left Ventricle and Myocardium segmentation in the multi-modal MM-WHS dataset. SP Cycle-GAN achieved a state of the art Myocardium segmentation Dice score (DSC) of 0.7435 for the MR to CT MM-WHS domain adaptation problem, and excelled in nearly all categories for the MM-WHS dataset. SP Cycle-GAN also demonstrated a strong ability to preserve blood vessel structure in the DRIVE to STARE domain adaptation problem, achieving a 4% DSC increase over a default Cycle-GAN implementation.

1.One-Class SVM on siamese neural network latent space for Unsupervised Anomaly Detection on brain MRI White Matter Hyperintensities

Authors:Nicolas Pinon MYRIAD, Robin Trombetta MYRIAD, Carole Lartizien MYRIAD

Abstract: Anomaly detection remains a challenging task in neuroimaging when little to no supervision is available and when lesions can be very small or with subtle contrast. Patch-based representation learning has shown powerful representation capacities when applied to industrial or medical imaging and outlier detection methods have been applied successfully to these images. In this work, we propose an unsupervised anomaly detection (UAD) method based on a latent space constructed by a siamese patch-based auto-encoder and perform the outlier detection with a One-Class SVM training paradigm tailored to the lesion detection task in multi-modality neuroimaging. We evaluate performances of this model on a public database, the White Matter Hyperintensities (WMH) challenge and show in par performance with the two best performing state-of-the-art methods reported so far.

2.Two-stage MR Image Segmentation Method for Brain Tumors based on Attention Mechanism

Authors:Li Zhu, Jiawei Jiang, Lin Lu, Jin Li

Abstract: Multimodal magnetic resonance imaging (MRI) can reveal different patterns of human tissue and is crucial for clinical diagnosis. However, limited by cost, noise and manual labeling, obtaining diverse and reliable multimodal MR images remains a challenge. For the same lesion, different MRI manifestations have great differences in background information, coarse positioning and fine structure. In order to obtain better generation and segmentation performance, a coordination-spatial attention generation adversarial network (CASP-GAN) based on the cycle-consistent generative adversarial network (CycleGAN) is proposed. The performance of the generator is optimized by introducing the Coordinate Attention (CA) module and the Spatial Attention (SA) module. The two modules can make full use of the captured location information, accurately locating the interested region, and enhancing the generator model network structure. The ability to extract the structure information and the detailed information of the original medical image can help generate the desired image with higher quality. There exist some problems in the original CycleGAN that the training time is long, the parameter amount is too large, and it is difficult to converge. In response to this problem, we introduce the Coordinate Attention (CA) module to replace the Res Block to reduce the number of parameters, and cooperate with the spatial information extraction network above to strengthen the information extraction ability. On the basis of CASP-GAN, an attentional generative cross-modality segmentation (AGCMS) method is further proposed. This method inputs the modalities generated by CASP-GAN and the real modalities into the segmentation network for brain tumor segmentation. Experimental results show that CASP-GAN outperforms CycleGAN and some state-of-the-art methods in PSNR, SSMI and RMSE in most tasks.

3.Towards Tumour Graph Learning for Survival Prediction in Head & Neck Cancer Patients

Authors:Angel Victor Juanco Muller, Joao F. C. Mota, Keith A. Goatman, Corne Hoogendoorn

Abstract: With nearly one million new cases diagnosed worldwide in 2020, head \& neck cancer is a deadly and common malignity. There are challenges to decision making and treatment of such cancer, due to lesions in multiple locations and outcome variability between patients. Therefore, automated segmentation and prognosis estimation approaches can help ensure each patient gets the most effective treatment. This paper presents a framework to perform these functions on arbitrary field of view (FoV) PET and CT registered scans, thus approaching tasks 1 and 2 of the HECKTOR 2022 challenge as team \texttt{VokCow}. The method consists of three stages: localization, segmentation and survival prediction. First, the scans with arbitrary FoV are cropped to the head and neck region and a u-shaped convolutional neural network (CNN) is trained to segment the region of interest. Then, using the obtained regions, another CNN is combined with a support vector machine classifier to obtain the semantic segmentation of the tumours, which results in an aggregated Dice score of 0.57 in task 1. Finally, survival prediction is approached with an ensemble of Weibull accelerated failure times model and deep learning methods. In addition to patient health record data, we explore whether processing graphs of image patches centred at the tumours via graph convolutions can improve the prognostic predictions. A concordance index of 0.64 was achieved in the test set, ranking 6th in the challenge leaderboard for this task.

4.Features-over-the-Air: Contrastive Learning Enabled Cooperative Edge Inference

Authors:Haotian Wu, Nitish Mital, Krystian Mikolajczyk, Deniz Gündüz

Abstract: We study the collaborative image retrieval problem at the wireless edge, where multiple edge devices capture images of the same object, which are then used jointly to retrieve similar images at the edge server over a shared multiple access channel. We propose a semantic non-orthogonal multiple access (NOMA) communication paradigm, in which extracted features from each device are mapped directly to channel inputs, which are then added over-the-air. We propose a novel contrastive learning (CL)-based semantic communication (CL-SC) paradigm, aiming to exploit signal correlations to maximize the retrieval accuracy under a total bandwidth constraints. Specifically, we treat noisy correlated signals as different augmentations of a common identity, and propose a cross-view CL algorithm to optimize the correlated signals in a coarse-to-fine fashion to improve retrieval accuracy. Extensive numerical experiments verify that our method achieves the state-of-the-art performance and can significantly improve retrieval accuracy, with particularly significant gains in low signla-to-noise ratio (SNR) and limited bandwidth regimes.

5.Deep-Learning-based Vascularture Extraction for Single-Scan Optical Coherence Tomography Angiography

Authors:Jinpeng Liao, Tianyu Zhang, Yilong Zhang, Chunhui Li, Zhihong Huang

Abstract: Optical coherence tomography angiography (OCTA) is a non-invasive imaging modality that extends the functionality of OCT by extracting moving red blood cell signals from surrounding static biological tissues. OCTA has emerged as a valuable tool for analyzing skin microvasculature, enabling more accurate diagnosis and treatment monitoring. Most existing OCTA extraction algorithms, such as speckle variance (SV)- and eigen-decomposition (ED)-OCTA, implement a larger number of repeated (NR) OCT scans at the same position to produce high-quality angiography images. However, a higher NR requires a longer data acquisition time, leading to more unpredictable motion artifacts. In this study, we propose a vasculature extraction pipeline that uses only one-repeated OCT scan to generate OCTA images. The pipeline is based on the proposed Vasculature Extraction Transformer (VET), which leverages convolutional projection to better learn the spatial relationships between image patches. In comparison to OCTA images obtained via the SV-OCTA (PSNR: 17.809) and ED-OCTA (PSNR: 18.049) using four-repeated OCT scans, OCTA images extracted by VET exhibit moderate quality (PSNR: 17.515) and higher image contrast while reducing the required data acquisition time from ~8 s to ~2 s. Based on visual observations, the proposed VET outperforms SV and ED algorithms when using neck and face OCTA data in areas that are challenging to scan. This study represents that the VET has the capacity to extract vascularture images from a fast one-repeated OCT scan, facilitating accurate diagnosis for patients.

6.Implicit Bayes Adaptation: A Collaborative Transport Approach

Authors:Bo Jiang, Hamid Krim, Tianfu Wu, Derya Cansever

Abstract: The power and flexibility of Optimal Transport (OT) have pervaded a wide spectrum of problems, including recent Machine Learning challenges such as unsupervised domain adaptation. Its essence of quantitatively relating two probability distributions by some optimal metric, has been creatively exploited and shown to hold promise for many real-world data challenges. In a related theme in the present work, we posit that domain adaptation robustness is rooted in the intrinsic (latent) representations of the respective data, which are inherently lying in a non-linear submanifold embedded in a higher dimensional Euclidean space. We account for the geometric properties by refining the $l^2$ Euclidean metric to better reflect the geodesic distance between two distinct representations. We integrate a metric correction term as well as a prior cluster structure in the source data of the OT-driven adaptation. We show that this is tantamount to an implicit Bayesian framework, which we demonstrate to be viable for a more robust and better-performing approach to domain adaptation. Substantiating experiments are also included for validation purposes.

7.Transformer with Selective Shuffled Position Embedding using ROI-Exchange Strategy for Early Detection of Knee Osteoarthritis

Authors:Zhe Wang, Aladine Chetouani, Rachid Jennane

Abstract: Knee OsteoArthritis (KOA) is a prevalent musculoskeletal disorder that causes decreased mobility in seniors. The lack of sufficient data in the medical field is always a challenge for training a learning model due to the high cost of labelling. At present, deep neural network training strongly depends on data augmentation to improve the model's generalization capability and avoid over-fitting. However, existing data augmentation operations, such as rotation, gamma correction, etc., are designed based on the data itself, which does not substantially increase the data diversity. In this paper, we proposed a novel approach based on the Vision Transformer (ViT) model with Selective Shuffled Position Embedding (SSPE) and a ROI-exchange strategy to obtain different input sequences as a method of data augmentation for early detection of KOA (KL-0 vs KL-2). More specifically, we fixed and shuffled the position embedding of ROI and non-ROI patches, respectively. Then, for the input image, we randomly selected other images from the training set to exchange their ROI patches and thus obtained different input sequences. Finally, a hybrid loss function was derived using different loss functions with optimized weights. Experimental results show that our proposed approach is a valid method of data augmentation as it can significantly improve the model's classification performance.

8.Morph-SSL: Self-Supervision with Longitudinal Morphing to Predict AMD Progression from OCT

Authors:Arunava Chakravarty, Taha Emre, Oliver Leingang, Sophie Riedl, Julia Mai, Hendrik P. N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje Bogunović

Abstract: The lack of reliable biomarkers makes predicting the conversion from intermediate to neovascular age-related macular degeneration (iAMD, nAMD) a challenging task. We develop a Deep Learning (DL) model to predict the future risk of conversion of an eye from iAMD to nAMD from its current OCT scan. Although eye clinics generate vast amounts of longitudinal OCT scans to monitor AMD progression, only a small subset can be manually labeled for supervised DL. To address this issue, we propose Morph-SSL, a novel Self-supervised Learning (SSL) method for longitudinal data. It uses pairs of unlabelled OCT scans from different visits and involves morphing the scan from the previous visit to the next. The Decoder predicts the transformation for morphing and ensures a smooth feature manifold that can generate intermediate scans between visits through linear interpolation. Next, the Morph-SSL trained features are input to a Classifier which is trained in a supervised manner to model the cumulative probability distribution of the time to conversion with a sigmoidal function. Morph-SSL was trained on unlabelled scans of 399 eyes (3570 visits). The Classifier was evaluated with a five-fold cross-validation on 2418 scans from 343 eyes with clinical labels of the conversion date. The Morph-SSL features achieved an AUC of 0.766 in predicting the conversion to nAMD within the next 6 months, outperforming the same network when trained end-to-end from scratch or pre-trained with popular SSL methods. Automated prediction of the future risk of nAMD onset can enable timely treatment and individualized AMD management.

1.Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration

Authors:Wenyang Liu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

Abstract: In this paper, we study a real-world JPEG image restoration problem with bit errors on the encrypted bitstream. The bit errors bring unpredictable color casts and block shifts on decoded image contents, which cannot be resolved by existing image restoration methods mainly relying on pre-defined degradation models in the pixel domain. To address these challenges, we propose a robust JPEG decoder, followed by a two-stage compensation and alignment framework to restore bitstream-corrupted JPEG images. Specifically, the robust JPEG decoder adopts an error-resilient mechanism to decode the corrupted JPEG bitstream. The two-stage framework is composed of the self-compensation and alignment (SCA) stage and the guided-compensation and alignment (GCA) stage. The SCA adaptively performs block-wise image color compensation and alignment based on the estimated color and block offsets via image content similarity. The GCA leverages the extracted low-resolution thumbnail from the JPEG header to guide full-resolution pixel-wise image restoration in a coarse-to-fine manner. It is achieved by a coarse-guided pix2pix network and a refine-guided bi-directional Laplacian pyramid fusion network. We conduct experiments on three benchmarks with varying degrees of bit error rates. Experimental results and ablation studies demonstrate the superiority of our proposed method. The code will be released at

2.Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Authors:Sijing Liu, Qilong Ying, Shuangchi He, Xin Yang, Dong Ni, Ruobing Huang

Abstract: Ultrasound is the primary modality to examine fetal growth during pregnancy, while the image quality could be affected by various factors. Quality assessment is essential for controlling the quality of ultrasound images to guarantee both the perceptual and diagnostic values. Existing automated approaches often require heavy structural annotations and the predictions may not necessarily be consistent with the assessment results by human experts. Furthermore, the overall quality of a scan and the correlation between the quality of frames should not be overlooked. In this work, we propose a reinforcement learning framework powered by two hierarchical agents that collaboratively learn to perform both frame-level and video-level quality assessments. It is equipped with a specially-designed reward mechanism that considers temporal dependency among frame quality and only requires sparse binary annotations to train. Experimental results on a challenging fetal brain dataset verify that the proposed framework could perform dual-level quality assessment and its predictions correlate well with the subjective assessment results.

3.Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method

Authors:Yixuan Li, Bolin Chen, Baoliang Chen, Meng Wang, Shiqi Wang

Abstract: Recent years have witnessed an exponential increase in the demand for face video compression, and the success of artificial intelligence has expanded the boundaries beyond traditional hybrid video coding. Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs, leveraging the statistical priors of face videos. However, the great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA). In this paper, we introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos. The database contains 3,240 compressed face video clips in multiple compression levels, which are derived from 135 source videos with diversified content using six representative video codecs, including two traditional methods based on hybrid coding frameworks, two end-to-end methods, and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) index for face video compression was developed to measure the perceptual quality, considering the distinct content characteristics and temporal priors of the face videos. Experimental results exhibit its superior performance on the proposed CFVQA dataset. The benchmark is now made publicly available at:

4.Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images

Authors:Misgina Tsighe Hagos, Niamh Belton, Ronan P. Killeen, Kathleen M. Curran, Brian Mac Namee

Abstract: Alzheimer's Disease (AD), which is the most common cause of dementia, is a progressive disease preceded by Mild Cognitive Impairment (MCI). Early detection of the disease is crucial for making treatment decisions. However, most of the literature on computer-assisted detection of AD focuses on classifying brain images into one of three major categories: healthy, MCI, and AD; or categorising MCI patients into one of (1) progressive: those who progress from MCI to AD at a future examination time during a given study period, and (2) stable: those who stay as MCI and never progress to AD. This misses the opportunity to accurately identify the trajectory of progressive MCI patients. In this paper, we revisit the brain image classification task for AD identification and re-frame it as an ordinal classification task to predict how close a patient is to the severe AD stage. To this end, we select progressive MCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and construct an ordinal dataset with a prediction target that indicates the time to progression to AD. We train a siamese network model to predict the time to onset of AD based on MRI brain images. We also propose a weighted variety of siamese networks and compare its performance to a baseline model. Our evaluations show that incorporating a weighting factor to siamese networks brings considerable performance gain at predicting how close input brain MRI images are to progressing to AD.

5.Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection

Authors:Ashay Patel, Petru-Danial Tudiosu, Walter H. L. Pinaya, Gary Cook, Vicky Goh, Sebastien Ourselin, M. Jorge Cardoso

Abstract: Cancer is a highly heterogeneous condition that can occur almost anywhere in the human body. 18F-fluorodeoxyglucose is an imaging modality commonly used to detect cancer due to its high sensitivity and clear visualisation of the pattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous, it is challenging to train general-purpose discriminative cancer detection models, with data availability and disease complexity often cited as a limiting factor. Unsupervised anomaly detection models have been suggested as a putative solution. These models learn a healthy representation of tissue and detect cancer by predicting deviations from the healthy norm, which requires models capable of accurately learning long-range interactions between organs and their imaging patterns with high levels of expressivity. Such characteristics are suitably satisfied by transformers, which have been shown to generate state-of-the-art results in unsupervised anomaly detection by training on normal data. This work expands upon such approaches by introducing multi-modal conditioning of the transformer via cross-attention i.e. supplying anatomical reference from paired CT. Using 294 whole-body PET/CT samples, we show that our anomaly detection method is robust and capable of achieving accurate cancer localization results even in cases where normal training data is unavailable. In addition, we show the efficacy of this approach on out-of-sample data showcasing the generalizability of this approach with limited training data. Lastly, we propose to combine model uncertainty with a new kernel density estimation approach, and show that it provides clinically and statistically significant improvements when compared to the classic residual-based anomaly maps. Overall, a superior performance is demonstrated against leading state-of-the-art alternatives, drawing attention to the potential of these approaches.

6.Robust thalamic nuclei segmentation from T1-weighted MRI

Authors:Julie P. Vidal, Lola Danet, Patrice Péran, Jérémie Pariente, Meritxell Bach Cuadra, Natalie M. Zahr, Emmanuel J. Barbeau, Manojkumar Saranathan

Abstract: Accurate segmentation of thalamic nuclei, crucial for understanding their role in healthy cognition and in pathologies, is challenging to achieve on standard T1-weighted (T1w) magnetic resonance imaging (MRI) due to poor image contrast. White-matter-nulled (WMn) MRI sequences improve intrathalamic contrast but are not part of clinical protocols or extant databases. Here, we introduce Histogram-based polynomial synthesis (HIPS), a fast preprocessing step that synthesizes WMn-like image contrast from standard T1w MRI using a polynomial approximation. HIPS was incorporated into our Thalamus Optimized Multi-Atlas Segmentation (THOMAS) pipeline, developed and optimized for WMn MRI. HIPS-THOMAS was compared to a convolutional neural network (CNN)-based segmentation method and THOMAS modified for T1w images (T1w-THOMAS). The robustness and accuracy of the three methods were tested across different image contrasts, scanner manufacturers, and field strength. HIPS-synthesized images improved intra-thalamic contrast and thalamic boundaries, and their segmentations yielded significantly better mean Dice, lower percentage of volume error, and lower standard deviations compared to both the CNN method and T1w-THOMAS. Finally, using THOMAS, HIPS-synthesized images were as effective as WMn images for identifying thalamic nuclei atrophy in alcohol use disorders subjects relative to healthy controls, with a higher area under the ROC curve compared to T1w-THOMAS (0.79 vs 0.73).

7.The University of California San Francisco, Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset

Authors:Jeffrey D. Rudie, Rachit Saluja David A. Weiss, Pierre Nedelec, Evan Calabrese, John B. Colby, Benjamin Laguna, John Mongan, Steve Braunstein, Christopher P. Hess, Andreas M. Rauschecker, Leo P. Sugrue, Javier E. Villanueva-Meyer

Abstract: The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) dataset is a public, clinical, multimodal brain MRI dataset consisting of 560 brain MRIs from 412 patients with expert annotations of 5136 brain metastases. Data consists of registered and skull stripped T1 post-contrast, T1 pre-contrast, FLAIR and subtraction (T1 pre-contrast - T1 post-contrast) images and voxelwise segmentations of enhancing brain metastases in NifTI format. The dataset also includes patient demographics, surgical status and primary cancer types. The UCSF-BSMR has been made publicly available in the hopes that researchers will use these data to push the boundaries of AI applications for brain metastases.

8.CAD-RADS scoring of coronary CT angiography with Multi-Axis Vision Transformer: a clinically-inspired deep learning pipeline

Authors:Alessia Gerbasi, Arianna Dagliati, Giuseppe Albi, Mattia Chiesa, Daniele Andreini, Andrea Baggiano, Saima Mushtaq, Gianluca Pontone, Riccardo Bellazzi, Gualtiero Colombo

Abstract: The standard non-invasive imaging technique used to assess the severity and extent of Coronary Artery Disease (CAD) is Coronary Computed Tomography Angiography (CCTA). However, manual grading of each patient's CCTA according to the CAD-Reporting and Data System (CAD-RADS) scoring is time-consuming and operator-dependent, especially in borderline cases. This work proposes a fully automated, and visually explainable, deep learning pipeline to be used as a decision support system for the CAD screening procedure. The pipeline performs two classification tasks: firstly, identifying patients who require further clinical investigations and secondly, classifying patients into subgroups based on the degree of stenosis, according to commonly used CAD-RADS thresholds. The pipeline pre-processes multiplanar projections of the coronary arteries, extracted from the original CCTAs, and classifies them using a fine-tuned Multi-Axis Vision Transformer architecture. With the aim of emulating the current clinical practice, the model is trained to assign a per-patient score by stacking the bi-dimensional longitudinal cross-sections of the three main coronary arteries along channel dimension. Furthermore, it generates visually interpretable maps to assess the reliability of the predictions. When run on a database of 1873 three-channel images of 253 patients collected at the Monzino Cardiology Center in Milan, the pipeline obtained an AUC of 0.87 and 0.93 for the two classification tasks, respectively. According to our knowledge, this is the first model trained to assign CAD-RADS scores learning solely from patient scores and not requiring finer imaging annotation steps that are not part of the clinical routine.

1.Generalizable Deep Learning Method for Suppressing Unseen and Multiple MRI Artifacts Using Meta-learning

Authors:Arun Palla, Sriprabha Ramanarayanan, Keerthi Ram, Mohanasankar Sivaprakasam

Abstract: Magnetic Resonance (MR) images suffer from various types of artifacts due to motion, spatial resolution, and under-sampling. Conventional deep learning methods deal with removing a specific type of artifact, leading to separately trained models for each artifact type that lack the shared knowledge generalizable across artifacts. Moreover, training a model for each type and amount of artifact is a tedious process that consumes more training time and storage of models. On the other hand, the shared knowledge learned by jointly training the model on multiple artifacts might be inadequate to generalize under deviations in the types and amounts of artifacts. Model-agnostic meta-learning (MAML), a nested bi-level optimization framework is a promising technique to learn common knowledge across artifacts in the outer level of optimization, and artifact-specific restoration in the inner level. We propose curriculum-MAML (CMAML), a learning process that integrates MAML with curriculum learning to impart the knowledge of variable artifact complexity to adaptively learn restoration of multiple artifacts during training. Comparative studies against Stochastic Gradient Descent and MAML, using two cardiac datasets reveal that CMAML exhibits (i) better generalization with improved PSNR for 83% of unseen types and amounts of artifacts and improved SSIM in all cases, and (ii) better artifact suppression in 4 out of 5 cases of composite artifacts (scans with multiple artifacts).

2.Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions

Authors:Luyang Luo, Xi Wang, Yi Lin, Xiaoqi Ma, Andong Tan, Ronald Chan, Vince Vardhanabhuti, Winnie CW Chu, Kwang-Ting Cheng, Hao Chen

Abstract: Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex context of breast imaging modalities. Considering the rapid improvement in the deep learning technology and the increasing severity of breast cancer, it is critical to summarize past progress and identify future challenges to be addressed. In this paper, we provide an extensive survey of deep learning-based breast cancer imaging research, covering studies on mammogram, ultrasound, magnetic resonance imaging, and digital pathology images over the past decade. The major deep learning methods, publicly available datasets, and applications on imaging-based screening, diagnosis, treatment response prediction, and prognosis are described in detail. Drawn from the findings of this survey, we present a comprehensive discussion of the challenges and potential avenues for future research in deep learning-based breast cancer imaging.

1.SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM

Authors:Yihao Liu, Jiaming Zhang, Zhangcong She, Amir Kheradmand, Mehran Armand

Abstract: The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub ( SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.

2.A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data

Authors:Chi-en Amy Tai, Hayden Gunraj, Alexander Wong

Abstract: Recently, a new form of magnetic resonance imaging (MRI) called synthetic correlated diffusion (CDI$^s$) imaging was introduced and showed considerable promise for clinical decision support for cancers such as prostate cancer when compared to current gold-standard MRI techniques. However, the efficacy for CDI$^s$ for other forms of cancers such as breast cancer has not been as well-explored nor have CDI$^s$ data been previously made publicly available. Motivated to advance efforts in the development of computer-aided clinical decision support for breast cancer using CDI$^s$, we introduce Cancer-Net BCa, a multi-institutional open-source benchmark dataset of volumetric CDI$^s$ imaging data of breast cancer patients. Cancer-Net BCa contains CDI$^s$ volumetric images from a pre-treatment cohort of 253 patients across ten institutions, along with detailed annotation metadata (the lesion type, genetic subtype, longest diameter on the MRI (MRLD), the Scarff-Bloom-Richardson (SBR) grade, and the post-treatment breast cancer pathologic complete response (pCR) to neoadjuvant chemotherapy). We further examine the demographic and tumour diversity of the Cancer-Net BCa dataset to gain deeper insights into potential biases. Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.

3.Unifying and Personalizing Weakly-supervised Federated Medical Image Segmentation via Adaptive Representation and Aggregation

Authors:Li Lin, Jiewei Wu, Yixiang Liu, Kenneth K. Y. Wong, Xiaoying Tang

Abstract: Federated learning (FL) enables multiple sites to collaboratively train powerful deep models without compromising data privacy and security. The statistical heterogeneity (e.g., non-IID data and domain shifts) is a primary obstacle in FL, impairing the generalization performance of the global model. Weakly supervised segmentation, which uses sparsely-grained (i.e., point-, bounding box-, scribble-, block-wise) supervision, is increasingly being paid attention to due to its great potential of reducing annotation costs. However, there may exist label heterogeneity, i.e., different annotation forms across sites. In this paper, we propose a novel personalized FL framework for medical image segmentation, named FedICRA, which uniformly leverages heterogeneous weak supervision via adaptIve Contrastive Representation and Aggregation. Concretely, to facilitate personalized modeling and to avoid confusion, a channel selection based site contrastive representation module is employed to adaptively cluster intra-site embeddings and separate inter-site ones. To effectively integrate the common knowledge from the global model with the unique knowledge from each local model, an adaptive aggregation module is applied for updating and initializing local models at the element level. Additionally, a weakly supervised objective function that leverages a multiscale tree energy loss and a gated CRF loss is employed to generate more precise pseudo-labels and further boost the segmentation performance. Through extensive experiments on two distinct medical image segmentation tasks of different modalities, the proposed FedICRA demonstrates overwhelming performance over other state-of-the-art personalized FL methods. Its performance even approaches that of fully supervised training on centralized data. Our code and data are available at

4.Multisensor fusion-based digital twin in additive manufacturing for in-situ quality monitoring and defect correction

Authors:Lequn Chen, Xiling Yao, Kui Liu, Chaolin Tan, Seung Ki Moon

Abstract: Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures. In this paper, we present a multisensor fusion-based digital twin for in-situ quality monitoring and defect correction in a robotic laser direct energy deposition process. Multisensor fusion sources consist of an acoustic sensor, an infrared thermal camera, a coaxial vision camera, and a laser line scanner. The key novelty and contribution of this work are to develop a spatiotemporal data fusion method that synchronizes and registers the multisensor features within the part's 3D volume. The fused dataset can be used to predict location-specific quality using machine learning. On-the-fly identification of regions requiring material addition or removal is feasible. Robot toolpath and auto-tuned process parameters are generated for defecting correction. In contrast to traditional single-sensor-based monitoring, multisensor fusion allows for a more in-depth understanding of underlying process physics, such as pore formation and laser-material interactions. The proposed methods pave the way for self-adaptation AM with higher efficiency, less waste, and cleaner production.

5.FetMRQC: Automated Quality Control for fetal brain MRI

Authors:Thomas Sanchez, Oscar Esteban, Yvan Gomez, Elisenda Eixarch, Meritxell Bach Cuadra

Abstract: Quality control (QC) has long been considered essential to guarantee the reliability of neuroimaging studies. It is particularly important for fetal brain MRI, where large and unpredictable fetal motion can lead to substantial artifacts in the acquired images. Existing methods for fetal brain quality assessment operate at the \textit{slice} level, and fail to get a comprehensive picture of the quality of an image, that can only be achieved by looking at the \textit{entire} brain volume. In this work, we propose FetMRQC, a machine learning framework for automated image quality assessment tailored to fetal brain MRI, which extracts an ensemble of quality metrics that are then used to predict experts' ratings. Based on the manual ratings of more than 1000 low-resolution stacks acquired across two different institutions, we show that, compared with existing quality metrics, FetMRQC is able to generalize out-of-domain, while being interpretable and data efficient. We also release a novel manual quality rating tool designed to facilitate and optimize quality rating of fetal brain images. Our tool, along with all the code to generate, train and evaluate the model will be released upon acceptance of the paper.

6.Automatic Aortic Valve Pathology Detection from 3-Chamber Cine MRI with Spatio-Temporal Attention Maps

Authors:Y. On, K. Vimalesvaran, C. Galazis, S. Zaman, J. Howard, N. Linton, N. Peters, G. Cole, A. A. Bharath, M. Varela

Abstract: The assessment of aortic valve pathology using magnetic resonance imaging (MRI) typically relies on blood velocity estimates acquired using phase contrast (PC) MRI. However, abnormalities in blood flow through the aortic valve often manifest by the dephasing of blood signal in gated balanced steady-state free precession (bSSFP) scans (Cine MRI). We propose a 3D classification neural network (NN) to automatically identify aortic valve pathology (aortic regurgitation, aortic stenosis, mixed valve disease) from Cine MR images. We train and test our approach on a retrospective clinical dataset from three UK hospitals, using single-slice 3-chamber cine MRI from N = 576 patients. Our classification model accurately predicts the presence of aortic valve pathology (AVD) with an accuracy of 0.85 +/- 0.03 and can also correctly discriminate the type of AVD pathology (accuracy: 0.75 +/- 0.03). Gradient-weighted class activation mapping (Grad-CAM) confirms that the blood pool voxels close to the aortic root contribute the most to the classification. Our approach can be used to improve the diagnosis of AVD and optimise clinical CMR protocols for accurate and efficient AVD detection.

7.Automated computed tomography and magnetic resonance imaging segmentation using deep learning: a beginner's guide

Authors:Diedre Carmo, Gustavo Pinheiro, Lívia Rodrigues, Thays Abreu, Roberto Lotufo, Letícia Rittner

Abstract: Medical image segmentation is an increasingly popular area of research in medical imaging processing and analysis. However, many researchers who are new to the field struggle with basic concepts. This tutorial paper aims to provide an overview of the fundamental concepts of medical imaging, with a focus on Magnetic Resonance and Computerized Tomography. We will also discuss deep learning algorithms, tools, and frameworks used for segmentation tasks, and suggest best practices for method development and image analysis. Our tutorial includes sample tasks using public data, and accompanying code is available on GitHub ( By sharing our insights gained from years of experience in the field and learning from relevant literature, we hope to assist researchers in overcoming the initial challenges they may encounter in this exciting and important area of research.

1.A Deep Analysis of Transfer Learning Based Breast Cancer Detection Using Histopathology Images

Authors:Md Ishtyaq Mahmud, Muntasir Mamun, Ahmed Abdelgawad

Abstract: Breast cancer is one of the most common and dangerous cancers in women, while it can also afflict men. Breast cancer treatment and detection are greatly aided by the use of histopathological images since they contain sufficient phenotypic data. A Deep Neural Network (DNN) is commonly employed to improve accuracy and breast cancer detection. In our research, we have analyzed pre-trained deep transfer learning models such as ResNet50, ResNet101, VGG16, and VGG19 for detecting breast cancer using the 2453 histopathology images dataset. Images in the dataset were separated into two categories: those with invasive ductal carcinoma (IDC) and those without IDC. After analyzing the transfer learning model, we found that ResNet50 outperformed other models, achieving accuracy rates of 90.2%, Area under Curve (AUC) rates of 90.0%, recall rates of 94.7%, and a marginal loss of 3.5%.

2.SFT-KD-Recon: Learning a Student-friendly Teacher for Knowledge Distillation in Magnetic Resonance Image Reconstruction

Authors:Matcha Naga Gayathri, Sriprabha Ramanarayanan, Mohammad Al Fahim, Rahul G S, Keerthi Ram, Mohanasankar Sivaprakasam

Abstract: Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to become marginal, indicating possible excess model capacity. Knowledge distillation (KD) is an emerging technique to compress these models, in which a trained deep teacher network is used to distill knowledge to a smaller student network such that the student learns to mimic the behavior of the teacher. Most KD methods focus on effectively training the student with a pre-trained teacher unaware of the student model. We propose SFT-KD-Recon, a student-friendly teacher training approach along with the student as a prior step to KD to make the teacher aware of the structure and capacity of the student and enable aligning the representations of the teacher with the student. In SFT, the teacher is jointly trained with the unfolded branch configurations of the student blocks using three loss terms - teacher-reconstruction loss, student-reconstruction loss, and teacher-student imitation loss, followed by KD of the student. We perform extensive experiments for MRI acceleration in 4x and 5x under-sampling on the brain and cardiac datasets on five KD methods using the proposed approach as a prior step. We consider the DC-CNN architecture and setup teacher as D5C5 (141765 parameters), and student as D3C5 (49285 parameters), denoting a compression of 2.87:1. Results show that (i) our approach consistently improves the KD methods with improved reconstruction performance and image quality, and (ii) the student distilled using our approach is competitive with the teacher, with the performance gap reduced from 0.53 dB to 0.03 dB.

3.Artificial intelligence based prediction on lung cancer risk factors using deep learning

Authors:Muhammad Sohaib, Mary Adewunmi

Abstract: In this proposed work, we identified the significant research issues on lung cancer risk factors. Capturing and defining symptoms at an early stage is one of the most difficult phases for patients. Based on the history of patients records, we reviewed a number of current research studies on lung cancer and its various stages. We identified that lung cancer is one of the significant research issues in predicting the early stages of cancer disease. This research aimed to develop a model that can detect lung cancer with a remarkably high level of accuracy using the deep learning approach (convolution neural network). This method considers and resolves significant gaps in previous studies. We compare the accuracy levels and loss values of our model with VGG16, InceptionV3, and Resnet50. We found that our model achieved an accuracy of 94% and a minimum loss of 0.1%. Hence physicians can use our convolution neural network models for predicting lung cancer risk factors in the real world. Moreover, this investigation reveals that squamous cell carcinoma, normal, adenocarcinoma, and large cell carcinoma are the most significant risk factors. In addition, the remaining attributes are also crucial for achieving the best performance.

4.Real time enhancement of operator's ergonomics in physical human - robot collaboration scenarios using a multi-stereo camera system

Authors:Gerasimos Arvanitis, Nikos Piperigkos, Christos Anagnostopoulos, Aris S. Lalos, Konstantinos Moustakas

Abstract: In collaborative tasks where humans work alongside machines, the robot's movements and behaviour can have a significant impact on the operator's safety, health, and comfort. To address this issue, we present a multi-stereo camera system that continuously monitors the operator's posture while they work with the robot. This system uses a novel distributed fusion approach to assess the operator's posture in real-time and to help avoid uncomfortable or unsafe positions. The system adjusts the robot's movements and informs the operator of any incorrect or potentially harmful postures, reducing the risk of accidents, strain, and musculoskeletal disorders. The analysis is personalized, taking into account the unique anthropometric characteristics of each operator, to ensure optimal ergonomics. The results of our experiments show that the proposed approach leads to improved human body postures and offers a promising solution for enhancing the ergonomics of operators in collaborative tasks.

5.Mask-conditioned latent diffusion for generating gastrointestinal polyp images

Authors:Roman Macháček, Leila Mozaffari, Zahra Sepasdar, Sravanthi Parasa, Pål Halvorsen, Michael A. Riegler, Vajira Thambawita

Abstract: In order to take advantage of AI solutions in endoscopy diagnostics, we must overcome the issue of limited annotations. These limitations are caused by the high privacy concerns in the medical field and the requirement of getting aid from experts for the time-consuming and costly medical data annotation process. In computer vision, image synthesis has made a significant contribution in recent years as a result of the progress of generative adversarial networks (GANs) and diffusion probabilistic models (DPM). Novel DPMs have outperformed GANs in text, image, and video generation tasks. Therefore, this study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given generated segmentation masks. Our experimental results show that our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps. To test the usefulness of the generated data, we trained binary image segmentation models to study the effect of using synthetic data. Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data. However, the results reflect that achieving good segmentation performance with synthetic data heavily depends on model architectures.

6.Deep-learning assisted detection and quantification of (oo)cysts of Giardia and Cryptosporidium on smartphone microscopy images

Authors:Suprim Nakarmi, Sanam Pudasaini, Safal Thapaliya, Pratima Upretee, Retina Shrestha, Basant Giri, Bhanu Bhakta Neupane, Bishesh Khanal

Abstract: The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of three state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts.

7.A comparative study between paired and unpaired Image Quality Assessment in Low-Dose CT Denoising

Authors:Francesco Di Feola, Lorenzo Tronchin, Paolo Soda

Abstract: The current deep learning approaches for low-dose CT denoising can be divided into paired and unpaired methods. The former involves the use of well-paired datasets, whilst the latter relaxes this constraint. The large availability of unpaired datasets has raised the interest in deepening unpaired denoising strategies that, in turn, need for robust evaluation techniques going beyond the qualitative evaluation. To this end, we can use quantitative image quality assessment scores that we divided into two categories, i.e., paired and unpaired measures. However, the interpretation of unpaired metrics is not straightforward, also because the consistency with paired metrics has not been fully investigated. To cope with this limitation, in this work we consider 15 paired and unpaired scores, which we applied to assess the performance of low-dose CT denoising. We perform an in-depth statistical analysis that not only studies the correlation between paired and unpaired metrics but also within each category. This brings out useful guidelines that can help researchers and practitioners select the right measure for their applications.

1.HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains

Authors:Huanjing Yue, Yubo Peng, Biting Yu, Xuanwu Yin, Zhenyu Zhou, Jingyu Yang

Abstract: High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper.

2.ADS_UNet: A Nested UNet for Histopathology Image Segmentation

Authors:Yilong Yang, Srinandan Dasmahapatra, Sasan Mahmoodi

Abstract: The UNet model consists of fully convolutional network (FCN) layers arranged as contracting encoder and upsampling decoder maps. Nested arrangements of these encoder and decoder maps give rise to extensions of the UNet model, such as UNete and UNet++. Other refinements include constraining the outputs of the convolutional layers to discriminate between segment labels when trained end to end, a property called deep supervision. This reduces feature diversity in these nested UNet models despite their large parameter space. Furthermore, for texture segmentation, pixel correlations at multiple scales contribute to the classification task; hence, explicit deep supervision of shallower layers is likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise additive training algorithm that incorporates resource-efficient deep supervision in shallower layers and takes performance-weighted combinations of the sub-UNets to create the segmentation model. We provide empirical evidence on three histopathology datasets to support the claim that the proposed ADS UNet reduces correlations between constituent features and improves performance while being more resource efficient. We demonstrate that ADS_UNet outperforms state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training time as that required by Transformers.

3.Reconstruction-driven Dynamic Refinement based Unsupervised Domain Adaptation for Joint Optic Disc and Cup Segmentation

Authors:Ziyang Chen, Yongsheng Pan, Yong Xia

Abstract: Glaucoma is one of the leading causes of irreversible blindness. Segmentation of optic disc (OD) and optic cup (OC) on fundus images is a crucial step in glaucoma screening. Although many deep learning models have been constructed for this task, it remains challenging to train an OD/OC segmentation model that could be deployed successfully to different healthcare centers. The difficulties mainly comes from the domain shift issue, i.e., the fundus images collected at these centers usually vary greatly in the tone, contrast, and brightness. To address this issue, in this paper, we propose a novel unsupervised domain adaptation (UDA) method called Reconstruction-driven Dynamic Refinement Network (RDR-Net), where we employ a due-path segmentation backbone for simultaneous edge detection and region prediction and design three modules to alleviate the domain gap. The reconstruction alignment (RA) module uses a variational auto-encoder (VAE) to reconstruct the input image and thus boosts the image representation ability of the network in a self-supervised way. It also uses a style-consistency constraint to force the network to retain more domain-invariant information. The low-level feature refinement (LFR) module employs input-specific dynamic convolutions to suppress the domain-variant information in the obtained low-level features. The prediction-map alignment (PMA) module elaborates the entropy-driven adversarial learning to encourage the network to generate source-like boundaries and regions. We evaluated our RDR-Net against state-of-the-art solutions on four public fundus image datasets. Our results indicate that RDR-Net is superior to competing models in both segmentation performance and generalization ability

4.Accelerated deep self-supervised ptycho-laminography for three-dimensional nanoscale imaging of integrated circuits

Authors:Iksung Kang, Yi Jiang, Mirko Holler, Manuel Guizar-Sicairos, A. F. J. Levi, Jeffrey Klug, Stefan Vogt, George Barbastathis

Abstract: Three-dimensional inspection of nanostructures such as integrated circuits is important for security and reliability assurance. Two scanning operations are required: ptychographic to recover the complex transmissivity of the specimen; and rotation of the specimen to acquire multiple projections covering the 3D spatial frequency domain. Two types of rotational scanning are possible: tomographic and laminographic. For flat, extended samples, for which the full 180 degree coverage is not possible, the latter is preferable because it provides better coverage of the 3D spatial frequency domain compared to limited-angle tomography. It is also because the amount of attenuation through the sample is approximately the same for all projections. However, both techniques are time consuming because of extensive acquisition and computation time. Here, we demonstrate the acceleration of ptycho-laminographic reconstruction of integrated circuits with 16-times fewer angular samples and 4.67-times faster computation by using a physics-regularized deep self-supervised learning architecture. We check the fidelity of our reconstruction against a densely sampled reconstruction that uses full scanning and no learning. As already reported elsewhere [Zhou and Horstmeyer, Opt. Express, 28(9), pp. 12872-12896], we observe improvement of reconstruction quality even over the densely sampled reconstruction, due to the ability of the self-supervised learning kernel to fill the missing cone.

5.Localise to segment: crop to improve organ at risk segmentation accuracy

Authors:Abraham George Smith, Denis Kutnár, Ivan Richter Vogelius, Sune Darkner, Jens Petersen

Abstract: Increased organ at risk segmentation accuracy is required to reduce cost and complications for patients receiving radiotherapy treatment. Some deep learning methods for the segmentation of organs at risk use a two stage process where a localisation network first crops an image to the relevant region and then a locally specialised network segments the cropped organ of interest. We investigate the accuracy improvements brought about by such a localisation stage by comparing to a single-stage baseline network trained on full resolution images. We find that localisation approaches can improve both training time and stability and a two stage process involving both a localisation and organ segmentation network provides a significant increase in segmentation accuracy for the spleen, pancreas and heart from the Medical Segmentation Decathlon dataset. We also observe increased benefits of localisation for smaller organs. Source code that recreates the main results is available at \href{}{this https URL}.

6.Brain Extraction comparing Segment Anything Model (SAM) and FSL Brain Extraction Tool

Authors:Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug

Abstract: Brain extraction is a critical preprocessing step in almost every neuroimaging study, enabling accurate segmentation and analysis of Magnetic Resonance Imaging (MRI) data. FSL's Brain Extraction Tool (BET), although considered the current gold standard, presents limitations such as over-extraction, which can be particularly problematic in brains with lesions affecting the outer regions, inaccurate differentiation between brain tissue and surrounding meninges, and susceptibility to image quality issues. Recent advances in computer vision research have led to the development of the Segment Anything Model (SAM) by Meta AI, which has demonstrated remarkable potential across a wide range of applications. In this paper, we present a comparative analysis of brain extraction techniques using BET and SAM on a variety of brain scans with varying image qualities, MRI sequences, and brain lesions affecting different brain regions. We find that SAM outperforms BET based on several metrics, particularly in cases where image quality is compromised by signal inhomogeneities, non-isotropic voxel resolutions, or the presence of brain lesions that are located near or involve the outer regions of the brain and the meninges. These results suggest that SAM has the potential to emerge as a more accurate and precise tool for a broad range of brain extraction applications.

7.SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

Authors:Saikat Roy, Tassilo Wald, Gregor Koehler, Maximilian R. Rokuss, Nico Disch, Julius Holzschuh, David Zimmerer, Klaus H. Maier-Hein

Abstract: Foundation models have taken over natural language processing and image generation domains due to the flexibility of prompting. With the recent introduction of the Segment Anything Model (SAM), this prompt-driven paradigm has entered image segmentation with a hitherto unexplored abundance of capabilities. The purpose of this paper is to conduct an initial evaluation of the out-of-the-box zero-shot capabilities of SAM for medical image segmentation, by evaluating its performance on an abdominal CT organ segmentation task, via point or bounding box based prompting. We show that SAM generalizes well to CT data, making it a potential catalyst for the advancement of semi-automatic segmentation tools for clinicians. We believe that this foundation model, while not reaching state-of-the-art segmentation performance in our investigations, can serve as a highly potent starting point for further adaptations of such models to the intricacies of the medical domain. Keywords: medical image segmentation, SAM, foundation models, zero-shot learning

8.LCDctCNN: Lung Cancer Diagnosis of CT scan Images Using CNN Based Model

Authors:Muntasir Mamun, Md Ishtyaq Mahmud, Mahabuba Meherin, Ahmed Abdelgawad

Abstract: The most deadly and life-threatening disease in the world is lung cancer. Though early diagnosis and accurate treatment are necessary for lowering the lung cancer mortality rate. A computerized tomography (CT) scan-based image is one of the most effective imaging techniques for lung cancer detection using deep learning models. In this article, we proposed a deep learning model-based Convolutional Neural Network (CNN) framework for the early detection of lung cancer using CT scan images. We also have analyzed other models for instance Inception V3, Xception, and ResNet-50 models to compare with our proposed model. We compared our models with each other considering the metrics of accuracy, Area Under Curve (AUC), recall, and loss. After evaluating the model's performance, we observed that CNN outperformed other models and has been shown to be promising compared to traditional methods. It achieved an accuracy of 92%, AUC of 98.21%, recall of 91.72%, and loss of 0.328.