Attcatvgg-Net: an Explainable Multioutput Deep Learning Framework for Cataract Stage Classification and Visual Acuity Regression using Multicolor Fundus Images
Attcatvgg-Net: an Explainable Multioutput Deep Learning Framework for Cataract Stage Classification and Visual Acuity Regression using Multicolor Fundus Images
Nazarpour-Servak, M.; Taghinezhad, N.; Mahmoudi, T.; Azimi, A.; Nowroozzadeh, M. H.
AbstractPurpose: The purpose of this study is to develop and evaluate an attention-guided deep learning model using the multicolor imaging module of Spectralis Optical Coherence Tomography (OCT) imaging for automated cataract severity classification and Visual Acuity (VA) prediction. Methods: We analyzed 314 multicolor fundus images from 169 patients. Images were preprocessed using an enhanced Retinex algorithm and segmented into three concentric macular zones: Zone 1 (fovea, central 1.5 mm diameter), Zone 2 (parafovea, 1.5-2.5 mm ring), and Zone 3 (perifovea, >2.5 mm radius). A multi-output convolutional neural network (AttCatVgg-Net), based on VGG-16 and enhanced with a Convolutional Block Attention Module (CBAM), was trained to simultaneously perform three-class cataract classification (normal to mild, moderate, severe) and visual acuity (VA) regression. Model performance was assessed using accuracy, AUC, F1-score, and regression metrics. Statistical analyses included the Wilcoxon signed-rank test and the Spearman correlation test. Results: For cataract grading, the integrated model using all wavelengths and zones achieved 92.5% accuracy, 94.7% area under the ROC curve (AUC), and a 92.1% F1-score. The green channel alone achieved 90.1% accuracy and 0.93 AUC, while the red channel yielded lower performance (76.3% accuracy, 0.83 AUC). Among anatomical zones, Zone 1 (fovea) and Zone 3 achieved 84.3% and 84.71% accuracy and 0.88 and 0.89 AUC, respectively, whereas Zone 2 underperformed (60.41% accuracy, 0.71 AUC). For visual acuity prediction, the full model achieved a mean absolute error (MAE) of 0.1181 and a coefficient of determination (R-squared) of 0.7759. The green channel demonstrated the strongest correlation with actual VA (correlation coefficient = 0.823, p < 0.001), followed by green-red (0.817) and blue (0.809). The green channel also achieved the lowest Mean Squared Error (MSE = 0.0369) and Root Mean Square Error (RMSE = 0.1920), outperforming other channels. Conclusions: Attention-guided deep learning applied to Spectralis OCT multicolor imaging enables accurate, objective classification of cataract severity and estimation of cataract-related visual acuity loss. Keywords: Cataract Classification, Visual Acuity Prediction, Multicolor Imaging, Deep Learning, Attention Mechanism, CBAM, Multioutput Deep Learning