Science Cast

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

Xu ZhengOctober 12, 2023 5:57am

Views (34)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

arXivPDFOctober 11, 2023 12:00am

Authors

Xu Zheng, Yunhao Luo, Pengyuan Zhou, Lin Wang

Abstract

In this paper, we tackle a new problem: how to transfer knowledge from the pre-trained cumbersome yet well-performed CNN-based model to learn a compact Vision Transformer (ViT)-based model while maintaining its learning capacity? Due to the completely different characteristics of ViT and CNN and the long-existing capacity gap between teacher and student models in Knowledge Distillation (KD), directly transferring the cross-model knowledge is non-trivial. To this end, we subtly leverage the visual and linguistic-compatible feature character of ViT (i.e., student), and its capacity gap with the CNN (i.e., teacher) and propose a novel CNN-to-ViT KD framework, dubbed C2VKD. Importantly, as the teacher's features are heterogeneous to those of the student, we first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations. Moreover, due to the large capacity gap between the teacher and student and the inevitable prediction errors of the teacher, we then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes. Experiments on three semantic segmentation benchmark datasets consistently show that the increment of mIoU of our method is over 200% of the SoTA KD methods

TwitterandLinkedIn

0 comments

Add comment

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments