Science Cast

Attention Is Not All You Need Anymore

Zhe ChenFebruary 26, 2024 11:36am

Views (276)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Attention Is Not All You Need Anymore

arXivPDF

Authors

Zhe Chen

Abstract

In recent years, the popular Transformer architecture has achieved great success in many application areas, including natural language processing and computer vision. Many existing works aim to reduce the computational and memory complexity of the self-attention mechanism in the Transformer by trading off performance. However, performance is key for the continuing success of the Transformer. In this paper, a drop-in replacement for the self-attention mechanism in the Transformer, called the Extractor, is proposed. Experimental results show that replacing the self-attention mechanism with the Extractor improves the performance of the Transformer. Furthermore, the proposed Extractor has the potential to run faster than the self-attention since it has a much shorter critical path of computation. Additionally, the sequence prediction problem in the context of text generation is formulated using variable-length discrete-time Markov chains, and the Transformer is reviewed based on our understanding.

TwitterandLinkedIn

0 comments

Add comment

Attention Is Not All You Need Anymore

Attention Is Not All You Need Anymore

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments