Science Cast

Scaling Transformer to 1M tokens and beyond with RMT

paulclintonFebruary 26, 2024 11:35am

Views (323)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Scaling Transformer to 1M tokens and beyond with RMT

arXivPDFApril 19, 2023 12:00am

Authors

Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev

Abstract

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up to an unprecedented two million tokens while maintaining high retrieval accuracy. Experiments with language modeling tasks show perplexity improvement as the number of processed input segments increases. These results underscore the effectiveness of our method, which has significant potential to enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.

TwitterandLinkedIn

0 comments

Add comment

Scaling Transformer to 1M tokens and beyond with RMT

Scaling Transformer to 1M tokens and beyond with RMT

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments