From Deep to Long Learning?
The article discusses the development of sequence models that can handle longer sequences with more context. - While traditional attention-based Transformers scale quadratically with sequence length, the authors have developed models based on structured state space models (SSMs) that scale nearly li.. read more









