Machine learning

Longformer / BigBird

Long-sequence Transformers such as Longformer (Beltagy, Peters & Cohan, 2020) and BigBird (Zaheer et al., 2020) replace the standard Transformer's O(n²) attention with sparse attention patterns that scale linearly, O(n), with sequence length. This lets a single model attend over thousands of tokens — full documents, legal texts, or genomic sequences — that would not fit a conventional Transformer.

MethodMind'de açSoonVideoSoon

Tam yöntemi oku

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Beltagy, I., Peters, M. E. & Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv. link
  2. Zaheer, M. et al. (2020). Big Bird: Transformers for Longer Sequences. NeurIPS. link

Related methods

Referenced by

ScholarGateLongformer / BigBird (Long-Sequence Transformers with Sparse Attention (Longformer / BigBird)). Retrieved 2026-06-04 from https://scholargate.app/tr/deep-learning/longformer-bigbird