Machine learningDeep learning / NLP / CV

Self-supervised Transformer

A self-supervised Transformer is a Transformer network pretrained using automatically constructed supervision signals — such as masked token prediction or next-sentence prediction — rather than human-annotated labels. The resulting representations are then fine-tuned or probed on downstream tasks. BERT, GPT, and ViT (Vision Transformer in masked-image modeling mode) are the most widely known instantiations of this paradigm.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186. DOI: 10.18653/v1/N19-1423
  2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. link

Related methods

Referenced by

ScholarGateSelf-supervised Transformer (Self-supervised Transformer (Pretraining with Self-generated Supervision)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/self-supervised-transformer