Machine learningDeep learning / NLP / CV
Semi-supervised Transformer
Semi-supervised learning with Transformer architectures leverages large quantities of unlabeled data alongside a small labeled set to train powerful sequence models. The dominant pattern — exemplified by BERT — first pre-trains the Transformer on unlabeled data using self-supervised objectives such as masked token prediction, then fine-tunes it on the labeled task. This two-stage approach dramatically reduces the labeled data needed to achieve strong performance.
MethodMind'de açSoonVideoSoon
Tam yöntemi oku
Members only
Sign inSign in with a free account to read this section.
Sources
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186. DOI: 10.18653/v1/N19-1423 ↗
- Zoph, B., Ghiasi, G., Lin, T.-Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. V. (2020). Rethinking Pre-training and Self-training. Advances in Neural Information Processing Systems (NeurIPS), 33, 3833–3845. link ↗
Related methods
Referenced by
Semi-supervised BERT-based ClassificationSemi-supervised GRUSemi-supervised LDA Topic ModelSemi-supervised NMF Topic ModelSemi-supervised Question AnsweringSemi-supervised Reinforcement LearningSemi-supervised RoBERTa-based ClassificationSemi-supervised Sentence EmbeddingsSemi-supervised Variational AutoencoderWeakly supervised transformer