Machine learning

Knowledge Distillation

Knowledge Distillation is a model-compression technique, introduced by Geoffrey Hinton and colleagues in 2015, that trains a small student model using the soft-label outputs of a large teacher model. Distilled models such as DistilBERT and TinyBERT reach roughly 97% of the larger model's performance while running far faster.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Hinton, G., Vinyals, O. & Dean, J. (2015). Distilling the Knowledge in a Neural Network. NeurIPS Deep Learning Workshop. link
  2. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108. link

Related methods

Referenced by

ScholarGateKnowledge Distillation (Knowledge Distillation (Teacher–Student Model Compression)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/knowledge-distillation