Semi-supervised Doc2Vec extends the Paragraph Vector framework of Le and Mikolov (2014) by training dense document embeddings on both labeled and unlabeled corpora simultaneously, using available class labels as an auxiliary signal to steer the representation toward task-relevant structure while still exploiting the full unlabeled collection for generalization.
Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), PMLR 32(2), 1188–1196. link ↗