Process / pipeline

Language Identification (LID)

Language identification is a natural-language-processing task that automatically detects which language a piece of text is written in. Building on off-the-shelf tools such as langid.py (Lui & Baldwin, 2012) and the efficient classifiers of Joulin et al. (2017), it is widely used to preprocess and filter multilingual data sets.

MethodMind'de açSoonVideoSoon

Tam yöntemi oku

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Lui, M. & Baldwin, T. (2012). langid.py: An Off-the-shelf Language Identification Tool. Proceedings of the ACL 2012 System Demonstrations. link
  2. Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2017). Bag of Tricks for Efficient Text Classification. Proceedings of the EACL 2017. link

Related methods

Referenced by

ScholarGateLanguage Identification (Language Identification (LID)). Retrieved 2026-06-04 from https://scholargate.app/tr/text-mining/language-identification