Process / pipeline

Language Identification (LID)

Language identification is a natural-language-processing task that automatically detects which language a piece of text is written in. Building on off-the-shelf tools such as langid.py (Lui & Baldwin, 2012) and the efficient classifiers of Joulin et al. (2017), it is widely used to preprocess and filter multilingual data sets.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Lui, M. & Baldwin, T. (2012). langid.py: An Off-the-shelf Language Identification Tool. Proceedings of the ACL 2012 System Demonstrations. link
  2. Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2017). Bag of Tricks for Efficient Text Classification. Proceedings of the EACL 2017. link

Related methods

Referenced by

ScholarGateLanguage Identification (Language Identification (LID)). Retrieved 2026-06-04 from https://scholargate.app/en/text-mining/language-identification