Process / pipeline
Language Identification (LID)
Language identification is a natural-language-processing task that automatically detects which language a piece of text is written in. Building on off-the-shelf tools such as langid.py (Lui & Baldwin, 2012) and the efficient classifiers of Joulin et al. (2017), it is widely used to preprocess and filter multilingual data sets.
Open in MethodMindSoonVideoSoon
Read the full method
Members only
Sign inSign in with a free account to read this section.