langid.py: An off-the-shelf language identification tool. We present langid.py, an off-the-shelf language identification tool. We discuss the design and implementation of langid.py, and provide an empirical comparison on 5 long-document datasets, and 2 datasets from the microblog domain. We find that langid.py maintains consistently high accuracy across all domains, making it ideal for end-users that require language identification without wanting to invest in preparation of in-domain training data.
Keywords for this software
References in zbMATH (referenced in 2 articles )
Showing results 1 to 2 of 2.
- Lim, Kar Wai; Buntine, Wray: Bibliographic analysis on research publications using authors, categorical labels and the citation network (2016)
- Lim, Kar Wai; Buntine, Wray; Chen, Changyou; Du, Lan: Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes (2016)