Text Classification
fastText
2155 languages
language-identification
File size: 1,150 Bytes
04e93aa
 
0c503b1
 
 
 
 
 
 
 
 
04e93aa
0c503b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f617a6
0c503b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: apache-2.0
tags:
- text-classification
- language-identification
library_name: fasttext
datasets:
- cis-lmu/GlotSparse
- cis-lmu/GlotStoryBook
metrics:
- f1
---


# GlotLID

## Description

GlotLID is a Fasttext language identification (LID) model for around 2000 languages.


### How to use

Here is how to use this model to detect the language of a given text:

```python
>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")

```

## License

The model is distributed under the Apache License, Version 2.0.

## References

If you use this model, please cite the following paper:

```
@inproceedings{
  kargaran2023glotlid,
  title={{GlotLID}: Language Identification for Low-Resource Languages},
  author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
  booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
  year={2023},
  url={https://openreview.net/forum?id=dl4e3EBz5j}
}

```