File size: 1,889 Bytes
04e93aa 0c503b1 04e93aa 0c503b1 0f617a6 0c503b1 9310586 0c503b1 9310586 0c503b1 9310586 0c503b1 f6d6f66 0c503b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: apache-2.0
tags:
- text-classification
- language-identification
library_name: fasttext
datasets:
- cis-lmu/GlotSparse
- cis-lmu/GlotStoryBook
metrics:
- f1
---
# GlotLID
## Description
GlotLID is a Fasttext language identification (LID) model for around 2000 languages.
### How to use
Here is how to use this model to detect the language of a given text:
```python
>>> import fasttext
>>> from huggingface_hub import hf_hub_download
>>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")
```
If you are not a fan of huggingface_hub, then download the model directyly:
```python
>>> ! wget https://huggingface.co/cis-lmu/glotlid/resolve/main/model.bin
```
```python
>>> import fasttext
>>> model = fasttext.load_model("/path/to/model.bin")
>>> model.predict("Hello, world!")
```
## License
The model is distributed under the Apache License, Version 2.0.
## Version
We always maintain the previous version of GlotLID in our repository.
To access a specific version, simply append the version number to the `filename`.
- For v1: `model_v1.bin` (introduced in the GlotLID paper and used in all experiments).
- For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).
`model.bin` always refers to the latest version (v2).
## References
If you use this model, please cite the following paper:
```
@inproceedings{
kargaran2023glotlid,
title={{GlotLID}: Language Identification for Low-Resource Languages},
author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=dl4e3EBz5j}
}
``` |