lpq29743 kargaranamir commited on
Commit
cbf952c
·
1 Parent(s): c762441

add readme update (#1)

Browse files

- add readme update (918c7d2a532aa7038160e30e8f36fda7dc17dd28)


Co-authored-by: Amir Hossein Kargaran <kargaranamir@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +41 -2
README.md CHANGED
@@ -2,7 +2,46 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # Glot500
6
 
7
- Pretrained model on 500+ languages using a masked language modeling (MLM) objective. It was introduced in
8
  [this paper]() (ACL 2023) and first released in [this repository](https://github.com/cisnlp/Glot500).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # Glot500 (base-sized model)
6
 
7
+ Glot500 model (Glot500-m) pre-trained on 500+ languages using a masked language modeling (MLM) objective. It was introduced in
8
  [this paper]() (ACL 2023) and first released in [this repository](https://github.com/cisnlp/Glot500).
9
+
10
+
11
+ ## Usage
12
+
13
+ You can use this model directly with a pipeline for masked language modeling:
14
+
15
+ ```python
16
+ >>> from transformers import pipeline
17
+ >>> unmasker = pipeline('fill-mask', model='cis-lmu/glot500-base')
18
+ >>> unmasker("Hello I'm a <mask> model.")
19
+ ```
20
+
21
+
22
+ Here is how to use this model to get the features of a given text in PyTorch:
23
+
24
+ ```python
25
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
26
+
27
+ tokenizer = AutoTokenizer.from_pretrained('cis-lmu/glot500-base')
28
+ model = AutoModelForMaskedLM.from_pretrained("cis-lmu/glot500-base")
29
+
30
+ # prepare input
31
+ text = "Replace me by any text you'd like."
32
+ encoded_input = tokenizer(text, return_tensors='pt')
33
+
34
+ # forward pass
35
+ output = model(**encoded_input)
36
+ ```
37
+
38
+ ### BibTeX entry and citation info
39
+
40
+ ```bibtex
41
+ @inproceedings{imani-etal-2023-glot500,
42
+ title = "Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages",
43
+ author = " ImaniGooghari, Ayyoob and Lin, Peiqin and Kargaran, Amir Hossein and Severini, Silvia and Sabet, Masoud Jalili and Kassner, Nora and Ma, Chunlan and Schmid, Helmut and Martins, André and Yvon, François and Sch{\"u}tze, Hinrich",
44
+ booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.",
45
+ year = "2023",
46
+ }
47
+ ```