guillermoruiz commited on
Commit
6641744
·
verified ·
1 Parent(s): 562cfb9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -7,6 +7,7 @@ metrics:
7
  pipeline_tag: fill-mask
8
  widget:
9
  - text: Vamos a comer unos [MASK]
 
10
  tags:
11
  - code
12
  - nlp
@@ -15,4 +16,51 @@ tags:
15
  tokenizer:
16
  - yes
17
  ---
18
- # BILMA (Bert In Latin America)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: fill-mask
8
  widget:
9
  - text: Vamos a comer unos [MASK]
10
+ example_title: "Vamos a comer unos tacos"
11
  tags:
12
  - code
13
  - nlp
 
16
  tokenizer:
17
  - yes
18
  ---
19
+ # BILMA (Bert In Latin aMericA)
20
+
21
+ Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.
22
+
23
+ The accuracy of the models trained on the MLM task for different regions are:
24
+
25
+ ![bilma-mlm-comp](https://user-images.githubusercontent.com/392873/163045798-89bd45c5-b654-4f16-b3e2-5cf404e12ddd.png)
26
+
27
+ # Pre-requisites
28
+
29
+ You will need TensorFlow 2.4 or newer.
30
+
31
+ # Quick guide
32
+
33
+ You can see the demo notebooks for a quick guide on how to use the models.
34
+
35
+ Clone this repository and then run
36
+ ```
37
+ bash download-emoji15-bilma.sh
38
+ ```
39
+
40
+ to download the MX model. Then to load the model you can use the code:
41
+ ```
42
+ from bilma import bilma_model
43
+ vocab_file = "vocab_file_All.txt"
44
+ model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5"
45
+ model = bilma_model.load(model_file)
46
+ tokenizer = bilma_model.tokenizer(vocab_file=vocab_file,
47
+ max_length=280)
48
+ ```
49
+
50
+ Now you will need some text:
51
+ ```
52
+ texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.",
53
+ "Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija",
54
+ "Vamos a comer unos tacos",
55
+ "Los del banco no dejan de llamarme"]
56
+ toks = tokenizer.tokenize(texts)
57
+ ```
58
+
59
+ With this, you are ready to use the model
60
+ ```
61
+ p = model.predict(toks)
62
+ tokenizer.decode_emo(p[1])
63
+ ```
64
+
65
+ which produces the output: ![emoji-output](https://user-images.githubusercontent.com/392873/165176270-77dd32ca-377e-4d29-ab4a-bc5f75913241.jpg)
66
+ each emoji correspond to each entry in `texts`.