File size: 1,630 Bytes
22e24fc
 
 
 
 
 
 
 
96b4655
6641744
96b4655
 
 
 
 
 
 
674c987
6641744
 
 
 
 
 
 
 
 
 
 
 
 
 
ac6585c
 
 
 
 
6641744
ac6585c
 
6641744
ac6585c
 
 
 
6641744
 
ac6585c
6641744
ac6585c
 
 
6641744
 
ac6585c
6641744
ac6585c
6641744
 
ac6585c
6641744
ac6585c
 
6641744
 
ac6585c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
language:
- es
metrics:
- accuracy
pipeline_tag: fill-mask
widget:
- text: Vamos a comer unos [MASK]
  example_title: "Vamos a comer unos tacos"
tags:
- code
- nlp
- custom
- bilma
tokenizer:
- yes
---
# BILMA (Bert In Latin aMericA)

Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.

The accuracy of the models trained on the MLM task for different regions are:

![bilma-mlm-comp](https://user-images.githubusercontent.com/392873/163045798-89bd45c5-b654-4f16-b3e2-5cf404e12ddd.png)

# Pre-requisites

You will need TensorFlow 2.4 or newer.

# Quick guide

Install the following version for the transformers library
```
!pip install transformers==4.30.2
```



Instanciate the tokenizer and the trained model
```
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
from transformers import TFAutoModel
model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True, include_top=False)
```

Now,we will need some text and then pass it through the tokenizer:
```
text = ["Vamos a comer [MASK].",
        "Hace mucho que no voy al [MASK]."]
t = tok(text, padding="max_length", return_tensors="tf", max_length=280)
```

With this, we are ready to use the model
```
p = model(t)
```

Now, we get the most likely words with:
```
import tensorflow as tf
tok.batch_decode(tf.argmax(p["logits"], 2)[:,1:], skip_special_tokens=True)
```

which produces the output:
```
['vamos a comer tacos.', 'hace mucho que no voy al gym.']
```