guillermoruiz
/

bilma_MX

Inference Endpoints

Model card Files Files and versions Community

guillermoruiz commited on Apr 1, 2024

Commit

6641744

·

verified ·

1 Parent(s): 562cfb9

Update README.md

Files changed (1) hide show

README.md +49 -1

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ metrics:
 pipeline_tag: fill-mask
 widget:
 - text: Vamos a comer unos [MASK]
 tags:
 - code
 - nlp
@@ -15,4 +16,51 @@ tags:
 tokenizer:
 - yes
 ---
-# BILMA (Bert In Latin America)

 pipeline_tag: fill-mask
 widget:
 - text: Vamos a comer unos [MASK]
+  example_title: "Vamos a comer unos tacos"
 tags:
 - code
 - nlp
 tokenizer:
 - yes
 ---
+# BILMA (Bert In Latin aMericA)
+Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.
+The accuracy of the models trained on the MLM task for different regions are:
+![bilma-mlm-comp](https://user-images.githubusercontent.com/392873/163045798-89bd45c5-b654-4f16-b3e2-5cf404e12ddd.png)
+# Pre-requisites
+You will need TensorFlow 2.4 or newer.
+# Quick guide
+You can see the demo notebooks for a quick guide on how to use the models.
+Clone this repository and then run
+```
+bash download-emoji15-bilma.sh
+```
+to download the MX model. Then to load the model you can use the code:
+```
+from bilma import bilma_model
+vocab_file = "vocab_file_All.txt"
+model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5"
+model = bilma_model.load(model_file)
+tokenizer = bilma_model.tokenizer(vocab_file=vocab_file,
+max_length=280)
+```
+Now you will need some text:
+```
+texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.",
+         "Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija",
+         "Vamos a comer unos tacos",
+         "Los del banco no dejan de llamarme"]
+toks = tokenizer.tokenize(texts)
+```
+With this, you are ready to use the model
+```
+p = model.predict(toks)
+tokenizer.decode_emo(p[1])
+```
+which produces the output: ![emoji-output](https://user-images.githubusercontent.com/392873/165176270-77dd32ca-377e-4d29-ab4a-bc5f75913241.jpg)
+each emoji correspond to each entry in `texts`.