upload

Browse files

Files changed (15) hide show

.gitattributes +1 -0
34t.txt +1 -0
README.md +91 -0
config.json +20 -0
flax_model.msgpack +3 -0
gitattributes.txt +9 -0
nbest_predictions_.json +3 -0
null_odds_.json +0 -0
predictions_.json +0 -0
pytorch_model.bin +3 -0
saved_model.tar.gz +3 -0
special_tokens_map.json +1 -0
tokenizer_config.json +1 -0
training_args.bin +3 -0
vocab (1).txt +0 -0

.gitattributes CHANGED Viewed

@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+nbest_predictions_.json filter=lfs diff=lfs merge=lfs -text

34t.txt ADDED Viewed

	@@ -0,0 +1 @@

+ Gerry Cotten, creador de un mercado de compraventa de criptodivisas llamado QuadrigaCX, moría en la India el pasado 9 de diciembre de 2018. Lo hacía en circunstancias misteriosas, sobre todo porque tras su muerte no solo desaparecía él, sino también alrededor de 120 millones de euros en forma de criptodivisas. Los 115.000 clientes de QuadrigaCX veían así cómo sus inversiones se desvanecían, lo que puso en marcha una investigación rocambolesca que un año después no ha logrado averiguar dónde están el dinero. No es que no lo hayan intentado, porque se ha requerido incluso la exhumación del cadáver de Cotten para tratar de avanzar en ese proceso. Cotten fue uno de esos emprendedores que comenzó muy pronto a apostar por el mercado de las criptodivisas. Creó la empresa Quadriga en noviembre de 2013 en Vancouver con un socio llamado Michael Patryn —atentos, que este último es protagonista en este relato— y fueron de los primeros en poner en marcha un cajero automático con soporte de criptodivisas en Canadá. El negocio sufrió algunos altibajos, pero Cotten acabó haciendo la transición de Quadriga hacia un mercado de criptodivisas o exchange que operó notablemente durante la subida de valor de bitcoin en 2017. EN 2018, con la caída de los precios, varios clientes indicaron que habían tenido problemas al tratar de retirar fondos, y se comenzaron a poner en marcha investigaciones por potencial fraude.

README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+---
+language: es
+thumbnail: https://i.imgur.com/jgBdimh.png
+---
+# BETO (Spanish BERT) + Spanish SQuAD2.0
+This model is provided by [BETO team](https://github.com/dccuchile/beto) and fine-tuned on [SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve) for **Q&A** downstream task.
+## Details of the language model('dccuchile/bert-base-spanish-wwm-cased')
+Language model ([**'dccuchile/bert-base-spanish-wwm-cased'**](https://github.com/dccuchile/beto/blob/master/README.md)):
+BETO is a [BERT model](https://github.com/google-research/bert) trained on a [big Spanish corpus](https://github.com/josecannete/spanish-corpora). BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with [Multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) as well as other (not BERT-based) models.
+## Details of the downstream task (Q&A) - Dataset
+[SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve)
+| Dataset                | # Q&A |
+| ---------------------- | ----- |
+| SQuAD2.0 Train         | 130 K |
+| SQuAD2.0-es-v2.0       | 111 K |
+| SQuAD2.0 Dev           | 12  K |
+| SQuAD-es-v2.0-small Dev| 69  K |
+## Model training
+The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
+```bash
+export SQUAD_DIR=path/to/nl_squad
+python transformers/examples/question-answering/run_squad.py \
+  --model_type bert \
+  --model_name_or_path dccuchile/bert-base-spanish-wwm-cased \
+  --do_train \
+  --do_eval \
+  --do_lower_case \
+  --train_file $SQUAD_DIR/train_nl-v2.0.json \
+  --predict_file $SQUAD_DIR/dev_nl-v2.0.json \
+  --per_gpu_train_batch_size 12 \
+  --learning_rate 3e-5 \
+  --num_train_epochs 2.0 \
+  --max_seq_length 384 \
+  --doc_stride 128 \
+  --output_dir /content/model_output \
+  --save_steps 5000 \
+  --threads 4 \
+  --version_2_with_negative
+```
+## Results:
+  | Metric               | # Value |
+| ---------------------- | ----- |
+| **Exact**              | **76.50**50 |
+| **F1**                 | **86.07**81 |
+```json
+{
+  "exact": 76.50501430594491,
+  "f1": 86.07818773108252,
+  "total": 69202,
+  "HasAns_exact": 67.93020719738277,
+  "HasAns_f1": 82.37912207996466,
+  "HasAns_total": 45850,
+  "NoAns_exact": 93.34104145255225,
+  "NoAns_f1": 93.34104145255225,
+  "NoAns_total": 23352,
+  "best_exact": 76.51223953064941,
+  "best_exact_thresh": 0.0,
+  "best_f1": 86.08541295578848,
+  "best_f1_thresh": 0.0
+}
+```
+### Model in action (in a Colab Notebook)
+<details>
+1.  Set the context and ask some questions:
+![Set context and questions](https://media.giphy.com/media/mCIaBpfN0LQcuzkA2F/giphy.gif)
+2.  Run predictions:
+![Run the model](https://media.giphy.com/media/WT453aptcbCP7hxWTZ/giphy.gif)
+</details>
+> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
+> Made with <span style="color: #e25555;">&hearts;</span> in Spain

config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "architectures": [
+    "BertForQuestionAnswering"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "output_past": true,
+  "pad_token_id": 1,
+  "type_vocab_size": 2,
+  "vocab_size": 31002
+}

flax_model.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f7ed9d36d9dfabb74e6f9a297232a880c4048fe3baff7d7aace5027dbe461f9
+size 437054446

gitattributes.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tar.gz filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text

nbest_predictions_.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:085a6a937305d45965a24a19916bcb43722a1a7cb1d5c14753bdc85e8b4a3166
+size 320885570

null_odds_.json ADDED Viewed

The diff for this file is too large to render. See raw diff

predictions_.json ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:280c5e41b49fff41568fcc282c9419ee8b3c6681ac6c30ae5e3718f546b61bff
+size 439457908

saved_model.tar.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a6f5510408fb6e69cd13e7bfa8c7e3cf6f390ccbf7b57430e9cbf18f7a97bd4
+size 408021292

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:483b8d0eb37493df82b9e3f8a049e84af9f12b190537f91c4e8fd5e0b58b0f4d
+size 1537

vocab (1).txt ADDED Viewed

The diff for this file is too large to render. See raw diff