mrm8488
/

spanish-gpt2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

spanish-gpt2 / README.md

mrm8488's picture

Update README.md

1d9373c over 3 years ago

|

1.73 kB

metadata

language: es
tags:
  - GPT-2
datasets:
  - large_spanish_corpus
widgets:
  - text: Érase un vez un
license: mit

Spanish GPT-2 trained on BETO's corpus (large_spanish_corpus)

This is a Spanish GPT-2 model trained from scratch on the large_spanish_corpus aka BETO's corpus with Flax This is part of the Flax/Jax Community Week, organised by HuggingFace and TPU usage sponsored by Google.

Dataset

The dataset is about 20 GB. 95% of the data was used for training and the rest 5% for validation.

Metrics (on evaluation dataset)

Loss: 2.413
Perplexity: 11.36

Team members

Manuel Romero (mrm8488)
María Grandury (mariagrandury)
Pablo González de Prado (Pablogps)
Daniel Vera (daveni)
Sri Lakshmi (srisweet)
José Posada (jdposa)
Santiago Hincapie (shpotes)
Jorge (jorgealro)

Useful links