w11wo's picture
Create README.md
51b3647
metadata
language: id
tags:
  - gpt2-indo-small-kids-stories
license: mit
widget:
  - text: Archie sedang mengendarai roket ke planet Mars.

GPT-2 Indonesian Small Kids Stories

GPT-2 Indonesian Small Kids Stories is a causal language model based on the OpenAI GPT-2 model. The model was originally the pre-trained GPT2 Small Indonesian model, which was then fine-tuned on Indonesian kids' stories from Room To Read and Let's Read.

10% of the dataset was kept for evaluation purposes. The pre-trained model was fine-tuned and achieved an evaluation loss of 3.777 and an evaluation perplexity of 43.68.

Hugging Face's Trainer class from the Transformers library was used to train the model. PyTorch was used as the backend framework during training, but the model remains compatible with other frameworks nonetheless.

Model

Model #params Arch. Training/Validation data (text)
gpt2-indo-small-kids-stories 124M GPT2 Small Indonesian Kids' Stories (860 KB)

Evaluation Results

The model was fine-tuned for 10 epochs.

Epoch Training Loss Validation Loss
1 4.259600 4.020201
2 3.979100 3.911295
3 3.818300 3.849313
4 3.691600 3.809931
5 3.589300 3.789201
6 3.506200 3.778665
7 3.439200 3.774871
8 3.387600 3.774859
9 3.351300 3.776672
10 3.330100 3.776935

How to Use (PyTorch)

As Causal Language Model

from transformers import pipeline

pretrained_name = "bookbot/gpt2-indo-small-kids-stories"

nlp = pipeline(
    "text-generation",
    model=pretrained_name,
    tokenizer=pretrained_name
)

nlp("Archie sedang mengendarai roket ke planet Mars.")

Feature Extraction in PyTorch

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

pretrained_name = "bookbot/gpt2-indo-small-kids-stories"
model = GPT2LMHeadModel.from_pretrained(pretrained_name)
tokenizer = GPT2TokenizerFast.from_pretrained(pretrained_name)

prompt = "Archie sedang mengendarai roket ke planet Mars."
encoded_input = tokenizer(prompt, return_tensors='pt')
output = model(**encoded_input)

Disclaimer

Do consider the biases which come from both the pre-trained GPT-2 model and the Indonesian Kids' Stories dataset that may be carried over into the results of this model.

Author

GPT-2 Indonesian Small Kids Stories was trained and evaluated by Wilson Wongso. All computation and development are done on Google Colaboratory using their free GPU access.