metadata
license: apache-2.0
datasets:
- wmt/wmt14
language:
- de
- en
pipeline_tag: text2text-generation
This is a custom huggingface model port of the PyTorch implementation of the original transformer model from 2017 introduced in the paper "Attention Is All You Need". This is the 65M parameter base model version trained to do English-to-German translations.
Usage:
model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'
(remember the trust_remote_code=True
because of custom modeling file)
Training:
Parameter | Value |
---|---|
Dataset | WMT14-de-en |
Translation Pairs | 4.5M (135M tokens total) |
Epochs | 24 |
Batch Size | 16 |
Accumulation Batch | 8 |
Effective Batch Size | 128 (16 * 8) |
Training Script | train.py |
Optimiser | Adam (learning rate = 0.0001) |
Loss Type | Cross Entropy |
Final Test Loss | 1.87 |
GPU. | RTX 4070 (12GB) |
Results