metadata
license: mit
language:
- nl
hmByT5 - Preliminary Language Models
Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:
- Dutch (Delpher Corpus)
More details can be found in our GitHub repository.
Pretraining
We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU. Details about the training can be found here.
Evaluation on Downstream Tasks (NER)
We evaluated the hmByT5 model on ICDAR Europeana dataset:
Configuration | Run 1 | Avg. |
---|---|---|
wsFalse-bs4-e10-lr0.00015-poolingfirst |
87.63 | 87.63 ± 0.0 |
wsFalse-bs8-e10-lr0.00016-poolingfirst |
87.35 | 87.35 ± 0.0 |
wsFalse-bs8-e10-lr0.00015-poolingfirst |
87.26 | 87.26 ± 0.0 |
wsFalse-bs4-e10-lr0.00016-poolingfirst |
86.31 | 86.31 ± 0.0 |
We only performed fine-tuning for one epoch. Unfortunately, this ByT5 Base model shows no improvement over ByT5 Small architecture.
Acknowledgements
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️