A newer version of this model is available:
ybracke/transnormer-19c-beta-v02
Transnormer 19th century (beta v01)
This model normalizes spelling variants in historical German text to the modern spelling. It is a fine-tuned version of google/byt5-small on a modified version of the DTA EvalCorpus (1780-1901).
Demo Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("ybracke/transnormer-19c-beta-v01")
model = AutoModelForSeq2SeqLM.from_pretrained("ybracke/transnormer-19c-beta-v01")
gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.max_new_tokens = 512
sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
inputs = tokenizer(sentence, return_tensors="pt",)
outputs = model.generate(**inputs, generation_config=gen_cfg)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# >>> ['Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'
Here is how to use this model with the pipeline API:
from transformers import pipeline
transnormer = pipeline('text2text-generation', model='ybracke/transnormer-19c-beta-v01')
sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
print(transnormer(sentence))
# >>> [{'generated_text': 'Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'}]
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.76
Framework versions
- Transformers 4.31.0
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.13.3
- Downloads last month
- 113
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for ybracke/transnormer-19c-beta-v01
Base model
google/byt5-small