--- license: apache-2.0 datasets: - wmt/wmt14 language: - en - de --- ```python from tokenizers import Tokenizer tok = Tokenizer.from_pretrained("llm-scratch/wmt-14-en-de-tok") ```