Add files

Browse files

Files changed (13) hide show

README.md +197 -0
added_tokens.json +1 -0
config.gin +150 -0
config.json +35 -0
events.out.tfevents.1672813339.t1v-n-c82e3785-w-0.2029925.0.v2 +3 -0
flax_model.msgpack +3 -0
pytorch_model.bin +3 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
spiece.vocab +0 -0
test_results.json +10 -0
tokenizer_config.json +113 -0
training_state.json +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,197 @@

+---
+language:
+- nl
+- en
+- multilingual
+license: apache-2.0
+tags:
+- dutch
+- english
+- t5
+- t5x
+- ul2
+- seq2seq
+- translation
+datasets:
+- yhavinga/mc4_nl_cleaned
+- yhavinga/nedd_wiki_news
+pipeline_tag: translation
+widget:
+  - text: >-
+      Redistricting and West Virginia’s shrinking population forced the state’s
+      Republican Legislature to pit Mr. McKinley, a six-term Republican with a
+      pragmatic bent, against Mr. Mooney, who has served four terms marked more
+      by conservative rhetoric than legislative achievements.
+  - text: >-
+      It is a painful and tragic spectacle that rises before me: I have drawn
+      back the curtain from the rottenness of man. This word, in my mouth, is at
+      least free from one suspicion: that it involves a moral accusation against
+      humanity.
+  - text: >-
+      Young Wehling was hunched in his chair, his head in his hand. He was so
+      rumpled, so still and colorless as to be virtually invisible. His
+      camouflage was perfect, since the waiting room had a disorderly and
+      demoralized air, too. Chairs and ashtrays had been moved away from the
+      walls. The floor was paved with spattered dropcloths.
+---
+# ul2-large-en-nl for English to Dutch translation
+Fine-tuned T5 model on English to Dutch translation that was pretrained on Dutch using a UL2 (Mixture-of-Denoisers) objective.
+The T5 model was introduced in
+[this paper](https://arxiv.org/abs/1910.10683)
+and first released at [this page](https://github.com/google-research/text-to-text-transfer-transformer).
+The UL2 objective was introduced in
+[this paper](https://arxiv.org/abs/2205.05131)
+and first released at [this page](https://github.com/google-research/google-research/tree/master/ul2).
+## Model description
+T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.
+`ul2-large-en-nl` T5 is a transformers model fine-tuned on parallel sentence and paragraph pairs
+sampled from books.
+This model used the [T5 v1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) improvements compared to the original T5 model during the pretraining:
+- GEGLU activation in the feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202)
+- Dropout was turned off during pre-training. Dropout should be re-enabled during fine-tuning
+- Pre-trained on self-supervised objective only without mixing in the downstream tasks
+- No parameter sharing between embedding and classifier layer
+### UL2 pretraining objective
+This model was pretrained with the UL2's Mixture-of-Denoisers (MoD) objective, that combines diverse pre-training
+paradigms together. UL2 frames different objective functions for training language models as denoising tasks, where
+the model has to recover missing sub-sequences of a given input. During pre-training it uses a novel mixture-of-denoisers
+that samples from a varied set of such objectives, each with different configurations. UL2 is trained using a mixture of
+three denoising tasks:
+1. R-denoising (or regular span corruption), which emulates the standard T5 span corruption objective;
+2. X-denoising (or extreme span corruption); and
+3. S-denoising (or sequential PrefixLM).
+During pre-training, we sample from the available denoising tasks based on user-specified ratios.
+UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training
+denoising task. During the pre-training, a paradigm token is inserted to the input
+(`[NLU]` for R-denoising, `[NLG]` for X-denoising, or `[S2S]` for S-denoising) indicating the denoising task at hand.
+Then, during fine-tuning the same input token should be inserted to get the best performance for different downstream
+fine-tuning tasks.
+## Intended uses & limitations
+This model was fine-tuned on parallel sentence and paragraph pairs and can be used
+for machine translation.
+### How to use
+Here is how to use this model in PyTorch:
+```python
+model_name = "yhavinga/ul2-large-en-nl"
+from transformers import AutoTokenizer
+from transformers import AutoModelForSeq2SeqLM
+from transformers import pipeline
+import torch
+device_num = 0 if torch.cuda.is_available() else -1
+device = "cpu" if device_num < 0 else f"cuda:{device_num}"
+tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name, use_auth_token=True).to(
+    device
+)
+params = {"max_length": 370, "num_beams": 4, "early_stopping": True}
+translator = pipeline("translation", tokenizer=tokenizer, model=model, device=device_num)
+print(translator("Young Wehling was hunched in his chair, his head in his hand. He was so rumpled, so still and colorless as to be virtually invisible.",
+               **params)[0]['translation_text'])
+```
+### Limitations and bias
+The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.
+Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
+## Training data
+The `ul2-large-en-nl` T5 model was pre-trained simultaneously on a combination of several datasets,
+including the `full` config of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
+crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
+containing only texts from Dutch and Belgian newspapers. This last dataset is oversampled to bias the model
+towards descriptions of events in the Netherlands and Belgium.
+After pre-training, the model was
+fine-tuned on a translation dataset containing 13 million sentence and paragraph pairs
+sampled from books.
+## Training procedure
+### Preprocessing
+The ul2-large-en-nl T5 model uses a SentencePiece unigram tokenizer with a vocabulary of 32,000 tokens.
+The tokenizer includes the special tokens `<pad>`, `</s>`, `<unk>`,  known from the original T5 paper,
+`[NLU]`, `[NLG]` and `[S2S]` for the MoD pre-training, and `<n>` for newline.
+During pre-training with the UL2 objective, input and output sequences consist of 512 consecutive tokens.
+The tokenizer does not lowercase texts and is therefore case-sensitive; it distinguises
+between `dutch` and `Dutch`.
+Additionally, 100+28 extra tokens were added for pre-training tasks, resulting in a total of 32,128 tokens.
+### Fine-tuning
+This model was fine-tuned on a dataset containing 13M sentence and paragraph translation pairs sampled from books.
+* Pre-trained model used as starting point: yhavinga/ul2-large-dutch
+* Amount of fine-tune training steps: 77600
+* Batch size: 512 (gradient accumulation steps: 16)
+* Sequence length: 370 tokens
+* Model dtype: bfloat16
+* z_loss: 0.0001
+* Optimizer: adamw_hf beta1: 0.9 beta2: 0.9969 eps: 1e-08
+* Dropout rate: 0.01
+* Learning rate: 0.0009 with linear decay to 0 and warmup for 500 steps
+* Label smoothing factor: 0.11
+* Bleu score: 45.1
+### Model list
+Models in this series:
+|                      | ul2-base-en-nl   | ul2-base-nl36-en-nl   | ul2-large-en-nl   |
+|:---------------------|:-----------------|:----------------------|:------------------|
+| model_type           | t5               | t5                    | t5                |
+| _pipeline_tag        | translation      | translation           | translation       |
+| d_model              | 768              | 768                   | 1024              |
+| d_ff                 | 2048             | 3072                  | 2816              |
+| num_heads            | 12               | 12                    | 16                |
+| d_kv                 | 64               | 64                    | 64                |
+| num_layers           | 12               | 36                    | 24                |
+| num_decoder_layers   | 12               | 36                    | 24                |
+| feed_forward_proj    | gated-silu       | gated-silu            | gated-silu        |
+| dense_act_fn         | silu             | silu                  | silu              |
+| vocab_size           | 32128            | 32128                 | 32128             |
+| tie_word_embeddings  | 0                | 0                     | 0                 |
+| torch_dtype          | float32          | float32               | float32           |
+| _gin_batch_size      | 128              | 64                    | 64                |
+| _gin_z_loss          | 0.0001           | 0.0001                | 0.0001            |
+| _gin_t5_config_dtype | 'bfloat16'       | 'bfloat16'            | 'bfloat16'        |
+## Evaluation results
+See the evaluation section in the interactive [Pre-training Dutch T5 Models](https://huggingface.co/spaces/yhavinga/pre-training-dutch-t5-models) blog.
+## Acknowledgements
+This project would not have been possible without compute generously provided by Google through the
+[TPU Research Cloud](https://sites.research.google/trc/).
+Thanks to the [Finnish-NLP](https://huggingface.co/Finnish-NLP) authors for releasing their code for the UL2 objective and associated task definitions.
+Thanks to [Stephenn Fernandes](https://huggingface.co/StephennFernandes) for helping me get started with the t5x framework.
+Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)

added_tokens.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"[new_id_17]": 32117, "[new_id_20]": 32120, "[new_id_13]": 32113, "[new_id_2]": 32102, "[new_id_16]": 32116, "[new_id_7]": 32107, "[new_id_5]": 32105, "[new_id_1]": 32101, "[new_id_15]": 32115, "[new_id_12]": 32112, "[new_id_0]": 32100, "[new_id_11]": 32111, "[new_id_25]": 32125, "[new_id_24]": 32124, "[new_id_10]": 32110, "[new_id_27]": 32127, "[new_id_23]": 32123, "[new_id_14]": 32114, "[new_id_22]": 32122, "[new_id_21]": 32121, "[new_id_19]": 32119, "[new_id_3]": 32103, "[new_id_4]": 32104, "[new_id_18]": 32118, "[new_id_9]": 32109, "[new_id_8]": 32108, "[new_id_26]": 32126, "[new_id_6]": 32106}

config.gin ADDED Viewed

	@@ -0,0 +1,150 @@

+from __gin__ import dynamic_registration
+import __main__ as train_script
+import seqio
+import t5.data.mixtures
+from t5x import adafactor
+from t5x.examples.t5 import network
+from t5x import gin_utils
+from t5x import models
+from t5x import partitioning
+from t5x import trainer
+from t5x import utils
+import tasks.nedd_tasks
+import tasks.ul2_tasks as tasks2
+# Macros:
+# ==============================================================================
+BATCH_SIZE = 64
+DROPOUT_RATE = 0.0
+LABEL_SMOOTHING = 0.0
+LOSS_NORMALIZING_FACTOR = None
+MIXTURE_OR_TASK_MODULE = None
+MIXTURE_OR_TASK_NAME = 'ul2_mc4_nedd_wiki_news_mix_1'
+MODEL = @models.EncoderDecoderModel()
+MODEL_DIR = 'ul2_large_mc4_nedd_wiki_news_nl'
+OPTIMIZER = @adafactor.Adafactor()
+RANDOM_SEED = None
+SHUFFLE_TRAIN_EXAMPLES = True
+TASK_FEATURE_LENGTHS = {'inputs': 512, 'targets': 512}
+TRAIN_STEPS = 1000000
+USE_CACHED_TASKS = False
+USE_HARDWARE_RNG = False
+VOCABULARY = @seqio.SentencePieceVocabulary()
+Z_LOSS = 0.0001
+# Parameters for adafactor.Adafactor:
+# ==============================================================================
+adafactor.Adafactor.decay_rate = 0.8
+adafactor.Adafactor.logical_factor_rules = \
+    @adafactor.standard_logical_factor_rules()
+adafactor.Adafactor.step_offset = 0
+# Parameters for utils.CheckpointConfig:
+# ==============================================================================
+utils.CheckpointConfig.restore = @utils.RestoreCheckpointConfig()
+utils.CheckpointConfig.save = @utils.SaveCheckpointConfig()
+# Parameters for utils.create_learning_rate_scheduler:
+# ==============================================================================
+utils.create_learning_rate_scheduler.base_learning_rate = 1.0
+utils.create_learning_rate_scheduler.factors = 'constant * rsqrt_decay'
+utils.create_learning_rate_scheduler.warmup_steps = 10000
+# Parameters for train/utils.DatasetConfig:
+# ==============================================================================
+train/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train/utils.DatasetConfig.pack = True
+train/utils.DatasetConfig.seed = None
+train/utils.DatasetConfig.shuffle = %SHUFFLE_TRAIN_EXAMPLES
+train/utils.DatasetConfig.split = 'train'
+train/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for train_eval/utils.DatasetConfig:
+# ==============================================================================
+train_eval/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train_eval/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train_eval/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train_eval/utils.DatasetConfig.pack = True
+train_eval/utils.DatasetConfig.seed = 42
+train_eval/utils.DatasetConfig.shuffle = False
+train_eval/utils.DatasetConfig.split = 'validation'
+train_eval/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train_eval/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for models.EncoderDecoderModel:
+# ==============================================================================
+models.EncoderDecoderModel.input_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.label_smoothing = %LABEL_SMOOTHING
+models.EncoderDecoderModel.loss_normalizing_factor = %LOSS_NORMALIZING_FACTOR
+models.EncoderDecoderModel.module = @network.Transformer()
+models.EncoderDecoderModel.optimizer_def = %OPTIMIZER
+models.EncoderDecoderModel.output_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.z_loss = %Z_LOSS
+# Parameters for partitioning.PjitPartitioner:
+# ==============================================================================
+partitioning.PjitPartitioner.logical_axis_rules = \
+    @partitioning.standard_logical_axis_rules()
+partitioning.PjitPartitioner.model_parallel_submesh = None
+partitioning.PjitPartitioner.num_partitions = 1
+# Parameters for utils.RestoreCheckpointConfig:
+# ==============================================================================
+utils.RestoreCheckpointConfig.path = []
+# Parameters for utils.SaveCheckpointConfig:
+# ==============================================================================
+utils.SaveCheckpointConfig.dtype = 'float32'
+utils.SaveCheckpointConfig.keep = 4
+utils.SaveCheckpointConfig.period = 50000
+utils.SaveCheckpointConfig.save_dataset = False
+utils.SaveCheckpointConfig.use_gda = False
+# Parameters for seqio.SentencePieceVocabulary:
+# ==============================================================================
+seqio.SentencePieceVocabulary.sentencepiece_model_file = \
+    'gs://t5-dutch-english/vocabs/nedd.32000.128extra/spiece.model'
+# Parameters for network.T5Config:
+# ==============================================================================
+network.T5Config.dropout_rate = %DROPOUT_RATE
+network.T5Config.dtype = 'bfloat16'
+network.T5Config.emb_dim = 1024
+network.T5Config.head_dim = 64
+network.T5Config.logits_via_embedding = False
+network.T5Config.mlp_activations = ('gelu', 'linear')
+network.T5Config.mlp_dim = 2816
+network.T5Config.num_decoder_layers = 24
+network.T5Config.num_encoder_layers = 24
+network.T5Config.num_heads = 16
+network.T5Config.vocab_size = 32128
+# Parameters for train_script.train:
+# ==============================================================================
+train_script.train.checkpoint_cfg = @utils.CheckpointConfig()
+train_script.train.eval_period = 2000
+train_script.train.eval_steps = 20
+train_script.train.infer_eval_dataset_cfg = None
+train_script.train.model = %MODEL
+train_script.train.model_dir = %MODEL_DIR
+train_script.train.partitioner = @partitioning.PjitPartitioner()
+train_script.train.random_seed = %RANDOM_SEED
+train_script.train.stats_period = 100
+train_script.train.summarize_config_fn = @gin_utils.summarize_gin_config
+train_script.train.total_steps = %TRAIN_STEPS
+train_script.train.train_dataset_cfg = @train/utils.DatasetConfig()
+train_script.train.train_eval_dataset_cfg = @train_eval/utils.DatasetConfig()
+train_script.train.trainer_cls = @trainer.Trainer
+train_script.train.use_hardware_rng = %USE_HARDWARE_RNG
+# Parameters for trainer.Trainer:
+# ==============================================================================
+trainer.Trainer.learning_rate_fn = @utils.create_learning_rate_scheduler()
+trainer.Trainer.num_microbatches = None
+# Parameters for network.Transformer:
+# ==============================================================================
+network.Transformer.config = @network.T5Config()

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "_name_or_path": "yhavinga/ul2-large-en-nl",
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 2816,
+  "d_kv": 64,
+  "d_model": 1024,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "silu",
+  "dropout_rate": 0.01,
+  "eos_token_id": 1,
+  "early_stopping": true,
+  "feed_forward_proj": "gated-silu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "max_length": 370,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_beams": 4,
+  "num_decoder_layers": 24,
+  "num_heads": 16,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.23.1",
+  "use_cache": true,
+  "vocab_size": 32128
+}

events.out.tfevents.1672813339.t1v-n-c82e3785-w-0.2029925.0.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac1d98fc1bccf791ede706b7a1dcfd34f998c772b2f8deba7c73ec9de5c2271c
+size 3472065

flax_model.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0943ba2f958f2d04dec07e54975e3aae3c00a933dc752cef4d9787ab4c264588
+size 1632372682

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:67ccb5db4ce777787f9147bb8d5d941ef6c3d4cc0fdbacf02209ca52622d438a
+size 3132781861

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:caa6e2f21aeec181276ab80273e3f869ce303ccb8602d68e0524783c3581092d
+size 800223

spiece.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

test_results.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "test_bp": 1.0,
+    "test_precision_ng1": 73.84971296375632,
+    "test_precision_ng2": 51.14788183314563,
+    "test_precision_ng3": 38.4143670608848,
+    "test_precision_ng4": 29.72446177017808,
+    "test_ref_len": 11637,
+    "test_score": 45.5717846591585,
+    "test_sys_len": 11671
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "name_or_path": "yhavinga/ul2-large-dutch",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": null,
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>",
+  "use_fast_tokenizer": false
+}

training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 1241629}