yhavinga
/

ul2-large-en-nl

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

yhavinga commited on May 17, 2023

Commit

d95fa88

•

1 Parent(s): 5f09441

Autoupdate README.md

Files changed (1) hide show

README.md +1 -4

README.md CHANGED Viewed

@@ -121,14 +121,11 @@ Therefore, the model can have biased predictions. This bias will also affect all
 The `ul2-large-en-nl` T5 model was pre-trained simultaneously on a combination of several datasets,
 including the `full` config of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
 crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
-containing only texts from Dutch and Belgian newspapers. This last dataset is oversampled to bias the model
-towards descriptions of events in the Netherlands and Belgium.
 After pre-training, the model was
 fine-tuned on a translation dataset containing 13 million sentence and paragraph pairs
 sampled from books.
 ## Training procedure

 The `ul2-large-en-nl` T5 model was pre-trained simultaneously on a combination of several datasets,
 including the `full` config of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
 crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
+containing only texts from Dutch newspapers.
 After pre-training, the model was
 fine-tuned on a translation dataset containing 13 million sentence and paragraph pairs
 sampled from books.
 ## Training procedure