Fairseq
Catalan
German
AudreyVM commited on
Commit
d6a7552
·
verified ·
1 Parent(s): cd2bca9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -8
README.md CHANGED
@@ -59,6 +59,10 @@ The Catalan-German data collected from the web was a combination of the followin
59
  | GNOME |
60
  | KDE4 |
61
  | OpenSubtitles |
 
 
 
 
62
  | GlobalVoices|
63
  | Tatoeba |
64
  | Books |
@@ -67,17 +71,10 @@ The Catalan-German data collected from the web was a combination of the followin
67
 
68
  All corpora except Europarl and Tilde were collected from [Opus](https://opus.nlpl.eu/).
69
  The Europarl and Tilde corpora are synthetic parallel corpora created from the original Spanish-German corpora by [SoftCatalà](https://github.com/Softcatala).
 
70
 
71
  The synthetic parallel data was created from the following Spanish-German datasets:
72
 
73
- | Datasets |
74
- |-------------------|
75
- |globalvoices_es-de_20230901 |
76
- |multiparacrawl_es-de_20230901 |
77
- |dgt_es-de_20240129 |
78
- |eubookshop_es-de_20240129 |
79
- |nllb_es-de_20240129 |
80
- |opensubtitles_es-de_20240129 |
81
 
82
 
83
  ### Training procedure
 
59
  | GNOME |
60
  | KDE4 |
61
  | OpenSubtitles |
62
+ | MultiParaCrawl |
63
+ | DGT |
64
+ | EUBookshop |
65
+ | NLLB |
66
  | GlobalVoices|
67
  | Tatoeba |
68
  | Books |
 
71
 
72
  All corpora except Europarl and Tilde were collected from [Opus](https://opus.nlpl.eu/).
73
  The Europarl and Tilde corpora are synthetic parallel corpora created from the original Spanish-German corpora by [SoftCatalà](https://github.com/Softcatala).
74
+ Once all available Catalan-German data had been collected, additional synthetic Catalan data was created from the Spanish side of Spanish-German corpora using [Projecte Aina’s Spanish-Catalan model.](https://huggingface.co/projecte-aina/aina-translator-es-ca)
75
 
76
  The synthetic parallel data was created from the following Spanish-German datasets:
77
 
 
 
 
 
 
 
 
 
78
 
79
 
80
  ### Training procedure