Update README.md
Browse files
README.md
CHANGED
@@ -322,7 +322,8 @@ including all of the official European languages plus Catalan, Basque, Galician,
|
|
322 |
It amounts to 6,574,251,526 parallel sentence pairs.
|
323 |
|
324 |
This highly multilingual corpus is predominantly composed of data sourced from [OPUS](https://opus.nlpl.eu/),
|
325 |
-
with additional data taken from the [NTEU project](https://nteu.eu/), [Aina Project](https://projecteaina.cat/), and other sources
|
|
|
326 |
Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
|
327 |
[Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
|
328 |
|
|
|
322 |
It amounts to 6,574,251,526 parallel sentence pairs.
|
323 |
|
324 |
This highly multilingual corpus is predominantly composed of data sourced from [OPUS](https://opus.nlpl.eu/),
|
325 |
+
with additional data taken from the [NTEU project](https://nteu.eu/), [Aina Project](https://projecteaina.cat/), and other sources
|
326 |
+
(see: [Data Sources](#pre-data-sources) and [References](#pre-references)).
|
327 |
Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
|
328 |
[Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
|
329 |
|