fdelucaf commited on
Commit
822951e
·
verified ·
1 Parent(s): bb4b2f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -20
README.md CHANGED
@@ -49,7 +49,9 @@ base_model:
49
 
50
  # Salamandra Model Card
51
 
52
- SalamandraTA-7b-instruct is a translation LLM that has been instruction-tuned from SalamandraTA-7b-base. The base model results from continually pre-training [Salamandra-7b](https://huggingface.co/BSC-LT/salamandra-7b) on parallel data. The model is proficent in 37 european languages and support translation-related tasks, namely: sentence-level-translation, paragraph-level-translation, document-level-translation, automatic post-editing, machine translation evaluation, multi-reference-translation, named-entity-recognition and context-aware translation.
 
 
53
 
54
  > [!WARNING]
55
  > **DISCLAIMER:** This version of Salamandra is tailored exclusively for translation tasks. It lacks chat capabilities and has not been trained with any chat instructions.
@@ -129,7 +131,9 @@ The accelerated partition is composed of 1,120 nodes with the following specific
129
 
130
  You can translate between the following 37 languages:
131
 
132
- Aragonese, Aranese, Asturian, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian Bokmål, Norwegian Nynorsk, Occitan, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian, Valencian, Welsh.
 
 
133
 
134
  The instruction-following model use the commonly adopted ChatML template:
135
 
@@ -194,7 +198,7 @@ Using this template, each turn is preceded by a `<|im_start|>` delimiter and the
194
 
195
  #### General translation
196
 
197
- For machine translation tasks you can use the following prompt template:
198
 
199
  ```
200
  Translate the following text from {source} into {target}.
@@ -217,7 +221,7 @@ text = f"Translate the following text from {source} into {target}.\n{source}: {s
217
 
218
  ### Post-editing
219
 
220
- For post-editing tasks you can use the following prompt template:
221
 
222
  ```
223
  Please fix any mistakes in the following {source}-{target} machine translation or keep it unedited if it's correct.
@@ -244,7 +248,7 @@ text = f"Please fix any mistakes in the following {source}-{target} machine tran
244
 
245
  ### Document-level translation
246
 
247
- For document-level translation tasks you can use the following prompt template:
248
 
249
  ```
250
  Please translate this text from {source} into {target}.
@@ -274,7 +278,7 @@ The Farm Workforce Modernization Act of 2023, which could grant legal status to
274
 
275
  ### Named-entity recognition
276
 
277
- For named-entity recognition tasks you can use the following prompt template:
278
 
279
  ```
280
  Analyse the following tokenized text and mark the tokens containing named entities.
@@ -313,10 +317,12 @@ Marked: """
313
 
314
  ### Pretraining Data
315
 
316
- The training corpus consists of 424 billion tokens of Catalan-, Spanish-centric, and English-centric parallel data, including all of the official European languages plus Catalan, Basque,
317
- Galician, Asturian, Aragonese and Aranese. It amounts to 6,574,251,526 parallel sentence pairs.
 
318
 
319
- This highly multilingual corpus is predominantly composed of data sourced from [OPUS](https://opus.nlpl.eu/), with additional data taken from the [NTEU project](https://nteu.eu/), Project Aina’s existing corpora, and our own curated datasets..
 
320
  Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
321
  [Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
322
 
@@ -331,24 +337,24 @@ Click the expand button below to see the full list of corpora included in the tr
331
  |-----------------------------------------------|----------------------------------------------------------------|-----------------------------------------------|----------------------------------------------------------------|
332
  |[AINA](https://huggingface.co/projecte-aina) | en | | |
333
  |ARANESE-SYNTH-CORPUS-BSC | arn | | |
334
- |BOUA-BSC | | val | |
335
  |[BOUMH](https://github.com/transducens/PILAR/tree/main/valencian/BOUMH) | | val | |
336
  |[BOUA-PILAR](https://github.com/transducens/PILAR/tree/main/valencian/BOUA) | | val | |
337
  |[CCMatrix](https://opus.nlpl.eu/CCMatrix/corpus/version/CCMatrix) |eu | | ga |
338
  |[DGT](https://opus.nlpl.eu/DGT/corpus/version/DGT) | |bg,cs,da,de,el ,et,fi,fr,ga,hr,hu,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,hu,lt,lv,mt,sh,sl|
339
- |DOGV-BSC | | val | |
340
  |[DOGV-PILAR](https://github.com/transducens/PILAR/tree/main/valencian/DOGV-html) | | val | |
341
  |[ELRC-EMEA](https://opus.nlpl.eu/ELRC-EMEA/corpus/version/ELRC-EMEA) | |bg,cs,da,hu,lt,lv,mt,pl,ro,sk,sl | et,hr,lv,ro,sk,sl |
342
  |[EMEA](https://opus.nlpl.eu/EMEA/corpus/version/EMEA) | |bg,cs,da,el,fi,hu,lt,mt,nl,pl,ro,sk,sl,sv | et,mt |
343
  |[EUBookshop](https://opus.nlpl.eu/EUbookshop/corpus/version/EUbookshop) |lt,pl,pt |cs,da,de,el,fi,fr,ga,it,lv,mt,nl,pl,pt,ro,sk,sl,sv |cy,ga|
344
  |[Europarl](https://opus.nlpl.eu/Europarl/corpus/version/Europarl) | |bg,cs,da,el,en,fi,fr,hu,lt,lv,nl,pl,pt ,ro,sk,sl,sv | |
345
  |[Europat](https://opus.nlpl.eu/EuroPat/corpus/version/EuroPat) | |en,hr | no |
346
- |[GAITU](https://gaitu.eus/) | | | eu|
347
  |[KDE4](https://opus.nlpl.eu/KDE4/corpus/version/KDE4) |bg,cs,da,de,el ,et,eu,fi,fr,ga,gl,hr,it,lt,lv,nl,pl,pt,ro,sk,sl,sv |bg,ga,hr |cy,ga,nn,oc |
348
  |[GlobalVoices](https://opus.nlpl.eu/GlobalVoices/corpus/version/GlobalVoices) | bg,de,fr,it,nl,pl,pt |bg,de,fr,pt | |
349
  |[GNOME](https://opus.nlpl.eu/GNOME/corpus/version/GNOME) |eu,fr,ga,gl,pt |ga |cy,ga,nn|
350
  |[JRC-Arquis](https://opus.nlpl.eu/JRC-Acquis/corpus/version/JRC-Acquis) | |cs,da,et,fr,lt,lv,mt,nl,pl ,ro,sv| et |
351
- |LES-CORTS-VALENCIANES-BSC | | val | |
352
  |[MaCoCu](https://opus.nlpl.eu/MaCoCu/corpus/version/MaCoCu) | en | | hr,mt,uk |
353
  |[MultiCCAligned](https://opus.nlpl.eu/JRC-Acquis/corpus/version/JRC-Acquis) |bg,cs,de,el,et,fi,fr,hr,hu,it,lt,lv,nl,pl,ro,sk,sv |bg,fi,fr,hr,it,lv,nl,pt |bg,cy,da,et,fi,hr,hu,lt,lv,no,sl,sr,uk|
354
  |[MultiHPLT](https://opus.nlpl.eu/MultiHPLT/corpus/version/MultiHPLT) |en, et,fi,ga,hr,mt | |fi,ga,gl,hr,mt,nn,sr |
@@ -356,7 +362,7 @@ Click the expand button below to see the full list of corpora included in the tr
356
  |[MultiUN](https://opus.nlpl.eu/MultiUN/corpus/version/MultiUN) | |fr | |
357
  |[News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary) | |fr | |
358
  |[NLLB](https://opus.nlpl.eu/NLLB/corpus/version/NLLB) |bg,da,el,en,et,fi,fr,gl,hu,it ,lt,lv,pt,ro,sk,sl |bg,cs,da,de,el ,et,fi,fr,hu,it,lt,lv,nl,pl,pt ,ro,sk,sl,sv| bg,cs,cy,da,de,el,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,no,oc,pl,pt,ro,ru,sk,sl,sr,sv,uk|
359
- |[NÓS](https://zenodo.org/records/7675110) | | | gl |
360
  |[NÓS-SYN](https://zenodo.org/records/7685180) | | | gl |
361
  |[NTEU](https://www.elrc-share.eu/repository/search/?q=NTEU) | |bg,cs,da,de,el,en,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,lt,lv,mt,ro,sk,sl,sv |
362
  |[OpenSubtitles](https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles) |bg,cs,da,de,el ,et,eu,fi,gl,hr,hu,lt,lv,nl,pl,pt,ro,sk,sl,sv |da,de,fi,fr,hr,hu,it,lv,nl | bg,cs,de,el,et,hr,fi,fr,hr,hu,no,sl,sr|
@@ -365,14 +371,15 @@ Click the expand button below to see the full list of corpora included in the tr
365
  |[Tatoeba](https://opus.nlpl.eu/Tatoeba/corpus/version/Tatoeba) |de,pt |pt | |
366
  |[TildeModel](https://opus.nlpl.eu/TildeMODEL/corpus/version/TildeMODEL) | |bg | et,hr,lt,lv,mt |
367
  |[UNPC](https://opus.nlpl.eu/UNPC/corpus/version/UNPC) | |en,fr | ru |
368
- |[VALENCIAN-AUTH](https://github.com/transducens/PILAR/tree/main/valencian/Generalitat) | | val | |
369
- |[VALENCIAN-SYNTH](https://github.com/transducens/PILAR/tree/main/valencian/Generalitat) | | val | |
370
  |[WikiMatrix](https://opus.nlpl.eu/WikiMatrix/corpus/version/WikiMatrix) |bg,cs,da,de,el ,et,eu,fi,fr,gl,hr,hu,it,lt,nl,pl,pt,ro,sk,sl,sv |bg,en,fr,hr,it,pt | oc,sh |
371
  |[Wikimedia](https://opus.nlpl.eu/wikimedia/corpus/version/wikimedia) | | |cy,nn |
372
  |[XLENT](https://opus.nlpl.eu/XLEnt/corpus/version/XLEnt) |eu,ga,gl |ga |cy,et,ga,gl,hr,oc,sh|
373
 
374
 
375
- Datasets marked with "BSC" (e.g., BOUA-BSC, DOGV-BSC) are synthetic data generated using our own seq-to-seq models and are for internal use only.
 
376
 
377
  To consult the data summary document with the respective licences, please send an e-mail to ipr@bsc.es.
378
 
@@ -411,9 +418,13 @@ To consult the data summary document with the respective licences, please send a
411
 
412
  ### Instruction Tuning Data
413
 
414
- This model has been fine-tuned on ~135k instructions, primarily targeting machine translation performance for Catalan, English, and Spanish. Additional instruction data for other European and closely related Iberian languages was also included, as it yielded a positive impact on the languages of interest. That said, the performance in these additional languages is not guaranteed due to the limited amount of available data and the lack of resources for thorough testing.
 
 
415
 
416
- A portion of our fine-tuning data comes directly from, or is sampled from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2). We also created additional datasets for our main languages of interest. While tasks relating to machine translation are included, it’s important to note that no chat data was used in the fine-tuning process.
 
 
417
 
418
  Click the expand button below to see the full list of tasks included in the finetuning data.
419
 
@@ -459,7 +470,8 @@ Click the expand button below to see the full list of tasks included in the fine
459
  | Context-Aware Translation | [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2): [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval) | en-de | 558 |
460
  |**Total** | | | **135,404** |
461
 
462
- The non-public portion of this dataset was jointly created by BSC, HiTZ, and CiTIUS. For further information regarding the instruction-tuning data, please contact <langtech@bsc.es>.
 
463
 
464
  </details>
465
 
 
49
 
50
  # Salamandra Model Card
51
 
52
+ SalamandraTA-7b-instruct is a translation LLM that has been instruction-tuned from SalamandraTA-7b-base.
53
+ The base model results from continually pre-training [Salamandra-7b](https://huggingface.co/BSC-LT/salamandra-7b) on parallel data and has not been published, but is reserved for internal use.
54
+ The model is proficent in 37 european languages and support translation-related tasks, namely: sentence-level-translation, paragraph-level-translation, document-level-translation, automatic post-editing, machine translation evaluation, multi-reference-translation, named-entity-recognition and context-aware translation.
55
 
56
  > [!WARNING]
57
  > **DISCLAIMER:** This version of Salamandra is tailored exclusively for translation tasks. It lacks chat capabilities and has not been trained with any chat instructions.
 
131
 
132
  You can translate between the following 37 languages:
133
 
134
+ Aragonese, Aranese, Asturian, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian,
135
+ Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian Bokmål, Norwegian Nynorsk, Occitan, Polish, Portuguese, Romanian, Russian, Serbian, Slovak,
136
+ Slovenian, Spanish, Swedish, Ukrainian, Valencian, Welsh.
137
 
138
  The instruction-following model use the commonly adopted ChatML template:
139
 
 
198
 
199
  #### General translation
200
 
201
+ For machine translation tasks, you can use the following prompt template:
202
 
203
  ```
204
  Translate the following text from {source} into {target}.
 
221
 
222
  ### Post-editing
223
 
224
+ For post-editing tasks, you can use the following prompt template:
225
 
226
  ```
227
  Please fix any mistakes in the following {source}-{target} machine translation or keep it unedited if it's correct.
 
248
 
249
  ### Document-level translation
250
 
251
+ For document-level translation tasks, you can use the following prompt template:
252
 
253
  ```
254
  Please translate this text from {source} into {target}.
 
278
 
279
  ### Named-entity recognition
280
 
281
+ For named-entity recognition tasks, you can use the following prompt template:
282
 
283
  ```
284
  Analyse the following tokenized text and mark the tokens containing named entities.
 
317
 
318
  ### Pretraining Data
319
 
320
+ The pretraining corpus consists of 424 billion tokens of Catalan-centric, Spanish-centric, and English-centric parallel data,
321
+ including all of the official European languages plus Catalan, Basque, Galician, Asturian, Aragonese and Aranese.
322
+ It amounts to 6,574,251,526 parallel sentence pairs.
323
 
324
+ This highly multilingual corpus is predominantly composed of data sourced from [OPUS](https://opus.nlpl.eu/),
325
+ with additional data taken from the [NTEU project](https://nteu.eu/), Project Aina’s corpora, and other sources (see: Data Sources and References below).
326
  Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
327
  [Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
328
 
 
337
  |-----------------------------------------------|----------------------------------------------------------------|-----------------------------------------------|----------------------------------------------------------------|
338
  |[AINA](https://huggingface.co/projecte-aina) | en | | |
339
  |ARANESE-SYNTH-CORPUS-BSC | arn | | |
340
+ |BOUA-SYNTH-BSC | | val | |
341
  |[BOUMH](https://github.com/transducens/PILAR/tree/main/valencian/BOUMH) | | val | |
342
  |[BOUA-PILAR](https://github.com/transducens/PILAR/tree/main/valencian/BOUA) | | val | |
343
  |[CCMatrix](https://opus.nlpl.eu/CCMatrix/corpus/version/CCMatrix) |eu | | ga |
344
  |[DGT](https://opus.nlpl.eu/DGT/corpus/version/DGT) | |bg,cs,da,de,el ,et,fi,fr,ga,hr,hu,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,hu,lt,lv,mt,sh,sl|
345
+ |DOGV-SYNTH-BSC | | val | |
346
  |[DOGV-PILAR](https://github.com/transducens/PILAR/tree/main/valencian/DOGV-html) | | val | |
347
  |[ELRC-EMEA](https://opus.nlpl.eu/ELRC-EMEA/corpus/version/ELRC-EMEA) | |bg,cs,da,hu,lt,lv,mt,pl,ro,sk,sl | et,hr,lv,ro,sk,sl |
348
  |[EMEA](https://opus.nlpl.eu/EMEA/corpus/version/EMEA) | |bg,cs,da,el,fi,hu,lt,mt,nl,pl,ro,sk,sl,sv | et,mt |
349
  |[EUBookshop](https://opus.nlpl.eu/EUbookshop/corpus/version/EUbookshop) |lt,pl,pt |cs,da,de,el,fi,fr,ga,it,lv,mt,nl,pl,pt,ro,sk,sl,sv |cy,ga|
350
  |[Europarl](https://opus.nlpl.eu/Europarl/corpus/version/Europarl) | |bg,cs,da,el,en,fi,fr,hu,lt,lv,nl,pl,pt ,ro,sk,sl,sv | |
351
  |[Europat](https://opus.nlpl.eu/EuroPat/corpus/version/EuroPat) | |en,hr | no |
352
+ |[GAITU Corpus](https://gaitu.eus/) | | | eu|
353
  |[KDE4](https://opus.nlpl.eu/KDE4/corpus/version/KDE4) |bg,cs,da,de,el ,et,eu,fi,fr,ga,gl,hr,it,lt,lv,nl,pl,pt,ro,sk,sl,sv |bg,ga,hr |cy,ga,nn,oc |
354
  |[GlobalVoices](https://opus.nlpl.eu/GlobalVoices/corpus/version/GlobalVoices) | bg,de,fr,it,nl,pl,pt |bg,de,fr,pt | |
355
  |[GNOME](https://opus.nlpl.eu/GNOME/corpus/version/GNOME) |eu,fr,ga,gl,pt |ga |cy,ga,nn|
356
  |[JRC-Arquis](https://opus.nlpl.eu/JRC-Acquis/corpus/version/JRC-Acquis) | |cs,da,et,fr,lt,lv,mt,nl,pl ,ro,sv| et |
357
+ |LES-CORTS-VALENCIANES-SYNTH-BSC | | val | |
358
  |[MaCoCu](https://opus.nlpl.eu/MaCoCu/corpus/version/MaCoCu) | en | | hr,mt,uk |
359
  |[MultiCCAligned](https://opus.nlpl.eu/JRC-Acquis/corpus/version/JRC-Acquis) |bg,cs,de,el,et,fi,fr,hr,hu,it,lt,lv,nl,pl,ro,sk,sv |bg,fi,fr,hr,it,lv,nl,pt |bg,cy,da,et,fi,hr,hu,lt,lv,no,sl,sr,uk|
360
  |[MultiHPLT](https://opus.nlpl.eu/MultiHPLT/corpus/version/MultiHPLT) |en, et,fi,ga,hr,mt | |fi,ga,gl,hr,mt,nn,sr |
 
362
  |[MultiUN](https://opus.nlpl.eu/MultiUN/corpus/version/MultiUN) | |fr | |
363
  |[News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary) | |fr | |
364
  |[NLLB](https://opus.nlpl.eu/NLLB/corpus/version/NLLB) |bg,da,el,en,et,fi,fr,gl,hu,it ,lt,lv,pt,ro,sk,sl |bg,cs,da,de,el ,et,fi,fr,hu,it,lt,lv,nl,pl,pt ,ro,sk,sl,sv| bg,cs,cy,da,de,el,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,no,oc,pl,pt,ro,ru,sk,sl,sr,sv,uk|
365
+ |[NÓS Corpus](https://zenodo.org/records/7675110) | | | gl |
366
  |[NÓS-SYN](https://zenodo.org/records/7685180) | | | gl |
367
  |[NTEU](https://www.elrc-share.eu/repository/search/?q=NTEU) | |bg,cs,da,de,el,en,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,lt,lv,mt,ro,sk,sl,sv |
368
  |[OpenSubtitles](https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles) |bg,cs,da,de,el ,et,eu,fi,gl,hr,hu,lt,lv,nl,pl,pt,ro,sk,sl,sv |da,de,fi,fr,hr,hu,it,lv,nl | bg,cs,de,el,et,hr,fi,fr,hr,hu,no,sl,sr|
 
371
  |[Tatoeba](https://opus.nlpl.eu/Tatoeba/corpus/version/Tatoeba) |de,pt |pt | |
372
  |[TildeModel](https://opus.nlpl.eu/TildeMODEL/corpus/version/TildeMODEL) | |bg | et,hr,lt,lv,mt |
373
  |[UNPC](https://opus.nlpl.eu/UNPC/corpus/version/UNPC) | |en,fr | ru |
374
+ |[PILAR-VALENCIAN-AUTH](https://github.com/transducens/PILAR/tree/main/valencian/Generalitat) | | val | |
375
+ |[PILAR-VALENCIAN-SYNTH](https://github.com/transducens/PILAR/tree/main/valencian/Generalitat) | | val | |
376
  |[WikiMatrix](https://opus.nlpl.eu/WikiMatrix/corpus/version/WikiMatrix) |bg,cs,da,de,el ,et,eu,fi,fr,gl,hr,hu,it,lt,nl,pl,pt,ro,sk,sl,sv |bg,en,fr,hr,it,pt | oc,sh |
377
  |[Wikimedia](https://opus.nlpl.eu/wikimedia/corpus/version/wikimedia) | | |cy,nn |
378
  |[XLENT](https://opus.nlpl.eu/XLEnt/corpus/version/XLEnt) |eu,ga,gl |ga |cy,et,ga,gl,hr,oc,sh|
379
 
380
 
381
+ Datasets with "-BSC" in their names (e.g., BOUA-SYNTH-BSC, DOGV-SYNTH-BSC) are synthetic datasets obtained by machine translating
382
+ pre-existing monolingual corpora with our own seq-to-seq models. These datasets were generated internally for model training and are not published.
383
 
384
  To consult the data summary document with the respective licences, please send an e-mail to ipr@bsc.es.
385
 
 
418
 
419
  ### Instruction Tuning Data
420
 
421
+ This model has been fine-tuned on ~135k instructions, primarily targeting machine translation performance for Catalan, English, and Spanish.
422
+ Additional instruction data for other European and closely related Iberian languages was also included, as it yielded a positive impact on the languages of interest.
423
+ That said, the performance in these additional languages is not guaranteed due to the limited amount of available data and the lack of resources for thorough testing.
424
 
425
+ A portion of our fine-tuning data comes directly from, or is sampled from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2).
426
+ We also created additional datasets for our main languages of interest.
427
+ While tasks relating to machine translation are included, it’s important to note that no chat data was used in the fine-tuning process.
428
 
429
  Click the expand button below to see the full list of tasks included in the finetuning data.
430
 
 
470
  | Context-Aware Translation | [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2): [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval) | en-de | 558 |
471
  |**Total** | | | **135,404** |
472
 
473
+ The non-public portion of this dataset was jointly created by the [ILENIA](https://proyectoilenia.es/) partners BSC, HiTZ, and CiTIUS. For further information regarding the instruction-tuning data,
474
+ please contact <langtech@bsc.es>.
475
 
476
  </details>
477