fdelucaf commited on
Commit
290f442
·
verified ·
1 Parent(s): 822951e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -362,8 +362,8 @@ Click the expand button below to see the full list of corpora included in the tr
362
  |[MultiUN](https://opus.nlpl.eu/MultiUN/corpus/version/MultiUN) | |fr | |
363
  |[News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary) | |fr | |
364
  |[NLLB](https://opus.nlpl.eu/NLLB/corpus/version/NLLB) |bg,da,el,en,et,fi,fr,gl,hu,it ,lt,lv,pt,ro,sk,sl |bg,cs,da,de,el ,et,fi,fr,hu,it,lt,lv,nl,pl,pt ,ro,sk,sl,sv| bg,cs,cy,da,de,el,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,no,oc,pl,pt,ro,ru,sk,sl,sr,sv,uk|
365
- |[NÓS Corpus](https://zenodo.org/records/7675110) | | | gl |
366
- |[NÓS-SYN](https://zenodo.org/records/7685180) | | | gl |
367
  |[NTEU](https://www.elrc-share.eu/repository/search/?q=NTEU) | |bg,cs,da,de,el,en,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,lt,lv,mt,ro,sk,sl,sv |
368
  |[OpenSubtitles](https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles) |bg,cs,da,de,el ,et,eu,fi,gl,hr,hu,lt,lv,nl,pl,pt,ro,sk,sl,sv |da,de,fi,fr,hr,hu,it,lv,nl | bg,cs,de,el,et,hr,fi,fr,hr,hu,no,sl,sr|
369
  |[OPUS-100](https://opus.nlpl.eu/opus-100.php) | en | | gl |
@@ -470,7 +470,8 @@ Click the expand button below to see the full list of tasks included in the fine
470
  | Context-Aware Translation | [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2): [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval) | en-de | 558 |
471
  |**Total** | | | **135,404** |
472
 
473
- The non-public portion of this dataset was jointly created by the [ILENIA](https://proyectoilenia.es/) partners BSC, HiTZ, and CiTIUS. For further information regarding the instruction-tuning data,
 
474
  please contact <langtech@bsc.es>.
475
 
476
  </details>
@@ -498,7 +499,11 @@ please contact <langtech@bsc.es>.
498
 
499
  ## Evaluation
500
 
501
- Below are the evaluation results on the [Flores+200 devtest set](https://huggingface.co/datasets/openlanguagedata/flores_plus), compared against the state-of-the-art MADLAD400-7B model ([Kudugunta, S., et al.](https://arxiv.org/abs/2309.04662)) and SalamandraTA-7b-base model. These results cover translation directions between CA-XX, ES-XX, EN-XX, as well as XX-CA, XX-ES, and XX-EN. The metrics have been computed excluding Asturian, Aranese, and Aragonese as we report them separately. The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 500 tokens). We report the following metrics:
 
 
 
 
502
 
503
  <details>
504
  <summary>Click to show metrics details</summary>
@@ -639,7 +644,9 @@ This section presents the evaluation metrics for Basque translation tasks.
639
 
640
  ### Low-Resource Languages of Spain
641
 
642
- The tables below summarize the performance metrics for English, Spanish, and Catalan to Asturian, Aranese and Aragonese compared against [Transducens/IbRo-nllb](https://huggingface.co/Transducens/IbRo-nllb) [(Galiano Jimenez, et al.)](https://aclanthology.org/2024.wmt-1.85/), NLLB-3.3 ([Costa-jussà et al., 2022](https://arxiv.org/abs/2207.04672)) and [SalamandraTA-2B](https://huggingface.co/BSC-LT/salamandraTA-2B).
 
 
643
 
644
  <details>
645
  <summary>English evaluation</summary>
@@ -674,18 +681,18 @@ The tables below summarize the performance metrics for English, Spanish, and Cat
674
  | SalamandraTA-7b-instruct | es | ast | **21.28** | **68.11** | **52.73** |
675
  | SalamandraTA-7b-base | es | ast | 17.65 | 75.78 | 51.05 |
676
  | Transducens/IbRo-nllb | es | ast | 16.79 | 76.36 | 50.89 |
677
- | salamandraTA2B | es | ast | 16.68 | 77.29 | 49.46 |
678
  | nllb-3.3B | es | ast | 11.85 | 100.86 | 40.27 |
679
  | | | | | | |
680
  | SalamandraTA-7b-base | es | arn | **29.19** | **71.85** | **49.42** |
681
  | Transducens/IbRo-nllb | es | arn | 28.45 | 72.56 | 49.28 |
682
  | SalamandraTA-7b-instruct | es | arn | 26.82 | 74.04 | 47.55 |
683
- | salamandraTA2B | es | arn | 25.41 | 74.71 | 47.33 |
684
  | | | | | | |
685
  | Transducens/IbRo-nllb | es | arg | **59.75** | **28.01** | **78.73** |
686
  | SalamandraTA-7b-base | es | arg | 53.96 | 31.51 | 76.08 |
687
  | SalamandraTA-7b-instruct | es | arg | 47.54 | 36.57 | 72.38 |
688
- | salamandraTA2B | es | arg | 44.57 | 37.93 | 71.32 |
689
 
690
  </details>
691
 
@@ -701,19 +708,19 @@ The tables below summarize the performance metrics for English, Spanish, and Cat
701
  |:---------------------------------|:---------|:---------|-------:|-------:|-------:|
702
  | SalamandraTA-7b-instruct | ca | ast | **27.86** | **58.19** | 57.98 |
703
  | SalamandraTA-7b-base | ca | ast | 26.11 | 63.63 | **58.08** |
704
- | salamandraTA2B | ca | ast | 25.32 | 62.59 | 55.98 |
705
  | Transducens/IbRo-nllb | ca | ast | 24.77 | 61.60 | 57.49 |
706
  | nllb-3.3B | ca | ast | 17.17 | 91.47 | 45.83 |
707
  | | | | | | |
708
  | SalamandraTA-7b-base | ca | arn | **17.77** | **80.88** | **42.12** |
709
  | Transducens/IbRo-nllb | ca | arn | 17.51 | 81.18 | 41.91 |
710
  | SalamandraTA-7b-instruct | ca | arn | 16.45 | 82.01 | 41.04 |
711
- | salamandraTA2B | ca | arn | 15.37 | 82.76 | 40.53 |
712
  | | | | | | |
713
  | Transducens/IbRo-nllb | ca | arg | **24.44** | **60.79** | **55.51** |
714
  | SalamandraTA-7b-base | ca | arg | 22.53 | 62.37 | 54.32 |
715
  | SalamandraTA-7b-instruct | ca | arg | 21.62 | 63.38 | 53.01 |
716
- | salamandraTA2B | ca | arg | 18.6 | 65.82 | 51.21 |
717
 
718
  </details>
719
 
@@ -725,7 +732,8 @@ With regard to MT models, no specific analysis has yet been carried out in order
725
  accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
726
  cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to perform more analyses as soon
727
  as we have implemented the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
728
- Note that the model has only undergone preliminary instruction tuning. We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
 
729
 
730
  ## Additional information
731
 
 
362
  |[MultiUN](https://opus.nlpl.eu/MultiUN/corpus/version/MultiUN) | |fr | |
363
  |[News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary) | |fr | |
364
  |[NLLB](https://opus.nlpl.eu/NLLB/corpus/version/NLLB) |bg,da,el,en,et,fi,fr,gl,hu,it ,lt,lv,pt,ro,sk,sl |bg,cs,da,de,el ,et,fi,fr,hu,it,lt,lv,nl,pl,pt ,ro,sk,sl,sv| bg,cs,cy,da,de,el,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,no,oc,pl,pt,ro,ru,sk,sl,sr,sv,uk|
365
+ |[NÓS Authentic Corpus](https://zenodo.org/records/7675110) | | | gl |
366
+ |[NÓS Synthetic Corpus](https://zenodo.org/records/7685180) | | | gl |
367
  |[NTEU](https://www.elrc-share.eu/repository/search/?q=NTEU) | |bg,cs,da,de,el,en,et,fi,fr,ga,hr,hu,it,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv | da,et,ga,hr,lt,lv,mt,ro,sk,sl,sv |
368
  |[OpenSubtitles](https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles) |bg,cs,da,de,el ,et,eu,fi,gl,hr,hu,lt,lv,nl,pl,pt,ro,sk,sl,sv |da,de,fi,fr,hr,hu,it,lv,nl | bg,cs,de,el,et,hr,fi,fr,hr,hu,no,sl,sr|
369
  |[OPUS-100](https://opus.nlpl.eu/opus-100.php) | en | | gl |
 
470
  | Context-Aware Translation | [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2): [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval) | en-de | 558 |
471
  |**Total** | | | **135,404** |
472
 
473
+ The non-public portion of this dataset was jointly created by the [ILENIA](https://proyectoilenia.es/) partners BSC, [HiTZ](http://hitz.ehu.eus/es),
474
+ and [CiTIUS](https://citius.gal/es/). For further information regarding the instruction-tuning data,
475
  please contact <langtech@bsc.es>.
476
 
477
  </details>
 
499
 
500
  ## Evaluation
501
 
502
+ Below are the evaluation results on the [Flores+200 devtest set](https://huggingface.co/datasets/openlanguagedata/flores_plus),
503
+ compared against the state-of-the-art MADLAD400-7B model ([Kudugunta, S., et al.](https://arxiv.org/abs/2309.04662)) and SalamandraTA-7b-base model.
504
+ These results cover translation directions between CA-XX, ES-XX, EN-XX, as well as XX-CA, XX-ES, and XX-EN.
505
+ The metrics have been computed excluding Asturian, Aranese, and Aragonese, as we report them separately.
506
+ The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 500 tokens). We report the following metrics:
507
 
508
  <details>
509
  <summary>Click to show metrics details</summary>
 
644
 
645
  ### Low-Resource Languages of Spain
646
 
647
+ The tables below summarize the performance metrics for English, Spanish, and Catalan to Asturian, Aranese and Aragonese compared
648
+ against [Transducens/IbRo-nllb](https://huggingface.co/Transducens/IbRo-nllb) [(Galiano Jimenez, et al.)](https://aclanthology.org/2024.wmt-1.85/),
649
+ NLLB-3.3 ([Costa-jussà et al., 2022](https://arxiv.org/abs/2207.04672)) and [SalamandraTA-2B](https://huggingface.co/BSC-LT/salamandraTA-2B).
650
 
651
  <details>
652
  <summary>English evaluation</summary>
 
681
  | SalamandraTA-7b-instruct | es | ast | **21.28** | **68.11** | **52.73** |
682
  | SalamandraTA-7b-base | es | ast | 17.65 | 75.78 | 51.05 |
683
  | Transducens/IbRo-nllb | es | ast | 16.79 | 76.36 | 50.89 |
684
+ | SalamandraTA-2B | es | ast | 16.68 | 77.29 | 49.46 |
685
  | nllb-3.3B | es | ast | 11.85 | 100.86 | 40.27 |
686
  | | | | | | |
687
  | SalamandraTA-7b-base | es | arn | **29.19** | **71.85** | **49.42** |
688
  | Transducens/IbRo-nllb | es | arn | 28.45 | 72.56 | 49.28 |
689
  | SalamandraTA-7b-instruct | es | arn | 26.82 | 74.04 | 47.55 |
690
+ | SalamandraTA-2B | es | arn | 25.41 | 74.71 | 47.33 |
691
  | | | | | | |
692
  | Transducens/IbRo-nllb | es | arg | **59.75** | **28.01** | **78.73** |
693
  | SalamandraTA-7b-base | es | arg | 53.96 | 31.51 | 76.08 |
694
  | SalamandraTA-7b-instruct | es | arg | 47.54 | 36.57 | 72.38 |
695
+ | SalamandraTA-2B | es | arg | 44.57 | 37.93 | 71.32 |
696
 
697
  </details>
698
 
 
708
  |:---------------------------------|:---------|:---------|-------:|-------:|-------:|
709
  | SalamandraTA-7b-instruct | ca | ast | **27.86** | **58.19** | 57.98 |
710
  | SalamandraTA-7b-base | ca | ast | 26.11 | 63.63 | **58.08** |
711
+ | SalamandraTA-2B | ca | ast | 25.32 | 62.59 | 55.98 |
712
  | Transducens/IbRo-nllb | ca | ast | 24.77 | 61.60 | 57.49 |
713
  | nllb-3.3B | ca | ast | 17.17 | 91.47 | 45.83 |
714
  | | | | | | |
715
  | SalamandraTA-7b-base | ca | arn | **17.77** | **80.88** | **42.12** |
716
  | Transducens/IbRo-nllb | ca | arn | 17.51 | 81.18 | 41.91 |
717
  | SalamandraTA-7b-instruct | ca | arn | 16.45 | 82.01 | 41.04 |
718
+ | SalamandraTA-2B | ca | arn | 15.37 | 82.76 | 40.53 |
719
  | | | | | | |
720
  | Transducens/IbRo-nllb | ca | arg | **24.44** | **60.79** | **55.51** |
721
  | SalamandraTA-7b-base | ca | arg | 22.53 | 62.37 | 54.32 |
722
  | SalamandraTA-7b-instruct | ca | arg | 21.62 | 63.38 | 53.01 |
723
+ | SalamandraTA-2B | ca | arg | 18.6 | 65.82 | 51.21 |
724
 
725
  </details>
726
 
 
732
  accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
733
  cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to perform more analyses as soon
734
  as we have implemented the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
735
+ Note that the model has only undergone preliminary instruction tuning.
736
+ We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
737
 
738
  ## Additional information
739