johntsi
/

ZeroSwot-Large_asr-cv_en-to-200

@@ -265,7 +265,7 @@ The compression module is a light-weight transformer that takes as input the hid
 ## Version
-This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
 We have more versions available:
@@ -302,18 +302,18 @@ def load_and_resample_audio(audio_path, target_sr=16000):
 # Load processors and tokenizers
 processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
-tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
 # Load ZeroSwot Encoder
-commit_hash = "eafabee295ea1c8b45483d1fd26bd747d9a7d937"
 zeroswot_encoder = AutoModel.from_pretrained(
-    "johntsi/ZeroSwot-Medium_asr-cv_en-to-200", trust_remote_code=True, revision=commit_hash,
 )
 zeroswot_encoder.eval()
 zeroswot_encoder.to("cuda")
 # Load NLLB Model
-nllb_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
 nllb_model.eval()
 nllb_model.to("cuda")
@@ -335,14 +335,15 @@ print(translation)
 ## Results
-BLEU scores on CoVoST-2 test compared to supervised SOTA models [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b) and [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium). You can refer to Table 5 of the Results section in the paper for more details.
 |     Models     |  ZS  |  Size (B)  |  Ar  |  Ca  |  Cy  |  De  |  Et  |  Fa  |  Id  |  Ja  |  Lv  |  Mn  |  Sl  |  Sv  |  Ta  |  Tr  |  Zh  | Average |
 |:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
-|    [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b)    |  ✗   |    1.0     | 19.2 | 32.1 | **31.8** | 26.2 | 22.4 | 21.3 | 30.3 | 39.9 | 22.0 | 14.9 | 25.4 | 32.3 | 18.1 | 17.1 | 36.7 |   26.0  |
-| [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium)  |  ✗   |    1.2     | 20.8 | 37.3 | 29.9 | **31.4** | 23.3 | 17.2 | 34.8 | 37.5 | 19.5 | 12.9 | 29.0 | 37.3 | 18.9 | **19.8** | 30.0 |   26.6  |
-| [ZeroSwot-M_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_en-to-200) |  ✓   | 0.35/0.95  | 17.6 | 32.5 | 18.0 | 29.9 | 20.4 | 16.3 | 32.4 | 32.0 | 13.3 | 10.0 | 25.2 | 34.4 | 17.8 | 15.6 | 30.5 |   23.1  |
-| [ZeroSwot-M_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_mt-covost2_en-to-200) |  ✓   | 0.35/0.95  | **24.4** | **38.7** | 28.8 | 31.2 | **26.2** | **26.0** | **36.0** | **46.0** | **24.8** | **19.0** | **31.6** | **37.8** | **24.4** | 18.6 | **39.0** |   **30.2**  |
 ## Citation

 ## Version
+This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) model.
 We have more versions available:
 # Load processors and tokenizers
 processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
+tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B")
 # Load ZeroSwot Encoder
+commit_hash = "fc0da35496bd26102f342b0694a3a89791eb713c"
 zeroswot_encoder = AutoModel.from_pretrained(
+    "johntsi/ZeroSwot-Large_asr-cv_en-to-200", trust_remote_code=True, revision=commit_hash,
 )
 zeroswot_encoder.eval()
 zeroswot_encoder.to("cuda")
 # Load NLLB Model
+nllb_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B")
 nllb_model.eval()
 nllb_model.to("cuda")
 ## Results
+BLEU scores on CoVoST-2 test compared to supervised SOTA models XLS-R-2B and SeamlessM4T-Large. You can refer to Table 5 of the Results section in the paper for more details.
 |     Models     |  ZS  |  Size (B)  |  Ar  |  Ca  |  Cy  |  De  |  Et  |  Fa  |  Id  |  Ja  |  Lv  |  Mn  |  Sl  |  Sv  |  Ta  |  Tr  |  Zh  | Average |
 |:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
+|    [XLS-R-2B](https://huggingface.co/facebook/wav2vec2-xls-r-2b-en-to-15)    |  ✗   |    2.0     | 20.7 | 34.2 | 33.8 | 28.3 | 24.1 | 22.9 | 32.5 | 41.5 | 23.5 | 16.2 | 27.6 | 34.5 | 19.8 | 18.6 | 38.5 |   27.8  |
+| [SeamlessM4T-L-v1](https://huggingface.co/facebook/seamless-m4t-large)  |  ✗   |    2.3     | 24.5 | 41.6 | 33.6 | 35.9 | 28.5 | 19.3 | 39.0 | 39.4 | 23.8 | 15.7 | 35.0 | 42.5 | 22.7 | 23.9 | 33.1 |   30.6  |
+|   [SeamlessM4T-L-v2](https://huggingface.co/facebook/seamless-m4t-v2-large)      |  ✗   |    2.3     | 25.4 | **43.6** | **35.5** | **37.0** | **29.3** | 19.2 | **40.2** | 39.7 | 24.8 | 16.4 | **36.2** | **43.7** | 23.4 | **24.7** | 35.9 |   **31.7**  |
+| [ZeroSwot-Large_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_en-to-200) |  ✓   | 0.35/1.65  | 19.8 | 36.1 | 22.6 | 31.8 | 23.6 | 16.8 | 34.2 | 33.6 | 17.5 | 11.8 | 28.9 | 36.8 | 19.1 | 17.5 | 32.2 |   25.5  |
+| [ZeroSwot-Large_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_mt-covost2_en-to-15) |  ✓   | 0.35/1.65  | **25.7** | 40.0 | 29.0 | 32.8 | 27.2 | **26.6** | 37.1 | **47.1** | **25.7** | **18.9** | 33.2 | 39.3 | **25.3** | 19.8 | **40.5** |  31.2  |
 ## Citation