Update README.md
Browse files
README.md
CHANGED
@@ -265,7 +265,7 @@ The compression module is a light-weight transformer that takes as input the hid
|
|
265 |
|
266 |
## Version
|
267 |
|
268 |
-
This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-
|
269 |
|
270 |
We have more versions available:
|
271 |
|
@@ -302,18 +302,18 @@ def load_and_resample_audio(audio_path, target_sr=16000):
|
|
302 |
|
303 |
# Load processors and tokenizers
|
304 |
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
|
305 |
-
tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-
|
306 |
|
307 |
# Load ZeroSwot Encoder
|
308 |
-
commit_hash = "
|
309 |
zeroswot_encoder = AutoModel.from_pretrained(
|
310 |
-
"johntsi/ZeroSwot-
|
311 |
)
|
312 |
zeroswot_encoder.eval()
|
313 |
zeroswot_encoder.to("cuda")
|
314 |
|
315 |
# Load NLLB Model
|
316 |
-
nllb_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-
|
317 |
nllb_model.eval()
|
318 |
nllb_model.to("cuda")
|
319 |
|
@@ -335,14 +335,15 @@ print(translation)
|
|
335 |
|
336 |
## Results
|
337 |
|
338 |
-
BLEU scores on CoVoST-2 test compared to supervised SOTA models
|
339 |
|
340 |
| Models | ZS | Size (B) | Ar | Ca | Cy | De | Et | Fa | Id | Ja | Lv | Mn | Sl | Sv | Ta | Tr | Zh | Average |
|
341 |
|:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
|
342 |
-
| [XLS-R-
|
343 |
-
| [SeamlessM4T-
|
344 |
-
|
|
345 |
-
| [ZeroSwot-
|
|
|
346 |
|
347 |
## Citation
|
348 |
|
|
|
265 |
|
266 |
## Version
|
267 |
|
268 |
+
This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) model.
|
269 |
|
270 |
We have more versions available:
|
271 |
|
|
|
302 |
|
303 |
# Load processors and tokenizers
|
304 |
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
|
305 |
+
tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B")
|
306 |
|
307 |
# Load ZeroSwot Encoder
|
308 |
+
commit_hash = "fc0da35496bd26102f342b0694a3a89791eb713c"
|
309 |
zeroswot_encoder = AutoModel.from_pretrained(
|
310 |
+
"johntsi/ZeroSwot-Large_asr-cv_en-to-200", trust_remote_code=True, revision=commit_hash,
|
311 |
)
|
312 |
zeroswot_encoder.eval()
|
313 |
zeroswot_encoder.to("cuda")
|
314 |
|
315 |
# Load NLLB Model
|
316 |
+
nllb_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B")
|
317 |
nllb_model.eval()
|
318 |
nllb_model.to("cuda")
|
319 |
|
|
|
335 |
|
336 |
## Results
|
337 |
|
338 |
+
BLEU scores on CoVoST-2 test compared to supervised SOTA models XLS-R-2B and SeamlessM4T-Large. You can refer to Table 5 of the Results section in the paper for more details.
|
339 |
|
340 |
| Models | ZS | Size (B) | Ar | Ca | Cy | De | Et | Fa | Id | Ja | Lv | Mn | Sl | Sv | Ta | Tr | Zh | Average |
|
341 |
|:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
|
342 |
+
| [XLS-R-2B](https://huggingface.co/facebook/wav2vec2-xls-r-2b-en-to-15) | ✗ | 2.0 | 20.7 | 34.2 | 33.8 | 28.3 | 24.1 | 22.9 | 32.5 | 41.5 | 23.5 | 16.2 | 27.6 | 34.5 | 19.8 | 18.6 | 38.5 | 27.8 |
|
343 |
+
| [SeamlessM4T-L-v1](https://huggingface.co/facebook/seamless-m4t-large) | ✗ | 2.3 | 24.5 | 41.6 | 33.6 | 35.9 | 28.5 | 19.3 | 39.0 | 39.4 | 23.8 | 15.7 | 35.0 | 42.5 | 22.7 | 23.9 | 33.1 | 30.6 |
|
344 |
+
| [SeamlessM4T-L-v2](https://huggingface.co/facebook/seamless-m4t-v2-large) | ✗ | 2.3 | 25.4 | **43.6** | **35.5** | **37.0** | **29.3** | 19.2 | **40.2** | 39.7 | 24.8 | 16.4 | **36.2** | **43.7** | 23.4 | **24.7** | 35.9 | **31.7** |
|
345 |
+
| [ZeroSwot-Large_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_en-to-200) | ✓ | 0.35/1.65 | 19.8 | 36.1 | 22.6 | 31.8 | 23.6 | 16.8 | 34.2 | 33.6 | 17.5 | 11.8 | 28.9 | 36.8 | 19.1 | 17.5 | 32.2 | 25.5 |
|
346 |
+
| [ZeroSwot-Large_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_mt-covost2_en-to-15) | ✓ | 0.35/1.65 | **25.7** | 40.0 | 29.0 | 32.8 | 27.2 | **26.6** | 37.1 | **47.1** | **25.7** | **18.9** | 33.2 | 39.3 | **25.3** | 19.8 | **40.5** | 31.2 |
|
347 |
|
348 |
## Citation
|
349 |
|