johntsi commited on
Commit
50e8e96
·
verified ·
1 Parent(s): 762878c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -225
README.md CHANGED
@@ -1,231 +1,20 @@
1
  ---
2
  language:
3
- - ace
4
- - acm
5
- - acq
6
- - aeb
7
- - af
8
- - ajp
9
- - ak
10
- - als
11
- - am
12
- - apc
13
  - ar
14
- - ars
15
- - ary
16
- - arz
17
- - as
18
- - ast
19
- - awa
20
- - ayr
21
- - azb
22
- - azj
23
- - ba
24
- - bm
25
- - ban
26
- - be
27
- - bem
28
- - bn
29
- - bho
30
- - bjn
31
- - bo
32
- - bs
33
- - bug
34
- - bg
35
  - ca
36
- - ceb
37
- - cs
38
- - cjk
39
- - ckb
40
- - crh
41
- - cy
42
- - da
43
  - de
44
- - dik
45
- - dyu
46
- - dz
47
- - el
48
- - en
49
- - eo
50
  - et
51
- - eu
52
- - ee
53
- - fo
54
- - fj
55
- - fi
56
- - fon
57
- - fr
58
- - fur
59
- - fuv
60
- - gaz
61
- - gd
62
- - ga
63
- - gl
64
- - gn
65
- - gu
66
- - ht
67
- - ha
68
- - he
69
- - hi
70
- - hne
71
- - hr
72
- - hu
73
- - hy
74
- - ig
75
- - ilo
76
  - id
77
- - is
78
- - it
79
- - jv
80
  - ja
81
- - kab
82
- - kac
83
- - kam
84
- - kn
85
- - ks
86
- - ka
87
- - kk
88
- - kbp
89
- - kea
90
- - khk
91
- - km
92
- - ki
93
- - rw
94
- - ky
95
- - kmb
96
- - kmr
97
- - knc
98
- - kg
99
- - ko
100
- - lo
101
- - lij
102
- - li
103
- - ln
104
- - lt
105
- - lmo
106
- - ltg
107
- - lb
108
- - lua
109
- - lg
110
- - luo
111
- - lus
112
- - lvs
113
- - mag
114
- - mai
115
- - ml
116
- - mar
117
- - min
118
- - mk
119
- - mt
120
- - mni
121
- - mos
122
- - mi
123
- - my
124
- - nl
125
- - nn
126
- - nb
127
- - npi
128
- - nso
129
- - nus
130
- - ny
131
- - oc
132
- - ory
133
- - pag
134
- - pa
135
- - pap
136
- - pbt
137
- - pes
138
- - plt
139
- - pl
140
- - pt
141
- - prs
142
- - quy
143
- - ro
144
- - rn
145
- - ru
146
- - sg
147
- - sa
148
- - sat
149
- - scn
150
- - shn
151
- - si
152
- - sk
153
  - sl
154
- - sm
155
- - sn
156
- - sd
157
- - so
158
- - st
159
- - es
160
- - sc
161
- - sr
162
- - ss
163
- - su
164
  - sv
165
- - swh
166
- - szl
167
  - ta
168
- - taq
169
- - tt
170
- - te
171
- - tg
172
- - tl
173
- - th
174
- - ti
175
- - tpi
176
- - tn
177
- - ts
178
- - tk
179
- - tum
180
  - tr
181
- - tw
182
- - tzm
183
- - ug
184
- - uk
185
- - umb
186
- - ur
187
- - uzn
188
- - vec
189
- - vi
190
- - war
191
- - wo
192
- - xh
193
- - ydd
194
- - yo
195
- - yue
196
  - zh
197
- - zsm
198
- - zu
199
- language_details: >-
200
- ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab,
201
- aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab,
202
- asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl,
203
- bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn,
204
- bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn,
205
- cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn,
206
- dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn,
207
- ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn,
208
- fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr,
209
- hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn,
210
- hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn,
211
- jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva,
212
- kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr,
213
- kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn,
214
- lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn,
215
- ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva,
216
- mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn,
217
- mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn,
218
- nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn,
219
- gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn,
220
- prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn,
221
- san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn,
222
- smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn,
223
- srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn,
224
- tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi,
225
- taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn,
226
- tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab,
227
- uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr,
228
- yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn
229
  license: mit
230
  metrics:
231
  - bleu
@@ -265,7 +54,7 @@ The compression module is a light-weight transformer that takes as input the hid
265
 
266
  ## Version
267
 
268
- This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
269
 
270
  We have more versions available:
271
 
@@ -302,18 +91,18 @@ def load_and_resample_audio(audio_path, target_sr=16000):
302
 
303
  # Load processors and tokenizers
304
  processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
305
- tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
306
 
307
  # Load ZeroSwot Encoder
308
- commit_hash = "eafabee295ea1c8b45483d1fd26bd747d9a7d937"
309
  zeroswot_encoder = AutoModel.from_pretrained(
310
- "johntsi/ZeroSwot-Medium_asr-cv_en-to-200", trust_remote_code=True, revision=commit_hash,
311
  )
312
  zeroswot_encoder.eval()
313
  zeroswot_encoder.to("cuda")
314
 
315
  # Load NLLB Model
316
- nllb_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
317
  nllb_model.eval()
318
  nllb_model.to("cuda")
319
 
@@ -335,14 +124,15 @@ print(translation)
335
 
336
  ## Results
337
 
338
- BLEU scores on CoVoST-2 test compared to supervised SOTA models [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b) and [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium). You can refer to Table 5 of the Results section in the paper for more details.
339
 
340
  | Models | ZS | Size (B) | Ar | Ca | Cy | De | Et | Fa | Id | Ja | Lv | Mn | Sl | Sv | Ta | Tr | Zh | Average |
341
  |:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
342
- | [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b) | ✗ | 1.0 | 19.2 | 32.1 | **31.8** | 26.2 | 22.4 | 21.3 | 30.3 | 39.9 | 22.0 | 14.9 | 25.4 | 32.3 | 18.1 | 17.1 | 36.7 | 26.0 |
343
- | [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium) | ✗ | 1.2 | 20.8 | 37.3 | 29.9 | **31.4** | 23.3 | 17.2 | 34.8 | 37.5 | 19.5 | 12.9 | 29.0 | 37.3 | 18.9 | **19.8** | 30.0 | 26.6 |
344
- | [ZeroSwot-M_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_en-to-200) | | 0.35/0.95 | 17.6 | 32.5 | 18.0 | 29.9 | 20.4 | 16.3 | 32.4 | 32.0 | 13.3 | 10.0 | 25.2 | 34.4 | 17.8 | 15.6 | 30.5 | 23.1 |
345
- | [ZeroSwot-M_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_mt-covost2_en-to-200) | ✓ | 0.35/0.95 | **24.4** | **38.7** | 28.8 | 31.2 | **26.2** | **26.0** | **36.0** | **46.0** | **24.8** | **19.0** | **31.6** | **37.8** | **24.4** | 18.6 | **39.0** | **30.2** |
 
346
 
347
  ## Citation
348
 
 
1
  ---
2
  language:
3
+ - en
 
 
 
 
 
 
 
 
 
4
  - ar
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - ca
 
 
 
 
 
 
 
6
  - de
 
 
 
 
 
 
7
  - et
8
+ - fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - id
 
 
 
10
  - ja
11
+ - lv
12
+ - mn
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - sl
 
 
 
 
 
 
 
 
 
 
14
  - sv
 
 
15
  - ta
 
 
 
 
 
 
 
 
 
 
 
 
16
  - tr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  - zh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  license: mit
19
  metrics:
20
  - bleu
 
54
 
55
  ## Version
56
 
57
+ This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-1.3B_covost2](https://huggingface.co/facebook/nllb-200-distilled-600M_covost2_en-to-15) model, which was first finetuned on CoVoST2 MT data.
58
 
59
  We have more versions available:
60
 
 
91
 
92
  # Load processors and tokenizers
93
  processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")
94
+ tokenizer = NllbTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_covost2_en-to-15")
95
 
96
  # Load ZeroSwot Encoder
97
+ commit_hash = "762878c55bf91406318983c724db22590a828e96"
98
  zeroswot_encoder = AutoModel.from_pretrained(
99
+ "johntsi/ZeroSwot-Large_asr-cv_mt-covost2_en-to-15", trust_remote_code=True, revision=commit_hash,
100
  )
101
  zeroswot_encoder.eval()
102
  zeroswot_encoder.to("cuda")
103
 
104
  # Load NLLB Model
105
+ nllb_model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_covost2_en-to-15")
106
  nllb_model.eval()
107
  nllb_model.to("cuda")
108
 
 
124
 
125
  ## Results
126
 
127
+ BLEU scores on CoVoST-2 test compared to supervised SOTA models XLS-R-2B and SeamlessM4T-Large. You can refer to Table 5 of the Results section in the paper for more details.
128
 
129
  | Models | ZS | Size (B) | Ar | Ca | Cy | De | Et | Fa | Id | Ja | Lv | Mn | Sl | Sv | Ta | Tr | Zh | Average |
130
  |:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
131
+ | [XLS-R-2B](https://huggingface.co/facebook/wav2vec2-xls-r-2b-en-to-15) | ✗ | 2.0 | 20.7 | 34.2 | 33.8 | 28.3 | 24.1 | 22.9 | 32.5 | 41.5 | 23.5 | 16.2 | 27.6 | 34.5 | 19.8 | 18.6 | 38.5 | 27.8 |
132
+ | [SeamlessM4T-L-v1](https://huggingface.co/facebook/seamless-m4t-large) | ✗ | 2.3 | 24.5 | 41.6 | 33.6 | 35.9 | 28.5 | 19.3 | 39.0 | 39.4 | 23.8 | 15.7 | 35.0 | 42.5 | 22.7 | 23.9 | 33.1 | 30.6 |
133
+ | [SeamlessM4T-L-v2](https://huggingface.co/facebook/seamless-m4t-v2-large) | | 2.3 | 25.4 | **43.6** | **35.5** | **37.0** | **29.3** | 19.2 | **40.2** | 39.7 | 24.8 | 16.4 | **36.2** | **43.7** | 23.4 | **24.7** | 35.9 | **31.7** |
134
+ | [ZeroSwot-Large_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_en-to-200) | ✓ | 0.35/1.65 | 19.8 | 36.1 | 22.6 | 31.8 | 23.6 | 16.8 | 34.2 | 33.6 | 17.5 | 11.8 | 28.9 | 36.8 | 19.1 | 17.5 | 32.2 | 25.5 |
135
+ | [ZeroSwot-Large_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_mt-covost2_en-to-15) | ✓ | 0.35/1.65 | **25.7** | 40.0 | 29.0 | 32.8 | 27.2 | **26.6** | 37.1 | **47.1** | **25.7** | **18.9** | 33.2 | 39.3 | **25.3** | 19.8 | **40.5** | 31.2 |
136
 
137
  ## Citation
138