mms-1b-l1107 / README.md

Update README.md

1fdc004 over 1 year ago

14.4 kB

	---
	tags:
	- mms
	language:
	- ab
	- af
	- ak
	- am
	- ar
	- as
	- av
	- ay
	- az
	- ba
	- bm
	- be
	- bn
	- bi
	- bo
	- sh
	- br
	- bg
	- ca
	- cs
	- ce
	- cv
	- ku
	- cy
	- da
	- de
	- dv
	- dz
	- el
	- en
	- eo
	- et
	- eu
	- ee
	- fo
	- fa
	- fj
	- fi
	- fr
	- fy
	- ff
	- ga
	- gl
	- gn
	- gu
	- zh
	- ht
	- ha
	- he
	- hi
	- sh
	- hu
	- hy
	- ig
	- ia
	- ms
	- is
	- it
	- jv
	- ja
	- kn
	- ka
	- kk
	- kr
	- km
	- ki
	- rw
	- ky
	- ko
	- kv
	- lo
	- la
	- lv
	- ln
	- lt
	- lb
	- lg
	- mh
	- ml
	- mr
	- ms
	- mk
	- mg
	- mt
	- mn
	- mi
	- my
	- zh
	- nl
	- 'no'
	- 'no'
	- ne
	- ny
	- oc
	- om
	- or
	- os
	- pa
	- pl
	- pt
	- ms
	- ps
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- ro
	- rn
	- ru
	- sg
	- sk
	- sl
	- sm
	- sn
	- sd
	- so
	- es
	- sq
	- su
	- sv
	- sw
	- ta
	- tt
	- te
	- tg
	- tl
	- th
	- ti
	- ts
	- tr
	- uk
	- ms
	- vi
	- wo
	- xh
	- ms
	- yo
	- ms
	- zu
	- za
	license: cc-by-nc-4.0
	datasets:
	- google/fleurs
	metrics:
	- wer
	---

	# Massively Multilingual Speech (MMS) - Finetuned ASR - L1107

	This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/).
	This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and makes use of adapter models to transcribe 1000+ languages.
	The checkpoint consists of 1 billion parameters and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1107 languages.

	## Table Of Content

	- [Example](#example)
	- [Supported Languages](#supported-languages)
	- [Model details](#model-details)
	- [Additional links](#additional-links)

	## Example

	This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to transcribe audio of 1107 different
	languages. Let's look at a simple example.

	First, we install transformers and some other libraries
	```
	pip install torch accelerate torchaudio datasets
	pip install --upgrade transformers
	````

	Note: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version
	is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from
	source:
	```
	pip install git+https://github.com/huggingface/transformers.git
	```

	Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz.

	```py
	from datasets import load_dataset, Audio

	# English
	stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
	stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
	en_sample = next(iter(stream_data))["audio"]["array"]

	# French
	stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
	stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
	fr_sample = next(iter(stream_data))["audio"]["array"]
	```

	Next, we load the model and processor

	```py
	from transformers import Wav2Vec2ForCTC, AutoProcessor
	import torch

	model_id = "facebook/mms-1b-l1107"

	processor = AutoProcessor.from_pretrained(model_id)
	model = Wav2Vec2ForCTC.from_pretrained(model_id)
	```

	Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)

	```py
	inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs).logits

	ids = torch.argmax(outputs, dim=-1)[0]
	transcription = processor.decode(ids)
	# 'joe keton disapproved of films and buster also had reservations about the media'
	```

	We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "fra" for French.

	```py
	processor.tokenizer.set_target_lang("fra")
	model.load_adapter("fra")

	inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs).logits

	ids = torch.argmax(outputs, dim=-1)[0]
	transcription = processor.decode(ids)
	# "ce dernier est volé tout au long de l'histoire romaine"
	```

	In the same way the language can be switched out for all other supported languages. Please have a look at:
	```py
	processor.tokenizer.vocab.keys()
	```

	For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).

	## Supported Languages

	This model supports 1107 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
	You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
	<details>
	<summary>Click to toggle</summary>

	- abi
	- abp
	- aca
	- acd
	- ace
	- acf
	- ach
	- acn
	- acr
	- acu
	- ade
	- adh
	- adj
	- adx
	- aeu
	- agd
	- agg
	- agn
	- agr
	- agu
	- agx
	- aha
	- ahk
	- aia
	- aka
	- akb
	- ake
	- akp
	- alj
	- alp
	- alt
	- alz
	- ame
	- amf
	- amh
	- ami
	- amk
	- ann
	- any
	- aoz
	- apb
	- apr
	- ara
	- arl
	- asa
	- asg
	- asm
	- ata
	- atb
	- atg
	- ati
	- atq
	- ava
	- avn
	- avu
	- awa
	- awb
	- ayo
	- ayr
	- ayz
	- azb
	- azg
	- azj-script_cyrillic
	- azj-script_latin
	- azz
	- bak
	- bam
	- ban
	- bao
	- bav
	- bba
	- bbb
	- bbc
	- bbo
	- bcc-script_arabic
	- bcc-script_latin
	- bcl
	- bcw
	- bdg
	- bdh
	- bdq
	- bdu
	- bdv
	- beh
	- bem
	- ben
	- bep
	- bex
	- bfa
	- bfo
	- bfy
	- bfz
	- bgc
	- bgq
	- bgr
	- bgt
	- bgw
	- bha
	- bht
	- bhz
	- bib
	- bim
	- bis
	- biv
	- bjr
	- bjv
	- bjw
	- bjz
	- bkd
	- bkv
	- blh
	- blt
	- blx
	- blz
	- bmq
	- bmr
	- bmu
	- bmv
	- bng
	- bno
	- bnp
	- boa
	- bod
	- boj
	- bom
	- bor
	- bov
	- box
	- bpr
	- bps
	- bqc
	- bqi
	- bqj
	- bqp
	- bru
	- bsc
	- bsq
	- bss
	- btd
	- bts
	- btt
	- btx
	- bud
	- bul
	- bus
	- bvc
	- bvz
	- bwq
	- bwu
	- byr
	- bzh
	- bzi
	- bzj
	- caa
	- cab
	- cac-dialect_sanmateoixtatan
	- cac-dialect_sansebastiancoatan
	- cak-dialect_central
	- cak-dialect_santamariadejesus
	- cak-dialect_santodomingoxenacoj
	- cak-dialect_southcentral
	- cak-dialect_western
	- cak-dialect_yepocapa
	- cap
	- car
	- cas
	- cat
	- cax
	- cbc
	- cbi
	- cbr
	- cbs
	- cbt
	- cbu
	- cbv
	- cce
	- cco
	- cdj
	- ceb
	- ceg
	- cek
	- cfm
	- cgc
	- chf
	- chv
	- chz
	- cjo
	- cjp
	- cjs
	- cko
	- ckt
	- cla
	- cle
	- cly
	- cme
	- cmo-script_khmer
	- cmo-script_latin
	- cmr
	- cnh
	- cni
	- cnl
	- cnt
	- coe
	- cof
	- cok
	- con
	- cot
	- cou
	- cpa
	- cpb
	- cpu
	- crh
	- crk-script_latin
	- crk-script_syllabics
	- crn
	- crq
	- crs
	- crt
	- csk
	- cso
	- ctd
	- ctg
	- cto
	- ctu
	- cuc
	- cui
	- cuk
	- cul
	- cwa
	- cwe
	- cwt
	- cya
	- cym
	- daa
	- dah
	- dar
	- dbj
	- dbq
	- ddn
	- ded
	- des
	- deu
	- dga
	- dgi
	- dgk
	- dgo
	- dgr
	- dhi
	- did
	- dig
	- dik
	- dip
	- div
	- djk
	- dnj-dialect_blowowest
	- dnj-dialect_gweetaawueast
	- dnt
	- dnw
	- dop
	- dos
	- dsh
	- dso
	- dtp
	- dts
	- dug
	- dwr
	- dyi
	- dyo
	- dyu
	- dzo
	- eip
	- eka
	- ell
	- emp
	- enb
	- eng
	- enx
	- ese
	- ess
	- eus
	- evn
	- ewe
	- eza
	- fal
	- fao
	- far
	- fas
	- fij
	- fin
	- flr
	- fmu
	- fon
	- fra
	- frd
	- ful
	- gag-script_cyrillic
	- gag-script_latin
	- gai
	- gam
	- gau
	- gbi
	- gbk
	- gbm
	- gbo
	- gde
	- geb
	- gej
	- gil
	- gjn
	- gkn
	- gld
	- glk
	- gmv
	- gna
	- gnd
	- gng
	- gof-script_latin
	- gog
	- gor
	- gqr
	- grc
	- gri
	- grn
	- grt
	- gso
	- gub
	- guc
	- gud
	- guh
	- guj
	- guk
	- gum
	- guo
	- guq
	- guu
	- gux
	- gvc
	- gvl
	- gwi
	- gwr
	- gym
	- gyr
	- had
	- hag
	- hak
	- hap
	- hat
	- hau
	- hay
	- heb
	- heh
	- hif
	- hig
	- hil
	- hin
	- hlb
	- hlt
	- hne
	- hnn
	- hns
	- hoc
	- hoy
	- hto
	- hub
	- hui
	- hun
	- hus-dialect_centralveracruz
	- hus-dialect_westernpotosino
	- huu
	- huv
	- hvn
	- hwc
	- hyw
	- iba
	- icr
	- idd
	- ifa
	- ifb
	- ife
	- ifk
	- ifu
	- ify
	- ign
	- ikk
	- ilb
	- ilo
	- imo
	- inb
	- ind
	- iou
	- ipi
	- iqw
	- iri
	- irk
	- isl
	- itl
	- itv
	- ixl-dialect_sangasparchajul
	- ixl-dialect_sanjuancotzal
	- ixl-dialect_santamarianebaj
	- izr
	- izz
	- jac
	- jam
	- jav
	- jbu
	- jen
	- jic
	- jiv
	- jmc
	- jmd
	- jun
	- juy
	- jvn
	- kaa
	- kab
	- kac
	- kak
	- kan
	- kao
	- kaq
	- kay
	- kaz
	- kbo
	- kbp
	- kbq
	- kbr
	- kby
	- kca
	- kcg
	- kdc
	- kde
	- kdh
	- kdi
	- kdj
	- kdl
	- kdn
	- kdt
	- kek
	- ken
	- keo
	- ker
	- key
	- kez
	- kfb
	- kff-script_telugu
	- kfw
	- kfx
	- khg
	- khm
	- khq
	- kia
	- kij
	- kik
	- kin
	- kir
	- kjb
	- kje
	- kjg
	- kjh
	- kki
	- kkj
	- kle
	- klu
	- klv
	- klw
	- kma
	- kmd
	- kml
	- kmr-script_arabic
	- kmr-script_cyrillic
	- kmr-script_latin
	- kmu
	- knb
	- kne
	- knf
	- knj
	- knk
	- kno
	- kog
	- kor
	- kpq
	- kps
	- kpv
	- kpy
	- kpz
	- kqe
	- kqp
	- kqr
	- kqy
	- krc
	- kri
	- krj
	- krl
	- krr
	- krs
	- kru
	- ksb
	- ksr
	- kss
	- ktb
	- ktj
	- kub
	- kue
	- kum
	- kus
	- kvn
	- kvw
	- kwd
	- kwf
	- kwi
	- kxc
	- kxf
	- kxm
	- kxv
	- kyb
	- kyc
	- kyf
	- kyg
	- kyo
	- kyq
	- kyu
	- kyz
	- kzf
	- lac
	- laj
	- lam
	- lao
	- las
	- lat
	- lav
	- law
	- lbj
	- lbw
	- lcp
	- lee
	- lef
	- lem
	- lew
	- lex
	- lgg
	- lgl
	- lhu
	- lia
	- lid
	- lif
	- lip
	- lis
	- lje
	- ljp
	- llg
	- lln
	- lme
	- lnd
	- lns
	- lob
	- lok
	- lom
	- lon
	- loq
	- lsi
	- lsm
	- luc
	- lug
	- lwo
	- lww
	- lzz
	- maa-dialect_sanantonio
	- maa-dialect_sanjeronimo
	- mad
	- mag
	- mah
	- mai
	- maj
	- mak
	- mal
	- mam-dialect_central
	- mam-dialect_northern
	- mam-dialect_southern
	- mam-dialect_western
	- maq
	- mar
	- maw
	- maz
	- mbb
	- mbc
	- mbh
	- mbj
	- mbt
	- mbu
	- mbz
	- mca
	- mcb
	- mcd
	- mco
	- mcp
	- mcq
	- mcu
	- mda
	- mdv
	- mdy
	- med
	- mee
	- mej
	- men
	- meq
	- met
	- mev
	- mfe
	- mfh
	- mfi
	- mfk
	- mfq
	- mfy
	- mfz
	- mgd
	- mge
	- mgh
	- mgo
	- mhi
	- mhr
	- mhu
	- mhx
	- mhy
	- mib
	- mie
	- mif
	- mih
	- mil
	- mim
	- min
	- mio
	- mip
	- miq
	- mit
	- miy
	- miz
	- mjl
	- mjv
	- mkl
	- mkn
	- mlg
	- mmg
	- mnb
	- mnf
	- mnk
	- mnw
	- mnx
	- moa
	- mog
	- mon
	- mop
	- mor
	- mos
	- mox
	- moz
	- mpg
	- mpm
	- mpp
	- mpx
	- mqb
	- mqf
	- mqj
	- mqn
	- mrw
	- msy
	- mtd
	- mtj
	- mto
	- muh
	- mup
	- mur
	- muv
	- muy
	- mvp
	- mwq
	- mwv
	- mxb
	- mxq
	- mxt
	- mxv
	- mya
	- myb
	- myk
	- myl
	- myv
	- myx
	- myy
	- mza
	- mzi
	- mzj
	- mzk
	- mzm
	- mzw
	- nab
	- nag
	- nan
	- nas
	- naw
	- nca
	- nch
	- ncj
	- ncl
	- ncu
	- ndj
	- ndp
	- ndv
	- ndy
	- ndz
	- neb
	- new
	- nfa
	- nfr
	- nga
	- ngl
	- ngp
	- ngu
	- nhe
	- nhi
	- nhu
	- nhw
	- nhx
	- nhy
	- nia
	- nij
	- nim
	- nin
	- nko
	- nlc
	- nld
	- nlg
	- nlk
	- nmz
	- nnb
	- nnq
	- nnw
	- noa
	- nod
	- nog
	- not
	- npl
	- npy
	- nst
	- nsu
	- ntm
	- ntr
	- nuj
	- nus
	- nuz
	- nwb
	- nxq
	- nya
	- nyf
	- nyn
	- nyo
	- nyy
	- nzi
	- obo
	- ojb-script_latin
	- ojb-script_syllabics
	- oku
	- old
	- omw
	- onb
	- ood
	- orm
	- ory
	- oss
	- ote
	- otq
	- ozm
	- pab
	- pad
	- pag
	- pam
	- pan
	- pao
	- pap
	- pau
	- pbb
	- pbc
	- pbi
	- pce
	- pcm
	- peg
	- pez
	- pib
	- pil
	- pir
	- pis
	- pjt
	- pkb
	- pls
	- plw
	- pmf
	- pny
	- poh-dialect_eastern
	- poh-dialect_western
	- poi
	- pol
	- por
	- poy
	- ppk
	- pps
	- prf
	- prk
	- prt
	- pse
	- pss
	- ptu
	- pui
	- pwg
	- pww
	- pxm
	- qub
	- quc-dialect_central
	- quc-dialect_east
	- quc-dialect_north
	- quf
	- quh
	- qul
	- quw
	- quy
	- quz
	- qvc
	- qve
	- qvh
	- qvm
	- qvn
	- qvo
	- qvs
	- qvw
	- qvz
	- qwh
	- qxh
	- qxl
	- qxn
	- qxo
	- qxr
	- rah
	- rai
	- rap
	- rav
	- raw
	- rej
	- rel
	- rgu
	- rhg
	- rif-script_arabic
	- rif-script_latin
	- ril
	- rim
	- rjs
	- rkt
	- rmc-script_cyrillic
	- rmc-script_latin
	- rmo
	- rmy-script_cyrillic
	- rmy-script_latin
	- rng
	- rnl
	- rol
	- ron
	- rop
	- rro
	- rub
	- ruf
	- rug
	- run
	- rus
	- sab
	- sag
	- sah
	- saj
	- saq
	- sas
	- sba
	- sbd
	- sbl
	- sbp
	- sch
	- sck
	- sda
	- sea
	- seh
	- ses
	- sey
	- sgb
	- sgj
	- sgw
	- shi
	- shk
	- shn
	- sho
	- shp
	- sid
	- sig
	- sil
	- sja
	- sjm
	- sld
	- slu
	- sml
	- smo
	- sna
	- sne
	- snn
	- snp
	- snw
	- som
	- soy
	- spa
	- spp
	- spy
	- sqi
	- sri
	- srm
	- srn
	- srx
	- stn
	- stp
	- suc
	- suk
	- sun
	- sur
	- sus
	- suv
	- suz
	- swe
	- swh
	- sxb
	- sxn
	- sya
	- syl
	- sza
	- tac
	- taj
	- tam
	- tao
	- tap
	- taq
	- tat
	- tav
	- tbc
	- tbg
	- tbk
	- tbl
	- tby
	- tbz
	- tca
	- tcc
	- tcs
	- tcz
	- tdj
	- ted
	- tee
	- tel
	- tem
	- teo
	- ter
	- tes
	- tew
	- tex
	- tfr
	- tgj
	- tgk
	- tgl
	- tgo
	- tgp
	- tha
	- thk
	- thl
	- tih
	- tik
	- tir
	- tkr
	- tlb
	- tlj
	- tly
	- tmc
	- tmf
	- tna
	- tng
	- tnk
	- tnn
	- tnp
	- tnr
	- tnt
	- tob
	- toc
	- toh
	- tom
	- tos
	- tpi
	- tpm
	- tpp
	- tpt
	- trc
	- tri
	- trn
	- trs
	- tso
	- tsz
	- ttc
	- tte
	- ttq-script_tifinagh
	- tue
	- tuf
	- tuk-script_arabic
	- tuk-script_latin
	- tuo
	- tur
	- tvw
	- twb
	- twe
	- twu
	- txa
	- txq
	- txu
	- tye
	- tzh-dialect_bachajon
	- tzh-dialect_tenejapa
	- tzj-dialect_eastern
	- tzj-dialect_western
	- tzo-dialect_chamula
	- tzo-dialect_chenalho
	- ubl
	- ubu
	- udm
	- udu
	- uig-script_arabic
	- uig-script_cyrillic
	- ukr
	- unr
	- upv
	- ura
	- urb
	- urd-script_arabic
	- urd-script_devanagari
	- urd-script_latin
	- urk
	- urt
	- ury
	- usp
	- uzb-script_cyrillic
	- vag
	- vid
	- vie
	- vif
	- vmw
	- vmy
	- vun
	- vut
	- wal-script_ethiopic
	- wal-script_latin
	- wap
	- war
	- waw
	- way
	- wba
	- wlo
	- wlx
	- wmw
	- wob
	- wsg
	- wwa
	- xal
	- xdy
	- xed
	- xer
	- xmm
	- xnj
	- xnr
	- xog
	- xon
	- xrb
	- xsb
	- xsm
	- xsr
	- xsu
	- xta
	- xtd
	- xte
	- xtm
	- xtn
	- xua
	- xuo
	- yaa
	- yad
	- yal
	- yam
	- yao
	- yas
	- yat
	- yaz
	- yba
	- ybb
	- ycl
	- ycn
	- yea
	- yka
	- yli
	- yor
	- yre
	- yua
	- yuz
	- yva
	- zaa
	- zab
	- zac
	- zad
	- zae
	- zai
	- zam
	- zao
	- zaq
	- zar
	- zas
	- zav
	- zaw
	- zca
	- zga
	- zim
	- ziw
	- zlm
	- zmz
	- zne
	- zos
	- zpc
	- zpg
	- zpi
	- zpl
	- zpm
	- zpo
	- zpt
	- zpu
	- zpz
	- ztq
	- zty
	- zyb
	- zyp
	- zza

	</details>

	## Model details

	- Developed by: Vineel Pratap et al.
	- Model type: Multi-Lingual Automatic Speech Recognition model
	- Language(s): 1000+ languages, see [supported languages](#supported-languages)
	- License: CC-BY-NC 4.0 license
	- Num parameters: 1 billion
	- Audio sampling rate: 16,000 kHz
	- Cite as:

	@article{pratap2023mms,
	title={Scaling Speech Technology to 1,000+ Languages},
	author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
	journal={arXiv},
	year={2023}
	}

	## Additional Links

	- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
	- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
	- [Paper](https://arxiv.org/abs/2305.13516)
	- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
	- [Other MMS checkpoints](https://huggingface.co/models?other=mms)
	- MMS base checkpoints:
	- [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
	- [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
	- [Official Space](https://huggingface.co/spaces/facebook/MMS)