Text Classification
fastText
language-identification
glotlid / README.md
kargaranamir's picture
Update README.md
74cb50b verified
|
raw
history blame
15.8 kB
---
license: apache-2.0
language:
- aah
- aai
- aak
- aau
- aaz
- ab
- aba
- abi
- abk
- abn
- abq
- abs
- abt
- abx
- aby
- abz
- aca
- acd
- ace
- acf
- ach
- acm
- acn
- acq
- acr
- acu
- ada
- ade
- adh
- adi
- adj
- adl
- adx
- ady
- adz
- aeb
- aer
- aeu
- aey
- af
- afb
- afh
- afr
- agd
- agg
- agm
- agn
- agr
- agt
- agu
- agw
- agx
- aha
- ahk
- aia
- aii
- aim
- ain
- ajg
- aji
- ajp
- ajz
- ak
- aka
- akb
- ake
- akh
- akl
- akp
- ald
- alj
- aln
- alp
- alq
- als
- alt
- aly
- alz
- am
- ame
- amf
- amh
- ami
- amk
- amm
- amn
- amp
- amr
- amu
- amx
- an
- ang
- anm
- ann
- anp
- anv
- any
- aoc
- aoi
- aoj
- aom
- aon
- aoz
- apb
- apc
- ape
- apn
- apr
- apt
- apu
- apw
- apy
- apz
- aqz
- ar
- ara
- arb
- are
- arg
- arh
- arl
- arn
- arp
- arq
- arr
- ars
- ary
- arz
- as
- asg
- asm
- aso
- ast
- ata
- atb
- atd
- atg
- ati
- atj
- atq
- att
- auc
- aui
- auy
- av
- ava
- avk
- avn
- avt
- avu
- awa
- awb
- awi
- awx
- ay
- aym
- ayo
- ayp
- ayr
- az
- azb
- aze
- azg
- azj
- azz
- ba
- bak
- bal
- bam
- ban
- bao
- bar
- bas
- bav
- bba
- bbb
- bbc
- bbj
- bbk
- bbo
- bbr
- bcc
- bch
- bci
- bcl
- bco
- bcw
- bdd
- bdh
- bdq
- be
- bea
- bef
- bel
- bem
- ben
- beq
- ber
- bew
- bex
- bfd
- bfo
- bfz
- bg
- bgr
- bgs
- bgt
- bgz
- bhg
- bhl
- bho
- bhp
- bhw
- bhz
- bi
- bib
- big
- bih
- bik
- bim
- bin
- bis
- biu
- biv
- bjn
- bjp
- bjr
- bjv
- bkd
- bkl
- bkq
- bku
- bkv
- bla
- blh
- blk
- blt
- blw
- blz
- bm
- bmb
- bmh
- bmk
- bmq
- bmr
- bmu
- bmv
- bn
- bnj
- bno
- bnp
- bo
- boa
- bod
- boj
- bom
- bon
- bor
- bos
- bov
- box
- bpr
- bps
- bpy
- bqc
- bqj
- bqp
- br
- bre
- brh
- bru
- brx
- bs
- bsc
- bsn
- bsp
- bsq
- bss
- btd
- btg
- bth
- bts
- btt
- btx
- bua
- bud
- bug
- buk
- bul
- bum
- bus
- bvc
- bvd
- bvr
- bvy
- bvz
- bwd
- bwi
- bwq
- bwu
- bxh
- bxr
- byr
- byv
- byx
- bzd
- bzh
- bzi
- bzj
- bzt
- ca
- caa
- cab
- cac
- caf
- cag
- cak
- cao
- cap
- caq
- car
- cas
- cat
- cav
- cax
- cbc
- cbi
- cbk
- cbr
- cbs
- cbt
- cbu
- cbv
- cce
- cco
- ccp
- cdf
- ce
- ceb
- ceg
- cek
- ces
- cfm
- cgc
- cgg
- ch
- cha
- chd
- che
- chf
- chj
- chk
- chn
- cho
- chq
- chr
- chu
- chv
- chw
- chz
- cjk
- cjo
- cjp
- cjs
- cjv
- ckb
- ckm
- cko
- ckt
- cle
- clu
- cly
- cme
- cmi
- cmn
- cmo
- cmr
- cnh
- cni
- cnk
- cnl
- cnr
- cnt
- cnw
- co
- coe
- cof
- cok
- con
- cop
- cor
- cos
- cot
- cou
- cpa
- cpb
- cpc
- cpu
- cpy
- crh
- cri
- crj
- crk
- crl
- crm
- crn
- crq
- crs
- crt
- crx
- cs
- csb
- csk
- cso
- csw
- csy
- cta
- ctd
- cto
- ctp
- ctu
- cu
- cub
- cuc
- cui
- cuk
- cul
- cut
- cux
- cv
- cwd
- cwe
- cwt
- cy
- cya
- cym
- czt
- da
- daa
- dad
- daf
- dag
- dah
- dak
- dan
- dar
- dbq
- ddg
- ddn
- de
- ded
- des
- deu
- dga
- dgc
- dgi
- dgr
- dgz
- dhg
- dhm
- dhv
- did
- dig
- dik
- din
- dip
- diq
- dis
- diu
- div
- dje
- djk
- djr
- dks
- dln
- dng
- dnj
- dnw
- dob
- doi
- dop
- dos
- dow
- drg
- drt
- dru
- dsb
- dsh
- dtb
- dtp
- dts
- dty
- dua
- due
- dug
- duo
- dur
- dv
- dwr
- dws
- dww
- dyi
- dyo
- dyu
- dz
- dzo
- ebk
- ee
- efi
- egl
- eka
- ekk
- eko
- el
- ell
- eme
- emi
- eml
- emp
- en
- enb
- eng
- enl
- enm
- enq
- enx
- eo
- epo
- eri
- es
- ese
- esi
- esk
- ess
- est
- esu
- et
- eto
- etr
- etu
- eu
- eus
- eve
- evn
- ewe
- ewo
- ext
- eza
- fa
- faa
- fad
- fai
- fal
- fan
- fao
- far
- fas
- fat
- ffm
- fi
- fij
- fil
- fin
- fit
- fj
- fkv
- fmp
- fmu
- fo
- fon
- for
- fr
- fra
- frd
- frm
- fro
- frp
- frr
- fry
- fub
- fud
- fue
- fuf
- fuh
- fuq
- fur
- fuv
- fy
- ga
- gaa
- gag
- gah
- gai
- gam
- gaw
- gaz
- gba
- gbi
- gbo
- gbr
- gcf
- gcr
- gd
- gde
- gdg
- gdn
- gdr
- geb
- gej
- gfk
- ghe
- ghs
- gid
- gil
- giz
- gjn
- gkn
- gkp
- gl
- gla
- gle
- glg
- glk
- glv
- gmh
- gmv
- gn
- gna
- gnb
- gnd
- gng
- gnn
- gnw
- goa
- gof
- gog
- goh
- gom
- gor
- gos
- got
- gqr
- grc
- grn
- grt
- gso
- gsw
- gu
- gub
- guc
- gud
- gug
- guh
- gui
- guj
- guk
- gul
- gum
- gun
- guo
- guq
- gur
- guu
- guw
- gux
- guz
- gv
- gvc
- gvf
- gvl
- gvn
- gwi
- gwr
- gxx
- gya
- gym
- gyr
- ha
- hac
- hae
- hag
- hak
- hat
- hau
- hav
- haw
- hay
- hbo
- hbs
- hch
- he
- heb
- heg
- heh
- her
- hi
- hif
- hig
- hil
- hin
- hix
- hla
- hlt
- hmn
- hmo
- hmr
- hne
- hnj
- hnn
- hns
- ho
- hoc
- hop
- hot
- hr
- hra
- hrv
- hrx
- hsb
- ht
- hto
- hu
- hub
- hui
- hun
- hus
- huu
- huv
- hvn
- hwc
- hy
- hye
- hyw
- hz
- ia
- ian
- iba
- ibg
- ibo
- icr
- id
- ido
- idu
- ie
- ifa
- ifb
- ife
- ifk
- ifu
- ify
- ig
- ige
- ign
- igs
- ii
- iii
- ijc
- ike
- ikk
- ikt
- ikw
- ilb
- ile
- ilo
- imo
- ina
- inb
- ind
- inh
- ino
- io
- iou
- ipi
- iqw
- iri
- irk
- iry
- is
- isd
- ish
- isl
- iso
- it
- ita
- itl
- its
- itv
- ium
- ivb
- ivv
- iws
- ixl
- izh
- izr
- izz
- ja
- jaa
- jac
- jae
- jam
- jav
- jbo
- jbu
- jdt
- jic
- jiv
- jmc
- jmx
- jpa
- jpn
- jra
- jun
- jv
- jvn
- ka
- kaa
- kab
- kac
- kak
- kal
- kam
- kan
- kao
- kap
- kaq
- kas
- kat
- kaz
- kbc
- kbd
- kbh
- kbm
- kbo
- kbp
- kbq
- kbr
- kby
- kca
- kcg
- kck
- kdc
- kde
- kdh
- kdi
- kdj
- kdl
- kdp
- kdr
- kea
- kei
- kek
- ken
- keo
- ker
- kew
- kex
- kez
- kff
- kg
- kgf
- kgk
- kgp
- kgr
- kha
- khg
- khk
- khm
- khq
- khs
- khy
- khz
- ki
- kia
- kij
- kik
- kin
- kir
- kiu
- kix
- kj
- kjb
- kje
- kjh
- kjs
- kk
- kkc
- kki
- kkj
- kkl
- kl
- kle
- kln
- klt
- klv
- km
- kma
- kmb
- kmd
- kmg
- kmh
- kmk
- kmm
- kmo
- kmr
- kms
- kmu
- kmy
- kn
- knc
- kne
- knf
- kng
- knj
- knk
- kno
- knv
- knx
- kny
- ko
- kog
- koi
- kom
- kon
- koo
- kor
- kos
- kpe
- kpf
- kpg
- kpj
- kpq
- kpr
- kpv
- kpw
- kpx
- kpz
- kqa
- kqc
- kqe
- kqf
- kql
- kqn
- kqo
- kqp
- kqs
- kqw
- kqy
- krc
- kri
- krj
- krl
- kru
- krx
- ks
- ksb
- ksc
- ksd
- ksf
- ksh
- ksj
- ksp
- ksr
- kss
- ksw
- ktb
- ktj
- ktm
- kto
- ktu
- ktz
- kua
- kub
- kud
- kue
- kuj
- kum
- kup
- kus
- kv
- kvg
- kvj
- kvn
- kw
- kwd
- kwf
- kwi
- kwj
- kwn
- kwy
- kxc
- kxm
- kxw
- ky
- kyc
- kyf
- kyg
- kyq
- kyu
- kyz
- kze
- kzf
- kzj
- kzn
- la
- lac
- lad
- lai
- laj
- lam
- lao
- lap
- las
- lat
- lav
- law
- lb
- lbb
- lbe
- lbj
- lbk
- lch
- lcm
- lcp
- ldi
- ldn
- lea
- led
- lee
- lef
- leh
- lem
- leu
- lew
- lex
- lez
- lfn
- lg
- lgg
- lgl
- lgm
- lhi
- lhm
- lhu
- li
- lia
- lid
- lif
- lij
- lim
- lin
- lip
- lir
- lis
- lit
- liv
- ljp
- lki
- llb
- lld
- llg
- lln
- lmk
- lmo
- lmp
- ln
- lnd
- lo
- lob
- loe
- log
- lok
- lol
- lom
- loq
- loz
- lrc
- lsi
- lsm
- lt
- ltg
- ltz
- lu
- lua
- lub
- luc
- lud
- lue
- lug
- lun
- luo
- lus
- lut
- lv
- lvs
- lwg
- lwo
- lww
- lzh
- lzz
- maa
- mad
- maf
- mag
- mah
- mai
- maj
- mak
- mal
- mam
- maq
- mar
- mas
- mau
- mav
- maw
- max
- maz
- mbb
- mbc
- mbd
- mbf
- mbh
- mbi
- mbj
- mbl
- mbs
- mbt
- mca
- mcb
- mcd
- mcf
- mck
- mcn
- mco
- mcp
- mcq
- mcu
- mda
- mdf
- mdy
- med
- mee
- meh
- mej
- mek
- men
- meq
- mer
- met
- meu
- mev
- mfa
- mfe
- mfg
- mfh
- mfi
- mfk
- mfq
- mfy
- mfz
- mg
- mgc
- mgh
- mgm
- mgo
- mgr
- mgv
- mh
- mhi
- mhl
- mhr
- mhw
- mhx
- mhy
- mi
- mib
- mic
- mie
- mif
- mig
- mih
- mik
- mil
- mim
- min
- mio
- mip
- miq
- mir
- mit
- miy
- miz
- mjc
- mjw
- mk
- mkd
- mkl
- mkn
- mks
- mkz
- ml
- mlg
- mlh
- mlp
- mlt
- mlu
- mmn
- mmo
- mmx
- mn
- mna
- mnb
- mnf
- mni
- mnk
- mns
- mnw
- mnx
- mny
- moa
- moc
- mog
- moh
- mon
- mop
- mor
- mos
- mox
- mpg
- mph
- mpm
- mpp
- mps
- mpt
- mpx
- mqb
- mqj
- mqy
- mr
- mrg
- mri
- mrj
- mrq
- mrv
- mrw
- ms
- msa
- msb
- msc
- mse
- msk
- msm
- msy
- mt
- mta
- mtg
- mti
- mtj
- mto
- mtp
- mua
- mug
- muh
- mui
- mup
- mur
- mus
- mux
- muy
- mva
- mvn
- mvp
- mwc
- mwf
- mwl
- mwm
- mwn
- mwp
- mwq
- mwv
- mww
- mxb
- mxp
- mxq
- mxt
- mxv
- my
- mya
- myb
- myk
- myu
- myv
- myw
- myx
- myy
- mza
- mzh
- mzk
- mzl
- mzm
- mzn
- mzw
- mzz
- nab
- naf
- nah
- nak
- nan
- nap
- naq
- nas
- nav
- naw
- nb
- nba
- nbc
- nbe
- nbl
- nbq
- nbu
- nca
- nch
- ncj
- ncl
- ncq
- nct
- ncu
- ncx
- nd
- ndc
- nde
- ndh
- ndi
- ndj
- ndo
- ndp
- nds
- ndy
- ndz
- ne
- neb
- nep
- new
- nfa
- nfr
- ng
- ngb
- ngc
- ngl
- ngp
- ngu
- nhd
- nhe
- nhg
- nhi
- nhk
- nho
- nhr
- nhu
- nhw
- nhx
- nhy
- nia
- nif
- nii
- nij
- nim
- nin
- nio
- niq
- niu
- niy
- njb
- njm
- njn
- njo
- njz
- nka
- nkf
- nki
- nko
- nl
- nla
- nlc
- nld
- nlg
- nma
- nmf
- nmh
- nmo
- nmw
- nmz
- nn
- nnb
- nng
- nnh
- nnl
- nno
- nnp
- nnq
- nnw
- no
- noa
- nob
- nod
- nog
- non
- nop
- nor
- not
- nou
- nov
- nph
- npi
- npl
- npo
- npy
- nqo
- nr
- nre
- nrf
- nri
- nrm
- nsa
- nse
- nsm
- nsn
- nso
- nss
- nst
- nsu
- ntp
- ntr
- ntu
- nuj
- nus
- nuy
- nuz
- nv
- nvm
- nwb
- nwi
- nwx
- nxd
- ny
- nya
- nyf
- nyk
- nyn
- nyo
- nyu
- nyy
- nza
- nzb
- nzi
- nzm
- obo
- oc
- oci
- ogo
- oj
- ojb
- oji
- ojs
- oke
- oku
- okv
- old
- olo
- om
- omb
- omw
- ong
- ons
- ood
- opm
- or
- ori
- orm
- orv
- ory
- os
- oss
- ota
- otd
- ote
- otm
- otn
- oto
- otq
- ots
- otw
- oym
- ozm
- pa
- pab
- pad
- pag
- pah
- pam
- pan
- pao
- pap
- pau
- pbb
- pbc
- pbi
- pbl
- pbt
- pcd
- pck
- pcm
- pdc
- pdt
- pem
- pes
- pez
- pfe
- pfl
- phm
- pib
- pid
- pih
- pio
- pir
- pis
- pjt
- pkb
- pl
- plg
- pls
- plt
- plu
- plw
- pma
- pmf
- pmq
- pms
- pmx
- pnb
- pne
- pnt
- pny
- poe
- poh
- poi
- pol
- pon
- por
- pos
- pot
- pov
- poy
- ppk
- ppl
- ppo
- pps
- prf
- prg
- pri
- prk
- prq
- prs
- ps
- pse
- pss
- pt
- ptp
- ptu
- pua
- pui
- pus
- pwg
- pwn
- pww
- pxm
- qu
- qub
- quc
- que
- quf
- qug
- quh
- qul
- qup
- qus
- quw
- quy
- quz
- qva
- qvc
- qve
- qvh
- qvi
- qvm
- qvn
- qvo
- qvs
- qvw
- qvz
- qwh
- qxh
- qxl
- qxn
- qxo
- qxr
- qya
- rad
- rai
- rap
- rar
- rav
- raw
- rcf
- rej
- rel
- rgu
- rhg
- ria
- rif
- rim
- rjs
- rkb
- rm
- rmc
- rme
- rml
- rmn
- rmo
- rmq
- rmy
- rn
- rnd
- rng
- rnl
- ro
- roh
- rom
- ron
- roo
- rop
- row
- rro
- rtm
- ru
- rub
- rue
- ruf
- rug
- run
- rup
- rus
- rw
- rwo
- sa
- sab
- sag
- sah
- saj
- san
- sas
- sat
- say
- sba
- sbd
- sbe
- sbl
- sbs
- sby
- sc
- sck
- scn
- sco
- sd
- sda
- sdc
- sdh
- sdo
- sdq
- se
- seh
- sel
- ses
- sey
- sfw
- sg
- sgb
- sgc
- sgh
- sgs
- sgw
- sgz
- sh
- shi
- shk
- shn
- shp
- shr
- shs
- shu
- shy
- si
- sid
- sig
- sil
- sim
- sin
- sja
- sjn
- sjo
- sju
- sk
- skg
- skr
- sl
- sld
- slk
- sll
- slv
- sm
- sma
- sme
- smj
- smk
- sml
- smn
- smo
- sms
- smt
- sn
- sna
- snc
- snd
- snf
- snn
- snp
- snw
- sny
- so
- soe
- som
- sop
- soq
- sot
- soy
- spa
- spl
- spm
- spp
- sps
- spy
- sq
- sqi
- sr
- srd
- sri
- srm
- srn
- srp
- srq
- srr
- ss
- ssd
- ssg
- ssw
- ssx
- st
- stn
- stp
- stq
- su
- sua
- suc
- sue
- suk
- sun
- sur
- sus
- sux
- suz
- sv
- sw
- swa
- swb
- swc
- swe
- swg
- swh
- swk
- swp
- sxb
- sxn
- syb
- syc
- syl
- szb
- szl
- szy
- ta
- tab
- tac
- tah
- taj
- tam
- tap
- taq
- tar
- tat
- tav
- taw
- tay
- tbc
- tbg
- tbk
- tbl
- tbo
- tbw
- tby
- tbz
- tca
- tcc
- tcf
- tcs
- tcy
- tcz
- tdt
- tdx
- te
- ted
- tee
- tel
- tem
- teo
- ter
- tet
- tew
- tfr
- tg
- tgk
- tgl
- tgo
- tgp
- th
- tha
- thk
- thl
- thv
- ti
- tif
- tig
- tih
- tik
- tim
- tir
- tiv
- tiy
- tk
- tke
- tkl
- tkr
- tku
- tl
- tlb
- tlf
- tlh
- tlj
- tll
- tly
- tmc
- tmd
- tmr
- tn
- tna
- tnc
- tnk
- tnn
- tnp
- tnr
- to
- tob
- toc
- tod
- tog
- toh
- toi
- toj
- tok
- ton
- too
- top
- tos
- tpa
- tpi
- tpm
- tpn
- tpp
- tpt
- tpw
- tpz
- tqb
- tqo
- tr
- trc
- trn
- tro
- trp
- trq
- trs
- trv
- ts
- tsc
- tsg
- tsn
- tso
- tsw
- tsz
- tt
- ttc
- tte
- ttj
- ttq
- tts
- tuc
- tue
- tuf
- tui
- tuk
- tul
- tum
- tuo
- tur
- tuv
- tvk
- tvl
- tw
- twb
- twi
- twu
- twx
- txq
- txu
- ty
- tyv
- tzh
- tzj
- tzl
- tzm
- tzo
- ubr
- ubu
- udm
- udu
- ug
- uig
- uk
- ukr
- umb
- und
- upv
- ur
- ura
- urb
- urd
- urh
- uri
- urk
- urt
- urw
- ury
- usa
- usp
- uth
- uvh
- uvl
- uz
- uzb
- uzn
- uzs
- vag
- vap
- var
- ve
- vec
- ven
- vep
- vgt
- vi
- vid
- vie
- viv
- vls
- vmk
- vmw
- vmy
- vo
- vol
- vot
- vro
- vun
- vut
- wa
- waj
- wal
- wap
- war
- wat
- way
- wba
- wbm
- wbp
- wca
- wed
- wer
- wes
- wew
- whg
- whk
- wib
- wim
- wiu
- wln
- wls
- wlv
- wlx
- wmt
- wmw
- wnc
- wnu
- wo
- wob
- wol
- wos
- wrk
- wrs
- wsg
- wsk
- wuu
- wuv
- wwa
- xal
- xav
- xbi
- xbr
- xed
- xh
- xho
- xla
- xmf
- xmm
- xmv
- xnn
- xog
- xon
- xpe
- xrb
- xsb
- xsi
- xsm
- xsr
- xsu
- xtd
- xtm
- xtn
- xum
- xuo
- yaa
- yad
- yal
- yam
- yan
- yao
- yap
- yaq
- yas
- yat
- yaz
- ybb
- yby
- ycn
- ydd
- yi
- yid
- yim
- yka
- yle
- yli
- yml
- yo
- yom
- yon
- yor
- yrb
- yre
- yrk
- yrl
- yss
- yua
- yue
- yuj
- yup
- yut
- yuw
- yuz
- yva
- zaa
- zab
- zac
- zad
- zae
- zai
- zam
- zao
- zar
- zas
- zat
- zav
- zaw
- zca
- zdj
- zea
- zgh
- zh
- zho
- zia
- ziw
- zlm
- zne
- zoc
- zom
- zos
- zpa
- zpc
- zpd
- zpf
- zpg
- zpi
- zpj
- zpl
- zpm
- zpo
- zpq
- zpt
- zpu
- zpv
- zpz
- zsm
- zsr
- ztq
- zty
- zu
- zul
- zxx
- zyb
- zyp
- zza
tags:
- text-classification
- language-identification
library_name: fasttext
datasets:
- cis-lmu/GlotSparse
- cis-lmu/GlotStoryBook
- cis-lmu/glotlid-corpus
metrics:
- f1
---
# GlotLID
[![GlotLID](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)](https://huggingface.co/spaces/cis-lmu/glotlid-space)
## Description
**GlotLID** is a Fasttext language identification (LID) model that supports more than **2000 labels**.
**Latest:** GlotLID is now updated to **V3**. V3 supports **2102 labels** (three-letter ISO codes with script). For more details on the supported languages and performance, as well as significant changes from previous versions, please refer to [https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md](https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md).
- **Demo:** [huggingface](https://huggingface.co/spaces/cis-lmu/glotlid-space)
- **Repository:** [github](https://github.com/cisnlp/GlotLID)
- **Paper:** [paper](https://arxiv.org/abs/2310.16248) (EMNLP 2023)
- **Point of Contact:** amir@cis.lmu.de
### How to use
Here is how to use this model to detect the language of a given text:
```python
>>> import fasttext
>>> from huggingface_hub import hf_hub_download
# model.bin is the latest version always
>>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")
```
If you are not a fan of huggingface_hub, then download the model directyly:
```python
>>> ! wget https://huggingface.co/cis-lmu/glotlid/resolve/main/model.bin
```
```python
>>> import fasttext
>>> model = fasttext.load_model("/path/to/model.bin")
>>> model.predict("Hello, world!")
```
## License
The model is distributed under the Apache License, Version 2.0.
## Version
We always maintain the previous version of GlotLID in our repository.
To access a specific version, simply append the version number to the `filename`.
- For v1: `model_v1.bin` (introduced in the GlotLID [paper](https://arxiv.org/abs/2310.16248) and used in all experiments).
- For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).
- For v3: `model_v3.bin` (an edited version of v2, featuring more languages, excluding macro languages, further cleaned from noisy corpora and incorrect metadata labels based on the analysis of v2, supporting "zxx" and "und" series labels)
`model.bin` always refers to the latest version (v3).
## References
If you use this model, please cite the following paper:
```
@inproceedings{
kargaran2023glotlid,
title={{GlotLID}: Language Identification for Low-Resource Languages},
author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=dl4e3EBz5j}
}
```