language: | |
- en | |
- hi | |
- multilingual | |
license: cc-by-sa-4.0 | |
# en-hi-codemixed | |
This is a masked language model, based on the CamemBERT model architecture. | |
en-hi-codemixed model was trained from scratch on English, Hindi, and codemixed English-Hindi | |
corpora for 40 epochs. | |
The corpora used consists of primarily web crawled data, including codemixed tweets, and focuses on conversational | |
language and covid-19 pandemic. | |