Edit model card

This is a MicroBERT model for Tamil.

  • Its suffix is -mx, which means that it was pretrained using supervision from masked language modeling and XPOS tagging.
  • The unlabeled Tamil data was taken from a June 2022 dump of Tamil Wikipedia, downsampled to 1,429,735 tokens.
  • The UD treebank UD_Tamil-TTB, v2.9, totaling 9,581 tokens, was used for labeled data.

Please see the repository and the paper for more details.

Downloads last month
15
Inference API
This model can be loaded on Inference API (serverless).