This is a tiny model used for testing language models from scratch for Indic Languages. Started with assamese, as the data was short(and trimmed it to be in limits of Google Colab Free Tier) The final goal is to do this for other indic languages, and to use BART architecture, to extend IndicBART.

The model uses RoBERTa, with Byte Level Byte Pair Encoding for the Tokenizer part

Downloads last month
187
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.