language: ur | |
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png | |
tags: | |
- roberta-urdu-small | |
- urdu | |
- transformers | |
license: mit | |
## roberta-urdu-small | |
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/urduhack/urduhack/blob/master/LICENSE) | |
### Overview | |
**Language model:** roberta-urdu-small | |
**Model size:** 125M | |
**Language:** Urdu | |
**Training data:** News data from urdu news resources in Pakistan | |
### About roberta-urdu-small | |
roberta-urdu-small is a language model for urdu language. | |
``` | |
from transformers import pipeline | |
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small") | |
``` | |
## Training procedure | |
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from | |
urduhack to eliminate characters from other languages like arabic. | |
### About Urduhack | |
Urduhack is a Natural Language Processing (NLP) library for urdu language. | |
Github: https://github.com/urduhack/urduhack | |