Update README.md
Browse files
README.md
CHANGED
@@ -30,8 +30,7 @@ language:
|
|
30 |
Varta-BERT is a model pre-trained on the `full` training set of [Varta](https://huggingface.co/datasets/rahular/varta) in 14 Indic languages (Assamese, Bhojpuri, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Tamil, Telugu, and Urdu) and English, using a masked language modeling (MLM) objective.
|
31 |
|
32 |
[Varta](https://huggingface.co/datasets/rahular/varta) is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
|
33 |
-
The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta).
|
34 |
-
|
35 |
|
36 |
## Uses
|
37 |
You can use the raw model for masked language modeling, but it is mostly intended to be fine-tuned on a downstream task.
|
|
|
30 |
Varta-BERT is a model pre-trained on the `full` training set of [Varta](https://huggingface.co/datasets/rahular/varta) in 14 Indic languages (Assamese, Bhojpuri, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Tamil, Telugu, and Urdu) and English, using a masked language modeling (MLM) objective.
|
31 |
|
32 |
[Varta](https://huggingface.co/datasets/rahular/varta) is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
|
33 |
+
The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta).
|
|
|
34 |
|
35 |
## Uses
|
36 |
You can use the raw model for masked language modeling, but it is mostly intended to be fine-tuned on a downstream task.
|