rahular oooooorange commited on
Commit
5a51359
1 Parent(s): 11ef4a6

correct the first paragraph (#2)

Browse files

- correct the first paragraph (33f0455ab0cb08961489ea4da8b3e122821bdb96)


Co-authored-by: Ziling Cheng <oooooorange@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -7,7 +7,7 @@
7
  # Varta-T5
8
 
9
  ## Model Description
10
- Varta-BERT is a model pre-trained on the `full` training set of Varta in 14 Indic languages (Assamese, Bhojpuri, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Tamil, Telugu, and Urdu) and English, using a masked language modeling (MLM) objective.
11
 
12
  Varta is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
13
  The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta). The data is released in [this bucket](https://console.cloud.google.com/storage/browser/varta-eu/data-release).
 
7
  # Varta-T5
8
 
9
  ## Model Description
10
+ Varta-T5 is a model pre-trained on the `full` training set of Varta in 14 Indic languages (Assamese, Bhojpuri, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Tamil, Telugu, and Urdu) and English, using span corruption and gap-sentence generation as objectives.
11
 
12
  Varta is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
13
  The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta). The data is released in [this bucket](https://console.cloud.google.com/storage/browser/varta-eu/data-release).