abhik1505040
commited on
Commit
•
86242c1
1
Parent(s):
7edb8fe
Update README.md
Browse files
README.md
CHANGED
@@ -5,9 +5,9 @@ licenses:
|
|
5 |
- cc-by-nc-sa-4.0
|
6 |
---
|
7 |
|
8 |
-
# BanglaBERT
|
9 |
|
10 |
-
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT
|
11 |
|
12 |
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
|
13 |
|
@@ -20,8 +20,8 @@ from transformers import AutoModelForPreTraining, AutoTokenizer
|
|
20 |
from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
|
21 |
import torch
|
22 |
|
23 |
-
model = AutoModelForPreTraining.from_pretrained("csebuetnlp/
|
24 |
-
tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/
|
25 |
|
26 |
original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
|
27 |
fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
|
@@ -58,9 +58,9 @@ print("\n" + "-" * 50)
|
|
58 |
|[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
|
59 |
|[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
|
60 |
|[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
|
61 |
-
|[BanglaBERT
|
62 |
-
|[BanglaBERT](https://huggingface.co/csebuetnlp/
|
63 |
-
|
64 |
|
65 |
|
66 |
The benchmarking datasets are as follows:
|
|
|
5 |
- cc-by-nc-sa-4.0
|
6 |
---
|
7 |
|
8 |
+
# BanglaBERT
|
9 |
|
10 |
+
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
11 |
|
12 |
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
|
13 |
|
|
|
20 |
from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
|
21 |
import torch
|
22 |
|
23 |
+
model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert_small")
|
24 |
+
tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert_small")
|
25 |
|
26 |
original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
|
27 |
fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
|
|
|
58 |
|[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
|
59 |
|[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
|
60 |
|[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
|
61 |
+
|[BanglaBERT](https://huggingface.co/csebuetnlp/banglabert) | 110M | 72.89 | 82.80 | 77.78 | 72.63/79.34 | **77.09** |
|
62 |
+
|[BanglaBERT (Small)](https://huggingface.co/csebuetnlp/banglabert_small) | 13M | 69.29 | 76.75 | 73.41 | 63.30/69.65 | **70.38** |
|
63 |
+
|
64 |
|
65 |
|
66 |
The benchmarking datasets are as follows:
|