abhik1505040 commited on
Commit
86242c1
1 Parent(s): 7edb8fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -5,9 +5,9 @@ licenses:
5
  - cc-by-nc-sa-4.0
6
  ---
7
 
8
- # BanglaBERT (small)
9
 
10
- This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT (small)**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
11
 
12
  For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
13
 
@@ -20,8 +20,8 @@ from transformers import AutoModelForPreTraining, AutoTokenizer
20
  from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
21
  import torch
22
 
23
- model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert_large")
24
- tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert_large")
25
 
26
  original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
27
  fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
@@ -58,9 +58,9 @@ print("\n" + "-" * 50)
58
  |[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
59
  |[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
60
  |[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
61
- |[BanglaBERT (small)](https://huggingface.co/csebuetnlp/banglabert_small) | 14M | 69.29 | 76.75 | 73.41 | 63.30/69.65 | **70.40** |
62
- |[BanglaBERT](https://huggingface.co/csebuetnlp/banglabert) | 110M | 72.89 | 82.80 | 77.78 | 72.63/79.34 | 77.09 |
63
- |[BanglaBERT (large)](https://huggingface.co/csebuetnlp/banglabert_large) | 335M | 71.94 | 83.41 | 79.20 | 76.10/81.50 | **78.43** |
64
 
65
 
66
  The benchmarking datasets are as follows:
 
5
  - cc-by-nc-sa-4.0
6
  ---
7
 
8
+ # BanglaBERT
9
 
10
+ This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
11
 
12
  For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
13
 
 
20
  from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
21
  import torch
22
 
23
+ model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert_small")
24
+ tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert_small")
25
 
26
  original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
27
  fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
 
58
  |[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
59
  |[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
60
  |[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
61
+ |[BanglaBERT](https://huggingface.co/csebuetnlp/banglabert) | 110M | 72.89 | 82.80 | 77.78 | 72.63/79.34 | **77.09** |
62
+ |[BanglaBERT (Small)](https://huggingface.co/csebuetnlp/banglabert_small) | 13M | 69.29 | 76.75 | 73.41 | 63.30/69.65 | **70.38** |
63
+
64
 
65
 
66
  The benchmarking datasets are as follows: