csebuetnlp
/

banglabert_small

@@ -5,9 +5,9 @@ licenses:
 - cc-by-nc-sa-4.0
 ---
-# BanglaBERT (small)
-This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT (small)**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
 For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
@@ -20,8 +20,8 @@ from transformers import AutoModelForPreTraining, AutoTokenizer
 from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
 import torch
-model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert_large")
-tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert_large")
 original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
 fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
@@ -58,9 +58,9 @@ print("\n" + "-" * 50)
 |[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M  | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
 |[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
 |[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
-|[BanglaBERT (small)](https://huggingface.co/csebuetnlp/banglabert_small) | 14M | 69.29 | 76.75 | 73.41 | 63.30/69.65 | **70.40** |
-|[BanglaBERT](https://huggingface.co/csebuetnlp/banglabert) | 110M | 72.89 | 82.80 | 77.78 | 72.63/79.34 | 77.09 |
-|[BanglaBERT (large)](https://huggingface.co/csebuetnlp/banglabert_large) | 335M | 71.94 | 83.41 | 79.20 | 76.10/81.50 | **78.43** |
 The benchmarking datasets are as follows:

 - cc-by-nc-sa-4.0
 ---
+# BanglaBERT
+This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
 For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
 from normalizer import normalize # pip install git+https://github.com/csebuetnlp/normalizer
 import torch
+model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert_small")
+tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert_small")
 original_sentence = "আমি কৃতজ্ঞ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
 fake_sentence = "আমি হতাশ কারণ আপনি আমার জন্য অনেক কিছু করেছেন।"
 |[XLM-R (large)](https://huggingface.co/xlm-roberta-large) | 550M  | 70.97 | 82.40 | 78.39 | 73.15/79.06 | 76.79 |
 |[sahajBERT](https://huggingface.co/neuropark/sahajBERT) | 18M | 71.12 | 76.92 | 70.94 | 65.48/70.69 | 71.03 |
 |[BanglishBERT](https://huggingface.co/csebuetnlp/banglishbert) | 110M | 70.61 | 80.95 | 76.28 | 72.43/78.40 | 75.73 |
+|[BanglaBERT](https://huggingface.co/csebuetnlp/banglabert) | 110M | 72.89 | 82.80 | 77.78 | 72.63/79.34 | **77.09** |
+|[BanglaBERT (Small)](https://huggingface.co/csebuetnlp/banglabert_small) | 13M | 69.29 | 76.75 | 73.41 | 63.30/69.65 | **70.38** |
 The benchmarking datasets are as follows: