neuropark
/

sahajBERT-NER

@@ -1,43 +1,95 @@
 ---
-tags: autonlp
 language: bn
-widget:
-- text: "I love AutoNLP 🤗"
-datasets:
-- albertvillanova/autonlp-data-baselines-wikiann-entity_extraction
 ---
-# Model Trained Using AutoNLP
-- Problem type: Entity Extraction
-- Model ID: 1341171
-## Validation Metrics
-- Loss: 0.13715848326683044
-- Accuracy: 0.9730101212045483
-- Precision: 0.0
-- Recall: 0.0
-- F1: 0.0
-## Usage
-You can use cURL to access this model:
-```
-$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoNLP"}' https://api-inference.huggingface.co/models/albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171
-```
-Or Python API:
 ```
-from transformers import AutoModelForTokenClassification, AutoTokenizer
-model = AutoModelForTokenClassification.from_pretrained("albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171", use_auth_token=True)
-tokenizer = AutoTokenizer.from_pretrained("albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171", use_auth_token=True)
-inputs = tokenizer("I love AutoNLP", return_tensors="pt")
-outputs = model(**inputs)
-```

 ---
 language: bn
+tags:
+- collaborative
+- bengali
+- NER
+license: apache-2.0
+datasets: xtreme
+metrics:
+- Loss
+- Accuracy
+- Precision
+- Recall
 ---
+# sahajBERT Named Entity Recognition
+## Model description
+[sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann).
+Named Entities predicted by the model:
+| Label id | Label |
+|:--------:|:----:|
+|0 |O|
+|1 |B-PER|
+|2 |I-PER|
+|3 |B-ORG|
+|4 |I-ORG|
+|5 |B-LOC|
+|6 |I-LOC|
+## Intended uses & limitations
+#### How to use
+You can use this model directly with a pipeline for token classification:
+```python
+from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast
+# Initialize tokenizer
+tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")
+# Initialize model
+model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")
+# Initialize pipeline
+pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)
+raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
+output = pipeline(raw_text)
 ```
+#### Limitations and bias
+<!-- Provide examples of latent issues and potential remediations. -->
+WIP
+## Training data
+The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step 2489 and trained on the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann)
+## Training procedure
+Coming soon!
+<!-- ```bibtex
+@inproceedings{...,
+  year={2020}
+}
+``` -->
+## Eval results
+accuracy: 0.9291424418604651
+f1: 0.8475143403441683
+loss: 0.2975200116634369
+precision: 0.8254189944134078
+recall: 0.8708251473477406
+### BibTeX entry and citation info
+Coming soon!
+<!-- ```bibtex
+@inproceedings{...,
+  year={2020}
+}
+``` -->

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "AutoNLP",
   "_num_labels": 7,
   "architectures": [
     "AlbertForTokenClassification"
@@ -36,7 +36,7 @@
     "6": 6
   },
   "layer_norm_eps": 1e-12,
-  "max_length": 128,
   "max_position_embeddings": 512,
   "model_type": "albert",
   "net_structure_type": 0,
@@ -47,7 +47,7 @@
   "pad_token_id": 0,
   "padding": "max_length",
   "position_embedding_type": "absolute",
-  "transformers_version": "4.5.1",
   "type_vocab_size": 2,
   "vocab_size": 32000
 }

 {
+  "_name_or_path": "albertvillanova/autonlp-wikiann-entity_extraction-0c6d343-101875",
   "_num_labels": 7,
   "architectures": [
     "AlbertForTokenClassification"
     "6": 6
   },
   "layer_norm_eps": 1e-12,
+  "max_length": 96,
   "max_position_embeddings": 512,
   "model_type": "albert",
   "net_structure_type": 0,
   "pad_token_id": 0,
   "padding": "max_length",
   "position_embedding_type": "absolute",
+  "transformers_version": "4.6.1",
   "type_vocab_size": 2,
   "vocab_size": 32000
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:63e33ea5155516d7f13890f2acf28b5e3f9d23141d23d515ab62a28ed94635e1
-size 67605529

 version https://git-lfs.github.com/spec/v1
+oid sha256:42080d3ed92e65c13c467829d36c6f803f5a64587a55089aa8f7e94dffaf62cb
+size 67605209