ksirts commited on
Commit
e658cb8
1 Parent(s): f1e0a96

Update to README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -1 +1,42 @@
1
- EstBERT_NER
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EstBERT_NER
2
+
3
+ ## Model description
4
+
5
+ EstBERT_NER is a fine-tuned EstBERT model that can be used for Named Entity Recognition. This model was trained on the Estonian NER dataset created by [Tkachenko et al](https://www.aclweb.org/anthology/W13-2412.pdf). It can recognize three types of entities: locations (LOC), organizations (ORG) and persons (PER).
6
+
7
+ ## How to use
8
+
9
+ You can use this model with Transformers pipeline for NER. Post-processing of results may be necessary as the model occasionally tags subword tokens as entities.
10
+
11
+ ```
12
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
13
+ from transformers import pipeline
14
+
15
+ tokenizer = BertTokenizer.from_pretrained('tartuNLP/EstBERT_NER')
16
+ bertner = BertForTokenClassification.from_pretrained('tartuNLP/EstBERT_NER')
17
+
18
+ nlp = pipeline("ner", model=bertner, tokenizer=tokenizer)
19
+ sentence = 'Eesti Ekspressi teada on Eesti Pank uurinud Hansapanga tehinguid , mis toimusid kaks aastat tagasi suvel ja mille käigus voolas panka ligi miljardi krooni ulatuses kahtlast raha .'
20
+
21
+ ner_results = nlp(sentence)
22
+ print(ner_results)
23
+ ```
24
+ ```
25
+ [{'word': 'Eesti', 'score': 0.9964128136634827, 'entity': 'B-ORG', 'index': 1}, {'word': 'Ekspressi', 'score': 0.9978809356689453, 'entity': 'I-ORG', 'index': 2}, {'word': 'Eesti', 'score': 0.9988121390342712, 'entity': 'B-ORG', 'index': 5}, {'word': 'Pank', 'score': 0.9985784292221069, 'entity': 'I-ORG', 'index': 6}, {'word': 'Hansapanga', 'score': 0.9979034662246704, 'entity': 'B-ORG', 'index': 8}]
26
+
27
+ ```
28
+
29
+
30
+
31
+ ## BibTeX entry and citation info
32
+
33
+ ```
34
+ @misc{tanvir2020estbert,
35
+ title={EstBERT: A Pretrained Language-Specific BERT for Estonian},
36
+ author={Hasan Tanvir and Claudia Kittask and Kairit Sirts},
37
+ year={2020},
38
+ eprint={2011.04784},
39
+ archivePrefix={arXiv},
40
+ primaryClass={cs.CL}
41
+ }
42
+ ```