dicta-il
/

otobert

Safetensors

Hebrew

bert

Model card Files Files and versions Community

Shaltiel commited on Sep 28

Commit

bf54b0f

•

1 Parent(s): e4c44b3

Update README.md

Browse files

Files changed (1) hide show

README.md +56 -3

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+language:
+- he
+---
+# OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature
+New language model for Hebrew designed specifically for identifying suffixed verbal forms in Modern Hebrew, released [here](https://arxiv.org/abs/2308.16687).
+This is the base model pretrained with the masked-language-modeling objective.
+This model was trained with a special tokenizer which combines the bound suffix of an object pronoun into a single unit (e.g., `ראיתי אותו` becomes one unit), and was trained to predict those items during the mask prediction stage as well. For more details, please check out the paper listed on this page.
+Sample usage:
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained('dicta-il/otobert')
+model = AutoModelForMaskedLM.from_pretrained('dicta-il/otobert')
+model.eval()
+sentence = 'אני לא יכול להגיד לך מתי [MASK] לאחרונה.' # Supposed to be ראיתי אותו
+output = model(tokenizer.encode(sentence, return_tensors='pt'))
+# the [MASK] is the 7th token (including [CLS])
+import torch
+top_2 = torch.topk(output.logits[0, 7, :], 2)[1]
+print('\n'.join(tokenizer.convert_ids_to_tokens(top_2))) # should print נפגשנו / ראיתי_אותו
+```
+## Citation
+If you use OtoBERT in your research, please cite ```OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature```
+**BibTeX:**
+```bibtex
+tbd
+```
+## License
+Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
+This work is licensed under a
+[Creative Commons Attribution 4.0 International License][cc-by].
+[![CC BY 4.0][cc-by-image]][cc-by]
+[cc-by]: http://creativecommons.org/licenses/by/4.0/
+[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
+[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg