Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,72 @@
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
3 |
+
language:
|
4 |
+
- he
|
5 |
---
|
6 |
+
# DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew
|
7 |
+
|
8 |
+
State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
|
9 |
+
|
10 |
+
This is the BERT-large base model pretrained with the masked-language-modeling objective.
|
11 |
+
|
12 |
+
For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
|
13 |
+
|
14 |
+
For the bert-large models for other tasks, see [to-be-added].
|
15 |
+
|
16 |
+
Sample usage:
|
17 |
+
|
18 |
+
```python
|
19 |
+
from transformers import pipeline
|
20 |
+
|
21 |
+
oracle = pipeline('question-answering', model='dicta-il/dictabert-large-heq')
|
22 |
+
|
23 |
+
|
24 |
+
context = 'ืื ืืืช ืคืจืืคืืืื ืฉื ืืฉืชืืฉืื ื ืืฉืืช ืขื ืืื ืจืืื ืืืืื ืคืืื ืฆืืืื ืขื ืืคืจืืืืช. ืืกืืื ืื ืืืืืื ืืืง ืืืืืื ืืช ืืืืฆืขืืช ืืงืืงื ืืช ืืืืืข ืฉื ืืชื ืืืฉืื ืืืืฆืขืืช ืขืืืืืช ืืืช ืืืคื ืืฉืืืืฉ ืืขืืืืืช. ืืจืฆืืช ืืืจืืช, ืืืฉื, ืงืืขื ืืืงืื ื ืืงืฉืื ืืื ืื ืืืข ืืืฆืืจืช ืขืืืืืช ืืืฉืืช. ืืืงืื ืืื, ืืฉืจ ื ืงืืขื ืืฉื ืช 2000, ื ืงืืขื ืืืืจ ืฉื ืืฉืฃ ืื ืืืฉืจื ืืืืฉืื ืืืืื ืืืช ืฉื ืืืืฉื ืืืืจืืงืื ื ืื ืืฉืืืืฉ ืืกืืื (ONDCP) ืืืืช ืืืื ืืฉืชืืฉ ืืขืืืืืช ืืื ืืขืงืื ืืืจื ืืฉืชืืฉืื ืฉืฆืคื ืืคืจืกืืืืช ื ืื ืืฉืืืืฉ ืืกืืื ืืืืจื ืืืืืง ืืื ืืฉืชืืฉืื ืืื ื ืื ืกื ืืืชืจืื ืืชืืืืื ืืฉืืืืฉ ืืกืืื. ืื ืืื ืืจืื ื, ืคืขืื ืืืืื ืืคืจืืืืช ืืืฉืชืืฉืื ืืืื ืืจื ื, ืืฉืฃ ืื ื-CIA ืฉืื ืขืืืืืช ืงืืืขืืช ืืืืฉืื ืืืจืืื ืืืฉื ืขืฉืจ ืฉื ืื. ื-25 ืืืฆืืืจ 2005 ืืืื ืืจืื ื ืื ืืกืืื ืืช ืืืืืืื ืืืืื (ื-NSA) ืืฉืืืจื ืฉืชื ืขืืืืืช ืงืืืขืืช ืืืืฉืื ืืืงืจืื ืืืื ืฉืืจืื ืชืืื ื. ืืืืจ ืฉืื ืืฉื ืคืืจืกื, ืื ืืืืื ืืื ืืช ืืฉืืืืฉ ืืื.'
|
25 |
+
question = 'ืืืฆื ืืืืื ืืืืืข ืฉื ืืชื ืืืฉืื ืืืืฆืขืืช ืืขืืืืืช?'
|
26 |
+
|
27 |
+
oracle(question=question, context=context)
|
28 |
+
```
|
29 |
+
|
30 |
+
Output:
|
31 |
+
```json
|
32 |
+
{
|
33 |
+
"score": 0.998887836933136,
|
34 |
+
"start": 101,
|
35 |
+
"end": 114,
|
36 |
+
"answer": "ืืืืฆืขืืช ืืงืืงื"
|
37 |
+
}
|
38 |
+
```
|
39 |
+
|
40 |
+
## Citation
|
41 |
+
|
42 |
+
If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
|
43 |
+
|
44 |
+
**BibTeX:**
|
45 |
+
|
46 |
+
```bibtex
|
47 |
+
@misc{shmidman2023dictabert,
|
48 |
+
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
|
49 |
+
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
|
50 |
+
year={2023},
|
51 |
+
eprint={2308.16687},
|
52 |
+
archivePrefix={arXiv},
|
53 |
+
primaryClass={cs.CL}
|
54 |
+
}
|
55 |
+
```
|
56 |
+
|
57 |
+
## License
|
58 |
+
|
59 |
+
Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
|
60 |
+
|
61 |
+
This work is licensed under a
|
62 |
+
[Creative Commons Attribution 4.0 International License][cc-by].
|
63 |
+
|
64 |
+
[![CC BY 4.0][cc-by-image]][cc-by]
|
65 |
+
|
66 |
+
[cc-by]: http://creativecommons.org/licenses/by/4.0/
|
67 |
+
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
|
68 |
+
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
|