pranaydeeps
commited on
Commit
·
db7bf93
1
Parent(s):
f9d785f
Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,6 @@
|
|
1 |
-
|
2 |
# Ancient Greek BERT
|
3 |
|
4 |
-
|
5 |
|
6 |
The first and only available Ancient Greek sub-word BERT model!
|
7 |
|
@@ -15,13 +14,27 @@ Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and
|
|
15 |
|
16 |
## How to use
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
Can be directly used from the HuggingFace Model Hub with:
|
19 |
|
|
|
20 |
```python
|
21 |
from transformers import AutoTokenizer, AutoModel
|
22 |
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
23 |
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
24 |
```
|
|
|
|
|
|
|
|
|
|
|
25 |
## Training data
|
26 |
|
27 |
The model was initialised from [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
@@ -31,7 +44,18 @@ Gorman's Treebank
|
|
31 |
## Training and Eval details
|
32 |
|
33 |
Standard de-accentuating and lower-casing for Greek as suggested in [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
|
|
|
|
34 |
|
|
|
35 |
|
36 |
-
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Ancient Greek BERT
|
2 |
|
3 |
+
<img src="https://ichef.bbci.co.uk/images/ic/832xn/p02m4gzb.jpg"/>
|
4 |
|
5 |
The first and only available Ancient Greek sub-word BERT model!
|
6 |
|
|
|
14 |
|
15 |
## How to use
|
16 |
|
17 |
+
Requirements:
|
18 |
+
|
19 |
+
```python
|
20 |
+
pip install transformers
|
21 |
+
pip install unicodedata
|
22 |
+
pip install flair
|
23 |
+
```
|
24 |
+
|
25 |
Can be directly used from the HuggingFace Model Hub with:
|
26 |
|
27 |
+
|
28 |
```python
|
29 |
from transformers import AutoTokenizer, AutoModel
|
30 |
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
31 |
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
|
32 |
```
|
33 |
+
|
34 |
+
## Fine-tuning for POS/Morphological Analysis
|
35 |
+
|
36 |
+
Please refer the GitHub repository for the code and details regarding fine-tuning
|
37 |
+
|
38 |
## Training data
|
39 |
|
40 |
The model was initialised from [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
|
|
44 |
## Training and Eval details
|
45 |
|
46 |
Standard de-accentuating and lower-casing for Greek as suggested in [AUEB NLP Group's Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
|
47 |
+
The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max-seq-len of 512 and results in a perplexity of 4.8 on the held out test set.
|
48 |
+
It also gives state-of-the-art results when fine-tuned for PoS Tagging and Morphological Analysis on all 3 treebanks averaging >90% accuracy. Please consult our paper or contact [me](mailto:pranaydeep.singh@ugent.be) for further questions!
|
49 |
|
50 |
+
## Cite
|
51 |
|
52 |
+
If you end up using Ancient-Greek-BERT in your research, please cite the paper:
|
53 |
+
|
54 |
+
```
|
55 |
+
@inproceedings{ancient-greek-bert,
|
56 |
+
author = {Singh, Pranaydeep and Rutten, Gorik and Lefever, Els},
|
57 |
+
title = {A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek},
|
58 |
+
year = {2021},
|
59 |
+
booktitle = {The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)}
|
60 |
+
}
|
61 |
+
```
|