cabrooks commited on
Commit
b462afc
·
1 Parent(s): ddb0acb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: openrail
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Logion: Machine Learning for Greek Philology
2
+
3
+ The most advanced Ancient Greek BERT model trained to date! Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
4
+
5
+ We train a WordPiece tokenizer (with a vocab size of 50,000) on a corpus of over 70 million words of premodern Greek. Using this tokenizer and the same corpus, we train a BERT model.
6
+
7
+ Further information on this project and code for error detection can be found on [GitHub](https://github.com/charliecb/Logion).
8
+
9
+ We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
10
+
11
+ ## How to use
12
+
13
+ Requirements:
14
+
15
+ ```python
16
+ pip install transformers
17
+ ```
18
+
19
+ Load the model and tokenizer directly from the HuggingFace Model Hub:
20
+
21
+
22
+ ```python
23
+ from transformers import BertTokenizer, BertForMaskedLM
24
+ tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-50k_wordpiece")
25
+ model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
26
+ ```
27
+
28
+
29
+ ## Cite
30
+
31
+ If you use this model in your research, please cite the paper:
32
+
33
+ ```
34
+ @inproceedings{logion-base,
35
+ author = {Cowen-Breen, Charlie and Brooks, Creston and Haubold, Johannes and Graziosi, Barbara},
36
+ title = {Logion: Machine Learning for Greek Philology},
37
+ year = {2023},
38
+ url = {https://arxiv.org/abs/2305.01099}
39
+ }
40
+ ```