new5558 commited on
Commit
d8e4832
1 Parent(s): e4b1b40

docs: add information about attacut

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -17,9 +17,17 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
17
 
18
  # Documentation
19
 
 
 
 
 
 
 
 
20
  To initialize the model from hub, use the following commands
21
  ```
22
  from transformers import AutoTokenizer, AutoModel
 
23
 
24
  tokenizer = AutoTokenizer.from_pretrained("new5558/HoogBERTa")
25
  model = AutoModel.from_pretrained("new5558/HoogBERTa")
 
17
 
18
  # Documentation
19
 
20
+ ## Prerequisite
21
+ Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
22
+ ```
23
+ pip install attacut
24
+ ```
25
+
26
+ ## Getting Start
27
  To initialize the model from hub, use the following commands
28
  ```
29
  from transformers import AutoTokenizer, AutoModel
30
+ from attacut import tokenize
31
 
32
  tokenizer = AutoTokenizer.from_pretrained("new5558/HoogBERTa")
33
  model = AutoModel.from_pretrained("new5558/HoogBERTa")