Commit
•
0376b91
1
Parent(s):
0585988
Update README.md
Browse files
README.md
CHANGED
@@ -13,38 +13,4 @@ widget:
|
|
13 |
* pre-processing: normalization + SentencePiece
|
14 |
* test set scores: syllable: 15.95, word: 8.43
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
Training scripts from [LalitaDeelert/NLP-ZH_TH-Project](https://github.com/LalitaDeelert/NLP-ZH_TH-Project). Experiments tracked at [cstorm125/marianmt-zh_cn-th](https://wandb.ai/cstorm125/marianmt-zh_cn-th).
|
19 |
-
|
20 |
-
```
|
21 |
-
export WANDB_PROJECT=marianmt-zh_cn-th
|
22 |
-
python train_model.py --input_fname ../data/v1/Train.csv \\\\\\\\\\\\\\\\
|
23 |
-
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t--output_dir ../models/marianmt-zh_cn-th \\\\\\\\\\\\\\\\
|
24 |
-
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t--source_lang zh --target_lang th \\\\\\\\\\\\\\\\
|
25 |
-
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t--metric_tokenize th_syllable --fp16
|
26 |
-
```
|
27 |
-
|
28 |
-
## Usage
|
29 |
-
|
30 |
-
```
|
31 |
-
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
32 |
-
|
33 |
-
tokenizer = AutoTokenizer.from_pretrained("Lalita/marianmt-zh_cn-th")
|
34 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("Lalita/marianmt-zh_cn-th").cpu()
|
35 |
-
|
36 |
-
src_text = [
|
37 |
-
'我爱你',
|
38 |
-
'我想吃米饭',
|
39 |
-
]
|
40 |
-
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
|
41 |
-
print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])
|
42 |
-
|
43 |
-
> ['ผมรักคุณนะ', 'ฉันอยากกินข้าว']
|
44 |
-
```
|
45 |
-
|
46 |
-
## Requirements
|
47 |
-
```
|
48 |
-
transformers==4.6.0
|
49 |
-
torch==1.8.0
|
50 |
-
```
|
|
|
13 |
* pre-processing: normalization + SentencePiece
|
14 |
* test set scores: syllable: 15.95, word: 8.43
|
15 |
|
16 |
+
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|