Pavarissy commited on
Commit
5e464ad
1 Parent(s): 1ebd810

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -9
README.md CHANGED
@@ -6,6 +6,9 @@ datasets:
6
  - universal_dependencies
7
  metrics:
8
  - accuracy
 
 
 
9
  model-index:
10
  - name: wangchanberta-ud-thai-pud-upos
11
  results:
@@ -22,6 +25,9 @@ model-index:
22
  - name: Accuracy
23
  type: accuracy
24
  value: 0.9883334914161055
 
 
 
25
  ---
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -42,17 +48,20 @@ It achieves the following results on the evaluation set:
42
 
43
  ## Model description
44
 
45
- More information needed
46
 
47
- ## Intended uses & limitations
 
 
48
 
49
- More information needed
 
50
 
51
- ## Training and evaluation data
52
-
53
- More information needed
54
-
55
- ## Training procedure
56
 
57
  ### Training hyperparameters
58
 
@@ -86,4 +95,4 @@ The following hyperparameters were used during training:
86
  - Transformers 4.34.1
87
  - Pytorch 2.1.0+cu118
88
  - Datasets 2.14.6
89
- - Tokenizers 0.14.1
 
6
  - universal_dependencies
7
  metrics:
8
  - accuracy
9
+ - recall
10
+ - precision
11
+ - f1
12
  model-index:
13
  - name: wangchanberta-ud-thai-pud-upos
14
  results:
 
25
  - name: Accuracy
26
  type: accuracy
27
  value: 0.9883334914161055
28
+ language:
29
+ - th
30
+ library_name: transformers
31
  ---
32
 
33
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
48
 
49
  ## Model description
50
 
51
+ This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language.
52
 
53
+ ## Example
54
+ ```python
55
+ from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline
56
 
57
+ model = AutoModelForTokenClassification.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
58
+ tokenizer = AutoTokenizer.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
59
 
60
+ pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True)
61
+ outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย")
62
+ print(outputs)
63
+ # [{'entity_group': 'NOUN', 'score': 0.419697, 'word': '', 'start': 0, 'end': 1}, {'entity_group': 'PROPN', 'score': 0.8809489, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.7754166, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.9976932, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.97770107, 'word': 'เอเชีย', 'start': 21, 'end': 28}]
64
+ ```
65
 
66
  ### Training hyperparameters
67
 
 
95
  - Transformers 4.34.1
96
  - Pytorch 2.1.0+cu118
97
  - Datasets 2.14.6
98
+ - Tokenizers 0.14.1