--- license: mit base_model: microsoft/mdeberta-v3-base tags: - generated_from_trainer datasets: - universal_dependencies metrics: - accuracy - precision - recall model-index: - name: mdeberta-v3-ud-thai-pud-upos results: - task: name: Token Classification type: token-classification dataset: name: universal_dependencies type: universal_dependencies config: th_pud split: test args: th_pud metrics: - name: Accuracy type: accuracy value: 0.9934846474601972 widget: - text: นักวิจัยกล่าวว่าการวิเคราะห์ดีเอ็นเอของเนื้องอกอาจช่วยอธิบายถึงสาเหตุที่แท้จริงของมะเร็งชนิดอื่นๆ ได้ example_title: test_example_1 - text: >- คือผมไม่ได้ชอบกดดันพวกคุณหรอกนะ แต่ชะตากรรมของสาธารณรัฐอยู่ในกำมือคุณ example_title: test_example_2 language: - th library_name: transformers --- # mdeberta-v3-ud-thai-pud-upos This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the universal_dependencies dataset. It achieves the following results on the evaluation set: - Loss: 0.0303 - Macro avg precision: 0.9235 - Macro avg recall: 0.9228 - Macro avg f1: 0.9231 - Weighted avg precision: 0.9935 - Weighted avg recall: 0.9935 - Weighted avg f1: 0.9935 - Accuracy: 0.9935 ## Model description This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language. ## Example ```python from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline model = AutoModelForTokenClassification.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos") tokenizer = AutoTokenizer.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos") pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True) outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย") print(outputs) # [{'entity_group': 'PROPN', 'score': 0.9946701, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.85809743, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.99632, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.9961184, 'word': 'เอเชีย', 'start': 21, 'end': 28}] ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | Macro avg precision | Macro avg recall | Macro avg f1 | Weighted avg precision | Weighted avg recall | Weighted avg f1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:-------------------:|:----------------:|:------------:|:----------------------:|:-------------------:|:---------------:|:--------:| | No log | 1.0 | 125 | 0.3898 | 0.8417 | 0.7849 | 0.8078 | 0.9119 | 0.9112 | 0.9101 | 0.9112 | | No log | 2.0 | 250 | 0.1768 | 0.8765 | 0.8683 | 0.8720 | 0.9561 | 0.9560 | 0.9559 | 0.9560 | | No log | 3.0 | 375 | 0.1217 | 0.8972 | 0.8892 | 0.8929 | 0.9701 | 0.9701 | 0.9699 | 0.9701 | | 0.4709 | 4.0 | 500 | 0.0841 | 0.9057 | 0.9064 | 0.9059 | 0.9802 | 0.9800 | 0.9800 | 0.9800 | | 0.4709 | 5.0 | 625 | 0.0649 | 0.9128 | 0.9133 | 0.9130 | 0.9854 | 0.9853 | 0.9853 | 0.9853 | | 0.4709 | 6.0 | 750 | 0.0513 | 0.9147 | 0.9170 | 0.9158 | 0.9878 | 0.9877 | 0.9877 | 0.9877 | | 0.4709 | 7.0 | 875 | 0.0423 | 0.9199 | 0.9180 | 0.9189 | 0.9900 | 0.9900 | 0.9900 | 0.9900 | | 0.0857 | 8.0 | 1000 | 0.0350 | 0.9226 | 0.9207 | 0.9216 | 0.9921 | 0.9921 | 0.9921 | 0.9921 | | 0.0857 | 9.0 | 1125 | 0.0318 | 0.9237 | 0.9219 | 0.9228 | 0.9932 | 0.9932 | 0.9932 | 0.9932 | | 0.0857 | 10.0 | 1250 | 0.0303 | 0.9235 | 0.9228 | 0.9231 | 0.9935 | 0.9935 | 0.9935 | 0.9935 | ### Framework versions - Transformers 4.34.1 - Pytorch 2.1.0+cu118 - Datasets 2.14.6 - Tokenizers 0.14.1