amoeba04's picture
Update README.md
1be01f0 verified
|
raw
history blame
2.86 kB
---
license: apache-2.0
language:
- ko
base_model:
- monologg/koelectra-small-v3-discriminator
library_name: transformers
---
# KoELECTRA-small-v3-privacy-ner
This model is a fine-tuned version of [monologg/koelectra-small-v3-discriminator](https://huggingface.co/monologg/koelectra-small-v3-discriminator) on a synthesized privacy dataset. It achieves the following results on the evaluation set:
- f1 = 0.9998728608843798
- loss = 0.05310981854414328
- precision = 0.9999237126509853
- recall = 0.9998220142897098
## Model description
ํƒœ๊น… ์‹œ์Šคํ…œ : BIO ์‹œ์Šคํ…œ
- -B(begin) : ๊ฐœ์ฒด๋ช…์ด ์‹œ์ž‘ํ•  ๋•Œ
- -I(inside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช… ์ค‘๊ฐ„์— ์žˆ์„ ๋•Œ
- O(outside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช…์ด ์•„๋‹ ๊ฒฝ์šฐ
12๊ฐ€์ง€ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด์— ๋Œ€ํ•œ ํƒœ๊ทธ์…‹
| ๋ถ„๋ฅ˜ | ํ‘œ๊ธฐ | ์ •์˜ |
|:------------:|:---:|:-----------|
| PERSON | PER | ํ•œ๊ตญ์ธ ์ด๋ฆ„ |
| LOCATION | LOC | ํ•œ๊ตญ ์ฃผ์†Œ |
| RESIDENT REGISTRATION NUMBER | RRN | ํ•œ๊ตญ์ธ ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ |
| EMAIL | EMA | ์ด๋ฉ”์ผ |
| ID | ID | ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ID |
| PASSWORD | PWD | ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ๋น„๋ฐ€๋ฒˆํ˜ธ |
| ORGANIZATION | ORG | ์†Œ์† ๊ธฐ๊ด€ |
| PHONE NUMBER | PHN | ์ „ํ™”๋ฒˆํ˜ธ |
| CARD NUMBER | CRD | ์นด๋“œ๋ฒˆํ˜ธ |
| ACCOUNT NUMBER | ACC | ๊ณ„์ขŒ๋ฒˆํ˜ธ |
| PASSPORT NUMBER | PSP | ์—ฌ๊ถŒ๋ฒˆํ˜ธ |
| DRIVER'S LICENSE NUMBER | DLN | ์šด์ „๋ฉดํ—ˆ๋ฒˆํ˜ธ |
### How to use
You can use this model with Transformers *pipeline* for NER.
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("amoeba04/test1")
model = AutoModelForTokenClassification.from_pretrained("amoeba04/test1")
ner = pipeline("ner", model=model, tokenizer=tokenizer)
example = "์ง€๋‚œ์ฃผ, ํ™๊ธธ๋™ ์”จ๋Š” ์„œ์šธํŠน๋ณ„์‹œ ๊ฐ•๋‚จ๊ตฌ์— ์œ„์น˜ํ•œ ํ…Œํ—ค๋ž€๋กœ 101๋นŒ๋”ฉ์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."
ner_results = ner(example)
print(ner_results)
```
์ถœ๋ ฅ: "PER-B, PER-B ์”จ๋Š” LOC-BLOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-ILOC-I์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."
## Training and evaluation data
์ž์ฒด ์ œ์ž‘ํ•œ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด ๊ธฐ๋ฐ˜ ๊ฐœ์ฒด๋ช… ์ธ์‹ (NER) ๋ฐ์ดํ„ฐ์…‹
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 512
- eval_batch_size: 1024
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP
### Framework versions
- Transformers 4.40.0
- Pytorch 2.2.1+cu118
- Datasets 2.19.0
- Tokenizers 0.19.1