|
--- |
|
license: apache-2.0 |
|
language: |
|
- ko |
|
base_model: |
|
- monologg/koelectra-small-v3-discriminator |
|
library_name: transformers |
|
--- |
|
# KoELECTRA-small-v3-privacy-ner |
|
|
|
This model is a fine-tuned version of [monologg/koelectra-small-v3-discriminator](https://huggingface.co/monologg/koelectra-small-v3-discriminator) on a synthesized privacy dataset. It achieves the following results on the evaluation set: |
|
- f1 = 0.9998728608843798 |
|
- loss = 0.05310981854414328 |
|
- precision = 0.9999237126509853 |
|
- recall = 0.9998220142897098 |
|
|
|
## Model description |
|
|
|
ํ๊น
์์คํ
: BIO ์์คํ
|
|
- -B(begin) : ๊ฐ์ฒด๋ช
์ด ์์ํ ๋ |
|
- -I(inside) : ํ ํฐ์ด ๊ฐ์ฒด๋ช
์ค๊ฐ์ ์์ ๋ |
|
- O(outside) : ํ ํฐ์ด ๊ฐ์ฒด๋ช
์ด ์๋ ๊ฒฝ์ฐ |
|
|
|
12๊ฐ์ง ํ๊ตญ์ธ ๊ฐ์ธ์ ๋ณด ํจํด์ ๋ํ ํ๊ทธ์
|
|
|
|
| ๋ถ๋ฅ | ํ๊ธฐ | ์ ์ | |
|
|:------------:|:---:|:-----------| |
|
| PERSON | PER | ํ๊ตญ์ธ ์ด๋ฆ | |
|
| LOCATION | LOC | ํ๊ตญ ์ฃผ์ | |
|
| RESIDENT REGISTRATION NUMBER | RRN | ํ๊ตญ์ธ ์ฃผ๋ฏผ๋ฑ๋ก๋ฒํธ | |
|
| EMAIL | EMA | ์ด๋ฉ์ผ | |
|
| ID | ID | ์ผ๋ฐ ๋ก๊ทธ์ธ ID | |
|
| PASSWORD | PWD | ์ผ๋ฐ ๋ก๊ทธ์ธ ๋น๋ฐ๋ฒํธ | |
|
| ORGANIZATION | ORG | ์์ ๊ธฐ๊ด | |
|
| PHONE NUMBER | PHN | ์ ํ๋ฒํธ | |
|
| CARD NUMBER | CRD | ์นด๋๋ฒํธ | |
|
| ACCOUNT NUMBER | ACC | ๊ณ์ข๋ฒํธ | |
|
| PASSPORT NUMBER | PSP | ์ฌ๊ถ๋ฒํธ | |
|
| DRIVER'S LICENSE NUMBER | DLN | ์ด์ ๋ฉดํ๋ฒํธ | |
|
|
|
### How to use |
|
You can use this model with Transformers *pipeline* for NER. |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("amoeba04/test1") |
|
model = AutoModelForTokenClassification.from_pretrained("amoeba04/test1") |
|
ner = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
example = "์ง๋์ฃผ, ํ๊ธธ๋ ์จ๋ ์์ธํน๋ณ์ ๊ฐ๋จ๊ตฌ์ ์์นํ ํ
ํค๋๋ก 101๋น๋ฉ์์ ์งํ๋ IT ์ปจํผ๋ฐ์ค์ ์ฐธ์ํ์ต๋๋ค." |
|
ner_results = ner(example) |
|
print(ner_results) |
|
``` |
|
์ถ๋ ฅ: "PER-B, PER-B ์จ๋ LOC-BLOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-ILOC-I์์ ์งํ๋ IT ์ปจํผ๋ฐ์ค์ ์ฐธ์ํ์ต๋๋ค." |
|
|
|
## Training and evaluation data |
|
|
|
์์ฒด ์ ์ํ ํ๊ตญ์ธ ๊ฐ์ธ์ ๋ณด ํจํด ๊ธฐ๋ฐ ๊ฐ์ฒด๋ช
์ธ์ (NER) ๋ฐ์ดํฐ์
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 512 |
|
- eval_batch_size: 1024 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 1 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.40.0 |
|
- Pytorch 2.2.1+cu118 |
|
- Datasets 2.19.0 |
|
- Tokenizers 0.19.1 |