amoeba04's picture
Update README.md
1be01f0 verified
|
raw
history blame
2.86 kB
metadata
license: apache-2.0
language:
  - ko
base_model:
  - monologg/koelectra-small-v3-discriminator
library_name: transformers

KoELECTRA-small-v3-privacy-ner

This model is a fine-tuned version of monologg/koelectra-small-v3-discriminator on a synthesized privacy dataset. It achieves the following results on the evaluation set:

  • f1 = 0.9998728608843798
  • loss = 0.05310981854414328
  • precision = 0.9999237126509853
  • recall = 0.9998220142897098

Model description

ํƒœ๊น… ์‹œ์Šคํ…œ : BIO ์‹œ์Šคํ…œ

  • -B(begin) : ๊ฐœ์ฒด๋ช…์ด ์‹œ์ž‘ํ•  ๋•Œ
  • -I(inside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช… ์ค‘๊ฐ„์— ์žˆ์„ ๋•Œ
  • O(outside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช…์ด ์•„๋‹ ๊ฒฝ์šฐ

12๊ฐ€์ง€ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด์— ๋Œ€ํ•œ ํƒœ๊ทธ์…‹

๋ถ„๋ฅ˜ ํ‘œ๊ธฐ ์ •์˜
PERSON PER ํ•œ๊ตญ์ธ ์ด๋ฆ„
LOCATION LOC ํ•œ๊ตญ ์ฃผ์†Œ
RESIDENT REGISTRATION NUMBER RRN ํ•œ๊ตญ์ธ ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ
EMAIL EMA ์ด๋ฉ”์ผ
ID ID ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ID
PASSWORD PWD ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ๋น„๋ฐ€๋ฒˆํ˜ธ
ORGANIZATION ORG ์†Œ์† ๊ธฐ๊ด€
PHONE NUMBER PHN ์ „ํ™”๋ฒˆํ˜ธ
CARD NUMBER CRD ์นด๋“œ๋ฒˆํ˜ธ
ACCOUNT NUMBER ACC ๊ณ„์ขŒ๋ฒˆํ˜ธ
PASSPORT NUMBER PSP ์—ฌ๊ถŒ๋ฒˆํ˜ธ
DRIVER'S LICENSE NUMBER DLN ์šด์ „๋ฉดํ—ˆ๋ฒˆํ˜ธ

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("amoeba04/test1")
model = AutoModelForTokenClassification.from_pretrained("amoeba04/test1")
ner = pipeline("ner", model=model, tokenizer=tokenizer)

example = "์ง€๋‚œ์ฃผ, ํ™๊ธธ๋™ ์”จ๋Š” ์„œ์šธํŠน๋ณ„์‹œ ๊ฐ•๋‚จ๊ตฌ์— ์œ„์น˜ํ•œ ํ…Œํ—ค๋ž€๋กœ 101๋นŒ๋”ฉ์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."
ner_results = ner(example)
print(ner_results)

์ถœ๋ ฅ: "PER-B, PER-B ์”จ๋Š” LOC-BLOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-ILOC-I์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."

Training and evaluation data

์ž์ฒด ์ œ์ž‘ํ•œ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด ๊ธฐ๋ฐ˜ ๊ฐœ์ฒด๋ช… ์ธ์‹ (NER) ๋ฐ์ดํ„ฐ์…‹

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 512
  • eval_batch_size: 1024
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1