KLUE Robeta-base for legal documents
- KLUE/Robeta-Base Model์ ํ๊ฒฐ๋ฌธ์ผ๋ก ์ด๋ค์ง legal_text_merged02_light.txt ํ์ผ์ ์ฌ์ฉํ์ฌ ์ฌํ์ต ์ํจ ๋ชจ๋ธ์ ๋๋ค.
Model Details
Model Description
- Developed by: J.Park @ KETI
- Model type: klue/roberta-base
- Language(s) (NLP): korean
- License: [More Information Needed]
- Finetuned from model [optional]: [More Information Needed]
ํ์ต ๋ฐฉ๋ฒ
base_model = 'klue/roberta-base'
base_tokenizer = 'klue/roberta-base'
from transformers import RobertaTokenizer, RobertaForMaskedLM
from transformers import AutoModel, AutoTokenizer
model = RobertaForMaskedLM.from_pretrained(base_model)
tokenizer = AutoTokenizer.from_pretrained(base_tokenizer)
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(
tokenizer=tokenizer,
file_path=fpath_dataset,
block_size=512,
)
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir=output_dir,
overwrite_output_dir=True,
num_train_epochs=5,
per_device_train_batch_size=18,
save_steps=100,
save_total_limit=2,
seed=1
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset
)
train_metrics = trainer.train()
trainer.save_model(output_dir)
trainer.push_to_hub()
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.