|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- heegyu/open-korean-instructions |
|
language: |
|
- ko |
|
tags: |
|
- Llama-3 |
|
- LoRA |
|
- MLP-KTLim/llama-3-Korean-Bllossom-8B |
|
--- |
|
|
|
# MLP-KTLim/llama-3-Korean-Bllossom-8B model fine tuning |
|
# (TREX-Lab at Seoul Cyber University) |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
## Summary |
|
- Base Model : MLP-KTLim/llama-3-Korean-Bllossom-8B |
|
- Dataset : heegyu/open-korean-instructions (10%) |
|
- Tuning Method |
|
- PEFT(Parameter Efficient Fine-Tuning) |
|
- LoRA(Low-Rank Adaptation of Large Language Models) |
|
- Related Articles : https://arxiv.org/abs/2106.09685, https://arxiv.org/pdf/2403.10882 |
|
- Fine-tuning the Base Model with a random 10% of Korean chatbot data (open Korean instructions) |
|
- Test whether fine tuning of a large language model is possible on A30 GPU*1 (successful) |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [TREX-Lab at Seoul Cyber University] |
|
- **Language(s) (NLP):** [Korean] |
|
- **Finetuned from model :** [MLP-KTLim/llama-3-Korean-Bllossom-8B] |
|
|
|
## Fine Tuning Detail |
|
|
|
- alpha value 16 |
|
- r value 64 (it seems a bit big...@@) |
|
``` |
|
peft_config = LoraConfig( |
|
lora_alpha=16, |
|
lora_dropout=0.1, |
|
r=64, |
|
bias='none', |
|
task_type='CAUSAL_LM' |
|
) |
|
``` |
|
|
|
- Mixed precision : 4bit (bnb_4bit_use_double_quant) |
|
``` |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type='nf4', |
|
bnb_4bit_compute_dtype='float16', |
|
) |
|
``` |
|
|
|
- Use SFT trainer (https://huggingface.co/docs/trl/sft_trainer) |
|
``` |
|
trainer = SFTTrainer( |
|
model=peft_model, |
|
train_dataset=dataset, |
|
dataset_text_field='text', |
|
max_seq_length=min(tokenizer.model_max_length, 2048), |
|
tokenizer=tokenizer, |
|
packing=True, |
|
args=training_args |
|
) |
|
``` |
|
|
|
### Train Result |
|
|
|
``` |
|
time taken : executed in 21h 45m 55s |
|
``` |
|
|
|
``` |
|
TrainOutput(global_step=816, training_loss=1.718194248045192, |
|
metrics={'train_runtime': 78354.6002, |
|
'train_samples_per_second': 0.083, |
|
'train_steps_per_second': 0.01, |
|
'train_loss': 1.718194248045192, |
|
'epoch': 2.99}) |
|
|
|
``` |