LEESM
/

llama-3-Korean-Bllossom-8B-trexlab-oki10p

Text Generation

MLP-KTLim/llama-3-Korean-Bllossom-8B

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-3-Korean-Bllossom-8B-trexlab-oki10p / README.md

LEESM's picture

Update README.md

d105e03 verified 4 days ago

|

history blame contribute delete

2.19 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- heegyu/open-korean-instructions
	language:
	- ko
	tags:
	- Llama-3
	- LoRA
	- MLP-KTLim/llama-3-Korean-Bllossom-8B
	---

	# MLP-KTLim/llama-3-Korean-Bllossom-8B model fine tuning
	# (TREX-Lab at Seoul Cyber University)

	<!-- Provide a quick summary of what the model is/does. -->

	## Summary
	- Base Model : MLP-KTLim/llama-3-Korean-Bllossom-8B
	- Dataset : heegyu/open-korean-instructions (10%)
	- Tuning Method
	- PEFT(Parameter Efficient Fine-Tuning)
	- LoRA(Low-Rank Adaptation of Large Language Models)
	- Related Articles : https://arxiv.org/abs/2106.09685, https://arxiv.org/pdf/2403.10882
	- Fine-tuning the Base Model with a random 10% of Korean chatbot data (open Korean instructions)
	- Test whether fine tuning of a large language model is possible on A30 GPU*1 (successful)

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: [TREX-Lab at Seoul Cyber University]
	- Language(s) (NLP): [Korean]
	- Finetuned from model : [MLP-KTLim/llama-3-Korean-Bllossom-8B]

	## Fine Tuning Detail

	- alpha value 16
	- r value 64 (it seems a bit big...@@)
	```
	peft_config = LoraConfig(
	lora_alpha=16,
	lora_dropout=0.1,
	r=64,
	bias='none',
	task_type='CAUSAL_LM'
	)
	```

	- Mixed precision : 4bit (bnb_4bit_use_double_quant)
	```
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type='nf4',
	bnb_4bit_compute_dtype='float16',
	)
	```

	- Use SFT trainer (https://huggingface.co/docs/trl/sft_trainer)
	```
	trainer = SFTTrainer(
	model=peft_model,
	train_dataset=dataset,
	dataset_text_field='text',
	max_seq_length=min(tokenizer.model_max_length, 2048),
	tokenizer=tokenizer,
	packing=True,
	args=training_args
	)
	```

	### Train Result

	```
	time taken : executed in 21h 45m 55s
	```

	```
	TrainOutput(global_step=816, training_loss=1.718194248045192,
	metrics={'train_runtime': 78354.6002,
	'train_samples_per_second': 0.083,
	'train_steps_per_second': 0.01,
	'train_loss': 1.718194248045192,
	'epoch': 2.99})

	```