metadata
license: apache-2.0
tags:
- trl
- dpo
- generated_from_trainer
base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
ENERGY-DRINK-LOVE/eeve_dpo-v3
Our Team
- Jingyeom Kim
- Youjin Chung
Model
Base Model
Hardware and Software
- Hardware: A100 * 8 for training our model
- Deepspeed library & Huggingface TRL Trainer
Dataset
- DPO_dataset
- ์์ฒด ์ ์ dpo dataset(AI-hub dataset ํ์ฉ)
- OpenOrca DPO ๋ฑ ์์ด ๋ฐ์ดํฐ์ ๋ฒ์ญ(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, ์์ฒด๋ชจ๋ธ ํ์ฉ)
Training Method
Benchmark
Task | 0-shot | 5-shot |
---|---|---|
kobest_boolq | 0.950142 | 0.944444 |
kobest_copa | 0.751 | 0.835 |
kobest_hellaswag | 0.474 | 0.508 |
kobest_sentineg | 0.811083 | 0.972292 |
Average | 0.74655625 | 0.81493399 |
- (240307๊ธฐ์ค 7๋ฑ)
Average Ko-ARC Ko-HellaSwag Ko-MMLU Ko-TruthfulQA Ko-CommonGen V2 57.97 57.51 67.01 56.3 54.86 54.19