ENERGY-DRINK-LOVE
/

DataVortexS_dpov3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jingyeom commited on Mar 16, 2024

Commit

385552b

·

verified ·

1 Parent(s): 91f4236

Update README.md

Files changed (1) hide show

README.md +24 -32

README.md CHANGED Viewed

@@ -10,51 +10,43 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# nhn_dpo_v3_DataVortexS-10.7B-dpo-v1.11_DPO
-This model is a fine-tuned version of [Edentns/DataVortexS-10.7B-dpo-v1.11](https://huggingface.co/Edentns/DataVortexS-10.7B-dpo-v1.11) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 1
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 7
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 56
-- total_eval_batch_size: 56
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1
-### Training results
-### Framework versions
-- Transformers 4.38.1
-- Pytorch 2.2.1+cu118
-- Datasets 2.17.1
-- Tokenizers 0.15.2

   results: []
 ---
+# ENERGY-DRINK-LOVE/DataVortexS_dpov3
+### Our Team
+* Youjin Chung
+* Jingyeom Kim
+## Model
+### Base Model
+* [Edentns/DataVortexS-10.7B-dpo-v1.11](https://huggingface.co/Edentns/DataVortexS-10.7B-dpo-v1.11)
+### Hardware and Software
+* Hardware: A100 * 8 for training our model
+* Deepspeed library & Huggingface TRL Trainer
+### Dataset
+* DPO_dataset
+  * 자체 제작 dpo dataset(AI-hub dataset 활용)
+  * OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용)
+### Training Method
+* [DPO](https://arxiv.org/abs/2305.18290)
+## Benchmark
+**[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)**
+**[Ko-LLM-Leaderboard](https://www.aihub.or.kr/leaderboard/view.do?currMenu=500&topMenu=102)**
+* (240316기준 7등)
+* ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6551c0e37bbfce18781a8748/S4cpra6iTlzCdN7PP6A3o.png)
+| Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
+| ------: | -----: | -----------: | ------: | ------------: | --------------: |
+|   60.18 |  56.23 |        69.15 |   52.76 |         67.87 |           54.9  |