metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
  - name: model_hh_shp4_200
    results: []

model_hh_shp4_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.4445
Rewards/chosen: -0.1401
Rewards/rejected: -1.3796
Rewards/accuracies: 0.6300
Rewards/margins: 1.2395
Logps/rejected: -230.1749
Logps/chosen: -224.4940
Logits/rejected: -0.7701
Logits/chosen: -0.7769

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
8.0	100	1.4330	-0.1080	-1.3964	0.6200	1.2883	-230.1935	-224.4584	-0.7684	-0.7753
16.0	200	1.4371	-0.0911	-1.3887	0.6400	1.2976	-230.1849	-224.4396	-0.7692	-0.7762
24.0	300	1.4477	-0.1125	-1.3921	0.6200	1.2795	-230.1887	-224.4634	-0.7693	-0.7763
32.0	400	1.4521	-0.1143	-1.4167	0.6200	1.3024	-230.2161	-224.4653	-0.7696	-0.7763
40.0	500	1.4631	-0.1153	-1.3806	0.6200	1.2653	-230.1759	-224.4665	-0.7701	-0.7771
48.0	600	1.4455	-0.1180	-1.3970	0.6300	1.2791	-230.1942	-224.4695	-0.7698	-0.7769
56.0	700	1.4292	-0.0800	-1.3720	0.6100	1.2920	-230.1664	-224.4273	-0.7704	-0.7775
64.0	800	1.4434	-0.0943	-1.3739	0.6200	1.2796	-230.1686	-224.4432	-0.7703	-0.7773
72.0	900	1.4493	-0.1016	-1.4044	0.6100	1.3028	-230.2024	-224.4513	-0.7704	-0.7773
80.0	1000	1.4445	-0.1401	-1.3796	0.6300	1.2395	-230.1749	-224.4940	-0.7701	-0.7769

Framework versions

PEFT 0.10.0
Transformers 4.39.1
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2