metadata

license: mit
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: microsoft/phi-2
model-index:
  - name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
    results: []

phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of microsoft/phi-2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0972
Rewards/chosen: 0.2699
Rewards/rejected: -5.8246
Rewards/accuracies: 0.9623
Rewards/margins: 6.0944
Logps/rejected: -311.1872
Logps/chosen: -115.6127
Logits/rejected: 0.0766
Logits/chosen: 0.0242

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 20
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6805	0.06	20	0.6540	0.0096	-0.0728	0.8367	0.0824	-253.6698	-118.2153	0.3760	0.3395
0.5821	0.12	40	0.4977	0.0383	-0.4385	0.9199	0.4768	-257.3268	-117.9285	0.3836	0.3356
0.4163	0.19	60	0.3225	0.0641	-1.1656	0.9257	1.2298	-264.5979	-117.6701	0.3836	0.3192
0.275	0.25	80	0.2245	0.0476	-2.1180	0.9316	2.1656	-274.1212	-117.8351	0.3399	0.2698
0.1808	0.31	100	0.1771	-0.0012	-3.2019	0.9366	3.2007	-284.9609	-118.3238	0.2615	0.1964
0.1405	0.37	120	0.1528	0.0185	-4.0396	0.9425	4.0581	-293.3371	-118.1262	0.1983	0.1407
0.1121	0.44	140	0.1389	0.0285	-4.6518	0.9471	4.6802	-299.4591	-118.0267	0.1493	0.0980
0.1544	0.5	160	0.1289	0.0745	-4.9025	0.9506	4.9771	-301.9670	-117.5659	0.1257	0.0785
0.1594	0.56	180	0.1204	0.1435	-4.8770	0.9561	5.0205	-301.7119	-116.8765	0.1168	0.0696
0.0988	0.62	200	0.1136	0.1830	-5.1569	0.9576	5.3400	-304.5108	-116.4809	0.1078	0.0579
0.1141	0.68	220	0.1080	0.2052	-5.4532	0.9580	5.6584	-307.4731	-116.2591	0.0962	0.0460
0.0943	0.75	240	0.1037	0.2326	-5.6061	0.9592	5.8387	-309.0026	-115.9850	0.0913	0.0393
0.1108	0.81	260	0.1008	0.2500	-5.7399	0.9607	5.9900	-310.3409	-115.8109	0.0827	0.0316
0.1088	0.87	280	0.0987	0.2677	-5.7068	0.9619	5.9745	-310.0096	-115.6346	0.0825	0.0301
0.0741	0.93	300	0.0975	0.2701	-5.7873	0.9623	6.0574	-310.8145	-115.6102	0.0788	0.0261
0.1059	1.0	320	0.0972	0.2699	-5.8246	0.9623	6.0944	-311.1872	-115.6127	0.0766	0.0242

Framework versions

PEFT 0.7.1
Transformers 4.37.1
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1