llama-3-orpo-qlora

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the mlabonne/orpo-dpo-mix-40k dataset. It achieves the following results on the evaluation set:

Loss: 1.0581
Rewards/chosen: -0.0823
Rewards/rejected: -0.2496
Rewards/accuracies: 0.7879
Rewards/margins: 0.1673
Logps/rejected: -2.4958
Logps/chosen: -0.8230
Logits/rejected: -1.0347
Logits/chosen: -0.9355
Nll Loss: 1.0625
Log Odds Ratio: -0.3947
Log Odds Chosen: 2.1017

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 2
eval_batch_size: 2
seed: 77
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 4
total_train_batch_size: 24
total_eval_batch_size: 6
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 30
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
1.2408	0.9998	1639	1.1078	-0.0850	-0.1592	0.7045	0.0742	-1.5920	-0.8498	-0.9217	-0.9313	1.1014	-0.4987	1.0714
1.2158	1.9997	3278	1.0768	-0.0818	-0.1961	0.7273	0.1143	-1.9613	-0.8183	-0.7536	-0.7772	1.0726	-0.4562	1.5271
1.0891	2.9995	4917	1.0654	-0.0820	-0.2184	0.7197	0.1365	-2.1845	-0.8200	-0.9358	-0.8876	1.0648	-0.4458	1.7377
1.0521	3.9994	6556	1.0605	-0.0824	-0.2405	0.7727	0.1581	-2.4049	-0.8244	-0.9998	-0.8917	1.0630	-0.4060	1.9929
1.0763	4.9992	8195	1.0581	-0.0823	-0.2496	0.7879	0.1673	-2.4958	-0.8230	-1.0347	-0.9355	1.0625	-0.3947	2.1017

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.1
Datasets 2.19.2
Tokenizers 0.19.1

dchoi44
/

llama-3-orpo-qlora

llama-3-orpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dchoi44/llama-3-orpo-qlora

Dataset used to train dchoi44/llama-3-orpo-qlora

Evaluation results