End of training

8c9b91c verified 6 months ago

7.48 kB

	---
	library_name: transformers
	base_model: data/OpenELM-1_1B-SFT-2
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	model-index:
	- name: OpenELM-1_1B-DPO-full-2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# OpenELM-1_1B-DPO-full-2

	This model is a fine-tuned version of [data/OpenELM-1_1B-SFT-2](https://huggingface.co/data/OpenELM-1_1B-SFT-2) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7945
	- Rewards/chosen: -8.3125
	- Rewards/rejected: -10.4375
	- Rewards/accuracies: 0.7324
	- Rewards/margins: 2.1406
	- Logps/rejected: -1336.0
	- Logps/chosen: -1144.0
	- Logits/rejected: 5.5
	- Logits/chosen: 3.5938

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6007 \| 0.1047 \| 100 \| 0.6140 \| -1.2344 \| -1.5781 \| 0.6562 \| 0.3418 \| -444.0 \| -438.0 \| -8.5 \| -8.8125 \|
	\| 0.591 \| 0.2093 \| 200 \| 0.6025 \| -1.9297 \| -2.4688 \| 0.6895 \| 0.5312 \| -532.0 \| -508.0 \| -6.9375 \| -7.5312 \|
	\| 0.6351 \| 0.3140 \| 300 \| 0.5962 \| -2.2344 \| -2.6875 \| 0.6875 \| 0.4512 \| -556.0 \| -540.0 \| -4.9062 \| -5.7812 \|
	\| 0.6031 \| 0.4186 \| 400 \| 0.5900 \| -1.7109 \| -2.2812 \| 0.6875 \| 0.5625 \| -512.0 \| -486.0 \| -6.25 \| -7.2188 \|
	\| 0.5813 \| 0.5233 \| 500 \| 0.5824 \| -2.25 \| -2.8125 \| 0.7051 \| 0.5547 \| -568.0 \| -540.0 \| -3.6406 \| -4.8125 \|
	\| 0.5376 \| 0.6279 \| 600 \| 0.5624 \| -2.625 \| -3.3281 \| 0.7012 \| 0.7109 \| -620.0 \| -576.0 \| 2.4219 \| 0.9258 \|
	\| 0.5582 \| 0.7326 \| 700 \| 0.5655 \| -3.2812 \| -4.0938 \| 0.7051 \| 0.8008 \| -696.0 \| -644.0 \| -0.3281 \| -1.7891 \|
	\| 0.5437 \| 0.8373 \| 800 \| 0.5704 \| -2.8281 \| -3.4375 \| 0.6992 \| 0.6172 \| -632.0 \| -596.0 \| -1.6719 \| -3.1719 \|
	\| 0.567 \| 0.9419 \| 900 \| 0.5633 \| -3.1406 \| -3.9062 \| 0.7227 \| 0.7539 \| -676.0 \| -628.0 \| -1.0781 \| -2.4219 \|
	\| 0.223 \| 1.0466 \| 1000 \| 0.5835 \| -4.1562 \| -5.25 \| 0.7461 \| 1.0859 \| -812.0 \| -732.0 \| 3.375 \| 1.7734 \|
	\| 0.1774 \| 1.1512 \| 1100 \| 0.6000 \| -4.8438 \| -5.9688 \| 0.7227 \| 1.1328 \| -884.0 \| -800.0 \| 2.8906 \| 0.9844 \|
	\| 0.1868 \| 1.2559 \| 1200 \| 0.5954 \| -4.9062 \| -6.0625 \| 0.7188 \| 1.1484 \| -892.0 \| -804.0 \| 3.5 \| 1.9609 \|
	\| 0.1871 \| 1.3605 \| 1300 \| 0.6086 \| -5.3438 \| -6.5 \| 0.7324 \| 1.1562 \| -932.0 \| -848.0 \| 3.1719 \| 1.3281 \|
	\| 0.1651 \| 1.4652 \| 1400 \| 0.5995 \| -5.375 \| -6.4688 \| 0.7090 \| 1.0938 \| -932.0 \| -852.0 \| 2.9375 \| 1.0625 \|
	\| 0.1557 \| 1.5699 \| 1500 \| 0.6073 \| -5.3125 \| -6.5938 \| 0.7012 \| 1.2656 \| -944.0 \| -848.0 \| 1.9219 \| -0.1582 \|
	\| 0.2145 \| 1.6745 \| 1600 \| 0.6256 \| -5.1875 \| -6.4688 \| 0.7031 \| 1.2656 \| -932.0 \| -832.0 \| 3.0469 \| 0.9570 \|
	\| 0.1666 \| 1.7792 \| 1700 \| 0.6223 \| -5.5312 \| -6.8438 \| 0.7246 \| 1.3047 \| -972.0 \| -868.0 \| 3.8906 \| 1.7969 \|
	\| 0.164 \| 1.8838 \| 1800 \| 0.6084 \| -4.6875 \| -5.9375 \| 0.7383 \| 1.2266 \| -880.0 \| -784.0 \| 2.6562 \| 0.5117 \|
	\| 0.1552 \| 1.9885 \| 1900 \| 0.6211 \| -5.4375 \| -6.7812 \| 0.7363 \| 1.3359 \| -964.0 \| -856.0 \| 2.5469 \| 0.4004 \|
	\| 0.0204 \| 2.0931 \| 2000 \| 0.6830 \| -6.4062 \| -8.0 \| 0.7383 \| 1.6328 \| -1088.0 \| -952.0 \| 4.1562 \| 2.1719 \|
	\| 0.0205 \| 2.1978 \| 2100 \| 0.8096 \| -9.0 \| -11.125 \| 0.7168 \| 2.1094 \| -1400.0 \| -1216.0 \| 5.4375 \| 3.5469 \|
	\| 0.0228 \| 2.3025 \| 2200 \| 0.8077 \| -8.625 \| -10.8125 \| 0.7305 \| 2.1562 \| -1368.0 \| -1176.0 \| 5.25 \| 3.3281 \|
	\| 0.0148 \| 2.4071 \| 2300 \| 0.7832 \| -8.1875 \| -10.1875 \| 0.7227 \| 2.0469 \| -1304.0 \| -1128.0 \| 5.25 \| 3.3906 \|
	\| 0.0202 \| 2.5118 \| 2400 \| 0.7835 \| -8.1875 \| -10.25 \| 0.7344 \| 2.0781 \| -1312.0 \| -1136.0 \| 5.3125 \| 3.375 \|
	\| 0.01 \| 2.6164 \| 2500 \| 0.7940 \| -8.1875 \| -10.3125 \| 0.7363 \| 2.1094 \| -1320.0 \| -1136.0 \| 5.4688 \| 3.5312 \|
	\| 0.0153 \| 2.7211 \| 2600 \| 0.8036 \| -8.5625 \| -10.75 \| 0.7324 \| 2.1719 \| -1360.0 \| -1168.0 \| 5.625 \| 3.75 \|
	\| 0.0205 \| 2.8257 \| 2700 \| 0.7961 \| -8.375 \| -10.5 \| 0.7344 \| 2.1562 \| -1336.0 \| -1152.0 \| 5.5312 \| 3.6406 \|
	\| 0.0184 \| 2.9304 \| 2800 \| 0.7947 \| -8.3125 \| -10.5 \| 0.7324 \| 2.1562 \| -1336.0 \| -1144.0 \| 5.5 \| 3.5938 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.3.0
	- Datasets 2.21.0
	- Tokenizers 0.19.1