qwen_l21_entropy / README.md

End of training

7b7fa53 verified about 2 months ago

4.74 kB

	---
	library_name: transformers
	license: other
	base_model: trl-lib/qwen1.5-0.5b-sft
	tags:
	- alignment-handbook
	- trl
	- simpo
	- generated_from_trainer
	- trl
	- simpo
	- generated_from_trainer
	datasets:
	- yakazimir/ultrafeedback_binarized
	model-index:
	- name: qwen_l21_entropy
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# qwen_l21_entropy

	This model is a fine-tuned version of [trl-lib/qwen1.5-0.5b-sft](https://huggingface.co/trl-lib/qwen1.5-0.5b-sft) on the yakazimir/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6612
	- Rewards/chosen: -4.9613
	- Rewards/rejected: -8.3580
	- Rewards/accuracies: 0.6766
	- Rewards/margins: 3.3967
	- Logps/rejected: -8.3580
	- Logps/chosen: -4.9613
	- Logits/rejected: 1.3373
	- Logits/chosen: 0.9296

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 2
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6893 \| 0.2141 \| 400 \| 0.6976 \| -5.6399 \| -5.6514 \| 0.5134 \| 0.0115 \| -5.6514 \| -5.6399 \| 0.6073 \| 0.4970 \|
	\| 0.6905 \| 0.4282 \| 800 \| 0.6888 \| -9.5942 \| -10.2217 \| 0.5772 \| 0.6275 \| -10.2217 \| -9.5942 \| 0.9367 \| 0.7851 \|
	\| 0.6827 \| 0.6422 \| 1200 \| 0.6809 \| -3.7037 \| -4.6831 \| 0.6417 \| 0.9794 \| -4.6831 \| -3.7037 \| 0.4628 \| 0.3100 \|
	\| 0.665 \| 0.8563 \| 1600 \| 0.6737 \| -4.1597 \| -6.3017 \| 0.6588 \| 2.1420 \| -6.3017 \| -4.1597 \| 0.9087 \| 0.6452 \|
	\| 0.674 \| 1.0704 \| 2000 \| 0.6702 \| -4.7093 \| -7.4594 \| 0.6677 \| 2.7501 \| -7.4594 \| -4.7093 \| 1.0243 \| 0.7072 \|
	\| 0.6648 \| 1.2845 \| 2400 \| 0.6651 \| -4.2327 \| -7.0267 \| 0.6654 \| 2.7940 \| -7.0267 \| -4.2327 \| 0.9760 \| 0.6519 \|
	\| 0.6665 \| 1.4986 \| 2800 \| 0.6654 \| -4.6367 \| -7.6607 \| 0.6706 \| 3.0240 \| -7.6607 \| -4.6367 \| 1.0821 \| 0.7239 \|
	\| 0.6746 \| 1.7127 \| 3200 \| 0.6641 \| -5.1015 \| -8.2207 \| 0.6803 \| 3.1192 \| -8.2207 \| -5.1015 \| 1.0711 \| 0.6993 \|
	\| 0.6634 \| 1.9267 \| 3600 \| 0.6629 \| -4.7411 \| -7.8576 \| 0.6855 \| 3.1165 \| -7.8576 \| -4.7411 \| 1.0738 \| 0.7086 \|
	\| 0.6224 \| 2.1408 \| 4000 \| 0.6607 \| -4.6523 \| -7.8867 \| 0.6818 \| 3.2344 \| -7.8867 \| -4.6523 \| 1.1108 \| 0.7335 \|
	\| 0.6604 \| 2.3549 \| 4400 \| 0.6618 \| -4.7746 \| -8.0447 \| 0.6780 \| 3.2700 \| -8.0447 \| -4.7746 \| 1.2654 \| 0.8695 \|
	\| 0.6512 \| 2.5690 \| 4800 \| 0.6615 \| -4.9147 \| -8.2777 \| 0.6773 \| 3.3630 \| -8.2777 \| -4.9147 \| 1.2819 \| 0.8805 \|
	\| 0.6594 \| 2.7831 \| 5200 \| 0.6611 \| -4.9802 \| -8.3859 \| 0.6795 \| 3.4057 \| -8.3859 \| -4.9802 \| 1.2711 \| 0.8676 \|
	\| 0.6402 \| 2.9972 \| 5600 \| 0.6612 \| -4.9613 \| -8.3580 \| 0.6766 \| 3.3967 \| -8.3580 \| -4.9613 \| 1.3373 \| 0.9296 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1