NanQiangHF
/

llama3.1_8b_dpo_bwgenerator

Generated from Trainer

Model card Files Files and versions Community

llama3.1_8b_dpo_bwgenerator / README.md

NanQiangHF's picture

llama3.1_8b_dpo_bwgenerator_test

57a17b6 verified about 1 month ago

|

history blame contribute delete

4.38 kB

	---
	license: llama3.1
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	model-index:
	- name: llama3.1_8b_dpo_bwgenerator
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama3.1_8b_dpo_bwgenerator

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0325
	- Rewards/chosen: -8.0882
	- Rewards/rejected: -39.4615
	- Rewards/accuracies: 0.9958
	- Rewards/margins: 31.3733
	- Logps/rejected: -504.7621
	- Logps/chosen: -165.4306
	- Logits/rejected: -1.1893
	- Logits/chosen: -1.7730

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.0854 \| 0.0719 \| 1000 \| 0.1058 \| -28.5182 \| -64.6284 \| 0.9929 \| 36.1101 \| -756.4312 \| -369.7310 \| -1.1763 \| -1.7541 \|
	\| 0.078 \| 0.1438 \| 2000 \| 0.0582 \| -16.5113 \| -45.2514 \| 0.9938 \| 28.7401 \| -562.6614 \| -249.6615 \| -1.1262 \| -1.7216 \|
	\| 0.0458 \| 0.2157 \| 3000 \| 0.0506 \| -12.8337 \| -41.3538 \| 0.9942 \| 28.5201 \| -523.6852 \| -212.8855 \| -1.3210 \| -1.8884 \|
	\| 0.0295 \| 0.2876 \| 4000 \| 0.0534 \| -12.7034 \| -45.1669 \| 0.9942 \| 32.4635 \| -561.8164 \| -211.5826 \| -1.2303 \| -1.8040 \|
	\| 0.0442 \| 0.3595 \| 5000 \| 0.0428 \| -10.9032 \| -42.1320 \| 0.9955 \| 31.2288 \| -531.4679 \| -193.5811 \| -1.2327 \| -1.8028 \|
	\| 0.0329 \| 0.4313 \| 6000 \| 0.0365 \| -8.5207 \| -36.8790 \| 0.9951 \| 28.3583 \| -478.9377 \| -169.7559 \| -1.2024 \| -1.7841 \|
	\| 0.0384 \| 0.5032 \| 7000 \| 0.0418 \| -12.1405 \| -46.4364 \| 0.9955 \| 34.2959 \| -574.5117 \| -205.9535 \| -1.1646 \| -1.7549 \|
	\| 0.0596 \| 0.5751 \| 8000 \| 0.0344 \| -8.7801 \| -39.5544 \| 0.9951 \| 30.7743 \| -505.6917 \| -172.3499 \| -1.2145 \| -1.7970 \|
	\| 0.0437 \| 0.6470 \| 9000 \| 0.0347 \| -9.4417 \| -41.5833 \| 0.9955 \| 32.1416 \| -525.9807 \| -178.9660 \| -1.1796 \| -1.7709 \|
	\| 0.0203 \| 0.7189 \| 10000 \| 0.0357 \| -9.3723 \| -41.8496 \| 0.9951 \| 32.4773 \| -528.6439 \| -178.2718 \| -1.1694 \| -1.7593 \|
	\| 0.0257 \| 0.7908 \| 11000 \| 0.0347 \| -8.6569 \| -40.6073 \| 0.9961 \| 31.9505 \| -516.2208 \| -171.1173 \| -1.1821 \| -1.7676 \|
	\| 0.0355 \| 0.8627 \| 12000 \| 0.0332 \| -8.4060 \| -40.1402 \| 0.9964 \| 31.7342 \| -511.5494 \| -168.6083 \| -1.1878 \| -1.7722 \|
	\| 0.0553 \| 0.9346 \| 13000 \| 0.0325 \| -8.0882 \| -39.4615 \| 0.9958 \| 31.3733 \| -504.7621 \| -165.4306 \| -1.1893 \| -1.7730 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.44.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.14.7
	- Tokenizers 0.19.1