yakazimir commited on
Commit
ced244c
1 Parent(s): 034cdb1

Model save

Browse files
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ base_model: trl-lib/qwen1.5-0.5b-sft
5
+ tags:
6
+ - trl
7
+ - simpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: qwen_cpo_entropy_0_1
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # qwen_cpo_entropy_0_1
18
+
19
+ This model is a fine-tuned version of [trl-lib/qwen1.5-0.5b-sft](https://huggingface.co/trl-lib/qwen1.5-0.5b-sft) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.7448
22
+ - Sft Loss: 1.9881
23
+ - Rewards/chosen: -2.0736
24
+ - Rewards/rejected: -3.1334
25
+ - Rewards/accuracies: 0.7070
26
+ - Rewards/margins: 1.0598
27
+ - Logps/rejected: -3.1334
28
+ - Logps/chosen: -2.0736
29
+ - Logits/rejected: 0.5813
30
+ - Logits/chosen: 0.4340
31
+
32
+ ## Model description
33
+
34
+ More information needed
35
+
36
+ ## Intended uses & limitations
37
+
38
+ More information needed
39
+
40
+ ## Training and evaluation data
41
+
42
+ More information needed
43
+
44
+ ## Training procedure
45
+
46
+ ### Training hyperparameters
47
+
48
+ The following hyperparameters were used during training:
49
+ - learning_rate: 3e-06
50
+ - train_batch_size: 2
51
+ - eval_batch_size: 4
52
+ - seed: 42
53
+ - distributed_type: multi-GPU
54
+ - gradient_accumulation_steps: 16
55
+ - total_train_batch_size: 32
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 3.0
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Sft Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:------:|:----:|:---------------:|:--------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.7978 | 0.2141 | 400 | 0.7943 | 1.4613 | -1.4694 | -1.7232 | 0.6046 | 0.2538 | -1.7232 | -1.4694 | 0.3545 | 0.2666 |
66
+ | 0.7634 | 0.4282 | 800 | 0.7630 | 1.6008 | -1.6434 | -2.1177 | 0.6461 | 0.4743 | -2.1177 | -1.6434 | 0.2223 | 0.1296 |
67
+ | 0.7941 | 0.6422 | 1200 | 0.7446 | 1.6244 | -1.6029 | -2.1044 | 0.6751 | 0.5015 | -2.1044 | -1.6029 | 0.4172 | 0.3041 |
68
+ | 0.7152 | 0.8563 | 1600 | 0.7451 | 1.6680 | -1.6570 | -2.1579 | 0.6795 | 0.5009 | -2.1579 | -1.6570 | 0.6005 | 0.4663 |
69
+ | 0.7358 | 1.0704 | 2000 | 0.7325 | 1.6955 | -1.6992 | -2.3333 | 0.6825 | 0.6341 | -2.3333 | -1.6992 | 0.5037 | 0.3711 |
70
+ | 0.6698 | 1.2845 | 2400 | 0.7332 | 1.8308 | -1.8658 | -2.6575 | 0.7018 | 0.7917 | -2.6575 | -1.8658 | 0.5028 | 0.3708 |
71
+ | 0.6975 | 1.4986 | 2800 | 0.7278 | 1.7287 | -1.7409 | -2.4820 | 0.6892 | 0.7411 | -2.4820 | -1.7409 | 0.8810 | 0.7225 |
72
+ | 0.7081 | 1.7127 | 3200 | 0.7276 | 1.6777 | -1.6914 | -2.3420 | 0.6773 | 0.6506 | -2.3420 | -1.6914 | 0.5738 | 0.4414 |
73
+ | 0.6451 | 1.9267 | 3600 | 0.7215 | 1.7451 | -1.7517 | -2.4919 | 0.6914 | 0.7402 | -2.4919 | -1.7517 | 0.4252 | 0.3011 |
74
+ | 0.5342 | 2.1408 | 4000 | 0.7366 | 1.9275 | -1.9957 | -2.9730 | 0.6966 | 0.9773 | -2.9730 | -1.9957 | 0.6010 | 0.4547 |
75
+ | 0.5733 | 2.3549 | 4400 | 0.7454 | 1.9592 | -2.0642 | -3.1108 | 0.7055 | 1.0466 | -3.1108 | -2.0642 | 0.5969 | 0.4500 |
76
+ | 0.5581 | 2.5690 | 4800 | 0.7417 | 1.9637 | -2.0442 | -3.0719 | 0.7018 | 1.0277 | -3.0719 | -2.0442 | 0.6573 | 0.5046 |
77
+ | 0.5281 | 2.7831 | 5200 | 0.7447 | 1.9814 | -2.0666 | -3.1212 | 0.7033 | 1.0546 | -3.1212 | -2.0666 | 0.6008 | 0.4524 |
78
+ | 0.5395 | 2.9972 | 5600 | 0.7448 | 1.9881 | -2.0736 | -3.1334 | 0.7070 | 1.0598 | -3.1334 | -2.0736 | 0.5813 | 0.4340 |
79
+
80
+
81
+ ### Framework versions
82
+
83
+ - Transformers 4.44.2
84
+ - Pytorch 2.2.2+cu121
85
+ - Datasets 2.18.0
86
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.999297541394882,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6688217556451066,
5
+ "train_runtime": 34631.8364,
6
+ "train_samples": 59790,
7
+ "train_samples_per_second": 5.179,
8
+ "train_steps_per_second": 0.162
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "eos_token_id": 151643,
4
+ "max_new_tokens": 2048,
5
+ "transformers_version": "4.44.2"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.999297541394882,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6688217556451066,
5
+ "train_runtime": 34631.8364,
6
+ "train_samples": 59790,
7
+ "train_samples_per_second": 5.179,
8
+ "train_steps_per_second": 0.162
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff