lole25 commited on
Commit
8c151a7
1 Parent(s): d3f8ff6

Model save

Browse files
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: DUAL-GPO/phi-2-irepo-chatml-merged-i0
8
+ model-index:
9
+ - name: phi-2-irepo-chatml-v6-i1
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # phi-2-irepo-chatml-v6-i1
17
+
18
+ This model is a fine-tuned version of [DUAL-GPO/phi-2-irepo-chatml-merged-i0](https://huggingface.co/DUAL-GPO/phi-2-irepo-chatml-merged-i0) on the None dataset.
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 5e-06
38
+ - train_batch_size: 4
39
+ - eval_batch_size: 4
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 16
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.1
47
+ - num_epochs: 1
48
+
49
+ ### Training results
50
+
51
+
52
+
53
+ ### Framework versions
54
+
55
+ - PEFT 0.7.1
56
+ - Transformers 4.36.2
57
+ - Pytorch 2.1.2+cu121
58
+ - Datasets 2.14.6
59
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f9d8f18247139151a96386c3c93a9502c5d8df5199c4f8437c7211aaadd69dc6
3
  size 335579632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e23a501f5b5dd40d12fb0b8eb4b1dbfa05f23050b8cbc8a1b071ec239df0ac9f
3
  size 335579632
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.20755663267677513,
4
+ "train_runtime": 11765.5627,
5
+ "train_samples": 21000,
6
+ "train_samples_per_second": 1.785,
7
+ "train_steps_per_second": 0.112
8
+ }
runs/May20_19-09-07_gpu4-119-5/events.out.tfevents.1716196294.gpu4-119-5.941181.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:69ae1c726dd0e3c2a40c9e22af57c9e539c668ec3407e7b5915d9e1e9a1f28e7
3
- size 87820
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a7e49606ed870668ddc7cbdf3b237d85ce507cd0898d829c0d9f3b81143dd37
3
+ size 88808
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.20755663267677513,
4
+ "train_runtime": 11765.5627,
5
+ "train_samples": 21000,
6
+ "train_samples_per_second": 1.785,
7
+ "train_steps_per_second": 0.112
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1878 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9996190476190476,
5
+ "eval_steps": 500,
6
+ "global_step": 1312,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 3.787878787878788e-08,
14
+ "logits/chosen": 0.3414086103439331,
15
+ "logits/rejected": 0.3618736267089844,
16
+ "logps/chosen": -457.8209228515625,
17
+ "logps/rejected": -477.39923095703125,
18
+ "loss": 0.2223,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 3.787878787878788e-07,
28
+ "logits/chosen": 0.3330618441104889,
29
+ "logits/rejected": 0.3205517530441284,
30
+ "logps/chosen": -306.0744323730469,
31
+ "logps/rejected": -326.5745849609375,
32
+ "loss": 0.2099,
33
+ "rewards/accuracies": 0.2777777910232544,
34
+ "rewards/chosen": -0.00023350439732894301,
35
+ "rewards/margins": 6.652800948359072e-05,
36
+ "rewards/rejected": -0.0003000324359163642,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.02,
41
+ "learning_rate": 7.575757575757576e-07,
42
+ "logits/chosen": 0.3300573229789734,
43
+ "logits/rejected": 0.3766365945339203,
44
+ "logps/chosen": -294.01226806640625,
45
+ "logps/rejected": -328.8618469238281,
46
+ "loss": 0.2087,
47
+ "rewards/accuracies": 0.3062500059604645,
48
+ "rewards/chosen": 9.059391595656052e-05,
49
+ "rewards/margins": 1.657838947721757e-05,
50
+ "rewards/rejected": 7.401553739327937e-05,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 1.1363636363636364e-06,
56
+ "logits/chosen": 0.33157598972320557,
57
+ "logits/rejected": 0.3647070527076721,
58
+ "logps/chosen": -308.50250244140625,
59
+ "logps/rejected": -316.5055236816406,
60
+ "loss": 0.21,
61
+ "rewards/accuracies": 0.26249998807907104,
62
+ "rewards/chosen": -9.306786523666233e-05,
63
+ "rewards/margins": -5.975523890811019e-06,
64
+ "rewards/rejected": -8.709232497494668e-05,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.03,
69
+ "learning_rate": 1.5151515151515152e-06,
70
+ "logits/chosen": 0.30125534534454346,
71
+ "logits/rejected": 0.3123665452003479,
72
+ "logps/chosen": -278.3026123046875,
73
+ "logps/rejected": -290.43365478515625,
74
+ "loss": 0.221,
75
+ "rewards/accuracies": 0.20624999701976776,
76
+ "rewards/chosen": -1.4783197912038304e-05,
77
+ "rewards/margins": 5.556107498705387e-05,
78
+ "rewards/rejected": -7.034426380414516e-05,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.04,
83
+ "learning_rate": 1.8939393939393941e-06,
84
+ "logits/chosen": 0.33402004837989807,
85
+ "logits/rejected": 0.3797036111354828,
86
+ "logps/chosen": -250.8700408935547,
87
+ "logps/rejected": -274.0110778808594,
88
+ "loss": 0.2,
89
+ "rewards/accuracies": 0.26249998807907104,
90
+ "rewards/chosen": 0.0001303141616517678,
91
+ "rewards/margins": 2.952014074253384e-05,
92
+ "rewards/rejected": 0.00010079403000418097,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.05,
97
+ "learning_rate": 2.2727272727272728e-06,
98
+ "logits/chosen": 0.3142412304878235,
99
+ "logits/rejected": 0.3297632336616516,
100
+ "logps/chosen": -261.3359069824219,
101
+ "logps/rejected": -272.4247131347656,
102
+ "loss": 0.216,
103
+ "rewards/accuracies": 0.3125,
104
+ "rewards/chosen": 7.56668159738183e-05,
105
+ "rewards/margins": 0.0001469437702326104,
106
+ "rewards/rejected": -7.12769542587921e-05,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.05,
111
+ "learning_rate": 2.6515151515151514e-06,
112
+ "logits/chosen": 0.31074008345603943,
113
+ "logits/rejected": 0.3008107542991638,
114
+ "logps/chosen": -305.2731628417969,
115
+ "logps/rejected": -301.9687805175781,
116
+ "loss": 0.21,
117
+ "rewards/accuracies": 0.25,
118
+ "rewards/chosen": -0.0002583070017863065,
119
+ "rewards/margins": -3.105907671852037e-05,
120
+ "rewards/rejected": -0.0002272479177918285,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.06,
125
+ "learning_rate": 3.0303030303030305e-06,
126
+ "logits/chosen": 0.310350626707077,
127
+ "logits/rejected": 0.3315640389919281,
128
+ "logps/chosen": -232.4719696044922,
129
+ "logps/rejected": -277.22833251953125,
130
+ "loss": 0.1962,
131
+ "rewards/accuracies": 0.26875001192092896,
132
+ "rewards/chosen": 9.660933574195951e-05,
133
+ "rewards/margins": 0.00016913158469833434,
134
+ "rewards/rejected": -7.252227078424767e-05,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.07,
139
+ "learning_rate": 3.409090909090909e-06,
140
+ "logits/chosen": 0.28212469816207886,
141
+ "logits/rejected": 0.35147595405578613,
142
+ "logps/chosen": -291.93121337890625,
143
+ "logps/rejected": -309.50341796875,
144
+ "loss": 0.2222,
145
+ "rewards/accuracies": 0.3062500059604645,
146
+ "rewards/chosen": -0.0006029252544976771,
147
+ "rewards/margins": 0.00015305924171116203,
148
+ "rewards/rejected": -0.0007559844525530934,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.08,
153
+ "learning_rate": 3.7878787878787882e-06,
154
+ "logits/chosen": 0.2799969017505646,
155
+ "logits/rejected": 0.33390945196151733,
156
+ "logps/chosen": -313.59515380859375,
157
+ "logps/rejected": -333.5437316894531,
158
+ "loss": 0.2158,
159
+ "rewards/accuracies": 0.3187499940395355,
160
+ "rewards/chosen": -0.0013320285361260176,
161
+ "rewards/margins": 0.00041345407953485847,
162
+ "rewards/rejected": -0.001745482673868537,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.08,
167
+ "learning_rate": 4.166666666666667e-06,
168
+ "logits/chosen": 0.3202076852321625,
169
+ "logits/rejected": 0.31327325105667114,
170
+ "logps/chosen": -339.7900085449219,
171
+ "logps/rejected": -391.5009765625,
172
+ "loss": 0.2034,
173
+ "rewards/accuracies": 0.39375001192092896,
174
+ "rewards/chosen": -0.0016250547487288713,
175
+ "rewards/margins": 0.000587599235586822,
176
+ "rewards/rejected": -0.002212654100731015,
177
+ "step": 110
178
+ },
179
+ {
180
+ "epoch": 0.09,
181
+ "learning_rate": 4.5454545454545455e-06,
182
+ "logits/chosen": 0.2636483311653137,
183
+ "logits/rejected": 0.3315524756908417,
184
+ "logps/chosen": -257.42376708984375,
185
+ "logps/rejected": -279.6871643066406,
186
+ "loss": 0.2195,
187
+ "rewards/accuracies": 0.3187499940395355,
188
+ "rewards/chosen": -0.0009949387749657035,
189
+ "rewards/margins": 0.0003876467817462981,
190
+ "rewards/rejected": -0.0013825856149196625,
191
+ "step": 120
192
+ },
193
+ {
194
+ "epoch": 0.1,
195
+ "learning_rate": 4.924242424242425e-06,
196
+ "logits/chosen": 0.25745201110839844,
197
+ "logits/rejected": 0.3214574456214905,
198
+ "logps/chosen": -302.13787841796875,
199
+ "logps/rejected": -334.5384826660156,
200
+ "loss": 0.1975,
201
+ "rewards/accuracies": 0.36250001192092896,
202
+ "rewards/chosen": -0.0025944842491298914,
203
+ "rewards/margins": 0.0014145069289952517,
204
+ "rewards/rejected": -0.004008991178125143,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 0.11,
209
+ "learning_rate": 4.999432965739786e-06,
210
+ "logits/chosen": 0.26108819246292114,
211
+ "logits/rejected": 0.21493813395500183,
212
+ "logps/chosen": -321.00067138671875,
213
+ "logps/rejected": -349.4842834472656,
214
+ "loss": 0.2053,
215
+ "rewards/accuracies": 0.3687500059604645,
216
+ "rewards/chosen": -0.003485953900963068,
217
+ "rewards/margins": 0.0010465523228049278,
218
+ "rewards/rejected": -0.0045325057581067085,
219
+ "step": 140
220
+ },
221
+ {
222
+ "epoch": 0.11,
223
+ "learning_rate": 4.997129829895409e-06,
224
+ "logits/chosen": 0.16764743626117706,
225
+ "logits/rejected": 0.20418021082878113,
226
+ "logps/chosen": -216.89199829101562,
227
+ "logps/rejected": -234.42935180664062,
228
+ "loss": 0.208,
229
+ "rewards/accuracies": 0.2562499940395355,
230
+ "rewards/chosen": -0.0010050686541944742,
231
+ "rewards/margins": 0.0011736680753529072,
232
+ "rewards/rejected": -0.0021787367295473814,
233
+ "step": 150
234
+ },
235
+ {
236
+ "epoch": 0.12,
237
+ "learning_rate": 4.9930567839810125e-06,
238
+ "logits/chosen": 0.13134139776229858,
239
+ "logits/rejected": 0.14908884465694427,
240
+ "logps/chosen": -268.61419677734375,
241
+ "logps/rejected": -301.322265625,
242
+ "loss": 0.2203,
243
+ "rewards/accuracies": 0.3375000059604645,
244
+ "rewards/chosen": -0.0038432697765529156,
245
+ "rewards/margins": 0.005182007793337107,
246
+ "rewards/rejected": -0.009025278501212597,
247
+ "step": 160
248
+ },
249
+ {
250
+ "epoch": 0.13,
251
+ "learning_rate": 4.987216714880929e-06,
252
+ "logits/chosen": 0.1437113881111145,
253
+ "logits/rejected": 0.13215014338493347,
254
+ "logps/chosen": -259.3898010253906,
255
+ "logps/rejected": -267.3370056152344,
256
+ "loss": 0.2012,
257
+ "rewards/accuracies": 0.2750000059604645,
258
+ "rewards/chosen": -0.01510892529040575,
259
+ "rewards/margins": 0.0029346179217100143,
260
+ "rewards/rejected": -0.01804354228079319,
261
+ "step": 170
262
+ },
263
+ {
264
+ "epoch": 0.14,
265
+ "learning_rate": 4.979613761906212e-06,
266
+ "logits/chosen": 0.07960179448127747,
267
+ "logits/rejected": 0.08153261244297028,
268
+ "logps/chosen": -357.0888366699219,
269
+ "logps/rejected": -368.5904846191406,
270
+ "loss": 0.2207,
271
+ "rewards/accuracies": 0.29374998807907104,
272
+ "rewards/chosen": -0.021911371499300003,
273
+ "rewards/margins": 0.0033388130832463503,
274
+ "rewards/rejected": -0.025250181555747986,
275
+ "step": 180
276
+ },
277
+ {
278
+ "epoch": 0.14,
279
+ "learning_rate": 4.970253313860788e-06,
280
+ "logits/chosen": 0.12686966359615326,
281
+ "logits/rejected": 0.092551589012146,
282
+ "logps/chosen": -323.23162841796875,
283
+ "logps/rejected": -354.8710632324219,
284
+ "loss": 0.2065,
285
+ "rewards/accuracies": 0.32499998807907104,
286
+ "rewards/chosen": -0.027633970603346825,
287
+ "rewards/margins": 0.007219144143164158,
288
+ "rewards/rejected": -0.03485311195254326,
289
+ "step": 190
290
+ },
291
+ {
292
+ "epoch": 0.15,
293
+ "learning_rate": 4.959142005221991e-06,
294
+ "logits/chosen": 0.030479246750473976,
295
+ "logits/rejected": 0.03255733102560043,
296
+ "logps/chosen": -324.7137145996094,
297
+ "logps/rejected": -360.15802001953125,
298
+ "loss": 0.2118,
299
+ "rewards/accuracies": 0.29374998807907104,
300
+ "rewards/chosen": -0.036979734897613525,
301
+ "rewards/margins": 0.0006652610609307885,
302
+ "rewards/rejected": -0.037644993513822556,
303
+ "step": 200
304
+ },
305
+ {
306
+ "epoch": 0.16,
307
+ "learning_rate": 4.94628771143819e-06,
308
+ "logits/chosen": 0.011406493373215199,
309
+ "logits/rejected": 0.0717320591211319,
310
+ "logps/chosen": -335.43402099609375,
311
+ "logps/rejected": -334.8843078613281,
312
+ "loss": 0.2127,
313
+ "rewards/accuracies": 0.32499998807907104,
314
+ "rewards/chosen": -0.02695571817457676,
315
+ "rewards/margins": 0.0005523961153812706,
316
+ "rewards/rejected": -0.027508113533258438,
317
+ "step": 210
318
+ },
319
+ {
320
+ "epoch": 0.17,
321
+ "learning_rate": 4.931699543346854e-06,
322
+ "logits/chosen": 0.11805645376443863,
323
+ "logits/rejected": 0.13258619606494904,
324
+ "logps/chosen": -321.3063049316406,
325
+ "logps/rejected": -337.80084228515625,
326
+ "loss": 0.221,
327
+ "rewards/accuracies": 0.2874999940395355,
328
+ "rewards/chosen": -0.009172406047582626,
329
+ "rewards/margins": 0.0013926494866609573,
330
+ "rewards/rejected": -0.010565054602921009,
331
+ "step": 220
332
+ },
333
+ {
334
+ "epoch": 0.18,
335
+ "learning_rate": 4.9153878407169815e-06,
336
+ "logits/chosen": 0.04168776422739029,
337
+ "logits/rejected": 0.052837539464235306,
338
+ "logps/chosen": -321.91546630859375,
339
+ "logps/rejected": -337.6319274902344,
340
+ "loss": 0.2142,
341
+ "rewards/accuracies": 0.36250001192092896,
342
+ "rewards/chosen": 0.00017253603436984122,
343
+ "rewards/margins": 0.0029792343266308308,
344
+ "rewards/rejected": -0.002806698437780142,
345
+ "step": 230
346
+ },
347
+ {
348
+ "epoch": 0.18,
349
+ "learning_rate": 4.897364164920515e-06,
350
+ "logits/chosen": 0.0712689757347107,
351
+ "logits/rejected": 0.07041509449481964,
352
+ "logps/chosen": -275.91632080078125,
353
+ "logps/rejected": -272.06927490234375,
354
+ "loss": 0.2087,
355
+ "rewards/accuracies": 0.2562499940395355,
356
+ "rewards/chosen": 0.0034352331422269344,
357
+ "rewards/margins": 0.0012143031926825643,
358
+ "rewards/rejected": 0.002220930065959692,
359
+ "step": 240
360
+ },
361
+ {
362
+ "epoch": 0.19,
363
+ "learning_rate": 4.8776412907378845e-06,
364
+ "logits/chosen": 0.0784018337726593,
365
+ "logits/rejected": 0.07138744741678238,
366
+ "logps/chosen": -270.2529296875,
367
+ "logps/rejected": -276.2494812011719,
368
+ "loss": 0.2158,
369
+ "rewards/accuracies": 0.29374998807907104,
370
+ "rewards/chosen": 0.004147468134760857,
371
+ "rewards/margins": 0.0022368025965988636,
372
+ "rewards/rejected": 0.0019106656545773149,
373
+ "step": 250
374
+ },
375
+ {
376
+ "epoch": 0.2,
377
+ "learning_rate": 4.8562331973035396e-06,
378
+ "logits/chosen": 0.04922521486878395,
379
+ "logits/rejected": 0.03360098600387573,
380
+ "logps/chosen": -256.4879455566406,
381
+ "logps/rejected": -295.20220947265625,
382
+ "loss": 0.2052,
383
+ "rewards/accuracies": 0.3062500059604645,
384
+ "rewards/chosen": 0.0018207021057605743,
385
+ "rewards/margins": -4.667430039262399e-05,
386
+ "rewards/rejected": 0.0018673762679100037,
387
+ "step": 260
388
+ },
389
+ {
390
+ "epoch": 0.21,
391
+ "learning_rate": 4.833155058197842e-06,
392
+ "logits/chosen": 0.003297035349532962,
393
+ "logits/rejected": 0.05275546759366989,
394
+ "logps/chosen": -241.99462890625,
395
+ "logps/rejected": -263.92120361328125,
396
+ "loss": 0.2039,
397
+ "rewards/accuracies": 0.29374998807907104,
398
+ "rewards/chosen": 0.005943115334957838,
399
+ "rewards/margins": 0.004178833216428757,
400
+ "rewards/rejected": 0.0017642822349444032,
401
+ "step": 270
402
+ },
403
+ {
404
+ "epoch": 0.21,
405
+ "learning_rate": 4.808423230692374e-06,
406
+ "logits/chosen": 0.05449346452951431,
407
+ "logits/rejected": 0.06165250390768051,
408
+ "logps/chosen": -253.69369506835938,
409
+ "logps/rejected": -293.31585693359375,
410
+ "loss": 0.2029,
411
+ "rewards/accuracies": 0.3499999940395355,
412
+ "rewards/chosen": 0.012908290140330791,
413
+ "rewards/margins": 0.007234405726194382,
414
+ "rewards/rejected": 0.005673886742442846,
415
+ "step": 280
416
+ },
417
+ {
418
+ "epoch": 0.22,
419
+ "learning_rate": 4.7820552441562625e-06,
420
+ "logits/chosen": -0.005410021636635065,
421
+ "logits/rejected": 0.010305285453796387,
422
+ "logps/chosen": -270.04083251953125,
423
+ "logps/rejected": -316.2156066894531,
424
+ "loss": 0.1928,
425
+ "rewards/accuracies": 0.35624998807907104,
426
+ "rewards/chosen": 0.010301874950528145,
427
+ "rewards/margins": 0.007835518568754196,
428
+ "rewards/rejected": 0.002466357545927167,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 0.23,
433
+ "learning_rate": 4.754069787631761e-06,
434
+ "logits/chosen": -0.05866356939077377,
435
+ "logits/rejected": -0.09499181061983109,
436
+ "logps/chosen": -276.0648193359375,
437
+ "logps/rejected": -317.32037353515625,
438
+ "loss": 0.203,
439
+ "rewards/accuracies": 0.32499998807907104,
440
+ "rewards/chosen": 0.008139651268720627,
441
+ "rewards/margins": 0.008907916024327278,
442
+ "rewards/rejected": -0.0007682637078687549,
443
+ "step": 300
444
+ },
445
+ {
446
+ "epoch": 0.24,
447
+ "learning_rate": 4.724486696587862e-06,
448
+ "logits/chosen": -0.08903555572032928,
449
+ "logits/rejected": -0.07178256660699844,
450
+ "logps/chosen": -291.4513244628906,
451
+ "logps/rejected": -304.63726806640625,
452
+ "loss": 0.2106,
453
+ "rewards/accuracies": 0.26875001192092896,
454
+ "rewards/chosen": 0.001623433781787753,
455
+ "rewards/margins": 0.0021732584573328495,
456
+ "rewards/rejected": -0.0005498243262991309,
457
+ "step": 310
458
+ },
459
+ {
460
+ "epoch": 0.24,
461
+ "learning_rate": 4.693326938861367e-06,
462
+ "logits/chosen": -0.12407398223876953,
463
+ "logits/rejected": -0.06067138910293579,
464
+ "logps/chosen": -249.7843475341797,
465
+ "logps/rejected": -290.00408935546875,
466
+ "loss": 0.2082,
467
+ "rewards/accuracies": 0.34375,
468
+ "rewards/chosen": 0.010884495452046394,
469
+ "rewards/margins": 0.008355050347745419,
470
+ "rewards/rejected": 0.002529443707317114,
471
+ "step": 320
472
+ },
473
+ {
474
+ "epoch": 0.25,
475
+ "learning_rate": 4.660612599795343e-06,
476
+ "logits/chosen": -0.11239596456289291,
477
+ "logits/rejected": -0.12089481204748154,
478
+ "logps/chosen": -290.61737060546875,
479
+ "logps/rejected": -312.4447937011719,
480
+ "loss": 0.2046,
481
+ "rewards/accuracies": 0.30000001192092896,
482
+ "rewards/chosen": 0.007386790122836828,
483
+ "rewards/margins": 0.009295441210269928,
484
+ "rewards/rejected": -0.0019086506217718124,
485
+ "step": 330
486
+ },
487
+ {
488
+ "epoch": 0.26,
489
+ "learning_rate": 4.626366866585528e-06,
490
+ "logits/chosen": -0.12021325528621674,
491
+ "logits/rejected": -0.11920841783285141,
492
+ "logps/chosen": -334.55303955078125,
493
+ "logps/rejected": -341.6528625488281,
494
+ "loss": 0.2095,
495
+ "rewards/accuracies": 0.32499998807907104,
496
+ "rewards/chosen": 0.005449886433780193,
497
+ "rewards/margins": -0.0001580852986080572,
498
+ "rewards/rejected": 0.0056079719215631485,
499
+ "step": 340
500
+ },
501
+ {
502
+ "epoch": 0.27,
503
+ "learning_rate": 4.590614011845758e-06,
504
+ "logits/chosen": -0.0888114720582962,
505
+ "logits/rejected": -0.11601515114307404,
506
+ "logps/chosen": -275.80316162109375,
507
+ "logps/rejected": -300.4934997558594,
508
+ "loss": 0.1974,
509
+ "rewards/accuracies": 0.3687500059604645,
510
+ "rewards/chosen": 0.01670825481414795,
511
+ "rewards/margins": 0.00685430783778429,
512
+ "rewards/rejected": 0.009853946976363659,
513
+ "step": 350
514
+ },
515
+ {
516
+ "epoch": 0.27,
517
+ "learning_rate": 4.553379376404085e-06,
518
+ "logits/chosen": -0.1058160662651062,
519
+ "logits/rejected": -0.11791692674160004,
520
+ "logps/chosen": -266.61322021484375,
521
+ "logps/rejected": -279.0782470703125,
522
+ "loss": 0.1983,
523
+ "rewards/accuracies": 0.3125,
524
+ "rewards/chosen": 0.021091196686029434,
525
+ "rewards/margins": 0.004850545898079872,
526
+ "rewards/rejected": 0.01624065265059471,
527
+ "step": 360
528
+ },
529
+ {
530
+ "epoch": 0.28,
531
+ "learning_rate": 4.514689351341751e-06,
532
+ "logits/chosen": -0.16150256991386414,
533
+ "logits/rejected": -0.1245572566986084,
534
+ "logps/chosen": -277.8587341308594,
535
+ "logps/rejected": -272.55792236328125,
536
+ "loss": 0.208,
537
+ "rewards/accuracies": 0.3062500059604645,
538
+ "rewards/chosen": 0.017171379178762436,
539
+ "rewards/margins": 8.804257959127426e-05,
540
+ "rewards/rejected": 0.017083335667848587,
541
+ "step": 370
542
+ },
543
+ {
544
+ "epoch": 0.29,
545
+ "learning_rate": 4.474571359287791e-06,
546
+ "logits/chosen": -0.1678142547607422,
547
+ "logits/rejected": -0.1356990933418274,
548
+ "logps/chosen": -280.3147277832031,
549
+ "logps/rejected": -318.61224365234375,
550
+ "loss": 0.2096,
551
+ "rewards/accuracies": 0.375,
552
+ "rewards/chosen": 0.024694612249732018,
553
+ "rewards/margins": 0.010597179643809795,
554
+ "rewards/rejected": 0.014097435399889946,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 0.3,
559
+ "learning_rate": 4.4330538349824684e-06,
560
+ "logits/chosen": -0.12732991576194763,
561
+ "logits/rejected": -0.12050570547580719,
562
+ "logps/chosen": -287.99615478515625,
563
+ "logps/rejected": -286.3104553222656,
564
+ "loss": 0.2079,
565
+ "rewards/accuracies": 0.29374998807907104,
566
+ "rewards/chosen": 0.017507802695035934,
567
+ "rewards/margins": 1.922808587551117e-05,
568
+ "rewards/rejected": 0.017488572746515274,
569
+ "step": 390
570
+ },
571
+ {
572
+ "epoch": 0.3,
573
+ "learning_rate": 4.3901662051233755e-06,
574
+ "logits/chosen": -0.18693214654922485,
575
+ "logits/rejected": -0.11668721586465836,
576
+ "logps/chosen": -293.45611572265625,
577
+ "logps/rejected": -291.56011962890625,
578
+ "loss": 0.2156,
579
+ "rewards/accuracies": 0.2562499940395355,
580
+ "rewards/chosen": 0.011068233288824558,
581
+ "rewards/margins": -0.005082954186946154,
582
+ "rewards/rejected": 0.01615118607878685,
583
+ "step": 400
584
+ },
585
+ {
586
+ "epoch": 0.31,
587
+ "learning_rate": 4.345938867508439e-06,
588
+ "logits/chosen": -0.06379786878824234,
589
+ "logits/rejected": -0.07422082126140594,
590
+ "logps/chosen": -299.4942626953125,
591
+ "logps/rejected": -322.50201416015625,
592
+ "loss": 0.209,
593
+ "rewards/accuracies": 0.2874999940395355,
594
+ "rewards/chosen": 0.015014107339084148,
595
+ "rewards/margins": 0.003927754703909159,
596
+ "rewards/rejected": 0.011086350306868553,
597
+ "step": 410
598
+ },
599
+ {
600
+ "epoch": 0.32,
601
+ "learning_rate": 4.30040316949064e-06,
602
+ "logits/chosen": -0.0699755996465683,
603
+ "logits/rejected": -0.08232170343399048,
604
+ "logps/chosen": -288.906494140625,
605
+ "logps/rejected": -310.1832580566406,
606
+ "loss": 0.2186,
607
+ "rewards/accuracies": 0.3375000059604645,
608
+ "rewards/chosen": 0.015213333070278168,
609
+ "rewards/margins": 0.0060978480614721775,
610
+ "rewards/rejected": 0.009115484543144703,
611
+ "step": 420
612
+ },
613
+ {
614
+ "epoch": 0.33,
615
+ "learning_rate": 4.253591385759705e-06,
616
+ "logits/chosen": -0.10615179687738419,
617
+ "logits/rejected": -0.06481219828128815,
618
+ "logps/chosen": -278.0174255371094,
619
+ "logps/rejected": -334.2236328125,
620
+ "loss": 0.2108,
621
+ "rewards/accuracies": 0.3499999940395355,
622
+ "rewards/chosen": 0.012311488389968872,
623
+ "rewards/margins": 0.00595985259860754,
624
+ "rewards/rejected": 0.006351633928716183,
625
+ "step": 430
626
+ },
627
+ {
628
+ "epoch": 0.34,
629
+ "learning_rate": 4.205536695466524e-06,
630
+ "logits/chosen": -0.13743802905082703,
631
+ "logits/rejected": -0.133576899766922,
632
+ "logps/chosen": -274.42333984375,
633
+ "logps/rejected": -305.0398864746094,
634
+ "loss": 0.1894,
635
+ "rewards/accuracies": 0.28125,
636
+ "rewards/chosen": 0.00840366818010807,
637
+ "rewards/margins": 0.004010985605418682,
638
+ "rewards/rejected": 0.004392682109028101,
639
+ "step": 440
640
+ },
641
+ {
642
+ "epoch": 0.34,
643
+ "learning_rate": 4.15627315870651e-06,
644
+ "logits/chosen": -0.2103540152311325,
645
+ "logits/rejected": -0.19342456758022308,
646
+ "logps/chosen": -282.6641540527344,
647
+ "logps/rejected": -310.85113525390625,
648
+ "loss": 0.1974,
649
+ "rewards/accuracies": 0.3187499940395355,
650
+ "rewards/chosen": 0.00448573287576437,
651
+ "rewards/margins": 0.0029663294553756714,
652
+ "rewards/rejected": 0.0015194038860499859,
653
+ "step": 450
654
+ },
655
+ {
656
+ "epoch": 0.35,
657
+ "learning_rate": 4.105835692378557e-06,
658
+ "logits/chosen": -0.1941739320755005,
659
+ "logits/rejected": -0.22001402080059052,
660
+ "logps/chosen": -267.8673095703125,
661
+ "logps/rejected": -265.15618896484375,
662
+ "loss": 0.2101,
663
+ "rewards/accuracies": 0.2562499940395355,
664
+ "rewards/chosen": 0.007565724663436413,
665
+ "rewards/margins": 0.0029779626056551933,
666
+ "rewards/rejected": 0.004587762989103794,
667
+ "step": 460
668
+ },
669
+ {
670
+ "epoch": 0.36,
671
+ "learning_rate": 4.05426004543672e-06,
672
+ "logits/chosen": -0.2290777713060379,
673
+ "logits/rejected": -0.22092656791210175,
674
+ "logps/chosen": -267.7095947265625,
675
+ "logps/rejected": -307.23468017578125,
676
+ "loss": 0.2034,
677
+ "rewards/accuracies": 0.3499999940395355,
678
+ "rewards/chosen": 0.013477683067321777,
679
+ "rewards/margins": 0.006231679115444422,
680
+ "rewards/rejected": 0.007246003951877356,
681
+ "step": 470
682
+ },
683
+ {
684
+ "epoch": 0.37,
685
+ "learning_rate": 4.001582773552153e-06,
686
+ "logits/chosen": -0.2154918909072876,
687
+ "logits/rejected": -0.2263382226228714,
688
+ "logps/chosen": -236.22079467773438,
689
+ "logps/rejected": -281.47308349609375,
690
+ "loss": 0.208,
691
+ "rewards/accuracies": 0.3062500059604645,
692
+ "rewards/chosen": 0.008721614256501198,
693
+ "rewards/margins": 0.004514150787144899,
694
+ "rewards/rejected": 0.004207463003695011,
695
+ "step": 480
696
+ },
697
+ {
698
+ "epoch": 0.37,
699
+ "learning_rate": 3.947841213203262e-06,
700
+ "logits/chosen": -0.25622615218162537,
701
+ "logits/rejected": -0.28361567854881287,
702
+ "logps/chosen": -307.4781494140625,
703
+ "logps/rejected": -311.0342712402344,
704
+ "loss": 0.2118,
705
+ "rewards/accuracies": 0.3125,
706
+ "rewards/chosen": 0.00395470205694437,
707
+ "rewards/margins": 0.005559145472943783,
708
+ "rewards/rejected": -0.001604443765245378,
709
+ "step": 490
710
+ },
711
+ {
712
+ "epoch": 0.38,
713
+ "learning_rate": 3.893073455212438e-06,
714
+ "logits/chosen": -0.2454405277967453,
715
+ "logits/rejected": -0.23036327958106995,
716
+ "logps/chosen": -305.7126159667969,
717
+ "logps/rejected": -320.9441833496094,
718
+ "loss": 0.1978,
719
+ "rewards/accuracies": 0.29374998807907104,
720
+ "rewards/chosen": 0.006379422731697559,
721
+ "rewards/margins": 0.0032655359245836735,
722
+ "rewards/rejected": 0.003113886807113886,
723
+ "step": 500
724
+ },
725
+ {
726
+ "epoch": 0.39,
727
+ "learning_rate": 3.837318317748134e-06,
728
+ "logits/chosen": -0.23187056183815002,
729
+ "logits/rejected": -0.25806108117103577,
730
+ "logps/chosen": -295.49847412109375,
731
+ "logps/rejected": -312.9036560058594,
732
+ "loss": 0.2095,
733
+ "rewards/accuracies": 0.35624998807907104,
734
+ "rewards/chosen": 0.010056470520794392,
735
+ "rewards/margins": 0.005588938947767019,
736
+ "rewards/rejected": 0.004467530641704798,
737
+ "step": 510
738
+ },
739
+ {
740
+ "epoch": 0.4,
741
+ "learning_rate": 3.7806153188114027e-06,
742
+ "logits/chosen": -0.28555235266685486,
743
+ "logits/rejected": -0.27182719111442566,
744
+ "logps/chosen": -278.7008361816406,
745
+ "logps/rejected": -286.72845458984375,
746
+ "loss": 0.2112,
747
+ "rewards/accuracies": 0.3062500059604645,
748
+ "rewards/chosen": 0.0004449138359632343,
749
+ "rewards/margins": 0.0027928180061280727,
750
+ "rewards/rejected": -0.002347904024645686,
751
+ "step": 520
752
+ },
753
+ {
754
+ "epoch": 0.4,
755
+ "learning_rate": 3.7230046482264256e-06,
756
+ "logits/chosen": -0.29695600271224976,
757
+ "logits/rejected": -0.30071455240249634,
758
+ "logps/chosen": -303.6511535644531,
759
+ "logps/rejected": -311.7902526855469,
760
+ "loss": 0.2241,
761
+ "rewards/accuracies": 0.3062500059604645,
762
+ "rewards/chosen": 0.0028660064563155174,
763
+ "rewards/margins": -0.001624327152967453,
764
+ "rewards/rejected": 0.00449033360928297,
765
+ "step": 530
766
+ },
767
+ {
768
+ "epoch": 0.41,
769
+ "learning_rate": 3.6645271391548542e-06,
770
+ "logits/chosen": -0.2697906196117401,
771
+ "logits/rejected": -0.2420235425233841,
772
+ "logps/chosen": -287.1878662109375,
773
+ "logps/rejected": -289.3710632324219,
774
+ "loss": 0.1866,
775
+ "rewards/accuracies": 0.29374998807907104,
776
+ "rewards/chosen": 0.010242622345685959,
777
+ "rewards/margins": 0.0021990591194480658,
778
+ "rewards/rejected": 0.008043562062084675,
779
+ "step": 540
780
+ },
781
+ {
782
+ "epoch": 0.42,
783
+ "learning_rate": 3.6052242391541746e-06,
784
+ "logits/chosen": -0.2508285641670227,
785
+ "logits/rejected": -0.22416014969348907,
786
+ "logps/chosen": -282.50885009765625,
787
+ "logps/rejected": -304.0210876464844,
788
+ "loss": 0.1951,
789
+ "rewards/accuracies": 0.29374998807907104,
790
+ "rewards/chosen": 0.014059506356716156,
791
+ "rewards/margins": 0.006549051962792873,
792
+ "rewards/rejected": 0.007510454393923283,
793
+ "step": 550
794
+ },
795
+ {
796
+ "epoch": 0.43,
797
+ "learning_rate": 3.5451379808006014e-06,
798
+ "logits/chosen": -0.2925412654876709,
799
+ "logits/rejected": -0.27184224128723145,
800
+ "logps/chosen": -226.45083618164062,
801
+ "logps/rejected": -252.47207641601562,
802
+ "loss": 0.2026,
803
+ "rewards/accuracies": 0.26249998807907104,
804
+ "rewards/chosen": 0.006832784973084927,
805
+ "rewards/margins": 0.006407728884369135,
806
+ "rewards/rejected": 0.00042505591409280896,
807
+ "step": 560
808
+ },
809
+ {
810
+ "epoch": 0.43,
811
+ "learning_rate": 3.484310951897323e-06,
812
+ "logits/chosen": -0.280683308839798,
813
+ "logits/rejected": -0.2360977679491043,
814
+ "logps/chosen": -272.83953857421875,
815
+ "logps/rejected": -299.12847900390625,
816
+ "loss": 0.1933,
817
+ "rewards/accuracies": 0.32499998807907104,
818
+ "rewards/chosen": 0.0044427914544939995,
819
+ "rewards/margins": 0.007019062992185354,
820
+ "rewards/rejected": -0.0025762729346752167,
821
+ "step": 570
822
+ },
823
+ {
824
+ "epoch": 0.44,
825
+ "learning_rate": 3.4227862652892106e-06,
826
+ "logits/chosen": -0.23029498755931854,
827
+ "logits/rejected": -0.30575060844421387,
828
+ "logps/chosen": -284.2434997558594,
829
+ "logps/rejected": -325.15606689453125,
830
+ "loss": 0.2113,
831
+ "rewards/accuracies": 0.30000001192092896,
832
+ "rewards/chosen": -0.0008473257767036557,
833
+ "rewards/margins": 0.0063803717494010925,
834
+ "rewards/rejected": -0.007227697875350714,
835
+ "step": 580
836
+ },
837
+ {
838
+ "epoch": 0.45,
839
+ "learning_rate": 3.3606075283054005e-06,
840
+ "logits/chosen": -0.21216976642608643,
841
+ "logits/rejected": -0.24321305751800537,
842
+ "logps/chosen": -289.8013610839844,
843
+ "logps/rejected": -292.233642578125,
844
+ "loss": 0.2076,
845
+ "rewards/accuracies": 0.2750000059604645,
846
+ "rewards/chosen": -0.006494040135294199,
847
+ "rewards/margins": 0.0029291484970599413,
848
+ "rewards/rejected": -0.009423188865184784,
849
+ "step": 590
850
+ },
851
+ {
852
+ "epoch": 0.46,
853
+ "learning_rate": 3.2978188118513814e-06,
854
+ "logits/chosen": -0.3345397114753723,
855
+ "logits/rejected": -0.2698872983455658,
856
+ "logps/chosen": -288.9801025390625,
857
+ "logps/rejected": -325.5001525878906,
858
+ "loss": 0.2204,
859
+ "rewards/accuracies": 0.3375000059604645,
860
+ "rewards/chosen": -0.005164691712707281,
861
+ "rewards/margins": 0.006596796214580536,
862
+ "rewards/rejected": -0.011761486530303955,
863
+ "step": 600
864
+ },
865
+ {
866
+ "epoch": 0.46,
867
+ "learning_rate": 3.234464619172522e-06,
868
+ "logits/chosen": -0.28216275572776794,
869
+ "logits/rejected": -0.36517441272735596,
870
+ "logps/chosen": -275.3839416503906,
871
+ "logps/rejected": -297.3681640625,
872
+ "loss": 0.2018,
873
+ "rewards/accuracies": 0.30000001192092896,
874
+ "rewards/chosen": -0.007699407637119293,
875
+ "rewards/margins": 0.0056534623727202415,
876
+ "rewards/rejected": -0.01335287094116211,
877
+ "step": 610
878
+ },
879
+ {
880
+ "epoch": 0.47,
881
+ "learning_rate": 3.1705898543111576e-06,
882
+ "logits/chosen": -0.32491374015808105,
883
+ "logits/rejected": -0.31058216094970703,
884
+ "logps/chosen": -318.451904296875,
885
+ "logps/rejected": -336.5116271972656,
886
+ "loss": 0.2052,
887
+ "rewards/accuracies": 0.32499998807907104,
888
+ "rewards/chosen": -0.007857877761125565,
889
+ "rewards/margins": 0.005408051423728466,
890
+ "rewards/rejected": -0.013265928253531456,
891
+ "step": 620
892
+ },
893
+ {
894
+ "epoch": 0.48,
895
+ "learning_rate": 3.106239790279606e-06,
896
+ "logits/chosen": -0.3316212594509125,
897
+ "logits/rejected": -0.27467650175094604,
898
+ "logps/chosen": -306.818359375,
899
+ "logps/rejected": -310.139892578125,
900
+ "loss": 0.2198,
901
+ "rewards/accuracies": 0.30000001192092896,
902
+ "rewards/chosen": -0.008624967187643051,
903
+ "rewards/margins": 0.0013632235350087285,
904
+ "rewards/rejected": -0.009988190606236458,
905
+ "step": 630
906
+ },
907
+ {
908
+ "epoch": 0.49,
909
+ "learning_rate": 3.041460036971664e-06,
910
+ "logits/chosen": -0.30510929226875305,
911
+ "logits/rejected": -0.2651751637458801,
912
+ "logps/chosen": -289.76177978515625,
913
+ "logps/rejected": -306.5693664550781,
914
+ "loss": 0.2091,
915
+ "rewards/accuracies": 0.3187499940395355,
916
+ "rewards/chosen": -0.0034012228716164827,
917
+ "rewards/margins": 0.005689640529453754,
918
+ "rewards/rejected": -0.009090864099562168,
919
+ "step": 640
920
+ },
921
+ {
922
+ "epoch": 0.5,
923
+ "learning_rate": 2.976296508835326e-06,
924
+ "logits/chosen": -0.27622857689857483,
925
+ "logits/rejected": -0.3029255270957947,
926
+ "logps/chosen": -254.8801727294922,
927
+ "logps/rejected": -280.85552978515625,
928
+ "loss": 0.2118,
929
+ "rewards/accuracies": 0.26875001192092896,
930
+ "rewards/chosen": 0.003135355655103922,
931
+ "rewards/margins": 0.00885056797415018,
932
+ "rewards/rejected": -0.005715211853384972,
933
+ "step": 650
934
+ },
935
+ {
936
+ "epoch": 0.5,
937
+ "learning_rate": 2.910795392329649e-06,
938
+ "logits/chosen": -0.29605865478515625,
939
+ "logits/rejected": -0.27770885825157166,
940
+ "logps/chosen": -277.80804443359375,
941
+ "logps/rejected": -309.5155029296875,
942
+ "loss": 0.2133,
943
+ "rewards/accuracies": 0.3687500059604645,
944
+ "rewards/chosen": 0.0010512445587664843,
945
+ "rewards/margins": 0.008048586547374725,
946
+ "rewards/rejected": -0.006997342221438885,
947
+ "step": 660
948
+ },
949
+ {
950
+ "epoch": 0.51,
951
+ "learning_rate": 2.8450031131888147e-06,
952
+ "logits/chosen": -0.35795870423316956,
953
+ "logits/rejected": -0.37449702620506287,
954
+ "logps/chosen": -289.4441833496094,
955
+ "logps/rejected": -332.01055908203125,
956
+ "loss": 0.1977,
957
+ "rewards/accuracies": 0.33125001192092896,
958
+ "rewards/chosen": 0.004084504209458828,
959
+ "rewards/margins": 0.011728955432772636,
960
+ "rewards/rejected": -0.0076444498263299465,
961
+ "step": 670
962
+ },
963
+ {
964
+ "epoch": 0.52,
965
+ "learning_rate": 2.7789663035166035e-06,
966
+ "logits/chosen": -0.35911503434181213,
967
+ "logits/rejected": -0.3697941303253174,
968
+ "logps/chosen": -251.0248565673828,
969
+ "logps/rejected": -283.61279296875,
970
+ "loss": 0.2036,
971
+ "rewards/accuracies": 0.32499998807907104,
972
+ "rewards/chosen": 0.0005092100473120809,
973
+ "rewards/margins": 0.011741384863853455,
974
+ "rewards/rejected": -0.011232174932956696,
975
+ "step": 680
976
+ },
977
+ {
978
+ "epoch": 0.53,
979
+ "learning_rate": 2.7127317687345973e-06,
980
+ "logits/chosen": -0.4329107403755188,
981
+ "logits/rejected": -0.38218554854393005,
982
+ "logps/chosen": -275.99176025390625,
983
+ "logps/rejected": -282.9251708984375,
984
+ "loss": 0.2255,
985
+ "rewards/accuracies": 0.29374998807907104,
986
+ "rewards/chosen": -0.0015629607951268554,
987
+ "rewards/margins": 0.005656304769217968,
988
+ "rewards/rejected": -0.007219265215098858,
989
+ "step": 690
990
+ },
991
+ {
992
+ "epoch": 0.53,
993
+ "learning_rate": 2.6463464544075344e-06,
994
+ "logits/chosen": -0.38624441623687744,
995
+ "logits/rejected": -0.4270317554473877,
996
+ "logps/chosen": -313.55047607421875,
997
+ "logps/rejected": -360.2120361328125,
998
+ "loss": 0.1998,
999
+ "rewards/accuracies": 0.33125001192092896,
1000
+ "rewards/chosen": -0.0040723588317632675,
1001
+ "rewards/margins": 0.013303135521709919,
1002
+ "rewards/rejected": -0.01737549528479576,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "epoch": 0.54,
1007
+ "learning_rate": 2.579857412969345e-06,
1008
+ "logits/chosen": -0.4271603226661682,
1009
+ "logits/rejected": -0.4234057366847992,
1010
+ "logps/chosen": -304.10211181640625,
1011
+ "logps/rejected": -354.1361389160156,
1012
+ "loss": 0.1979,
1013
+ "rewards/accuracies": 0.36250001192092896,
1014
+ "rewards/chosen": -0.007728138472884893,
1015
+ "rewards/margins": 0.016483409330248833,
1016
+ "rewards/rejected": -0.024211544543504715,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "epoch": 0.55,
1021
+ "learning_rate": 2.513311770373421e-06,
1022
+ "logits/chosen": -0.5024863481521606,
1023
+ "logits/rejected": -0.47420111298561096,
1024
+ "logps/chosen": -317.60174560546875,
1025
+ "logps/rejected": -362.4761657714844,
1026
+ "loss": 0.192,
1027
+ "rewards/accuracies": 0.36250001192092896,
1028
+ "rewards/chosen": -0.011080259457230568,
1029
+ "rewards/margins": 0.010900650173425674,
1030
+ "rewards/rejected": -0.021980909630656242,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "epoch": 0.56,
1035
+ "learning_rate": 2.446756692690804e-06,
1036
+ "logits/chosen": -0.4976615905761719,
1037
+ "logits/rejected": -0.4921053349971771,
1038
+ "logps/chosen": -306.67547607421875,
1039
+ "logps/rejected": -361.6275939941406,
1040
+ "loss": 0.2069,
1041
+ "rewards/accuracies": 0.36250001192092896,
1042
+ "rewards/chosen": -0.006612093187868595,
1043
+ "rewards/margins": 0.014301796443760395,
1044
+ "rewards/rejected": -0.02091388963162899,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "epoch": 0.56,
1049
+ "learning_rate": 2.380239352679908e-06,
1050
+ "logits/chosen": -0.45196834206581116,
1051
+ "logits/rejected": -0.48185840249061584,
1052
+ "logps/chosen": -313.2336730957031,
1053
+ "logps/rejected": -313.48260498046875,
1054
+ "loss": 0.2149,
1055
+ "rewards/accuracies": 0.2874999940395355,
1056
+ "rewards/chosen": -0.010375128127634525,
1057
+ "rewards/margins": 0.004926495254039764,
1058
+ "rewards/rejected": -0.015301624312996864,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "epoch": 0.57,
1063
+ "learning_rate": 2.313806896351529e-06,
1064
+ "logits/chosen": -0.5004686117172241,
1065
+ "logits/rejected": -0.47765883803367615,
1066
+ "logps/chosen": -274.7811279296875,
1067
+ "logps/rejected": -311.44207763671875,
1068
+ "loss": 0.1904,
1069
+ "rewards/accuracies": 0.33125001192092896,
1070
+ "rewards/chosen": 0.002466167788952589,
1071
+ "rewards/margins": 0.012883700430393219,
1072
+ "rewards/rejected": -0.010417533107101917,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "epoch": 0.58,
1077
+ "learning_rate": 2.247506409552795e-06,
1078
+ "logits/chosen": -0.563784122467041,
1079
+ "logits/rejected": -0.5083848237991333,
1080
+ "logps/chosen": -286.28277587890625,
1081
+ "logps/rejected": -303.66925048828125,
1082
+ "loss": 0.2037,
1083
+ "rewards/accuracies": 0.30000001192092896,
1084
+ "rewards/chosen": -0.0052384668961167336,
1085
+ "rewards/margins": 0.004287950228899717,
1086
+ "rewards/rejected": -0.009526416659355164,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "epoch": 0.59,
1091
+ "learning_rate": 2.1813848845937695e-06,
1092
+ "logits/chosen": -0.5057908296585083,
1093
+ "logits/rejected": -0.48972076177597046,
1094
+ "logps/chosen": -297.547607421875,
1095
+ "logps/rejected": -294.28863525390625,
1096
+ "loss": 0.2101,
1097
+ "rewards/accuracies": 0.28125,
1098
+ "rewards/chosen": -0.0030983686447143555,
1099
+ "rewards/margins": 0.0027166479267179966,
1100
+ "rewards/rejected": -0.00581501517444849,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "epoch": 0.59,
1105
+ "learning_rate": 2.1154891869403436e-06,
1106
+ "logits/chosen": -0.472622811794281,
1107
+ "logits/rejected": -0.4686339497566223,
1108
+ "logps/chosen": -241.92630004882812,
1109
+ "logps/rejected": -279.33062744140625,
1110
+ "loss": 0.2091,
1111
+ "rewards/accuracies": 0.3125,
1112
+ "rewards/chosen": 0.009422892704606056,
1113
+ "rewards/margins": 0.009348099119961262,
1114
+ "rewards/rejected": 7.479395571863279e-05,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "epoch": 0.6,
1119
+ "learning_rate": 2.0498660219970395e-06,
1120
+ "logits/chosen": -0.44067639112472534,
1121
+ "logits/rejected": -0.46692126989364624,
1122
+ "logps/chosen": -285.769287109375,
1123
+ "logps/rejected": -305.04296875,
1124
+ "loss": 0.2271,
1125
+ "rewards/accuracies": 0.3187499940395355,
1126
+ "rewards/chosen": 0.008178983815014362,
1127
+ "rewards/margins": 0.007535718381404877,
1128
+ "rewards/rejected": 0.0006432650843635201,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "epoch": 0.61,
1133
+ "learning_rate": 1.9845619020032552e-06,
1134
+ "logits/chosen": -0.4349413514137268,
1135
+ "logits/rejected": -0.4484075605869293,
1136
+ "logps/chosen": -305.90972900390625,
1137
+ "logps/rejected": -287.2285461425781,
1138
+ "loss": 0.2105,
1139
+ "rewards/accuracies": 0.3062500059604645,
1140
+ "rewards/chosen": 0.006356480531394482,
1141
+ "rewards/margins": 0.0014576372923329473,
1142
+ "rewards/rejected": 0.004898843355476856,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "epoch": 0.62,
1147
+ "learning_rate": 1.9196231130664282e-06,
1148
+ "logits/chosen": -0.45880216360092163,
1149
+ "logits/rejected": -0.44850587844848633,
1150
+ "logps/chosen": -263.9127502441406,
1151
+ "logps/rejected": -292.25177001953125,
1152
+ "loss": 0.2033,
1153
+ "rewards/accuracies": 0.3125,
1154
+ "rewards/chosen": 0.0159307774156332,
1155
+ "rewards/margins": 0.009099806658923626,
1156
+ "rewards/rejected": 0.006830970756709576,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "epoch": 0.62,
1161
+ "learning_rate": 1.8550956823554708e-06,
1162
+ "logits/chosen": -0.4190438687801361,
1163
+ "logits/rejected": -0.39324015378952026,
1164
+ "logps/chosen": -297.383056640625,
1165
+ "logps/rejected": -307.9434814453125,
1166
+ "loss": 0.2115,
1167
+ "rewards/accuracies": 0.2750000059604645,
1168
+ "rewards/chosen": 0.008881737478077412,
1169
+ "rewards/margins": 0.0028643650002777576,
1170
+ "rewards/rejected": 0.006017371080815792,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "epoch": 0.63,
1175
+ "learning_rate": 1.7910253454777346e-06,
1176
+ "logits/chosen": -0.4339013993740082,
1177
+ "logits/rejected": -0.4780551791191101,
1178
+ "logps/chosen": -264.0601501464844,
1179
+ "logps/rejected": -309.7989807128906,
1180
+ "loss": 0.2071,
1181
+ "rewards/accuracies": 0.3687500059604645,
1182
+ "rewards/chosen": 0.01502480823546648,
1183
+ "rewards/margins": 0.013155427761375904,
1184
+ "rewards/rejected": 0.0018693817546591163,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "epoch": 0.64,
1189
+ "learning_rate": 1.7274575140626318e-06,
1190
+ "logits/chosen": -0.413571834564209,
1191
+ "logits/rejected": -0.4285905361175537,
1192
+ "logps/chosen": -263.08453369140625,
1193
+ "logps/rejected": -286.2958068847656,
1194
+ "loss": 0.2184,
1195
+ "rewards/accuracies": 0.25,
1196
+ "rewards/chosen": 0.008341384120285511,
1197
+ "rewards/margins": 0.0012967393267899752,
1198
+ "rewards/rejected": 0.007044644560664892,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "epoch": 0.65,
1203
+ "learning_rate": 1.6644372435748823e-06,
1204
+ "logits/chosen": -0.408627986907959,
1205
+ "logits/rejected": -0.4011750817298889,
1206
+ "logps/chosen": -229.0798797607422,
1207
+ "logps/rejected": -262.96075439453125,
1208
+ "loss": 0.2043,
1209
+ "rewards/accuracies": 0.2750000059604645,
1210
+ "rewards/chosen": 0.0140065373852849,
1211
+ "rewards/margins": 0.007881510071456432,
1212
+ "rewards/rejected": 0.006125027313828468,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "epoch": 0.66,
1217
+ "learning_rate": 1.6020092013802002e-06,
1218
+ "logits/chosen": -0.43513408303260803,
1219
+ "logits/rejected": -0.4182719588279724,
1220
+ "logps/chosen": -283.6298828125,
1221
+ "logps/rejected": -299.7771301269531,
1222
+ "loss": 0.2094,
1223
+ "rewards/accuracies": 0.3125,
1224
+ "rewards/chosen": 0.015776053071022034,
1225
+ "rewards/margins": 0.005827675573527813,
1226
+ "rewards/rejected": 0.00994837749749422,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "epoch": 0.66,
1231
+ "learning_rate": 1.5402176350860653e-06,
1232
+ "logits/chosen": -0.4543350338935852,
1233
+ "logits/rejected": -0.43857789039611816,
1234
+ "logps/chosen": -258.48822021484375,
1235
+ "logps/rejected": -309.0599365234375,
1236
+ "loss": 0.2144,
1237
+ "rewards/accuracies": 0.36250001192092896,
1238
+ "rewards/chosen": 0.01519864983856678,
1239
+ "rewards/margins": 0.0170084610581398,
1240
+ "rewards/rejected": -0.0018098097061738372,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "epoch": 0.67,
1245
+ "learning_rate": 1.4791063411799938e-06,
1246
+ "logits/chosen": -0.4367285370826721,
1247
+ "logits/rejected": -0.4124454855918884,
1248
+ "logps/chosen": -258.4836120605469,
1249
+ "logps/rejected": -256.78948974609375,
1250
+ "loss": 0.2125,
1251
+ "rewards/accuracies": 0.26249998807907104,
1252
+ "rewards/chosen": 0.010697437450289726,
1253
+ "rewards/margins": 0.0008281940827146173,
1254
+ "rewards/rejected": 0.009869244880974293,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "epoch": 0.68,
1259
+ "learning_rate": 1.4187186339875697e-06,
1260
+ "logits/chosen": -0.3717727065086365,
1261
+ "logits/rejected": -0.3890214264392853,
1262
+ "logps/chosen": -238.35391235351562,
1263
+ "logps/rejected": -270.5140686035156,
1264
+ "loss": 0.202,
1265
+ "rewards/accuracies": 0.26875001192092896,
1266
+ "rewards/chosen": 0.015574706718325615,
1267
+ "rewards/margins": 0.009048102423548698,
1268
+ "rewards/rejected": 0.006526602897793055,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "epoch": 0.69,
1273
+ "learning_rate": 1.3590973149722103e-06,
1274
+ "logits/chosen": -0.4208298325538635,
1275
+ "logits/rejected": -0.4319419860839844,
1276
+ "logps/chosen": -272.0270690917969,
1277
+ "logps/rejected": -303.1417541503906,
1278
+ "loss": 0.2083,
1279
+ "rewards/accuracies": 0.32499998807907104,
1280
+ "rewards/chosen": 0.015811121091246605,
1281
+ "rewards/margins": 0.010845163837075233,
1282
+ "rewards/rejected": 0.004965959116816521,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "epoch": 0.69,
1287
+ "learning_rate": 1.300284642398445e-06,
1288
+ "logits/chosen": -0.47379741072654724,
1289
+ "logits/rejected": -0.4520508348941803,
1290
+ "logps/chosen": -262.06292724609375,
1291
+ "logps/rejected": -311.0146789550781,
1292
+ "loss": 0.1942,
1293
+ "rewards/accuracies": 0.3499999940395355,
1294
+ "rewards/chosen": 0.014805925078690052,
1295
+ "rewards/margins": 0.01618744060397148,
1296
+ "rewards/rejected": -0.0013815152924507856,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "epoch": 0.7,
1301
+ "learning_rate": 1.2423223013801946e-06,
1302
+ "logits/chosen": -0.45992302894592285,
1303
+ "logits/rejected": -0.4493522644042969,
1304
+ "logps/chosen": -277.81231689453125,
1305
+ "logps/rejected": -303.0189514160156,
1306
+ "loss": 0.2059,
1307
+ "rewards/accuracies": 0.30000001192092896,
1308
+ "rewards/chosen": 0.007296368479728699,
1309
+ "rewards/margins": 0.007703060749918222,
1310
+ "rewards/rejected": -0.00040669291047379375,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "epoch": 0.71,
1315
+ "learning_rate": 1.1852513743352886e-06,
1316
+ "logits/chosen": -0.4434199929237366,
1317
+ "logits/rejected": -0.4254834055900574,
1318
+ "logps/chosen": -236.1838836669922,
1319
+ "logps/rejected": -280.4830627441406,
1320
+ "loss": 0.2005,
1321
+ "rewards/accuracies": 0.2874999940395355,
1322
+ "rewards/chosen": 0.00997086614370346,
1323
+ "rewards/margins": 0.012648127973079681,
1324
+ "rewards/rejected": -0.0026772604323923588,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "epoch": 0.72,
1329
+ "learning_rate": 1.1291123118671665e-06,
1330
+ "logits/chosen": -0.4678889811038971,
1331
+ "logits/rejected": -0.4497453570365906,
1332
+ "logps/chosen": -260.39630126953125,
1333
+ "logps/rejected": -310.59783935546875,
1334
+ "loss": 0.2033,
1335
+ "rewards/accuracies": 0.29374998807907104,
1336
+ "rewards/chosen": 0.013568739406764507,
1337
+ "rewards/margins": 0.013490801677107811,
1338
+ "rewards/rejected": 7.793791883159429e-05,
1339
+ "step": 940
1340
+ },
1341
+ {
1342
+ "epoch": 0.72,
1343
+ "learning_rate": 1.073944904094385e-06,
1344
+ "logits/chosen": -0.4590323865413666,
1345
+ "logits/rejected": -0.4857376217842102,
1346
+ "logps/chosen": -250.05941772460938,
1347
+ "logps/rejected": -269.81182861328125,
1348
+ "loss": 0.2081,
1349
+ "rewards/accuracies": 0.26875001192092896,
1350
+ "rewards/chosen": 0.009169178083539009,
1351
+ "rewards/margins": 0.0009465640177950263,
1352
+ "rewards/rejected": 0.008222613483667374,
1353
+ "step": 950
1354
+ },
1355
+ {
1356
+ "epoch": 0.73,
1357
+ "learning_rate": 1.019788252448267e-06,
1358
+ "logits/chosen": -0.42771902680397034,
1359
+ "logits/rejected": -0.4784491956233978,
1360
+ "logps/chosen": -286.91033935546875,
1361
+ "logps/rejected": -356.3207092285156,
1362
+ "loss": 0.2176,
1363
+ "rewards/accuracies": 0.36250001192092896,
1364
+ "rewards/chosen": 0.009989687241613865,
1365
+ "rewards/margins": 0.01874762959778309,
1366
+ "rewards/rejected": -0.008757943287491798,
1367
+ "step": 960
1368
+ },
1369
+ {
1370
+ "epoch": 0.74,
1371
+ "learning_rate": 9.66680741958685e-07,
1372
+ "logits/chosen": -0.5021045804023743,
1373
+ "logits/rejected": -0.4944073557853699,
1374
+ "logps/chosen": -256.72845458984375,
1375
+ "logps/rejected": -282.7173767089844,
1376
+ "loss": 0.2023,
1377
+ "rewards/accuracies": 0.33125001192092896,
1378
+ "rewards/chosen": 0.008730728179216385,
1379
+ "rewards/margins": 0.010655781254172325,
1380
+ "rewards/rejected": -0.001925053307786584,
1381
+ "step": 970
1382
+ },
1383
+ {
1384
+ "epoch": 0.75,
1385
+ "learning_rate": 9.146600140475945e-07,
1386
+ "logits/chosen": -0.49039751291275024,
1387
+ "logits/rejected": -0.4602015018463135,
1388
+ "logps/chosen": -319.5450134277344,
1389
+ "logps/rejected": -356.46478271484375,
1390
+ "loss": 0.2141,
1391
+ "rewards/accuracies": 0.3375000059604645,
1392
+ "rewards/chosen": 0.0025001950562000275,
1393
+ "rewards/margins": 0.0142419608309865,
1394
+ "rewards/rejected": -0.011741766706109047,
1395
+ "step": 980
1396
+ },
1397
+ {
1398
+ "epoch": 0.75,
1399
+ "learning_rate": 8.637629398496378e-07,
1400
+ "logits/chosen": -0.4612942636013031,
1401
+ "logits/rejected": -0.4650956988334656,
1402
+ "logps/chosen": -314.468017578125,
1403
+ "logps/rejected": -336.6165466308594,
1404
+ "loss": 0.2201,
1405
+ "rewards/accuracies": 0.3125,
1406
+ "rewards/chosen": 0.004640791565179825,
1407
+ "rewards/margins": 0.011841602623462677,
1408
+ "rewards/rejected": -0.0072008101269602776,
1409
+ "step": 990
1410
+ },
1411
+ {
1412
+ "epoch": 0.76,
1413
+ "learning_rate": 8.140255940787059e-07,
1414
+ "logits/chosen": -0.4051927924156189,
1415
+ "logits/rejected": -0.41572269797325134,
1416
+ "logps/chosen": -266.1190185546875,
1417
+ "logps/rejected": -307.9905090332031,
1418
+ "loss": 0.1983,
1419
+ "rewards/accuracies": 0.3062500059604645,
1420
+ "rewards/chosen": 0.005232260562479496,
1421
+ "rewards/margins": 0.011333976872265339,
1422
+ "rewards/rejected": -0.006101716309785843,
1423
+ "step": 1000
1424
+ },
1425
+ {
1426
+ "epoch": 0.77,
1427
+ "learning_rate": 7.654832294589776e-07,
1428
+ "logits/chosen": -0.433448851108551,
1429
+ "logits/rejected": -0.4530878961086273,
1430
+ "logps/chosen": -264.8626403808594,
1431
+ "logps/rejected": -305.2262878417969,
1432
+ "loss": 0.2114,
1433
+ "rewards/accuracies": 0.3187499940395355,
1434
+ "rewards/chosen": 0.004321249667555094,
1435
+ "rewards/margins": 0.009284446947276592,
1436
+ "rewards/rejected": -0.004963197745382786,
1437
+ "step": 1010
1438
+ },
1439
+ {
1440
+ "epoch": 0.78,
1441
+ "learning_rate": 7.181702517385789e-07,
1442
+ "logits/chosen": -0.470588743686676,
1443
+ "logits/rejected": -0.4782032370567322,
1444
+ "logps/chosen": -274.71527099609375,
1445
+ "logps/rejected": -307.9129943847656,
1446
+ "loss": 0.1997,
1447
+ "rewards/accuracies": 0.3125,
1448
+ "rewards/chosen": 0.007287648506462574,
1449
+ "rewards/margins": 0.00972357951104641,
1450
+ "rewards/rejected": -0.0024359312374144793,
1451
+ "step": 1020
1452
+ },
1453
+ {
1454
+ "epoch": 0.78,
1455
+ "learning_rate": 6.721201953035511e-07,
1456
+ "logits/chosen": -0.4322214126586914,
1457
+ "logits/rejected": -0.458743155002594,
1458
+ "logps/chosen": -262.1369323730469,
1459
+ "logps/rejected": -275.8923034667969,
1460
+ "loss": 0.2108,
1461
+ "rewards/accuracies": 0.23749999701976776,
1462
+ "rewards/chosen": 0.0005865416023880243,
1463
+ "rewards/margins": 0.000941948383115232,
1464
+ "rewards/rejected": -0.00035540637327358127,
1465
+ "step": 1030
1466
+ },
1467
+ {
1468
+ "epoch": 0.79,
1469
+ "learning_rate": 6.273656994094232e-07,
1470
+ "logits/chosen": -0.4811081886291504,
1471
+ "logits/rejected": -0.40350890159606934,
1472
+ "logps/chosen": -272.11419677734375,
1473
+ "logps/rejected": -307.12371826171875,
1474
+ "loss": 0.1991,
1475
+ "rewards/accuracies": 0.32499998807907104,
1476
+ "rewards/chosen": 0.005647097248584032,
1477
+ "rewards/margins": 0.007634321693331003,
1478
+ "rewards/rejected": -0.0019872249104082584,
1479
+ "step": 1040
1480
+ },
1481
+ {
1482
+ "epoch": 0.8,
1483
+ "learning_rate": 5.839384850472359e-07,
1484
+ "logits/chosen": -0.46519404649734497,
1485
+ "logits/rejected": -0.40799570083618164,
1486
+ "logps/chosen": -289.98883056640625,
1487
+ "logps/rejected": -292.3999938964844,
1488
+ "loss": 0.2163,
1489
+ "rewards/accuracies": 0.29374998807907104,
1490
+ "rewards/chosen": 0.008130472153425217,
1491
+ "rewards/margins": 0.002877952065318823,
1492
+ "rewards/rejected": 0.005252520553767681,
1493
+ "step": 1050
1494
+ },
1495
+ {
1496
+ "epoch": 0.81,
1497
+ "learning_rate": 5.418693324604082e-07,
1498
+ "logits/chosen": -0.5294805765151978,
1499
+ "logits/rejected": -0.5169610977172852,
1500
+ "logps/chosen": -272.34686279296875,
1501
+ "logps/rejected": -307.16412353515625,
1502
+ "loss": 0.1938,
1503
+ "rewards/accuracies": 0.35624998807907104,
1504
+ "rewards/chosen": 0.012028077617287636,
1505
+ "rewards/margins": 0.013656511902809143,
1506
+ "rewards/rejected": -0.0016284309094771743,
1507
+ "step": 1060
1508
+ },
1509
+ {
1510
+ "epoch": 0.82,
1511
+ "learning_rate": 5.01188059328386e-07,
1512
+ "logits/chosen": -0.5478223562240601,
1513
+ "logits/rejected": -0.4814823269844055,
1514
+ "logps/chosen": -289.8566589355469,
1515
+ "logps/rejected": -289.57525634765625,
1516
+ "loss": 0.2065,
1517
+ "rewards/accuracies": 0.28125,
1518
+ "rewards/chosen": 0.008872238919138908,
1519
+ "rewards/margins": 0.006515635643154383,
1520
+ "rewards/rejected": 0.0023566028103232384,
1521
+ "step": 1070
1522
+ },
1523
+ {
1524
+ "epoch": 0.82,
1525
+ "learning_rate": 4.619234996325314e-07,
1526
+ "logits/chosen": -0.4742591977119446,
1527
+ "logits/rejected": -0.45757928490638733,
1528
+ "logps/chosen": -328.6953125,
1529
+ "logps/rejected": -340.1205139160156,
1530
+ "loss": 0.2115,
1531
+ "rewards/accuracies": 0.32499998807907104,
1532
+ "rewards/chosen": 0.006806342396885157,
1533
+ "rewards/margins": 0.0063201361335814,
1534
+ "rewards/rejected": 0.000486206088680774,
1535
+ "step": 1080
1536
+ },
1537
+ {
1538
+ "epoch": 0.83,
1539
+ "learning_rate": 4.241034832192434e-07,
1540
+ "logits/chosen": -0.5209555625915527,
1541
+ "logits/rejected": -0.5151633024215698,
1542
+ "logps/chosen": -276.89581298828125,
1543
+ "logps/rejected": -273.1256103515625,
1544
+ "loss": 0.2135,
1545
+ "rewards/accuracies": 0.24375000596046448,
1546
+ "rewards/chosen": 0.0027219385374337435,
1547
+ "rewards/margins": 0.0031283677089959383,
1548
+ "rewards/rejected": -0.00040642902604304254,
1549
+ "step": 1090
1550
+ },
1551
+ {
1552
+ "epoch": 0.84,
1553
+ "learning_rate": 3.877548160747768e-07,
1554
+ "logits/chosen": -0.4666425585746765,
1555
+ "logits/rejected": -0.48001235723495483,
1556
+ "logps/chosen": -280.6470031738281,
1557
+ "logps/rejected": -315.48443603515625,
1558
+ "loss": 0.2048,
1559
+ "rewards/accuracies": 0.3187499940395355,
1560
+ "rewards/chosen": 0.002273663878440857,
1561
+ "rewards/margins": 0.008108812384307384,
1562
+ "rewards/rejected": -0.0058351485058665276,
1563
+ "step": 1100
1564
+ },
1565
+ {
1566
+ "epoch": 0.85,
1567
+ "learning_rate": 3.529032613257574e-07,
1568
+ "logits/chosen": -0.4815247654914856,
1569
+ "logits/rejected": -0.45564061403274536,
1570
+ "logps/chosen": -300.01959228515625,
1571
+ "logps/rejected": -330.40899658203125,
1572
+ "loss": 0.2076,
1573
+ "rewards/accuracies": 0.32499998807907104,
1574
+ "rewards/chosen": 0.0031171857845038176,
1575
+ "rewards/margins": 0.009788629598915577,
1576
+ "rewards/rejected": -0.006671445909887552,
1577
+ "step": 1110
1578
+ },
1579
+ {
1580
+ "epoch": 0.85,
1581
+ "learning_rate": 3.195735209788528e-07,
1582
+ "logits/chosen": -0.4933408796787262,
1583
+ "logits/rejected": -0.5491820573806763,
1584
+ "logps/chosen": -258.11822509765625,
1585
+ "logps/rejected": -296.2974853515625,
1586
+ "loss": 0.213,
1587
+ "rewards/accuracies": 0.3062500059604645,
1588
+ "rewards/chosen": 0.008377974852919579,
1589
+ "rewards/margins": 0.012130925431847572,
1590
+ "rewards/rejected": -0.003752949880436063,
1591
+ "step": 1120
1592
+ },
1593
+ {
1594
+ "epoch": 0.86,
1595
+ "learning_rate": 2.8778921841253774e-07,
1596
+ "logits/chosen": -0.43251729011535645,
1597
+ "logits/rejected": -0.5054982900619507,
1598
+ "logps/chosen": -281.94403076171875,
1599
+ "logps/rejected": -280.0823669433594,
1600
+ "loss": 0.2204,
1601
+ "rewards/accuracies": 0.23125000298023224,
1602
+ "rewards/chosen": 0.0010522855445742607,
1603
+ "rewards/margins": 0.00011768531840061769,
1604
+ "rewards/rejected": 0.0009346003644168377,
1605
+ "step": 1130
1606
+ },
1607
+ {
1608
+ "epoch": 0.87,
1609
+ "learning_rate": 2.5757288163336806e-07,
1610
+ "logits/chosen": -0.4674537181854248,
1611
+ "logits/rejected": -0.43040981888771057,
1612
+ "logps/chosen": -267.8955078125,
1613
+ "logps/rejected": -268.64312744140625,
1614
+ "loss": 0.2134,
1615
+ "rewards/accuracies": 0.2750000059604645,
1616
+ "rewards/chosen": -0.00031215575290843844,
1617
+ "rewards/margins": -0.00015194570005405694,
1618
+ "rewards/rejected": -0.00016021067858673632,
1619
+ "step": 1140
1620
+ },
1621
+ {
1622
+ "epoch": 0.88,
1623
+ "learning_rate": 2.2894592730863336e-07,
1624
+ "logits/chosen": -0.4933416247367859,
1625
+ "logits/rejected": -0.4184231758117676,
1626
+ "logps/chosen": -274.43487548828125,
1627
+ "logps/rejected": -287.72259521484375,
1628
+ "loss": 0.2058,
1629
+ "rewards/accuracies": 0.29374998807907104,
1630
+ "rewards/chosen": 0.012380121275782585,
1631
+ "rewards/margins": 0.007005924824625254,
1632
+ "rewards/rejected": 0.005374195985496044,
1633
+ "step": 1150
1634
+ },
1635
+ {
1636
+ "epoch": 0.88,
1637
+ "learning_rate": 2.019286455866981e-07,
1638
+ "logits/chosen": -0.5218170881271362,
1639
+ "logits/rejected": -0.46864786744117737,
1640
+ "logps/chosen": -298.1045227050781,
1641
+ "logps/rejected": -322.55181884765625,
1642
+ "loss": 0.1925,
1643
+ "rewards/accuracies": 0.32499998807907104,
1644
+ "rewards/chosen": -0.0008321186760440469,
1645
+ "rewards/margins": 0.004535870626568794,
1646
+ "rewards/rejected": -0.005367989186197519,
1647
+ "step": 1160
1648
+ },
1649
+ {
1650
+ "epoch": 0.89,
1651
+ "learning_rate": 1.7654018571579557e-07,
1652
+ "logits/chosen": -0.48008307814598083,
1653
+ "logits/rejected": -0.4698426127433777,
1654
+ "logps/chosen": -307.2906494140625,
1655
+ "logps/rejected": -356.3848571777344,
1656
+ "loss": 0.2072,
1657
+ "rewards/accuracies": 0.3812499940395355,
1658
+ "rewards/chosen": 0.013308033347129822,
1659
+ "rewards/margins": 0.019972598180174828,
1660
+ "rewards/rejected": -0.00666456576436758,
1661
+ "step": 1170
1662
+ },
1663
+ {
1664
+ "epoch": 0.9,
1665
+ "learning_rate": 1.5279854247146703e-07,
1666
+ "logits/chosen": -0.46893006563186646,
1667
+ "logits/rejected": -0.4585728645324707,
1668
+ "logps/chosen": -340.1523132324219,
1669
+ "logps/rejected": -351.4723205566406,
1670
+ "loss": 0.2135,
1671
+ "rewards/accuracies": 0.3187499940395355,
1672
+ "rewards/chosen": 0.0029886148404330015,
1673
+ "rewards/margins": 0.005356112495064735,
1674
+ "rewards/rejected": -0.002367498353123665,
1675
+ "step": 1180
1676
+ },
1677
+ {
1678
+ "epoch": 0.91,
1679
+ "learning_rate": 1.307205434022671e-07,
1680
+ "logits/chosen": -0.4998309016227722,
1681
+ "logits/rejected": -0.49961596727371216,
1682
+ "logps/chosen": -272.09832763671875,
1683
+ "logps/rejected": -272.3662109375,
1684
+ "loss": 0.1976,
1685
+ "rewards/accuracies": 0.26875001192092896,
1686
+ "rewards/chosen": 0.0032546469010412693,
1687
+ "rewards/margins": 0.003977665212005377,
1688
+ "rewards/rejected": -0.0007230177288874984,
1689
+ "step": 1190
1690
+ },
1691
+ {
1692
+ "epoch": 0.91,
1693
+ "learning_rate": 1.1032183690276754e-07,
1694
+ "logits/chosen": -0.518333911895752,
1695
+ "logits/rejected": -0.5002647638320923,
1696
+ "logps/chosen": -280.9651794433594,
1697
+ "logps/rejected": -300.6822204589844,
1698
+ "loss": 0.2098,
1699
+ "rewards/accuracies": 0.26875001192092896,
1700
+ "rewards/chosen": 0.002680544275790453,
1701
+ "rewards/margins": 0.003743491368368268,
1702
+ "rewards/rejected": -0.001062947092577815,
1703
+ "step": 1200
1704
+ },
1705
+ {
1706
+ "epoch": 0.92,
1707
+ "learning_rate": 9.161688112232836e-08,
1708
+ "logits/chosen": -0.46787554025650024,
1709
+ "logits/rejected": -0.45076021552085876,
1710
+ "logps/chosen": -282.44903564453125,
1711
+ "logps/rejected": -328.4521789550781,
1712
+ "loss": 0.2076,
1713
+ "rewards/accuracies": 0.35624998807907104,
1714
+ "rewards/chosen": 0.012471871450543404,
1715
+ "rewards/margins": 0.01295357383787632,
1716
+ "rewards/rejected": -0.00048170099034905434,
1717
+ "step": 1210
1718
+ },
1719
+ {
1720
+ "epoch": 0.93,
1721
+ "learning_rate": 7.46189337174788e-08,
1722
+ "logits/chosen": -0.4574583172798157,
1723
+ "logits/rejected": -0.48262113332748413,
1724
+ "logps/chosen": -287.9381408691406,
1725
+ "logps/rejected": -307.37017822265625,
1726
+ "loss": 0.2135,
1727
+ "rewards/accuracies": 0.33125001192092896,
1728
+ "rewards/chosen": 0.01407331507652998,
1729
+ "rewards/margins": 0.009314117953181267,
1730
+ "rewards/rejected": 0.004759198985993862,
1731
+ "step": 1220
1732
+ },
1733
+ {
1734
+ "epoch": 0.94,
1735
+ "learning_rate": 5.934004245518793e-08,
1736
+ "logits/chosen": -0.5343510508537292,
1737
+ "logits/rejected": -0.42381030321121216,
1738
+ "logps/chosen": -301.0657958984375,
1739
+ "logps/rejected": -314.10260009765625,
1740
+ "loss": 0.2144,
1741
+ "rewards/accuracies": 0.33125001192092896,
1742
+ "rewards/chosen": 0.011122044175863266,
1743
+ "rewards/margins": 0.00779320765286684,
1744
+ "rewards/rejected": 0.0033288367558270693,
1745
+ "step": 1230
1746
+ },
1747
+ {
1748
+ "epoch": 0.94,
1749
+ "learning_rate": 4.579103667367385e-08,
1750
+ "logits/chosen": -0.5401466488838196,
1751
+ "logits/rejected": -0.4905480444431305,
1752
+ "logps/chosen": -302.44818115234375,
1753
+ "logps/rejected": -316.00640869140625,
1754
+ "loss": 0.2009,
1755
+ "rewards/accuracies": 0.3125,
1756
+ "rewards/chosen": 0.009371964260935783,
1757
+ "rewards/margins": 0.005004683043807745,
1758
+ "rewards/rejected": 0.004367280751466751,
1759
+ "step": 1240
1760
+ },
1761
+ {
1762
+ "epoch": 0.95,
1763
+ "learning_rate": 3.398151960681162e-08,
1764
+ "logits/chosen": -0.4864376485347748,
1765
+ "logits/rejected": -0.474257230758667,
1766
+ "logps/chosen": -272.18450927734375,
1767
+ "logps/rejected": -293.8534851074219,
1768
+ "loss": 0.212,
1769
+ "rewards/accuracies": 0.29374998807907104,
1770
+ "rewards/chosen": 0.010555078275501728,
1771
+ "rewards/margins": 0.007184301503002644,
1772
+ "rewards/rejected": 0.003370775608345866,
1773
+ "step": 1250
1774
+ },
1775
+ {
1776
+ "epoch": 0.96,
1777
+ "learning_rate": 2.3919861577572924e-08,
1778
+ "logits/chosen": -0.48694509267807007,
1779
+ "logits/rejected": -0.4881502091884613,
1780
+ "logps/chosen": -285.6047668457031,
1781
+ "logps/rejected": -344.7032470703125,
1782
+ "loss": 0.207,
1783
+ "rewards/accuracies": 0.3375000059604645,
1784
+ "rewards/chosen": 0.010286719538271427,
1785
+ "rewards/margins": 0.011380873620510101,
1786
+ "rewards/rejected": -0.0010941538494080305,
1787
+ "step": 1260
1788
+ },
1789
+ {
1790
+ "epoch": 0.97,
1791
+ "learning_rate": 1.5613194065327854e-08,
1792
+ "logits/chosen": -0.4984816610813141,
1793
+ "logits/rejected": -0.4886694550514221,
1794
+ "logps/chosen": -341.03369140625,
1795
+ "logps/rejected": -376.7138977050781,
1796
+ "loss": 0.1955,
1797
+ "rewards/accuracies": 0.38749998807907104,
1798
+ "rewards/chosen": 0.008053203113377094,
1799
+ "rewards/margins": 0.012355836108326912,
1800
+ "rewards/rejected": -0.004302632994949818,
1801
+ "step": 1270
1802
+ },
1803
+ {
1804
+ "epoch": 0.98,
1805
+ "learning_rate": 9.067404651211808e-09,
1806
+ "logits/chosen": -0.47365856170654297,
1807
+ "logits/rejected": -0.463469922542572,
1808
+ "logps/chosen": -296.1658630371094,
1809
+ "logps/rejected": -324.7825012207031,
1810
+ "loss": 0.2016,
1811
+ "rewards/accuracies": 0.35624998807907104,
1812
+ "rewards/chosen": 0.00670830812305212,
1813
+ "rewards/margins": 0.009644337929785252,
1814
+ "rewards/rejected": -0.0029360298067331314,
1815
+ "step": 1280
1816
+ },
1817
+ {
1818
+ "epoch": 0.98,
1819
+ "learning_rate": 4.287132845137709e-09,
1820
+ "logits/chosen": -0.4601454734802246,
1821
+ "logits/rejected": -0.44960451126098633,
1822
+ "logps/chosen": -305.7701110839844,
1823
+ "logps/rejected": -349.0350341796875,
1824
+ "loss": 0.1976,
1825
+ "rewards/accuracies": 0.3499999940395355,
1826
+ "rewards/chosen": 0.007751205004751682,
1827
+ "rewards/margins": 0.011196482926607132,
1828
+ "rewards/rejected": -0.0034452793188393116,
1829
+ "step": 1290
1830
+ },
1831
+ {
1832
+ "epoch": 0.99,
1833
+ "learning_rate": 1.2757667974155896e-09,
1834
+ "logits/chosen": -0.4950791001319885,
1835
+ "logits/rejected": -0.47172340750694275,
1836
+ "logps/chosen": -312.46453857421875,
1837
+ "logps/rejected": -315.9293212890625,
1838
+ "loss": 0.2221,
1839
+ "rewards/accuracies": 0.2874999940395355,
1840
+ "rewards/chosen": 0.00022761887521483004,
1841
+ "rewards/margins": -0.004159080795943737,
1842
+ "rewards/rejected": 0.0043866997584700584,
1843
+ "step": 1300
1844
+ },
1845
+ {
1846
+ "epoch": 1.0,
1847
+ "learning_rate": 3.544089730633804e-11,
1848
+ "logits/chosen": -0.48454493284225464,
1849
+ "logits/rejected": -0.46828117966651917,
1850
+ "logps/chosen": -290.0190734863281,
1851
+ "logps/rejected": -295.50665283203125,
1852
+ "loss": 0.1916,
1853
+ "rewards/accuracies": 0.29374998807907104,
1854
+ "rewards/chosen": 0.0072511411271989346,
1855
+ "rewards/margins": 0.00542284082621336,
1856
+ "rewards/rejected": 0.001828300068154931,
1857
+ "step": 1310
1858
+ },
1859
+ {
1860
+ "epoch": 1.0,
1861
+ "step": 1312,
1862
+ "total_flos": 0.0,
1863
+ "train_loss": 0.20755663267677513,
1864
+ "train_runtime": 11765.5627,
1865
+ "train_samples_per_second": 1.785,
1866
+ "train_steps_per_second": 0.112
1867
+ }
1868
+ ],
1869
+ "logging_steps": 10,
1870
+ "max_steps": 1312,
1871
+ "num_input_tokens_seen": 0,
1872
+ "num_train_epochs": 1,
1873
+ "save_steps": 100,
1874
+ "total_flos": 0.0,
1875
+ "train_batch_size": 4,
1876
+ "trial_name": null,
1877
+ "trial_params": null
1878
+ }