nlile commited on
Commit
b87cd70
1 Parent(s): 331f758

Model save

Browse files
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: stabilityai/StableBeluga-13B
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: PE-13b-lora
7
+ results: []
8
+ ---
9
+
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # PE-13b-lora
14
+
15
+ This model is a fine-tuned version of [stabilityai/StableBeluga-13B](https://huggingface.co/stabilityai/StableBeluga-13B) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 0.5704
18
+ - Rewards/chosen: 0.1581
19
+ - Rewards/rejected: -0.1076
20
+ - Rewards/accuracies: 0.9472
21
+ - Rewards/margins: 0.2658
22
+ - Logps/rejected: -73.1769
23
+ - Logps/chosen: -90.4042
24
+ - Logits/rejected: -1.7758
25
+ - Logits/chosen: -2.0462
26
+
27
+ ## Model description
28
+
29
+ More information needed
30
+
31
+ ## Intended uses & limitations
32
+
33
+ More information needed
34
+
35
+ ## Training and evaluation data
36
+
37
+ More information needed
38
+
39
+ ## Training procedure
40
+
41
+ ### Training hyperparameters
42
+
43
+ The following hyperparameters were used during training:
44
+ - learning_rate: 5e-07
45
+ - train_batch_size: 6
46
+ - eval_batch_size: 4
47
+ - seed: 42
48
+ - distributed_type: multi-GPU
49
+ - num_devices: 8
50
+ - gradient_accumulation_steps: 2
51
+ - total_train_batch_size: 96
52
+ - total_eval_batch_size: 32
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_ratio: 0.1
56
+ - num_epochs: 1
57
+
58
+ ### Training results
59
+
60
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.693 | 0.07 | 100 | 0.6933 | -0.0008 | -0.0005 | 0.4889 | -0.0003 | -72.1053 | -91.9932 | -1.7861 | -2.0525 |
63
+ | 0.69 | 0.14 | 200 | 0.6901 | 0.0031 | -0.0015 | 0.5611 | 0.0046 | -72.1153 | -91.9544 | -1.7859 | -2.0524 |
64
+ | 0.6842 | 0.21 | 300 | 0.6832 | 0.0139 | -0.0056 | 0.6917 | 0.0195 | -72.1567 | -91.8467 | -1.7847 | -2.0513 |
65
+ | 0.672 | 0.27 | 400 | 0.6718 | 0.0281 | -0.0131 | 0.8250 | 0.0412 | -72.2312 | -91.7049 | -1.7836 | -2.0504 |
66
+ | 0.6563 | 0.34 | 500 | 0.6575 | 0.0498 | -0.0211 | 0.8861 | 0.0709 | -72.3116 | -91.4876 | -1.7821 | -2.0494 |
67
+ | 0.6437 | 0.41 | 600 | 0.6416 | 0.0705 | -0.0340 | 0.9111 | 0.1044 | -72.4401 | -91.2810 | -1.7807 | -2.0486 |
68
+ | 0.6261 | 0.48 | 700 | 0.6277 | 0.0885 | -0.0435 | 0.9250 | 0.1320 | -72.5355 | -91.1010 | -1.7796 | -2.0478 |
69
+ | 0.6117 | 0.55 | 800 | 0.6127 | 0.1097 | -0.0567 | 0.9222 | 0.1664 | -72.6675 | -90.8891 | -1.7786 | -2.0474 |
70
+ | 0.6002 | 0.62 | 900 | 0.6019 | 0.1226 | -0.0683 | 0.9278 | 0.1909 | -72.7836 | -90.7598 | -1.7777 | -2.0468 |
71
+ | 0.5912 | 0.68 | 1000 | 0.5912 | 0.1344 | -0.0805 | 0.9333 | 0.2148 | -72.9053 | -90.6422 | -1.7770 | -2.0466 |
72
+ | 0.5822 | 0.75 | 1100 | 0.5822 | 0.1441 | -0.0909 | 0.9472 | 0.2350 | -73.0092 | -90.5447 | -1.7763 | -2.0462 |
73
+ | 0.5789 | 0.82 | 1200 | 0.5759 | 0.1517 | -0.0992 | 0.9333 | 0.2509 | -73.0923 | -90.4690 | -1.7763 | -2.0465 |
74
+ | 0.5689 | 0.89 | 1300 | 0.5722 | 0.1555 | -0.1033 | 0.9500 | 0.2588 | -73.1332 | -90.4305 | -1.7762 | -2.0465 |
75
+ | 0.5694 | 0.96 | 1400 | 0.5702 | 0.1579 | -0.1066 | 0.9417 | 0.2644 | -73.1662 | -90.4070 | -1.7761 | -2.0465 |
76
+
77
+
78
+ ### Framework versions
79
+
80
+ - Transformers 4.35.0
81
+ - Pytorch 2.1.1+cu121
82
+ - Datasets 2.14.6
83
+ - Tokenizers 0.14.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b357d4b7b080ad3d7237a7afd4161656148afebdbc4a78d442ac3a5be6eff1f
3
  size 209758976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f9a1af2ba9753d0ecbe37035f3e479048c69509166da286ba458b139a0ec623
3
  size 209758976
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.0462207794189453,
4
+ "eval_logits/rejected": -1.7758138179779053,
5
+ "eval_logps/chosen": -90.40424346923828,
6
+ "eval_logps/rejected": -73.17691802978516,
7
+ "eval_loss": 0.5703843832015991,
8
+ "eval_rewards/accuracies": 0.9472222328186035,
9
+ "eval_rewards/chosen": 0.1581471860408783,
10
+ "eval_rewards/margins": 0.2657936215400696,
11
+ "eval_rewards/rejected": -0.10764642059803009,
12
+ "eval_runtime": 118.208,
13
+ "eval_samples": 2862,
14
+ "eval_samples_per_second": 24.212,
15
+ "eval_steps_per_second": 0.761,
16
+ "train_loss": 0.6280729855576607,
17
+ "train_runtime": 9689.6427,
18
+ "train_samples": 140201,
19
+ "train_samples_per_second": 14.469,
20
+ "train_steps_per_second": 0.151
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.0462207794189453,
4
+ "eval_logits/rejected": -1.7758138179779053,
5
+ "eval_logps/chosen": -90.40424346923828,
6
+ "eval_logps/rejected": -73.17691802978516,
7
+ "eval_loss": 0.5703843832015991,
8
+ "eval_rewards/accuracies": 0.9472222328186035,
9
+ "eval_rewards/chosen": 0.1581471860408783,
10
+ "eval_rewards/margins": 0.2657936215400696,
11
+ "eval_rewards/rejected": -0.10764642059803009,
12
+ "eval_runtime": 118.208,
13
+ "eval_samples": 2862,
14
+ "eval_samples_per_second": 24.212,
15
+ "eval_steps_per_second": 0.761
16
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6280729855576607,
4
+ "train_runtime": 9689.6427,
5
+ "train_samples": 140201,
6
+ "train_samples_per_second": 14.469,
7
+ "train_steps_per_second": 0.151
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.999657651489216,
5
+ "eval_steps": 100,
6
+ "global_step": 1460,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 3.424657534246575e-09,
14
+ "logits/chosen": -1.796067237854004,
15
+ "logits/rejected": -1.6250377893447876,
16
+ "logps/chosen": -84.08734130859375,
17
+ "logps/rejected": -66.90229797363281,
18
+ "loss": 0.6931,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 3.424657534246575e-08,
28
+ "logits/chosen": -1.7872660160064697,
29
+ "logits/rejected": -1.5217690467834473,
30
+ "logps/chosen": -91.57577514648438,
31
+ "logps/rejected": -78.510498046875,
32
+ "loss": 0.6935,
33
+ "rewards/accuracies": 0.4166666567325592,
34
+ "rewards/chosen": 0.00252585974521935,
35
+ "rewards/margins": 0.003410403151065111,
36
+ "rewards/rejected": -0.0008845434640534222,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.01,
41
+ "learning_rate": 6.84931506849315e-08,
42
+ "logits/chosen": -1.9197826385498047,
43
+ "logits/rejected": -1.6265367269515991,
44
+ "logps/chosen": -96.18563079833984,
45
+ "logps/rejected": -73.0329818725586,
46
+ "loss": 0.6941,
47
+ "rewards/accuracies": 0.4333333373069763,
48
+ "rewards/chosen": -0.0032726190984249115,
49
+ "rewards/margins": -0.004598576575517654,
50
+ "rewards/rejected": 0.0013259568950161338,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 1.0273972602739725e-07,
56
+ "logits/chosen": -1.8969062566757202,
57
+ "logits/rejected": -1.5750768184661865,
58
+ "logps/chosen": -96.1051254272461,
59
+ "logps/rejected": -74.69762420654297,
60
+ "loss": 0.6929,
61
+ "rewards/accuracies": 0.5416666865348816,
62
+ "rewards/chosen": -0.0011650085216388106,
63
+ "rewards/margins": 0.001948213903233409,
64
+ "rewards/rejected": -0.0031132223084568977,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.03,
69
+ "learning_rate": 1.36986301369863e-07,
70
+ "logits/chosen": -1.9271103143692017,
71
+ "logits/rejected": -1.6277456283569336,
72
+ "logps/chosen": -96.0181884765625,
73
+ "logps/rejected": -79.47406005859375,
74
+ "loss": 0.6927,
75
+ "rewards/accuracies": 0.5166666507720947,
76
+ "rewards/chosen": -0.001698245876468718,
77
+ "rewards/margins": -0.0010797118302434683,
78
+ "rewards/rejected": -0.0006185341626405716,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.03,
83
+ "learning_rate": 1.7123287671232875e-07,
84
+ "logits/chosen": -1.9380648136138916,
85
+ "logits/rejected": -1.6861069202423096,
86
+ "logps/chosen": -93.2973861694336,
87
+ "logps/rejected": -76.30517578125,
88
+ "loss": 0.6927,
89
+ "rewards/accuracies": 0.34166663885116577,
90
+ "rewards/chosen": -0.0030570379458367825,
91
+ "rewards/margins": -0.005480821710079908,
92
+ "rewards/rejected": 0.002423783764243126,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.04,
97
+ "learning_rate": 2.054794520547945e-07,
98
+ "logits/chosen": -1.832024335861206,
99
+ "logits/rejected": -1.511380910873413,
100
+ "logps/chosen": -98.84480285644531,
101
+ "logps/rejected": -75.86248779296875,
102
+ "loss": 0.6931,
103
+ "rewards/accuracies": 0.4500000476837158,
104
+ "rewards/chosen": -0.003249814035370946,
105
+ "rewards/margins": -0.00364103470928967,
106
+ "rewards/rejected": 0.00039122122689150274,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.05,
111
+ "learning_rate": 2.3972602739726023e-07,
112
+ "logits/chosen": -1.9178974628448486,
113
+ "logits/rejected": -1.6177875995635986,
114
+ "logps/chosen": -92.13301086425781,
115
+ "logps/rejected": -75.75408172607422,
116
+ "loss": 0.6927,
117
+ "rewards/accuracies": 0.5333333611488342,
118
+ "rewards/chosen": 0.0013952379813417792,
119
+ "rewards/margins": 0.0021971219684928656,
120
+ "rewards/rejected": -0.0008018844528123736,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.05,
125
+ "learning_rate": 2.73972602739726e-07,
126
+ "logits/chosen": -1.938677191734314,
127
+ "logits/rejected": -1.6598854064941406,
128
+ "logps/chosen": -88.68348693847656,
129
+ "logps/rejected": -74.39127349853516,
130
+ "loss": 0.6929,
131
+ "rewards/accuracies": 0.5333333015441895,
132
+ "rewards/chosen": 0.0035478367935866117,
133
+ "rewards/margins": 0.003417719155550003,
134
+ "rewards/rejected": 0.00013011766714043915,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.06,
139
+ "learning_rate": 3.0821917808219176e-07,
140
+ "logits/chosen": -1.832851767539978,
141
+ "logits/rejected": -1.5398364067077637,
142
+ "logps/chosen": -92.82958984375,
143
+ "logps/rejected": -73.15336608886719,
144
+ "loss": 0.6927,
145
+ "rewards/accuracies": 0.5833333730697632,
146
+ "rewards/chosen": 0.00075482705142349,
147
+ "rewards/margins": 0.0024536100681871176,
148
+ "rewards/rejected": -0.001698783366009593,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.07,
153
+ "learning_rate": 3.424657534246575e-07,
154
+ "logits/chosen": -1.8959871530532837,
155
+ "logits/rejected": -1.6301053762435913,
156
+ "logps/chosen": -92.27674865722656,
157
+ "logps/rejected": -74.60256958007812,
158
+ "loss": 0.693,
159
+ "rewards/accuracies": 0.44999998807907104,
160
+ "rewards/chosen": 0.0006887881900183856,
161
+ "rewards/margins": 0.0005900462856516242,
162
+ "rewards/rejected": 9.874170791590586e-05,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.07,
167
+ "eval_logits/chosen": -2.0524938106536865,
168
+ "eval_logits/rejected": -1.7860654592514038,
169
+ "eval_logps/chosen": -91.99323272705078,
170
+ "eval_logps/rejected": -72.10526275634766,
171
+ "eval_loss": 0.693277895450592,
172
+ "eval_rewards/accuracies": 0.4888888895511627,
173
+ "eval_rewards/chosen": -0.0007517762714996934,
174
+ "eval_rewards/margins": -0.00027055441751144826,
175
+ "eval_rewards/rejected": -0.00048122191219590604,
176
+ "eval_runtime": 117.2952,
177
+ "eval_samples_per_second": 24.4,
178
+ "eval_steps_per_second": 0.767,
179
+ "step": 100
180
+ },
181
+ {
182
+ "epoch": 0.08,
183
+ "learning_rate": 3.767123287671233e-07,
184
+ "logits/chosen": -1.947257399559021,
185
+ "logits/rejected": -1.6791489124298096,
186
+ "logps/chosen": -93.37996673583984,
187
+ "logps/rejected": -72.86904907226562,
188
+ "loss": 0.6936,
189
+ "rewards/accuracies": 0.5083333849906921,
190
+ "rewards/chosen": -0.002063233172520995,
191
+ "rewards/margins": 0.0007919662748463452,
192
+ "rewards/rejected": -0.0028551991563290358,
193
+ "step": 110
194
+ },
195
+ {
196
+ "epoch": 0.08,
197
+ "learning_rate": 4.10958904109589e-07,
198
+ "logits/chosen": -1.8895361423492432,
199
+ "logits/rejected": -1.587282419204712,
200
+ "logps/chosen": -93.67366027832031,
201
+ "logps/rejected": -72.37762451171875,
202
+ "loss": 0.6921,
203
+ "rewards/accuracies": 0.491666704416275,
204
+ "rewards/chosen": 0.0028340499848127365,
205
+ "rewards/margins": 0.0019404724007472396,
206
+ "rewards/rejected": 0.000893576827365905,
207
+ "step": 120
208
+ },
209
+ {
210
+ "epoch": 0.09,
211
+ "learning_rate": 4.4520547945205477e-07,
212
+ "logits/chosen": -1.9517349004745483,
213
+ "logits/rejected": -1.663013219833374,
214
+ "logps/chosen": -84.09037780761719,
215
+ "logps/rejected": -72.67787170410156,
216
+ "loss": 0.6928,
217
+ "rewards/accuracies": 0.46666663885116577,
218
+ "rewards/chosen": 0.0004304441681597382,
219
+ "rewards/margins": -0.0008509824983775616,
220
+ "rewards/rejected": 0.001281426870264113,
221
+ "step": 130
222
+ },
223
+ {
224
+ "epoch": 0.1,
225
+ "learning_rate": 4.794520547945205e-07,
226
+ "logits/chosen": -1.947576880455017,
227
+ "logits/rejected": -1.722400426864624,
228
+ "logps/chosen": -89.95055389404297,
229
+ "logps/rejected": -76.27583312988281,
230
+ "loss": 0.6919,
231
+ "rewards/accuracies": 0.625,
232
+ "rewards/chosen": 0.004270606208592653,
233
+ "rewards/margins": 0.009159665554761887,
234
+ "rewards/rejected": -0.004889058880507946,
235
+ "step": 140
236
+ },
237
+ {
238
+ "epoch": 0.1,
239
+ "learning_rate": 4.984779299847793e-07,
240
+ "logits/chosen": -1.9852988719940186,
241
+ "logits/rejected": -1.6872488260269165,
242
+ "logps/chosen": -90.7852783203125,
243
+ "logps/rejected": -73.9708251953125,
244
+ "loss": 0.6915,
245
+ "rewards/accuracies": 0.5249999761581421,
246
+ "rewards/chosen": 0.0020091754850000143,
247
+ "rewards/margins": 0.0034632813185453415,
248
+ "rewards/rejected": -0.0014541053678840399,
249
+ "step": 150
250
+ },
251
+ {
252
+ "epoch": 0.11,
253
+ "learning_rate": 4.946727549467275e-07,
254
+ "logits/chosen": -1.9156850576400757,
255
+ "logits/rejected": -1.5840113162994385,
256
+ "logps/chosen": -97.8337173461914,
257
+ "logps/rejected": -73.20053100585938,
258
+ "loss": 0.6921,
259
+ "rewards/accuracies": 0.5249999761581421,
260
+ "rewards/chosen": 0.0025704619474709034,
261
+ "rewards/margins": 0.002448607701808214,
262
+ "rewards/rejected": 0.00012185415107524022,
263
+ "step": 160
264
+ },
265
+ {
266
+ "epoch": 0.12,
267
+ "learning_rate": 4.908675799086758e-07,
268
+ "logits/chosen": -1.803034782409668,
269
+ "logits/rejected": -1.4945720434188843,
270
+ "logps/chosen": -96.1871337890625,
271
+ "logps/rejected": -72.80181121826172,
272
+ "loss": 0.6921,
273
+ "rewards/accuracies": 0.5333333611488342,
274
+ "rewards/chosen": 0.003732017008587718,
275
+ "rewards/margins": 0.003292496781796217,
276
+ "rewards/rejected": 0.0004395198484417051,
277
+ "step": 170
278
+ },
279
+ {
280
+ "epoch": 0.12,
281
+ "learning_rate": 4.87062404870624e-07,
282
+ "logits/chosen": -1.8866764307022095,
283
+ "logits/rejected": -1.5327541828155518,
284
+ "logps/chosen": -97.74296569824219,
285
+ "logps/rejected": -76.4982681274414,
286
+ "loss": 0.6909,
287
+ "rewards/accuracies": 0.5916666984558105,
288
+ "rewards/chosen": 0.004220059607177973,
289
+ "rewards/margins": 0.006730073597282171,
290
+ "rewards/rejected": -0.002510013757273555,
291
+ "step": 180
292
+ },
293
+ {
294
+ "epoch": 0.13,
295
+ "learning_rate": 4.832572298325722e-07,
296
+ "logits/chosen": -1.945041298866272,
297
+ "logits/rejected": -1.6189870834350586,
298
+ "logps/chosen": -94.37440490722656,
299
+ "logps/rejected": -73.1822738647461,
300
+ "loss": 0.6904,
301
+ "rewards/accuracies": 0.574999988079071,
302
+ "rewards/chosen": 0.004208459984511137,
303
+ "rewards/margins": 0.005117190536111593,
304
+ "rewards/rejected": -0.0009087308426387608,
305
+ "step": 190
306
+ },
307
+ {
308
+ "epoch": 0.14,
309
+ "learning_rate": 4.794520547945205e-07,
310
+ "logits/chosen": -1.9501270055770874,
311
+ "logits/rejected": -1.6498991250991821,
312
+ "logps/chosen": -93.06495666503906,
313
+ "logps/rejected": -72.0920181274414,
314
+ "loss": 0.69,
315
+ "rewards/accuracies": 0.6000000238418579,
316
+ "rewards/chosen": 0.0078277587890625,
317
+ "rewards/margins": 0.010002164170145988,
318
+ "rewards/rejected": -0.0021744065452367067,
319
+ "step": 200
320
+ },
321
+ {
322
+ "epoch": 0.14,
323
+ "eval_logits/chosen": -2.052419900894165,
324
+ "eval_logits/rejected": -1.7859160900115967,
325
+ "eval_logps/chosen": -91.95441436767578,
326
+ "eval_logps/rejected": -72.11531066894531,
327
+ "eval_loss": 0.6901015043258667,
328
+ "eval_rewards/accuracies": 0.5611110925674438,
329
+ "eval_rewards/chosen": 0.0031298992689698935,
330
+ "eval_rewards/margins": 0.004615597892552614,
331
+ "eval_rewards/rejected": -0.0014856986235827208,
332
+ "eval_runtime": 117.9228,
333
+ "eval_samples_per_second": 24.27,
334
+ "eval_steps_per_second": 0.763,
335
+ "step": 200
336
+ },
337
+ {
338
+ "epoch": 0.14,
339
+ "learning_rate": 4.756468797564688e-07,
340
+ "logits/chosen": -1.9345300197601318,
341
+ "logits/rejected": -1.6839030981063843,
342
+ "logps/chosen": -92.93646240234375,
343
+ "logps/rejected": -75.16416931152344,
344
+ "loss": 0.6903,
345
+ "rewards/accuracies": 0.6166666746139526,
346
+ "rewards/chosen": 0.0068414295092225075,
347
+ "rewards/margins": 0.005408720578998327,
348
+ "rewards/rejected": 0.001432707766070962,
349
+ "step": 210
350
+ },
351
+ {
352
+ "epoch": 0.15,
353
+ "learning_rate": 4.71841704718417e-07,
354
+ "logits/chosen": -1.9018728733062744,
355
+ "logits/rejected": -1.599656343460083,
356
+ "logps/chosen": -94.06185913085938,
357
+ "logps/rejected": -72.92243957519531,
358
+ "loss": 0.6888,
359
+ "rewards/accuracies": 0.6000000238418579,
360
+ "rewards/chosen": 0.00332554685883224,
361
+ "rewards/margins": 0.007436770014464855,
362
+ "rewards/rejected": -0.004111223388463259,
363
+ "step": 220
364
+ },
365
+ {
366
+ "epoch": 0.16,
367
+ "learning_rate": 4.680365296803653e-07,
368
+ "logits/chosen": -1.879314661026001,
369
+ "logits/rejected": -1.5864207744598389,
370
+ "logps/chosen": -93.1510238647461,
371
+ "logps/rejected": -74.74366760253906,
372
+ "loss": 0.6898,
373
+ "rewards/accuracies": 0.6000000238418579,
374
+ "rewards/chosen": 0.007647272199392319,
375
+ "rewards/margins": 0.00648108497262001,
376
+ "rewards/rejected": 0.0011661878088489175,
377
+ "step": 230
378
+ },
379
+ {
380
+ "epoch": 0.16,
381
+ "learning_rate": 4.642313546423135e-07,
382
+ "logits/chosen": -1.975441336631775,
383
+ "logits/rejected": -1.7306878566741943,
384
+ "logps/chosen": -86.38069915771484,
385
+ "logps/rejected": -75.68878173828125,
386
+ "loss": 0.6884,
387
+ "rewards/accuracies": 0.5749999284744263,
388
+ "rewards/chosen": 0.007093862630426884,
389
+ "rewards/margins": 0.008020764216780663,
390
+ "rewards/rejected": -0.0009269017027691007,
391
+ "step": 240
392
+ },
393
+ {
394
+ "epoch": 0.17,
395
+ "learning_rate": 4.604261796042618e-07,
396
+ "logits/chosen": -1.843636155128479,
397
+ "logits/rejected": -1.5651946067810059,
398
+ "logps/chosen": -92.92024230957031,
399
+ "logps/rejected": -75.78328704833984,
400
+ "loss": 0.6864,
401
+ "rewards/accuracies": 0.6333333849906921,
402
+ "rewards/chosen": 0.007120449095964432,
403
+ "rewards/margins": 0.011716886423528194,
404
+ "rewards/rejected": -0.004596438258886337,
405
+ "step": 250
406
+ },
407
+ {
408
+ "epoch": 0.18,
409
+ "learning_rate": 4.5662100456621e-07,
410
+ "logits/chosen": -1.8188960552215576,
411
+ "logits/rejected": -1.5440260171890259,
412
+ "logps/chosen": -91.01820373535156,
413
+ "logps/rejected": -72.49656677246094,
414
+ "loss": 0.687,
415
+ "rewards/accuracies": 0.6166666150093079,
416
+ "rewards/chosen": 0.007922597229480743,
417
+ "rewards/margins": 0.010534586384892464,
418
+ "rewards/rejected": -0.002611987991258502,
419
+ "step": 260
420
+ },
421
+ {
422
+ "epoch": 0.18,
423
+ "learning_rate": 4.528158295281583e-07,
424
+ "logits/chosen": -1.931947946548462,
425
+ "logits/rejected": -1.6909589767456055,
426
+ "logps/chosen": -88.5571517944336,
427
+ "logps/rejected": -72.00991821289062,
428
+ "loss": 0.6866,
429
+ "rewards/accuracies": 0.5999999642372131,
430
+ "rewards/chosen": 0.005753463599830866,
431
+ "rewards/margins": 0.010802066884934902,
432
+ "rewards/rejected": -0.005048603750765324,
433
+ "step": 270
434
+ },
435
+ {
436
+ "epoch": 0.19,
437
+ "learning_rate": 4.490106544901065e-07,
438
+ "logits/chosen": -1.9274402856826782,
439
+ "logits/rejected": -1.6220242977142334,
440
+ "logps/chosen": -91.9419174194336,
441
+ "logps/rejected": -77.3694076538086,
442
+ "loss": 0.6851,
443
+ "rewards/accuracies": 0.7083333134651184,
444
+ "rewards/chosen": 0.014447471126914024,
445
+ "rewards/margins": 0.018269026651978493,
446
+ "rewards/rejected": -0.0038215541280806065,
447
+ "step": 280
448
+ },
449
+ {
450
+ "epoch": 0.2,
451
+ "learning_rate": 4.4520547945205477e-07,
452
+ "logits/chosen": -1.8854271173477173,
453
+ "logits/rejected": -1.6241188049316406,
454
+ "logps/chosen": -92.77326965332031,
455
+ "logps/rejected": -77.81119537353516,
456
+ "loss": 0.6838,
457
+ "rewards/accuracies": 0.64166659116745,
458
+ "rewards/chosen": 0.013706192374229431,
459
+ "rewards/margins": 0.015467122197151184,
460
+ "rewards/rejected": -0.0017609309870749712,
461
+ "step": 290
462
+ },
463
+ {
464
+ "epoch": 0.21,
465
+ "learning_rate": 4.41400304414003e-07,
466
+ "logits/chosen": -1.8017442226409912,
467
+ "logits/rejected": -1.5209267139434814,
468
+ "logps/chosen": -90.86246490478516,
469
+ "logps/rejected": -72.70795440673828,
470
+ "loss": 0.6842,
471
+ "rewards/accuracies": 0.699999988079071,
472
+ "rewards/chosen": 0.013066952116787434,
473
+ "rewards/margins": 0.018115142360329628,
474
+ "rewards/rejected": -0.00504818931221962,
475
+ "step": 300
476
+ },
477
+ {
478
+ "epoch": 0.21,
479
+ "eval_logits/chosen": -2.051283121109009,
480
+ "eval_logits/rejected": -1.7846735715866089,
481
+ "eval_logps/chosen": -91.84673309326172,
482
+ "eval_logps/rejected": -72.15672302246094,
483
+ "eval_loss": 0.6832027435302734,
484
+ "eval_rewards/accuracies": 0.6916666626930237,
485
+ "eval_rewards/chosen": 0.013898174278438091,
486
+ "eval_rewards/margins": 0.01952529139816761,
487
+ "eval_rewards/rejected": -0.005627114325761795,
488
+ "eval_runtime": 116.9802,
489
+ "eval_samples_per_second": 24.466,
490
+ "eval_steps_per_second": 0.769,
491
+ "step": 300
492
+ },
493
+ {
494
+ "epoch": 0.21,
495
+ "learning_rate": 4.375951293759513e-07,
496
+ "logits/chosen": -1.8816429376602173,
497
+ "logits/rejected": -1.5877922773361206,
498
+ "logps/chosen": -92.76510620117188,
499
+ "logps/rejected": -77.3439712524414,
500
+ "loss": 0.6827,
501
+ "rewards/accuracies": 0.7166666388511658,
502
+ "rewards/chosen": 0.016739103943109512,
503
+ "rewards/margins": 0.020335419103503227,
504
+ "rewards/rejected": -0.003596315626055002,
505
+ "step": 310
506
+ },
507
+ {
508
+ "epoch": 0.22,
509
+ "learning_rate": 4.337899543378995e-07,
510
+ "logits/chosen": -1.8938591480255127,
511
+ "logits/rejected": -1.6317039728164673,
512
+ "logps/chosen": -93.17677307128906,
513
+ "logps/rejected": -74.86593627929688,
514
+ "loss": 0.6818,
515
+ "rewards/accuracies": 0.699999988079071,
516
+ "rewards/chosen": 0.01637251302599907,
517
+ "rewards/margins": 0.019245153293013573,
518
+ "rewards/rejected": -0.0028726388700306416,
519
+ "step": 320
520
+ },
521
+ {
522
+ "epoch": 0.23,
523
+ "learning_rate": 4.2998477929984777e-07,
524
+ "logits/chosen": -1.8730627298355103,
525
+ "logits/rejected": -1.6184980869293213,
526
+ "logps/chosen": -91.06995391845703,
527
+ "logps/rejected": -72.5287857055664,
528
+ "loss": 0.6808,
529
+ "rewards/accuracies": 0.7666667103767395,
530
+ "rewards/chosen": 0.01837759278714657,
531
+ "rewards/margins": 0.02310130000114441,
532
+ "rewards/rejected": -0.004723704420030117,
533
+ "step": 330
534
+ },
535
+ {
536
+ "epoch": 0.23,
537
+ "learning_rate": 4.26179604261796e-07,
538
+ "logits/chosen": -1.8828544616699219,
539
+ "logits/rejected": -1.576556921005249,
540
+ "logps/chosen": -92.8334732055664,
541
+ "logps/rejected": -74.53410339355469,
542
+ "loss": 0.6788,
543
+ "rewards/accuracies": 0.8166666030883789,
544
+ "rewards/chosen": 0.023710301145911217,
545
+ "rewards/margins": 0.03184016793966293,
546
+ "rewards/rejected": -0.008129866793751717,
547
+ "step": 340
548
+ },
549
+ {
550
+ "epoch": 0.24,
551
+ "learning_rate": 4.223744292237443e-07,
552
+ "logits/chosen": -1.9196786880493164,
553
+ "logits/rejected": -1.625704050064087,
554
+ "logps/chosen": -90.74723815917969,
555
+ "logps/rejected": -76.68948364257812,
556
+ "loss": 0.6777,
557
+ "rewards/accuracies": 0.8083333969116211,
558
+ "rewards/chosen": 0.026256781071424484,
559
+ "rewards/margins": 0.03269972652196884,
560
+ "rewards/rejected": -0.00644295010715723,
561
+ "step": 350
562
+ },
563
+ {
564
+ "epoch": 0.25,
565
+ "learning_rate": 4.185692541856925e-07,
566
+ "logits/chosen": -1.981180191040039,
567
+ "logits/rejected": -1.6961545944213867,
568
+ "logps/chosen": -93.02009582519531,
569
+ "logps/rejected": -76.1031723022461,
570
+ "loss": 0.6762,
571
+ "rewards/accuracies": 0.8500000238418579,
572
+ "rewards/chosen": 0.03220153599977493,
573
+ "rewards/margins": 0.03597740828990936,
574
+ "rewards/rejected": -0.0037758699618279934,
575
+ "step": 360
576
+ },
577
+ {
578
+ "epoch": 0.25,
579
+ "learning_rate": 4.1476407914764077e-07,
580
+ "logits/chosen": -1.8603204488754272,
581
+ "logits/rejected": -1.5979934930801392,
582
+ "logps/chosen": -94.58716583251953,
583
+ "logps/rejected": -75.44288635253906,
584
+ "loss": 0.6759,
585
+ "rewards/accuracies": 0.841666579246521,
586
+ "rewards/chosen": 0.021503183990716934,
587
+ "rewards/margins": 0.03375691920518875,
588
+ "rewards/rejected": -0.012253734283149242,
589
+ "step": 370
590
+ },
591
+ {
592
+ "epoch": 0.26,
593
+ "learning_rate": 4.10958904109589e-07,
594
+ "logits/chosen": -1.9121630191802979,
595
+ "logits/rejected": -1.613445520401001,
596
+ "logps/chosen": -93.34215545654297,
597
+ "logps/rejected": -74.89527893066406,
598
+ "loss": 0.6739,
599
+ "rewards/accuracies": 0.824999988079071,
600
+ "rewards/chosen": 0.02675667405128479,
601
+ "rewards/margins": 0.03688011318445206,
602
+ "rewards/rejected": -0.010123440995812416,
603
+ "step": 380
604
+ },
605
+ {
606
+ "epoch": 0.27,
607
+ "learning_rate": 4.071537290715373e-07,
608
+ "logits/chosen": -1.9191780090332031,
609
+ "logits/rejected": -1.532434105873108,
610
+ "logps/chosen": -94.9774169921875,
611
+ "logps/rejected": -73.49574279785156,
612
+ "loss": 0.6744,
613
+ "rewards/accuracies": 0.8666666150093079,
614
+ "rewards/chosen": 0.03616553544998169,
615
+ "rewards/margins": 0.048189498484134674,
616
+ "rewards/rejected": -0.012023964896798134,
617
+ "step": 390
618
+ },
619
+ {
620
+ "epoch": 0.27,
621
+ "learning_rate": 4.033485540334855e-07,
622
+ "logits/chosen": -1.8837106227874756,
623
+ "logits/rejected": -1.6179090738296509,
624
+ "logps/chosen": -93.18563079833984,
625
+ "logps/rejected": -75.48190307617188,
626
+ "loss": 0.672,
627
+ "rewards/accuracies": 0.8333333730697632,
628
+ "rewards/chosen": 0.03295673802495003,
629
+ "rewards/margins": 0.044657547026872635,
630
+ "rewards/rejected": -0.011700802482664585,
631
+ "step": 400
632
+ },
633
+ {
634
+ "epoch": 0.27,
635
+ "eval_logits/chosen": -2.0504438877105713,
636
+ "eval_logits/rejected": -1.78355073928833,
637
+ "eval_logps/chosen": -91.70490264892578,
638
+ "eval_logps/rejected": -72.23121643066406,
639
+ "eval_loss": 0.671801745891571,
640
+ "eval_rewards/accuracies": 0.824999988079071,
641
+ "eval_rewards/chosen": 0.028079798445105553,
642
+ "eval_rewards/margins": 0.041155941784381866,
643
+ "eval_rewards/rejected": -0.013076143339276314,
644
+ "eval_runtime": 122.6508,
645
+ "eval_samples_per_second": 23.335,
646
+ "eval_steps_per_second": 0.734,
647
+ "step": 400
648
+ },
649
+ {
650
+ "epoch": 0.28,
651
+ "learning_rate": 3.9954337899543377e-07,
652
+ "logits/chosen": -1.9980239868164062,
653
+ "logits/rejected": -1.710172414779663,
654
+ "logps/chosen": -90.13986206054688,
655
+ "logps/rejected": -74.13956451416016,
656
+ "loss": 0.6726,
657
+ "rewards/accuracies": 0.7833333015441895,
658
+ "rewards/chosen": 0.026057172566652298,
659
+ "rewards/margins": 0.03604119271039963,
660
+ "rewards/rejected": -0.00998402014374733,
661
+ "step": 410
662
+ },
663
+ {
664
+ "epoch": 0.29,
665
+ "learning_rate": 3.95738203957382e-07,
666
+ "logits/chosen": -1.9292805194854736,
667
+ "logits/rejected": -1.6599111557006836,
668
+ "logps/chosen": -89.1054458618164,
669
+ "logps/rejected": -76.2321548461914,
670
+ "loss": 0.67,
671
+ "rewards/accuracies": 0.875,
672
+ "rewards/chosen": 0.03600526601076126,
673
+ "rewards/margins": 0.04854360967874527,
674
+ "rewards/rejected": -0.012538343667984009,
675
+ "step": 420
676
+ },
677
+ {
678
+ "epoch": 0.29,
679
+ "learning_rate": 3.919330289193303e-07,
680
+ "logits/chosen": -1.8889102935791016,
681
+ "logits/rejected": -1.5878620147705078,
682
+ "logps/chosen": -94.20903015136719,
683
+ "logps/rejected": -75.64437103271484,
684
+ "loss": 0.6677,
685
+ "rewards/accuracies": 0.8833333253860474,
686
+ "rewards/chosen": 0.04072072356939316,
687
+ "rewards/margins": 0.05353847146034241,
688
+ "rewards/rejected": -0.012817745096981525,
689
+ "step": 430
690
+ },
691
+ {
692
+ "epoch": 0.3,
693
+ "learning_rate": 3.881278538812785e-07,
694
+ "logits/chosen": -1.889439344406128,
695
+ "logits/rejected": -1.5668418407440186,
696
+ "logps/chosen": -92.68305206298828,
697
+ "logps/rejected": -71.53545379638672,
698
+ "loss": 0.6653,
699
+ "rewards/accuracies": 0.9083333015441895,
700
+ "rewards/chosen": 0.042834434658288956,
701
+ "rewards/margins": 0.05957134813070297,
702
+ "rewards/rejected": -0.016736917197704315,
703
+ "step": 440
704
+ },
705
+ {
706
+ "epoch": 0.31,
707
+ "learning_rate": 3.8432267884322677e-07,
708
+ "logits/chosen": -1.8060448169708252,
709
+ "logits/rejected": -1.5509716272354126,
710
+ "logps/chosen": -88.83539581298828,
711
+ "logps/rejected": -71.70087432861328,
712
+ "loss": 0.6645,
713
+ "rewards/accuracies": 0.8416666984558105,
714
+ "rewards/chosen": 0.04349132627248764,
715
+ "rewards/margins": 0.05466403439640999,
716
+ "rewards/rejected": -0.011172705329954624,
717
+ "step": 450
718
+ },
719
+ {
720
+ "epoch": 0.31,
721
+ "learning_rate": 3.80517503805175e-07,
722
+ "logits/chosen": -1.8319886922836304,
723
+ "logits/rejected": -1.5673277378082275,
724
+ "logps/chosen": -88.6297378540039,
725
+ "logps/rejected": -74.78864288330078,
726
+ "loss": 0.6631,
727
+ "rewards/accuracies": 0.9083333015441895,
728
+ "rewards/chosen": 0.044696420431137085,
729
+ "rewards/margins": 0.0604400709271431,
730
+ "rewards/rejected": -0.015743646770715714,
731
+ "step": 460
732
+ },
733
+ {
734
+ "epoch": 0.32,
735
+ "learning_rate": 3.767123287671233e-07,
736
+ "logits/chosen": -1.9542433023452759,
737
+ "logits/rejected": -1.641196608543396,
738
+ "logps/chosen": -99.97395324707031,
739
+ "logps/rejected": -78.014404296875,
740
+ "loss": 0.6626,
741
+ "rewards/accuracies": 0.9166666865348816,
742
+ "rewards/chosen": 0.047287195920944214,
743
+ "rewards/margins": 0.06728993356227875,
744
+ "rewards/rejected": -0.020002741366624832,
745
+ "step": 470
746
+ },
747
+ {
748
+ "epoch": 0.33,
749
+ "learning_rate": 3.729071537290715e-07,
750
+ "logits/chosen": -1.916868805885315,
751
+ "logits/rejected": -1.6233441829681396,
752
+ "logps/chosen": -97.3001937866211,
753
+ "logps/rejected": -76.58578491210938,
754
+ "loss": 0.6619,
755
+ "rewards/accuracies": 0.9083333015441895,
756
+ "rewards/chosen": 0.05124374479055405,
757
+ "rewards/margins": 0.07088983058929443,
758
+ "rewards/rejected": -0.019646091386675835,
759
+ "step": 480
760
+ },
761
+ {
762
+ "epoch": 0.34,
763
+ "learning_rate": 3.6910197869101977e-07,
764
+ "logits/chosen": -1.9672422409057617,
765
+ "logits/rejected": -1.700531244277954,
766
+ "logps/chosen": -89.29413604736328,
767
+ "logps/rejected": -72.65511322021484,
768
+ "loss": 0.66,
769
+ "rewards/accuracies": 0.9166666269302368,
770
+ "rewards/chosen": 0.050498634576797485,
771
+ "rewards/margins": 0.06749961525201797,
772
+ "rewards/rejected": -0.01700098067522049,
773
+ "step": 490
774
+ },
775
+ {
776
+ "epoch": 0.34,
777
+ "learning_rate": 3.65296803652968e-07,
778
+ "logits/chosen": -1.833929419517517,
779
+ "logits/rejected": -1.5389481782913208,
780
+ "logps/chosen": -85.8098373413086,
781
+ "logps/rejected": -72.5254135131836,
782
+ "loss": 0.6563,
783
+ "rewards/accuracies": 0.8999999761581421,
784
+ "rewards/chosen": 0.05660303682088852,
785
+ "rewards/margins": 0.0805855318903923,
786
+ "rewards/rejected": -0.023982489481568336,
787
+ "step": 500
788
+ },
789
+ {
790
+ "epoch": 0.34,
791
+ "eval_logits/chosen": -2.0494003295898438,
792
+ "eval_logits/rejected": -1.7820732593536377,
793
+ "eval_logps/chosen": -91.48755645751953,
794
+ "eval_logps/rejected": -72.31159973144531,
795
+ "eval_loss": 0.6574758887290955,
796
+ "eval_rewards/accuracies": 0.8861111402511597,
797
+ "eval_rewards/chosen": 0.049815867096185684,
798
+ "eval_rewards/margins": 0.0709303691983223,
799
+ "eval_rewards/rejected": -0.021114489063620567,
800
+ "eval_runtime": 126.0719,
801
+ "eval_samples_per_second": 22.701,
802
+ "eval_steps_per_second": 0.714,
803
+ "step": 500
804
+ },
805
+ {
806
+ "epoch": 0.35,
807
+ "learning_rate": 3.614916286149163e-07,
808
+ "logits/chosen": -1.913190484046936,
809
+ "logits/rejected": -1.6543632745742798,
810
+ "logps/chosen": -89.34779357910156,
811
+ "logps/rejected": -71.7219009399414,
812
+ "loss": 0.6567,
813
+ "rewards/accuracies": 0.8916667103767395,
814
+ "rewards/chosen": 0.05136920139193535,
815
+ "rewards/margins": 0.07187855988740921,
816
+ "rewards/rejected": -0.02050935849547386,
817
+ "step": 510
818
+ },
819
+ {
820
+ "epoch": 0.36,
821
+ "learning_rate": 3.576864535768645e-07,
822
+ "logits/chosen": -1.8918750286102295,
823
+ "logits/rejected": -1.621498465538025,
824
+ "logps/chosen": -92.99454498291016,
825
+ "logps/rejected": -72.34910583496094,
826
+ "loss": 0.6569,
827
+ "rewards/accuracies": 0.841666579246521,
828
+ "rewards/chosen": 0.05106909200549126,
829
+ "rewards/margins": 0.0683208703994751,
830
+ "rewards/rejected": -0.01725177839398384,
831
+ "step": 520
832
+ },
833
+ {
834
+ "epoch": 0.36,
835
+ "learning_rate": 3.5388127853881277e-07,
836
+ "logits/chosen": -1.8910505771636963,
837
+ "logits/rejected": -1.6094516515731812,
838
+ "logps/chosen": -92.9715347290039,
839
+ "logps/rejected": -75.84952545166016,
840
+ "loss": 0.6559,
841
+ "rewards/accuracies": 0.8500000238418579,
842
+ "rewards/chosen": 0.05335830897092819,
843
+ "rewards/margins": 0.07476408034563065,
844
+ "rewards/rejected": -0.021405773237347603,
845
+ "step": 530
846
+ },
847
+ {
848
+ "epoch": 0.37,
849
+ "learning_rate": 3.50076103500761e-07,
850
+ "logits/chosen": -1.8273894786834717,
851
+ "logits/rejected": -1.5480557680130005,
852
+ "logps/chosen": -93.09770202636719,
853
+ "logps/rejected": -77.72835540771484,
854
+ "loss": 0.6531,
855
+ "rewards/accuracies": 0.8833333253860474,
856
+ "rewards/chosen": 0.06002092361450195,
857
+ "rewards/margins": 0.08498416841030121,
858
+ "rewards/rejected": -0.02496323548257351,
859
+ "step": 540
860
+ },
861
+ {
862
+ "epoch": 0.38,
863
+ "learning_rate": 3.462709284627093e-07,
864
+ "logits/chosen": -1.7952553033828735,
865
+ "logits/rejected": -1.5454210042953491,
866
+ "logps/chosen": -92.24395751953125,
867
+ "logps/rejected": -74.6566390991211,
868
+ "loss": 0.6514,
869
+ "rewards/accuracies": 0.8666666746139526,
870
+ "rewards/chosen": 0.06232692673802376,
871
+ "rewards/margins": 0.08038316667079926,
872
+ "rewards/rejected": -0.018056249246001244,
873
+ "step": 550
874
+ },
875
+ {
876
+ "epoch": 0.38,
877
+ "learning_rate": 3.424657534246575e-07,
878
+ "logits/chosen": -1.8805965185165405,
879
+ "logits/rejected": -1.586458444595337,
880
+ "logps/chosen": -96.27812194824219,
881
+ "logps/rejected": -75.61478424072266,
882
+ "loss": 0.6487,
883
+ "rewards/accuracies": 0.9083333015441895,
884
+ "rewards/chosen": 0.06480798870325089,
885
+ "rewards/margins": 0.09341150522232056,
886
+ "rewards/rejected": -0.02860351838171482,
887
+ "step": 560
888
+ },
889
+ {
890
+ "epoch": 0.39,
891
+ "learning_rate": 3.3866057838660576e-07,
892
+ "logits/chosen": -1.8029924631118774,
893
+ "logits/rejected": -1.5792688131332397,
894
+ "logps/chosen": -87.8210678100586,
895
+ "logps/rejected": -74.4839096069336,
896
+ "loss": 0.651,
897
+ "rewards/accuracies": 0.8749998807907104,
898
+ "rewards/chosen": 0.05512385442852974,
899
+ "rewards/margins": 0.07892224937677383,
900
+ "rewards/rejected": -0.023798387497663498,
901
+ "step": 570
902
+ },
903
+ {
904
+ "epoch": 0.4,
905
+ "learning_rate": 3.34855403348554e-07,
906
+ "logits/chosen": -1.913487434387207,
907
+ "logits/rejected": -1.612151861190796,
908
+ "logps/chosen": -94.51315307617188,
909
+ "logps/rejected": -74.9591293334961,
910
+ "loss": 0.6465,
911
+ "rewards/accuracies": 0.9583333134651184,
912
+ "rewards/chosen": 0.0840396136045456,
913
+ "rewards/margins": 0.107123002409935,
914
+ "rewards/rejected": -0.023083383217453957,
915
+ "step": 580
916
+ },
917
+ {
918
+ "epoch": 0.4,
919
+ "learning_rate": 3.310502283105023e-07,
920
+ "logits/chosen": -1.9018150568008423,
921
+ "logits/rejected": -1.6232668161392212,
922
+ "logps/chosen": -93.73323822021484,
923
+ "logps/rejected": -75.38792419433594,
924
+ "loss": 0.6435,
925
+ "rewards/accuracies": 0.8666666150093079,
926
+ "rewards/chosen": 0.07307516038417816,
927
+ "rewards/margins": 0.10251389443874359,
928
+ "rewards/rejected": -0.029438745230436325,
929
+ "step": 590
930
+ },
931
+ {
932
+ "epoch": 0.41,
933
+ "learning_rate": 3.272450532724505e-07,
934
+ "logits/chosen": -1.8873398303985596,
935
+ "logits/rejected": -1.6442911624908447,
936
+ "logps/chosen": -90.5989761352539,
937
+ "logps/rejected": -77.07279968261719,
938
+ "loss": 0.6437,
939
+ "rewards/accuracies": 0.966666579246521,
940
+ "rewards/chosen": 0.08184785395860672,
941
+ "rewards/margins": 0.11672016233205795,
942
+ "rewards/rejected": -0.034872300922870636,
943
+ "step": 600
944
+ },
945
+ {
946
+ "epoch": 0.41,
947
+ "eval_logits/chosen": -2.048579692840576,
948
+ "eval_logits/rejected": -1.780737280845642,
949
+ "eval_logps/chosen": -91.28104400634766,
950
+ "eval_logps/rejected": -72.44007110595703,
951
+ "eval_loss": 0.6415905356407166,
952
+ "eval_rewards/accuracies": 0.9111111164093018,
953
+ "eval_rewards/chosen": 0.07046664506196976,
954
+ "eval_rewards/margins": 0.10442798584699631,
955
+ "eval_rewards/rejected": -0.03396133333444595,
956
+ "eval_runtime": 117.9826,
957
+ "eval_samples_per_second": 24.258,
958
+ "eval_steps_per_second": 0.763,
959
+ "step": 600
960
+ },
961
+ {
962
+ "epoch": 0.42,
963
+ "learning_rate": 3.2343987823439876e-07,
964
+ "logits/chosen": -1.9880199432373047,
965
+ "logits/rejected": -1.676412582397461,
966
+ "logps/chosen": -91.00142669677734,
967
+ "logps/rejected": -73.52288818359375,
968
+ "loss": 0.6401,
969
+ "rewards/accuracies": 0.9333332777023315,
970
+ "rewards/chosen": 0.07217723876237869,
971
+ "rewards/margins": 0.1089334487915039,
972
+ "rewards/rejected": -0.036756210029125214,
973
+ "step": 610
974
+ },
975
+ {
976
+ "epoch": 0.42,
977
+ "learning_rate": 3.19634703196347e-07,
978
+ "logits/chosen": -1.8773486614227295,
979
+ "logits/rejected": -1.573994755744934,
980
+ "logps/chosen": -89.79063415527344,
981
+ "logps/rejected": -73.29415130615234,
982
+ "loss": 0.6391,
983
+ "rewards/accuracies": 0.9750000238418579,
984
+ "rewards/chosen": 0.06451607495546341,
985
+ "rewards/margins": 0.09732060134410858,
986
+ "rewards/rejected": -0.032804541289806366,
987
+ "step": 620
988
+ },
989
+ {
990
+ "epoch": 0.43,
991
+ "learning_rate": 3.158295281582953e-07,
992
+ "logits/chosen": -1.8495439291000366,
993
+ "logits/rejected": -1.5227216482162476,
994
+ "logps/chosen": -93.12200164794922,
995
+ "logps/rejected": -74.12937927246094,
996
+ "loss": 0.6371,
997
+ "rewards/accuracies": 0.9333333969116211,
998
+ "rewards/chosen": 0.08449102938175201,
999
+ "rewards/margins": 0.11922381818294525,
1000
+ "rewards/rejected": -0.034732796251773834,
1001
+ "step": 630
1002
+ },
1003
+ {
1004
+ "epoch": 0.44,
1005
+ "learning_rate": 3.120243531202435e-07,
1006
+ "logits/chosen": -1.93155837059021,
1007
+ "logits/rejected": -1.6498839855194092,
1008
+ "logps/chosen": -94.86695861816406,
1009
+ "logps/rejected": -76.8067626953125,
1010
+ "loss": 0.6354,
1011
+ "rewards/accuracies": 0.8833333253860474,
1012
+ "rewards/chosen": 0.08870759606361389,
1013
+ "rewards/margins": 0.12238001823425293,
1014
+ "rewards/rejected": -0.03367242217063904,
1015
+ "step": 640
1016
+ },
1017
+ {
1018
+ "epoch": 0.45,
1019
+ "learning_rate": 3.0821917808219176e-07,
1020
+ "logits/chosen": -1.8748143911361694,
1021
+ "logits/rejected": -1.6098239421844482,
1022
+ "logps/chosen": -91.36283874511719,
1023
+ "logps/rejected": -73.6336669921875,
1024
+ "loss": 0.6328,
1025
+ "rewards/accuracies": 0.9083333015441895,
1026
+ "rewards/chosen": 0.0779472142457962,
1027
+ "rewards/margins": 0.1164277195930481,
1028
+ "rewards/rejected": -0.038480497896671295,
1029
+ "step": 650
1030
+ },
1031
+ {
1032
+ "epoch": 0.45,
1033
+ "learning_rate": 3.0441400304414e-07,
1034
+ "logits/chosen": -1.8293393850326538,
1035
+ "logits/rejected": -1.4808156490325928,
1036
+ "logps/chosen": -94.16930389404297,
1037
+ "logps/rejected": -72.66043853759766,
1038
+ "loss": 0.6341,
1039
+ "rewards/accuracies": 0.9166666269302368,
1040
+ "rewards/chosen": 0.09546177834272385,
1041
+ "rewards/margins": 0.1290528029203415,
1042
+ "rewards/rejected": -0.033591024577617645,
1043
+ "step": 660
1044
+ },
1045
+ {
1046
+ "epoch": 0.46,
1047
+ "learning_rate": 3.006088280060883e-07,
1048
+ "logits/chosen": -1.8831846714019775,
1049
+ "logits/rejected": -1.577532172203064,
1050
+ "logps/chosen": -95.71855926513672,
1051
+ "logps/rejected": -74.64360046386719,
1052
+ "loss": 0.6305,
1053
+ "rewards/accuracies": 0.925000011920929,
1054
+ "rewards/chosen": 0.09222020953893661,
1055
+ "rewards/margins": 0.1298314929008484,
1056
+ "rewards/rejected": -0.037611283361911774,
1057
+ "step": 670
1058
+ },
1059
+ {
1060
+ "epoch": 0.47,
1061
+ "learning_rate": 2.968036529680365e-07,
1062
+ "logits/chosen": -1.884874701499939,
1063
+ "logits/rejected": -1.56434965133667,
1064
+ "logps/chosen": -91.71931457519531,
1065
+ "logps/rejected": -74.07929992675781,
1066
+ "loss": 0.6298,
1067
+ "rewards/accuracies": 0.949999988079071,
1068
+ "rewards/chosen": 0.09030507504940033,
1069
+ "rewards/margins": 0.13949953019618988,
1070
+ "rewards/rejected": -0.04919447377324104,
1071
+ "step": 680
1072
+ },
1073
+ {
1074
+ "epoch": 0.47,
1075
+ "learning_rate": 2.9299847792998476e-07,
1076
+ "logits/chosen": -1.939512848854065,
1077
+ "logits/rejected": -1.6418523788452148,
1078
+ "logps/chosen": -88.81291198730469,
1079
+ "logps/rejected": -71.72727966308594,
1080
+ "loss": 0.6255,
1081
+ "rewards/accuracies": 0.925000011920929,
1082
+ "rewards/chosen": 0.1102442592382431,
1083
+ "rewards/margins": 0.14661459624767303,
1084
+ "rewards/rejected": -0.036370351910591125,
1085
+ "step": 690
1086
+ },
1087
+ {
1088
+ "epoch": 0.48,
1089
+ "learning_rate": 2.89193302891933e-07,
1090
+ "logits/chosen": -1.8088592290878296,
1091
+ "logits/rejected": -1.554652452468872,
1092
+ "logps/chosen": -88.36043548583984,
1093
+ "logps/rejected": -74.16236877441406,
1094
+ "loss": 0.6261,
1095
+ "rewards/accuracies": 0.9416666030883789,
1096
+ "rewards/chosen": 0.09904900938272476,
1097
+ "rewards/margins": 0.13188037276268005,
1098
+ "rewards/rejected": -0.032831382006406784,
1099
+ "step": 700
1100
+ },
1101
+ {
1102
+ "epoch": 0.48,
1103
+ "eval_logits/chosen": -2.0478439331054688,
1104
+ "eval_logits/rejected": -1.7795759439468384,
1105
+ "eval_logps/chosen": -91.1009750366211,
1106
+ "eval_logps/rejected": -72.53546905517578,
1107
+ "eval_loss": 0.6277271509170532,
1108
+ "eval_rewards/accuracies": 0.925000011920929,
1109
+ "eval_rewards/chosen": 0.08847405016422272,
1110
+ "eval_rewards/margins": 0.13197554647922516,
1111
+ "eval_rewards/rejected": -0.04350150376558304,
1112
+ "eval_runtime": 117.9463,
1113
+ "eval_samples_per_second": 24.265,
1114
+ "eval_steps_per_second": 0.763,
1115
+ "step": 700
1116
+ },
1117
+ {
1118
+ "epoch": 0.49,
1119
+ "learning_rate": 2.853881278538813e-07,
1120
+ "logits/chosen": -1.8632869720458984,
1121
+ "logits/rejected": -1.598181962966919,
1122
+ "logps/chosen": -89.87626647949219,
1123
+ "logps/rejected": -74.79205322265625,
1124
+ "loss": 0.6285,
1125
+ "rewards/accuracies": 0.949999988079071,
1126
+ "rewards/chosen": 0.09145402163267136,
1127
+ "rewards/margins": 0.13780589401721954,
1128
+ "rewards/rejected": -0.04635186120867729,
1129
+ "step": 710
1130
+ },
1131
+ {
1132
+ "epoch": 0.49,
1133
+ "learning_rate": 2.815829528158295e-07,
1134
+ "logits/chosen": -1.8058583736419678,
1135
+ "logits/rejected": -1.5232179164886475,
1136
+ "logps/chosen": -87.31756591796875,
1137
+ "logps/rejected": -72.79600524902344,
1138
+ "loss": 0.6247,
1139
+ "rewards/accuracies": 0.949999988079071,
1140
+ "rewards/chosen": 0.09649817645549774,
1141
+ "rewards/margins": 0.1376573145389557,
1142
+ "rewards/rejected": -0.041159145534038544,
1143
+ "step": 720
1144
+ },
1145
+ {
1146
+ "epoch": 0.5,
1147
+ "learning_rate": 2.7777777777777776e-07,
1148
+ "logits/chosen": -1.8448421955108643,
1149
+ "logits/rejected": -1.6317142248153687,
1150
+ "logps/chosen": -87.55412292480469,
1151
+ "logps/rejected": -75.91432189941406,
1152
+ "loss": 0.6245,
1153
+ "rewards/accuracies": 0.925000011920929,
1154
+ "rewards/chosen": 0.10163182020187378,
1155
+ "rewards/margins": 0.14317932724952698,
1156
+ "rewards/rejected": -0.0415474958717823,
1157
+ "step": 730
1158
+ },
1159
+ {
1160
+ "epoch": 0.51,
1161
+ "learning_rate": 2.73972602739726e-07,
1162
+ "logits/chosen": -1.800663709640503,
1163
+ "logits/rejected": -1.5165865421295166,
1164
+ "logps/chosen": -92.58036041259766,
1165
+ "logps/rejected": -73.22102355957031,
1166
+ "loss": 0.6216,
1167
+ "rewards/accuracies": 0.908333420753479,
1168
+ "rewards/chosen": 0.09606580436229706,
1169
+ "rewards/margins": 0.14485225081443787,
1170
+ "rewards/rejected": -0.04878643900156021,
1171
+ "step": 740
1172
+ },
1173
+ {
1174
+ "epoch": 0.51,
1175
+ "learning_rate": 2.701674277016743e-07,
1176
+ "logits/chosen": -1.884385347366333,
1177
+ "logits/rejected": -1.6005618572235107,
1178
+ "logps/chosen": -94.69025421142578,
1179
+ "logps/rejected": -74.93373107910156,
1180
+ "loss": 0.6225,
1181
+ "rewards/accuracies": 0.949999988079071,
1182
+ "rewards/chosen": 0.11928985267877579,
1183
+ "rewards/margins": 0.16091035306453705,
1184
+ "rewards/rejected": -0.04162050783634186,
1185
+ "step": 750
1186
+ },
1187
+ {
1188
+ "epoch": 0.52,
1189
+ "learning_rate": 2.663622526636225e-07,
1190
+ "logits/chosen": -1.8744373321533203,
1191
+ "logits/rejected": -1.5992999076843262,
1192
+ "logps/chosen": -93.0952377319336,
1193
+ "logps/rejected": -76.86521911621094,
1194
+ "loss": 0.6187,
1195
+ "rewards/accuracies": 0.925000011920929,
1196
+ "rewards/chosen": 0.12268207967281342,
1197
+ "rewards/margins": 0.15957853198051453,
1198
+ "rewards/rejected": -0.036896444857120514,
1199
+ "step": 760
1200
+ },
1201
+ {
1202
+ "epoch": 0.53,
1203
+ "learning_rate": 2.6255707762557076e-07,
1204
+ "logits/chosen": -1.9210466146469116,
1205
+ "logits/rejected": -1.6140285730361938,
1206
+ "logps/chosen": -94.59263610839844,
1207
+ "logps/rejected": -75.74002838134766,
1208
+ "loss": 0.6174,
1209
+ "rewards/accuracies": 0.908333420753479,
1210
+ "rewards/chosen": 0.09473783522844315,
1211
+ "rewards/margins": 0.14541365206241608,
1212
+ "rewards/rejected": -0.050675809383392334,
1213
+ "step": 770
1214
+ },
1215
+ {
1216
+ "epoch": 0.53,
1217
+ "learning_rate": 2.58751902587519e-07,
1218
+ "logits/chosen": -1.9041986465454102,
1219
+ "logits/rejected": -1.643701195716858,
1220
+ "logps/chosen": -92.38731384277344,
1221
+ "logps/rejected": -76.4394302368164,
1222
+ "loss": 0.6147,
1223
+ "rewards/accuracies": 0.925000011920929,
1224
+ "rewards/chosen": 0.09374178946018219,
1225
+ "rewards/margins": 0.14521454274654388,
1226
+ "rewards/rejected": -0.05147276073694229,
1227
+ "step": 780
1228
+ },
1229
+ {
1230
+ "epoch": 0.54,
1231
+ "learning_rate": 2.549467275494673e-07,
1232
+ "logits/chosen": -1.8866844177246094,
1233
+ "logits/rejected": -1.5806655883789062,
1234
+ "logps/chosen": -89.06953430175781,
1235
+ "logps/rejected": -75.25362396240234,
1236
+ "loss": 0.6127,
1237
+ "rewards/accuracies": 0.949999988079071,
1238
+ "rewards/chosen": 0.1129077672958374,
1239
+ "rewards/margins": 0.1759023219347,
1240
+ "rewards/rejected": -0.06299454718828201,
1241
+ "step": 790
1242
+ },
1243
+ {
1244
+ "epoch": 0.55,
1245
+ "learning_rate": 2.511415525114155e-07,
1246
+ "logits/chosen": -1.9603450298309326,
1247
+ "logits/rejected": -1.658216118812561,
1248
+ "logps/chosen": -91.58163452148438,
1249
+ "logps/rejected": -74.63312530517578,
1250
+ "loss": 0.6117,
1251
+ "rewards/accuracies": 0.949999988079071,
1252
+ "rewards/chosen": 0.1321893036365509,
1253
+ "rewards/margins": 0.18407562375068665,
1254
+ "rewards/rejected": -0.051886312663555145,
1255
+ "step": 800
1256
+ },
1257
+ {
1258
+ "epoch": 0.55,
1259
+ "eval_logits/chosen": -2.047370672225952,
1260
+ "eval_logits/rejected": -1.7785520553588867,
1261
+ "eval_logps/chosen": -90.88909149169922,
1262
+ "eval_logps/rejected": -72.66747283935547,
1263
+ "eval_loss": 0.6126503348350525,
1264
+ "eval_rewards/accuracies": 0.9222221970558167,
1265
+ "eval_rewards/chosen": 0.10966197401285172,
1266
+ "eval_rewards/margins": 0.16636402904987335,
1267
+ "eval_rewards/rejected": -0.05670207738876343,
1268
+ "eval_runtime": 117.9996,
1269
+ "eval_samples_per_second": 24.254,
1270
+ "eval_steps_per_second": 0.763,
1271
+ "step": 800
1272
+ },
1273
+ {
1274
+ "epoch": 0.55,
1275
+ "learning_rate": 2.4733637747336376e-07,
1276
+ "logits/chosen": -1.9528766870498657,
1277
+ "logits/rejected": -1.594366192817688,
1278
+ "logps/chosen": -97.9262466430664,
1279
+ "logps/rejected": -73.47969055175781,
1280
+ "loss": 0.6145,
1281
+ "rewards/accuracies": 0.9083333015441895,
1282
+ "rewards/chosen": 0.12691722810268402,
1283
+ "rewards/margins": 0.1831960827112198,
1284
+ "rewards/rejected": -0.056278862059116364,
1285
+ "step": 810
1286
+ },
1287
+ {
1288
+ "epoch": 0.56,
1289
+ "learning_rate": 2.43531202435312e-07,
1290
+ "logits/chosen": -1.9507348537445068,
1291
+ "logits/rejected": -1.6255791187286377,
1292
+ "logps/chosen": -93.60541534423828,
1293
+ "logps/rejected": -74.82536315917969,
1294
+ "loss": 0.6094,
1295
+ "rewards/accuracies": 0.9166666269302368,
1296
+ "rewards/chosen": 0.11346453428268433,
1297
+ "rewards/margins": 0.17843127250671387,
1298
+ "rewards/rejected": -0.06496672332286835,
1299
+ "step": 820
1300
+ },
1301
+ {
1302
+ "epoch": 0.57,
1303
+ "learning_rate": 2.3972602739726023e-07,
1304
+ "logits/chosen": -1.7865006923675537,
1305
+ "logits/rejected": -1.5033773183822632,
1306
+ "logps/chosen": -89.35474395751953,
1307
+ "logps/rejected": -70.51929473876953,
1308
+ "loss": 0.6081,
1309
+ "rewards/accuracies": 0.9333332777023315,
1310
+ "rewards/chosen": 0.1159995049238205,
1311
+ "rewards/margins": 0.1914626657962799,
1312
+ "rewards/rejected": -0.07546313852071762,
1313
+ "step": 830
1314
+ },
1315
+ {
1316
+ "epoch": 0.58,
1317
+ "learning_rate": 2.359208523592085e-07,
1318
+ "logits/chosen": -1.9572938680648804,
1319
+ "logits/rejected": -1.6931850910186768,
1320
+ "logps/chosen": -90.90480041503906,
1321
+ "logps/rejected": -77.85359191894531,
1322
+ "loss": 0.6088,
1323
+ "rewards/accuracies": 0.9083333015441895,
1324
+ "rewards/chosen": 0.10320776700973511,
1325
+ "rewards/margins": 0.16814057528972626,
1326
+ "rewards/rejected": -0.06493280827999115,
1327
+ "step": 840
1328
+ },
1329
+ {
1330
+ "epoch": 0.58,
1331
+ "learning_rate": 2.3211567732115676e-07,
1332
+ "logits/chosen": -2.0167603492736816,
1333
+ "logits/rejected": -1.715150237083435,
1334
+ "logps/chosen": -90.37574768066406,
1335
+ "logps/rejected": -77.0097427368164,
1336
+ "loss": 0.6039,
1337
+ "rewards/accuracies": 0.9333332777023315,
1338
+ "rewards/chosen": 0.1519448608160019,
1339
+ "rewards/margins": 0.21030330657958984,
1340
+ "rewards/rejected": -0.058358438313007355,
1341
+ "step": 850
1342
+ },
1343
+ {
1344
+ "epoch": 0.59,
1345
+ "learning_rate": 2.28310502283105e-07,
1346
+ "logits/chosen": -1.8651390075683594,
1347
+ "logits/rejected": -1.5632587671279907,
1348
+ "logps/chosen": -94.64379119873047,
1349
+ "logps/rejected": -75.53591918945312,
1350
+ "loss": 0.6078,
1351
+ "rewards/accuracies": 0.9416667222976685,
1352
+ "rewards/chosen": 0.13561630249023438,
1353
+ "rewards/margins": 0.19979830086231232,
1354
+ "rewards/rejected": -0.06418199837207794,
1355
+ "step": 860
1356
+ },
1357
+ {
1358
+ "epoch": 0.6,
1359
+ "learning_rate": 2.2450532724505325e-07,
1360
+ "logits/chosen": -1.9391330480575562,
1361
+ "logits/rejected": -1.663351058959961,
1362
+ "logps/chosen": -86.51952362060547,
1363
+ "logps/rejected": -73.44007873535156,
1364
+ "loss": 0.6057,
1365
+ "rewards/accuracies": 0.9333332777023315,
1366
+ "rewards/chosen": 0.1343904286623001,
1367
+ "rewards/margins": 0.1892845332622528,
1368
+ "rewards/rejected": -0.054894138127565384,
1369
+ "step": 870
1370
+ },
1371
+ {
1372
+ "epoch": 0.6,
1373
+ "learning_rate": 2.207001522070015e-07,
1374
+ "logits/chosen": -1.842283010482788,
1375
+ "logits/rejected": -1.5807665586471558,
1376
+ "logps/chosen": -90.40550231933594,
1377
+ "logps/rejected": -77.0767593383789,
1378
+ "loss": 0.6045,
1379
+ "rewards/accuracies": 0.9249998927116394,
1380
+ "rewards/chosen": 0.13356170058250427,
1381
+ "rewards/margins": 0.1978417932987213,
1382
+ "rewards/rejected": -0.06428009271621704,
1383
+ "step": 880
1384
+ },
1385
+ {
1386
+ "epoch": 0.61,
1387
+ "learning_rate": 2.1689497716894975e-07,
1388
+ "logits/chosen": -1.913578748703003,
1389
+ "logits/rejected": -1.648938536643982,
1390
+ "logps/chosen": -87.49850463867188,
1391
+ "logps/rejected": -74.8587646484375,
1392
+ "loss": 0.6026,
1393
+ "rewards/accuracies": 0.9333332777023315,
1394
+ "rewards/chosen": 0.13597458600997925,
1395
+ "rewards/margins": 0.20577768981456757,
1396
+ "rewards/rejected": -0.06980310380458832,
1397
+ "step": 890
1398
+ },
1399
+ {
1400
+ "epoch": 0.62,
1401
+ "learning_rate": 2.13089802130898e-07,
1402
+ "logits/chosen": -1.8712265491485596,
1403
+ "logits/rejected": -1.5505828857421875,
1404
+ "logps/chosen": -92.89669036865234,
1405
+ "logps/rejected": -77.97479248046875,
1406
+ "loss": 0.6002,
1407
+ "rewards/accuracies": 0.9416667222976685,
1408
+ "rewards/chosen": 0.1514441967010498,
1409
+ "rewards/margins": 0.2083124816417694,
1410
+ "rewards/rejected": -0.056868284940719604,
1411
+ "step": 900
1412
+ },
1413
+ {
1414
+ "epoch": 0.62,
1415
+ "eval_logits/chosen": -2.046788454055786,
1416
+ "eval_logits/rejected": -1.7776765823364258,
1417
+ "eval_logps/chosen": -90.75981140136719,
1418
+ "eval_logps/rejected": -72.78363037109375,
1419
+ "eval_loss": 0.6019285321235657,
1420
+ "eval_rewards/accuracies": 0.9277777671813965,
1421
+ "eval_rewards/chosen": 0.12258908152580261,
1422
+ "eval_rewards/margins": 0.1909066140651703,
1423
+ "eval_rewards/rejected": -0.06831753998994827,
1424
+ "eval_runtime": 117.8634,
1425
+ "eval_samples_per_second": 24.282,
1426
+ "eval_steps_per_second": 0.764,
1427
+ "step": 900
1428
+ },
1429
+ {
1430
+ "epoch": 0.62,
1431
+ "learning_rate": 2.0928462709284625e-07,
1432
+ "logits/chosen": -1.7578074932098389,
1433
+ "logits/rejected": -1.445682406425476,
1434
+ "logps/chosen": -92.38932800292969,
1435
+ "logps/rejected": -74.73234558105469,
1436
+ "loss": 0.5977,
1437
+ "rewards/accuracies": 0.9583333134651184,
1438
+ "rewards/chosen": 0.14447703957557678,
1439
+ "rewards/margins": 0.2118067443370819,
1440
+ "rewards/rejected": -0.06732969731092453,
1441
+ "step": 910
1442
+ },
1443
+ {
1444
+ "epoch": 0.63,
1445
+ "learning_rate": 2.054794520547945e-07,
1446
+ "logits/chosen": -1.9356153011322021,
1447
+ "logits/rejected": -1.6309950351715088,
1448
+ "logps/chosen": -90.90232849121094,
1449
+ "logps/rejected": -75.2989730834961,
1450
+ "loss": 0.5962,
1451
+ "rewards/accuracies": 0.949999988079071,
1452
+ "rewards/chosen": 0.14640390872955322,
1453
+ "rewards/margins": 0.21756890416145325,
1454
+ "rewards/rejected": -0.07116499543190002,
1455
+ "step": 920
1456
+ },
1457
+ {
1458
+ "epoch": 0.64,
1459
+ "learning_rate": 2.0167427701674275e-07,
1460
+ "logits/chosen": -1.903969407081604,
1461
+ "logits/rejected": -1.6307262182235718,
1462
+ "logps/chosen": -91.58943939208984,
1463
+ "logps/rejected": -75.18601989746094,
1464
+ "loss": 0.6003,
1465
+ "rewards/accuracies": 0.949999988079071,
1466
+ "rewards/chosen": 0.13933506608009338,
1467
+ "rewards/margins": 0.20189881324768066,
1468
+ "rewards/rejected": -0.06256375461816788,
1469
+ "step": 930
1470
+ },
1471
+ {
1472
+ "epoch": 0.64,
1473
+ "learning_rate": 1.97869101978691e-07,
1474
+ "logits/chosen": -1.866371512413025,
1475
+ "logits/rejected": -1.6297776699066162,
1476
+ "logps/chosen": -91.3348617553711,
1477
+ "logps/rejected": -75.30091857910156,
1478
+ "loss": 0.5968,
1479
+ "rewards/accuracies": 0.949999988079071,
1480
+ "rewards/chosen": 0.16422715783119202,
1481
+ "rewards/margins": 0.22424063086509705,
1482
+ "rewards/rejected": -0.06001347303390503,
1483
+ "step": 940
1484
+ },
1485
+ {
1486
+ "epoch": 0.65,
1487
+ "learning_rate": 1.9406392694063925e-07,
1488
+ "logits/chosen": -1.860414743423462,
1489
+ "logits/rejected": -1.5813719034194946,
1490
+ "logps/chosen": -94.010498046875,
1491
+ "logps/rejected": -73.02679443359375,
1492
+ "loss": 0.595,
1493
+ "rewards/accuracies": 0.9583333134651184,
1494
+ "rewards/chosen": 0.14462696015834808,
1495
+ "rewards/margins": 0.22170260548591614,
1496
+ "rewards/rejected": -0.07707564532756805,
1497
+ "step": 950
1498
+ },
1499
+ {
1500
+ "epoch": 0.66,
1501
+ "learning_rate": 1.902587519025875e-07,
1502
+ "logits/chosen": -1.755253791809082,
1503
+ "logits/rejected": -1.5063341856002808,
1504
+ "logps/chosen": -89.60089111328125,
1505
+ "logps/rejected": -75.72676086425781,
1506
+ "loss": 0.5942,
1507
+ "rewards/accuracies": 0.949999988079071,
1508
+ "rewards/chosen": 0.12570711970329285,
1509
+ "rewards/margins": 0.19667670130729675,
1510
+ "rewards/rejected": -0.0709695890545845,
1511
+ "step": 960
1512
+ },
1513
+ {
1514
+ "epoch": 0.66,
1515
+ "learning_rate": 1.8645357686453575e-07,
1516
+ "logits/chosen": -1.9189532995224,
1517
+ "logits/rejected": -1.60482919216156,
1518
+ "logps/chosen": -91.0718765258789,
1519
+ "logps/rejected": -76.86122131347656,
1520
+ "loss": 0.5919,
1521
+ "rewards/accuracies": 0.949999988079071,
1522
+ "rewards/chosen": 0.14830803871154785,
1523
+ "rewards/margins": 0.22014153003692627,
1524
+ "rewards/rejected": -0.07183349877595901,
1525
+ "step": 970
1526
+ },
1527
+ {
1528
+ "epoch": 0.67,
1529
+ "learning_rate": 1.82648401826484e-07,
1530
+ "logits/chosen": -1.9582935571670532,
1531
+ "logits/rejected": -1.6904990673065186,
1532
+ "logps/chosen": -92.6358871459961,
1533
+ "logps/rejected": -75.13966369628906,
1534
+ "loss": 0.593,
1535
+ "rewards/accuracies": 0.9333333969116211,
1536
+ "rewards/chosen": 0.12001453340053558,
1537
+ "rewards/margins": 0.20278160274028778,
1538
+ "rewards/rejected": -0.0827670693397522,
1539
+ "step": 980
1540
+ },
1541
+ {
1542
+ "epoch": 0.68,
1543
+ "learning_rate": 1.7884322678843225e-07,
1544
+ "logits/chosen": -1.7855488061904907,
1545
+ "logits/rejected": -1.498254418373108,
1546
+ "logps/chosen": -89.93516540527344,
1547
+ "logps/rejected": -71.06876373291016,
1548
+ "loss": 0.5923,
1549
+ "rewards/accuracies": 0.9666666984558105,
1550
+ "rewards/chosen": 0.13533183932304382,
1551
+ "rewards/margins": 0.2196374386548996,
1552
+ "rewards/rejected": -0.08430557698011398,
1553
+ "step": 990
1554
+ },
1555
+ {
1556
+ "epoch": 0.68,
1557
+ "learning_rate": 1.750380517503805e-07,
1558
+ "logits/chosen": -1.9469058513641357,
1559
+ "logits/rejected": -1.655491828918457,
1560
+ "logps/chosen": -92.86201477050781,
1561
+ "logps/rejected": -77.54034423828125,
1562
+ "loss": 0.5912,
1563
+ "rewards/accuracies": 0.9499999284744263,
1564
+ "rewards/chosen": 0.14615695178508759,
1565
+ "rewards/margins": 0.22164049744606018,
1566
+ "rewards/rejected": -0.0754835233092308,
1567
+ "step": 1000
1568
+ },
1569
+ {
1570
+ "epoch": 0.68,
1571
+ "eval_logits/chosen": -2.046569585800171,
1572
+ "eval_logits/rejected": -1.7769988775253296,
1573
+ "eval_logps/chosen": -90.64215850830078,
1574
+ "eval_logps/rejected": -72.90531921386719,
1575
+ "eval_loss": 0.5911818742752075,
1576
+ "eval_rewards/accuracies": 0.9333333373069763,
1577
+ "eval_rewards/chosen": 0.13435469567775726,
1578
+ "eval_rewards/margins": 0.21484099328517914,
1579
+ "eval_rewards/rejected": -0.08048629015684128,
1580
+ "eval_runtime": 117.9339,
1581
+ "eval_samples_per_second": 24.268,
1582
+ "eval_steps_per_second": 0.763,
1583
+ "step": 1000
1584
+ },
1585
+ {
1586
+ "epoch": 0.69,
1587
+ "learning_rate": 1.7123287671232875e-07,
1588
+ "logits/chosen": -1.940800666809082,
1589
+ "logits/rejected": -1.6664402484893799,
1590
+ "logps/chosen": -86.94990539550781,
1591
+ "logps/rejected": -73.69987487792969,
1592
+ "loss": 0.5931,
1593
+ "rewards/accuracies": 0.925000011920929,
1594
+ "rewards/chosen": 0.09665495157241821,
1595
+ "rewards/margins": 0.18213674426078796,
1596
+ "rewards/rejected": -0.08548180013895035,
1597
+ "step": 1010
1598
+ },
1599
+ {
1600
+ "epoch": 0.7,
1601
+ "learning_rate": 1.67427701674277e-07,
1602
+ "logits/chosen": -1.9208215475082397,
1603
+ "logits/rejected": -1.6697795391082764,
1604
+ "logps/chosen": -90.12159729003906,
1605
+ "logps/rejected": -75.43165588378906,
1606
+ "loss": 0.5936,
1607
+ "rewards/accuracies": 0.9083331823348999,
1608
+ "rewards/chosen": 0.1299724578857422,
1609
+ "rewards/margins": 0.21237368881702423,
1610
+ "rewards/rejected": -0.08240120857954025,
1611
+ "step": 1020
1612
+ },
1613
+ {
1614
+ "epoch": 0.71,
1615
+ "learning_rate": 1.6362252663622525e-07,
1616
+ "logits/chosen": -1.8031762838363647,
1617
+ "logits/rejected": -1.552947759628296,
1618
+ "logps/chosen": -93.24359130859375,
1619
+ "logps/rejected": -73.16694641113281,
1620
+ "loss": 0.595,
1621
+ "rewards/accuracies": 0.9333333969116211,
1622
+ "rewards/chosen": 0.14134086668491364,
1623
+ "rewards/margins": 0.22129371762275696,
1624
+ "rewards/rejected": -0.07995286583900452,
1625
+ "step": 1030
1626
+ },
1627
+ {
1628
+ "epoch": 0.71,
1629
+ "learning_rate": 1.598173515981735e-07,
1630
+ "logits/chosen": -1.9094417095184326,
1631
+ "logits/rejected": -1.6032779216766357,
1632
+ "logps/chosen": -95.291015625,
1633
+ "logps/rejected": -76.30043029785156,
1634
+ "loss": 0.588,
1635
+ "rewards/accuracies": 0.949999988079071,
1636
+ "rewards/chosen": 0.17578403651714325,
1637
+ "rewards/margins": 0.24385884404182434,
1638
+ "rewards/rejected": -0.0680748000741005,
1639
+ "step": 1040
1640
+ },
1641
+ {
1642
+ "epoch": 0.72,
1643
+ "learning_rate": 1.5601217656012175e-07,
1644
+ "logits/chosen": -1.8715407848358154,
1645
+ "logits/rejected": -1.5754361152648926,
1646
+ "logps/chosen": -93.08331298828125,
1647
+ "logps/rejected": -71.55984497070312,
1648
+ "loss": 0.5885,
1649
+ "rewards/accuracies": 0.908333420753479,
1650
+ "rewards/chosen": 0.16208195686340332,
1651
+ "rewards/margins": 0.24299950897693634,
1652
+ "rewards/rejected": -0.08091756701469421,
1653
+ "step": 1050
1654
+ },
1655
+ {
1656
+ "epoch": 0.73,
1657
+ "learning_rate": 1.5220700152207e-07,
1658
+ "logits/chosen": -1.9729465246200562,
1659
+ "logits/rejected": -1.6472351551055908,
1660
+ "logps/chosen": -96.95377349853516,
1661
+ "logps/rejected": -74.73013305664062,
1662
+ "loss": 0.5829,
1663
+ "rewards/accuracies": 0.9583333134651184,
1664
+ "rewards/chosen": 0.17102012038230896,
1665
+ "rewards/margins": 0.26194968819618225,
1666
+ "rewards/rejected": -0.09092956781387329,
1667
+ "step": 1060
1668
+ },
1669
+ {
1670
+ "epoch": 0.73,
1671
+ "learning_rate": 1.4840182648401825e-07,
1672
+ "logits/chosen": -1.87423574924469,
1673
+ "logits/rejected": -1.5873512029647827,
1674
+ "logps/chosen": -92.32403564453125,
1675
+ "logps/rejected": -77.21125793457031,
1676
+ "loss": 0.5842,
1677
+ "rewards/accuracies": 0.966666579246521,
1678
+ "rewards/chosen": 0.1644188016653061,
1679
+ "rewards/margins": 0.27040743827819824,
1680
+ "rewards/rejected": -0.10598863661289215,
1681
+ "step": 1070
1682
+ },
1683
+ {
1684
+ "epoch": 0.74,
1685
+ "learning_rate": 1.445966514459665e-07,
1686
+ "logits/chosen": -1.8510887622833252,
1687
+ "logits/rejected": -1.5857640504837036,
1688
+ "logps/chosen": -92.05542755126953,
1689
+ "logps/rejected": -77.68433380126953,
1690
+ "loss": 0.5877,
1691
+ "rewards/accuracies": 0.9333333969116211,
1692
+ "rewards/chosen": 0.146043062210083,
1693
+ "rewards/margins": 0.22200524806976318,
1694
+ "rewards/rejected": -0.07596220076084137,
1695
+ "step": 1080
1696
+ },
1697
+ {
1698
+ "epoch": 0.75,
1699
+ "learning_rate": 1.4079147640791475e-07,
1700
+ "logits/chosen": -1.8213762044906616,
1701
+ "logits/rejected": -1.5474189519882202,
1702
+ "logps/chosen": -91.6738052368164,
1703
+ "logps/rejected": -74.36346435546875,
1704
+ "loss": 0.5796,
1705
+ "rewards/accuracies": 0.9833332896232605,
1706
+ "rewards/chosen": 0.1721878945827484,
1707
+ "rewards/margins": 0.26567989587783813,
1708
+ "rewards/rejected": -0.09349202364683151,
1709
+ "step": 1090
1710
+ },
1711
+ {
1712
+ "epoch": 0.75,
1713
+ "learning_rate": 1.36986301369863e-07,
1714
+ "logits/chosen": -1.8960554599761963,
1715
+ "logits/rejected": -1.5947376489639282,
1716
+ "logps/chosen": -95.58143615722656,
1717
+ "logps/rejected": -75.37010192871094,
1718
+ "loss": 0.5822,
1719
+ "rewards/accuracies": 0.9166666865348816,
1720
+ "rewards/chosen": 0.18196699023246765,
1721
+ "rewards/margins": 0.25899866223335266,
1722
+ "rewards/rejected": -0.0770316869020462,
1723
+ "step": 1100
1724
+ },
1725
+ {
1726
+ "epoch": 0.75,
1727
+ "eval_logits/chosen": -2.0461771488189697,
1728
+ "eval_logits/rejected": -1.7763375043869019,
1729
+ "eval_logps/chosen": -90.54474639892578,
1730
+ "eval_logps/rejected": -73.00917053222656,
1731
+ "eval_loss": 0.5822051763534546,
1732
+ "eval_rewards/accuracies": 0.9472222328186035,
1733
+ "eval_rewards/chosen": 0.14409679174423218,
1734
+ "eval_rewards/margins": 0.23496907949447632,
1735
+ "eval_rewards/rejected": -0.09087225794792175,
1736
+ "eval_runtime": 117.8174,
1737
+ "eval_samples_per_second": 24.292,
1738
+ "eval_steps_per_second": 0.764,
1739
+ "step": 1100
1740
+ },
1741
+ {
1742
+ "epoch": 0.76,
1743
+ "learning_rate": 1.3318112633181125e-07,
1744
+ "logits/chosen": -1.893864393234253,
1745
+ "logits/rejected": -1.5917404890060425,
1746
+ "logps/chosen": -90.5267105102539,
1747
+ "logps/rejected": -73.85464477539062,
1748
+ "loss": 0.5809,
1749
+ "rewards/accuracies": 0.9666666984558105,
1750
+ "rewards/chosen": 0.18791857361793518,
1751
+ "rewards/margins": 0.26672154664993286,
1752
+ "rewards/rejected": -0.07880295813083649,
1753
+ "step": 1110
1754
+ },
1755
+ {
1756
+ "epoch": 0.77,
1757
+ "learning_rate": 1.293759512937595e-07,
1758
+ "logits/chosen": -1.851485013961792,
1759
+ "logits/rejected": -1.5561844110488892,
1760
+ "logps/chosen": -91.46730041503906,
1761
+ "logps/rejected": -75.5124740600586,
1762
+ "loss": 0.5803,
1763
+ "rewards/accuracies": 0.9083333015441895,
1764
+ "rewards/chosen": 0.15885117650032043,
1765
+ "rewards/margins": 0.24810326099395752,
1766
+ "rewards/rejected": -0.08925210684537888,
1767
+ "step": 1120
1768
+ },
1769
+ {
1770
+ "epoch": 0.77,
1771
+ "learning_rate": 1.2557077625570775e-07,
1772
+ "logits/chosen": -1.9113174676895142,
1773
+ "logits/rejected": -1.6106466054916382,
1774
+ "logps/chosen": -88.4524917602539,
1775
+ "logps/rejected": -71.1933364868164,
1776
+ "loss": 0.5805,
1777
+ "rewards/accuracies": 0.8999999761581421,
1778
+ "rewards/chosen": 0.1571049988269806,
1779
+ "rewards/margins": 0.23198041319847107,
1780
+ "rewards/rejected": -0.07487543672323227,
1781
+ "step": 1130
1782
+ },
1783
+ {
1784
+ "epoch": 0.78,
1785
+ "learning_rate": 1.21765601217656e-07,
1786
+ "logits/chosen": -1.9750347137451172,
1787
+ "logits/rejected": -1.6996724605560303,
1788
+ "logps/chosen": -91.44354248046875,
1789
+ "logps/rejected": -77.03483581542969,
1790
+ "loss": 0.5768,
1791
+ "rewards/accuracies": 0.9416666030883789,
1792
+ "rewards/chosen": 0.14498132467269897,
1793
+ "rewards/margins": 0.23422233760356903,
1794
+ "rewards/rejected": -0.08924100548028946,
1795
+ "step": 1140
1796
+ },
1797
+ {
1798
+ "epoch": 0.79,
1799
+ "learning_rate": 1.1796042617960425e-07,
1800
+ "logits/chosen": -1.9331929683685303,
1801
+ "logits/rejected": -1.6460412740707397,
1802
+ "logps/chosen": -84.95231628417969,
1803
+ "logps/rejected": -73.00121307373047,
1804
+ "loss": 0.5759,
1805
+ "rewards/accuracies": 0.9416667222976685,
1806
+ "rewards/chosen": 0.17358574271202087,
1807
+ "rewards/margins": 0.2643177807331085,
1808
+ "rewards/rejected": -0.09073203802108765,
1809
+ "step": 1150
1810
+ },
1811
+ {
1812
+ "epoch": 0.79,
1813
+ "learning_rate": 1.141552511415525e-07,
1814
+ "logits/chosen": -1.9368797540664673,
1815
+ "logits/rejected": -1.6109968423843384,
1816
+ "logps/chosen": -95.30493927001953,
1817
+ "logps/rejected": -79.38502502441406,
1818
+ "loss": 0.5725,
1819
+ "rewards/accuracies": 0.9166666269302368,
1820
+ "rewards/chosen": 0.16715973615646362,
1821
+ "rewards/margins": 0.26029014587402344,
1822
+ "rewards/rejected": -0.09313040971755981,
1823
+ "step": 1160
1824
+ },
1825
+ {
1826
+ "epoch": 0.8,
1827
+ "learning_rate": 1.1035007610350075e-07,
1828
+ "logits/chosen": -1.9934200048446655,
1829
+ "logits/rejected": -1.7067596912384033,
1830
+ "logps/chosen": -91.30809020996094,
1831
+ "logps/rejected": -75.1029052734375,
1832
+ "loss": 0.577,
1833
+ "rewards/accuracies": 0.9750000238418579,
1834
+ "rewards/chosen": 0.1644877940416336,
1835
+ "rewards/margins": 0.2513524889945984,
1836
+ "rewards/rejected": -0.08686470985412598,
1837
+ "step": 1170
1838
+ },
1839
+ {
1840
+ "epoch": 0.81,
1841
+ "learning_rate": 1.06544901065449e-07,
1842
+ "logits/chosen": -1.8585717678070068,
1843
+ "logits/rejected": -1.593147873878479,
1844
+ "logps/chosen": -94.06452941894531,
1845
+ "logps/rejected": -78.18238830566406,
1846
+ "loss": 0.5798,
1847
+ "rewards/accuracies": 0.9166666865348816,
1848
+ "rewards/chosen": 0.1385236382484436,
1849
+ "rewards/margins": 0.2384444922208786,
1850
+ "rewards/rejected": -0.09992088377475739,
1851
+ "step": 1180
1852
+ },
1853
+ {
1854
+ "epoch": 0.81,
1855
+ "learning_rate": 1.0273972602739725e-07,
1856
+ "logits/chosen": -1.921449065208435,
1857
+ "logits/rejected": -1.6535272598266602,
1858
+ "logps/chosen": -87.8677749633789,
1859
+ "logps/rejected": -78.9840087890625,
1860
+ "loss": 0.5767,
1861
+ "rewards/accuracies": 0.9666666984558105,
1862
+ "rewards/chosen": 0.14824248850345612,
1863
+ "rewards/margins": 0.25046244263648987,
1864
+ "rewards/rejected": -0.10221991688013077,
1865
+ "step": 1190
1866
+ },
1867
+ {
1868
+ "epoch": 0.82,
1869
+ "learning_rate": 9.89345509893455e-08,
1870
+ "logits/chosen": -1.8294260501861572,
1871
+ "logits/rejected": -1.5607296228408813,
1872
+ "logps/chosen": -94.44920349121094,
1873
+ "logps/rejected": -77.64144134521484,
1874
+ "loss": 0.5789,
1875
+ "rewards/accuracies": 0.9166666269302368,
1876
+ "rewards/chosen": 0.16983038187026978,
1877
+ "rewards/margins": 0.2730123996734619,
1878
+ "rewards/rejected": -0.10318204015493393,
1879
+ "step": 1200
1880
+ },
1881
+ {
1882
+ "epoch": 0.82,
1883
+ "eval_logits/chosen": -2.0464720726013184,
1884
+ "eval_logits/rejected": -1.7763383388519287,
1885
+ "eval_logps/chosen": -90.46904754638672,
1886
+ "eval_logps/rejected": -73.09234619140625,
1887
+ "eval_loss": 0.5758996605873108,
1888
+ "eval_rewards/accuracies": 0.9333333373069763,
1889
+ "eval_rewards/chosen": 0.1516665369272232,
1890
+ "eval_rewards/margins": 0.2508557140827179,
1891
+ "eval_rewards/rejected": -0.0991891473531723,
1892
+ "eval_runtime": 117.7184,
1893
+ "eval_samples_per_second": 24.312,
1894
+ "eval_steps_per_second": 0.765,
1895
+ "step": 1200
1896
+ },
1897
+ {
1898
+ "epoch": 0.83,
1899
+ "learning_rate": 9.512937595129374e-08,
1900
+ "logits/chosen": -1.8505995273590088,
1901
+ "logits/rejected": -1.570237398147583,
1902
+ "logps/chosen": -95.08009338378906,
1903
+ "logps/rejected": -75.39155578613281,
1904
+ "loss": 0.5739,
1905
+ "rewards/accuracies": 0.9583333134651184,
1906
+ "rewards/chosen": 0.14622533321380615,
1907
+ "rewards/margins": 0.23580579459667206,
1908
+ "rewards/rejected": -0.08958044648170471,
1909
+ "step": 1210
1910
+ },
1911
+ {
1912
+ "epoch": 0.84,
1913
+ "learning_rate": 9.1324200913242e-08,
1914
+ "logits/chosen": -1.8380448818206787,
1915
+ "logits/rejected": -1.5534820556640625,
1916
+ "logps/chosen": -89.47401428222656,
1917
+ "logps/rejected": -73.60565948486328,
1918
+ "loss": 0.5772,
1919
+ "rewards/accuracies": 0.9416667222976685,
1920
+ "rewards/chosen": 0.1780690848827362,
1921
+ "rewards/margins": 0.2615908980369568,
1922
+ "rewards/rejected": -0.08352181315422058,
1923
+ "step": 1220
1924
+ },
1925
+ {
1926
+ "epoch": 0.84,
1927
+ "learning_rate": 8.751902587519024e-08,
1928
+ "logits/chosen": -1.975015640258789,
1929
+ "logits/rejected": -1.6878995895385742,
1930
+ "logps/chosen": -93.83460998535156,
1931
+ "logps/rejected": -76.86270904541016,
1932
+ "loss": 0.576,
1933
+ "rewards/accuracies": 0.9750000238418579,
1934
+ "rewards/chosen": 0.17377497255802155,
1935
+ "rewards/margins": 0.25830045342445374,
1936
+ "rewards/rejected": -0.08452550321817398,
1937
+ "step": 1230
1938
+ },
1939
+ {
1940
+ "epoch": 0.85,
1941
+ "learning_rate": 8.37138508371385e-08,
1942
+ "logits/chosen": -1.8430579900741577,
1943
+ "logits/rejected": -1.5395987033843994,
1944
+ "logps/chosen": -92.52012634277344,
1945
+ "logps/rejected": -79.96121215820312,
1946
+ "loss": 0.5745,
1947
+ "rewards/accuracies": 0.9583333134651184,
1948
+ "rewards/chosen": 0.15905624628067017,
1949
+ "rewards/margins": 0.25326135754585266,
1950
+ "rewards/rejected": -0.0942051112651825,
1951
+ "step": 1240
1952
+ },
1953
+ {
1954
+ "epoch": 0.86,
1955
+ "learning_rate": 7.990867579908676e-08,
1956
+ "logits/chosen": -1.8872146606445312,
1957
+ "logits/rejected": -1.5909473896026611,
1958
+ "logps/chosen": -89.97090148925781,
1959
+ "logps/rejected": -77.28529357910156,
1960
+ "loss": 0.5727,
1961
+ "rewards/accuracies": 0.949999988079071,
1962
+ "rewards/chosen": 0.16229455173015594,
1963
+ "rewards/margins": 0.2598329186439514,
1964
+ "rewards/rejected": -0.09753839671611786,
1965
+ "step": 1250
1966
+ },
1967
+ {
1968
+ "epoch": 0.86,
1969
+ "learning_rate": 7.6103500761035e-08,
1970
+ "logits/chosen": -1.8499984741210938,
1971
+ "logits/rejected": -1.5525275468826294,
1972
+ "logps/chosen": -93.28520202636719,
1973
+ "logps/rejected": -74.24462890625,
1974
+ "loss": 0.5731,
1975
+ "rewards/accuracies": 0.925000011920929,
1976
+ "rewards/chosen": 0.14929968118667603,
1977
+ "rewards/margins": 0.2529754042625427,
1978
+ "rewards/rejected": -0.10367570072412491,
1979
+ "step": 1260
1980
+ },
1981
+ {
1982
+ "epoch": 0.87,
1983
+ "learning_rate": 7.229832572298326e-08,
1984
+ "logits/chosen": -2.009666681289673,
1985
+ "logits/rejected": -1.7177025079727173,
1986
+ "logps/chosen": -89.4050521850586,
1987
+ "logps/rejected": -79.06332397460938,
1988
+ "loss": 0.5712,
1989
+ "rewards/accuracies": 0.949999988079071,
1990
+ "rewards/chosen": 0.1704416573047638,
1991
+ "rewards/margins": 0.27492305636405945,
1992
+ "rewards/rejected": -0.10448137670755386,
1993
+ "step": 1270
1994
+ },
1995
+ {
1996
+ "epoch": 0.88,
1997
+ "learning_rate": 6.84931506849315e-08,
1998
+ "logits/chosen": -1.81943678855896,
1999
+ "logits/rejected": -1.5563119649887085,
2000
+ "logps/chosen": -88.48675537109375,
2001
+ "logps/rejected": -72.3918685913086,
2002
+ "loss": 0.5729,
2003
+ "rewards/accuracies": 0.8916667103767395,
2004
+ "rewards/chosen": 0.13842932879924774,
2005
+ "rewards/margins": 0.24020667374134064,
2006
+ "rewards/rejected": -0.10177735984325409,
2007
+ "step": 1280
2008
+ },
2009
+ {
2010
+ "epoch": 0.88,
2011
+ "learning_rate": 6.468797564687976e-08,
2012
+ "logits/chosen": -1.8918097019195557,
2013
+ "logits/rejected": -1.6081184148788452,
2014
+ "logps/chosen": -87.69499969482422,
2015
+ "logps/rejected": -71.82918548583984,
2016
+ "loss": 0.5745,
2017
+ "rewards/accuracies": 0.9333332777023315,
2018
+ "rewards/chosen": 0.17439430952072144,
2019
+ "rewards/margins": 0.2683844268321991,
2020
+ "rewards/rejected": -0.09399012476205826,
2021
+ "step": 1290
2022
+ },
2023
+ {
2024
+ "epoch": 0.89,
2025
+ "learning_rate": 6.0882800608828e-08,
2026
+ "logits/chosen": -1.7783477306365967,
2027
+ "logits/rejected": -1.4948413372039795,
2028
+ "logps/chosen": -89.221435546875,
2029
+ "logps/rejected": -72.36725616455078,
2030
+ "loss": 0.5689,
2031
+ "rewards/accuracies": 0.9583333134651184,
2032
+ "rewards/chosen": 0.16254201531410217,
2033
+ "rewards/margins": 0.27369171380996704,
2034
+ "rewards/rejected": -0.11114968359470367,
2035
+ "step": 1300
2036
+ },
2037
+ {
2038
+ "epoch": 0.89,
2039
+ "eval_logits/chosen": -2.0464510917663574,
2040
+ "eval_logits/rejected": -1.7761657238006592,
2041
+ "eval_logps/chosen": -90.43045043945312,
2042
+ "eval_logps/rejected": -73.13316345214844,
2043
+ "eval_loss": 0.5722280740737915,
2044
+ "eval_rewards/accuracies": 0.949999988079071,
2045
+ "eval_rewards/chosen": 0.15552671253681183,
2046
+ "eval_rewards/margins": 0.2587975263595581,
2047
+ "eval_rewards/rejected": -0.10327085852622986,
2048
+ "eval_runtime": 118.3212,
2049
+ "eval_samples_per_second": 24.188,
2050
+ "eval_steps_per_second": 0.761,
2051
+ "step": 1300
2052
+ },
2053
+ {
2054
+ "epoch": 0.9,
2055
+ "learning_rate": 5.707762557077625e-08,
2056
+ "logits/chosen": -1.7794984579086304,
2057
+ "logits/rejected": -1.5128498077392578,
2058
+ "logps/chosen": -94.00843048095703,
2059
+ "logps/rejected": -75.231201171875,
2060
+ "loss": 0.5719,
2061
+ "rewards/accuracies": 0.9750000238418579,
2062
+ "rewards/chosen": 0.18278029561042786,
2063
+ "rewards/margins": 0.2786504328250885,
2064
+ "rewards/rejected": -0.09587012976408005,
2065
+ "step": 1310
2066
+ },
2067
+ {
2068
+ "epoch": 0.9,
2069
+ "learning_rate": 5.32724505327245e-08,
2070
+ "logits/chosen": -1.9731611013412476,
2071
+ "logits/rejected": -1.6866945028305054,
2072
+ "logps/chosen": -93.30955505371094,
2073
+ "logps/rejected": -78.57191467285156,
2074
+ "loss": 0.571,
2075
+ "rewards/accuracies": 0.9583333134651184,
2076
+ "rewards/chosen": 0.15462180972099304,
2077
+ "rewards/margins": 0.26251837611198425,
2078
+ "rewards/rejected": -0.10789655148983002,
2079
+ "step": 1320
2080
+ },
2081
+ {
2082
+ "epoch": 0.91,
2083
+ "learning_rate": 4.946727549467275e-08,
2084
+ "logits/chosen": -1.9133844375610352,
2085
+ "logits/rejected": -1.636694312095642,
2086
+ "logps/chosen": -88.56840515136719,
2087
+ "logps/rejected": -76.69215393066406,
2088
+ "loss": 0.5716,
2089
+ "rewards/accuracies": 0.966666579246521,
2090
+ "rewards/chosen": 0.15411558747291565,
2091
+ "rewards/margins": 0.26415151357650757,
2092
+ "rewards/rejected": -0.11003589630126953,
2093
+ "step": 1330
2094
+ },
2095
+ {
2096
+ "epoch": 0.92,
2097
+ "learning_rate": 4.5662100456621e-08,
2098
+ "logits/chosen": -1.8889293670654297,
2099
+ "logits/rejected": -1.5668977499008179,
2100
+ "logps/chosen": -93.04912567138672,
2101
+ "logps/rejected": -74.54277038574219,
2102
+ "loss": 0.5671,
2103
+ "rewards/accuracies": 0.9166666269302368,
2104
+ "rewards/chosen": 0.1801871955394745,
2105
+ "rewards/margins": 0.2698620557785034,
2106
+ "rewards/rejected": -0.08967487514019012,
2107
+ "step": 1340
2108
+ },
2109
+ {
2110
+ "epoch": 0.92,
2111
+ "learning_rate": 4.185692541856925e-08,
2112
+ "logits/chosen": -1.853014588356018,
2113
+ "logits/rejected": -1.6049798727035522,
2114
+ "logps/chosen": -92.84537506103516,
2115
+ "logps/rejected": -76.91346740722656,
2116
+ "loss": 0.5734,
2117
+ "rewards/accuracies": 0.9333332777023315,
2118
+ "rewards/chosen": 0.1830371469259262,
2119
+ "rewards/margins": 0.27097687125205994,
2120
+ "rewards/rejected": -0.08793972432613373,
2121
+ "step": 1350
2122
+ },
2123
+ {
2124
+ "epoch": 0.93,
2125
+ "learning_rate": 3.80517503805175e-08,
2126
+ "logits/chosen": -1.8321574926376343,
2127
+ "logits/rejected": -1.559768795967102,
2128
+ "logps/chosen": -90.3908462524414,
2129
+ "logps/rejected": -77.5791015625,
2130
+ "loss": 0.5728,
2131
+ "rewards/accuracies": 0.9333333969116211,
2132
+ "rewards/chosen": 0.15050294995307922,
2133
+ "rewards/margins": 0.2644408941268921,
2134
+ "rewards/rejected": -0.11393795162439346,
2135
+ "step": 1360
2136
+ },
2137
+ {
2138
+ "epoch": 0.94,
2139
+ "learning_rate": 3.424657534246575e-08,
2140
+ "logits/chosen": -1.8819458484649658,
2141
+ "logits/rejected": -1.5830302238464355,
2142
+ "logps/chosen": -96.20283508300781,
2143
+ "logps/rejected": -79.02750396728516,
2144
+ "loss": 0.57,
2145
+ "rewards/accuracies": 0.9166666269302368,
2146
+ "rewards/chosen": 0.1495673507452011,
2147
+ "rewards/margins": 0.26126623153686523,
2148
+ "rewards/rejected": -0.11169885098934174,
2149
+ "step": 1370
2150
+ },
2151
+ {
2152
+ "epoch": 0.94,
2153
+ "learning_rate": 3.0441400304414e-08,
2154
+ "logits/chosen": -1.8712704181671143,
2155
+ "logits/rejected": -1.5640711784362793,
2156
+ "logps/chosen": -87.56044006347656,
2157
+ "logps/rejected": -73.2048110961914,
2158
+ "loss": 0.5701,
2159
+ "rewards/accuracies": 0.925000011920929,
2160
+ "rewards/chosen": 0.15671199560165405,
2161
+ "rewards/margins": 0.2683314383029938,
2162
+ "rewards/rejected": -0.11161943525075912,
2163
+ "step": 1380
2164
+ },
2165
+ {
2166
+ "epoch": 0.95,
2167
+ "learning_rate": 2.663622526636225e-08,
2168
+ "logits/chosen": -1.9705963134765625,
2169
+ "logits/rejected": -1.6459972858428955,
2170
+ "logps/chosen": -93.333251953125,
2171
+ "logps/rejected": -75.71430969238281,
2172
+ "loss": 0.5669,
2173
+ "rewards/accuracies": 0.9833332896232605,
2174
+ "rewards/chosen": 0.22007617354393005,
2175
+ "rewards/margins": 0.3152759373188019,
2176
+ "rewards/rejected": -0.09519973397254944,
2177
+ "step": 1390
2178
+ },
2179
+ {
2180
+ "epoch": 0.96,
2181
+ "learning_rate": 2.28310502283105e-08,
2182
+ "logits/chosen": -1.9400713443756104,
2183
+ "logits/rejected": -1.6670045852661133,
2184
+ "logps/chosen": -90.7359619140625,
2185
+ "logps/rejected": -72.49601745605469,
2186
+ "loss": 0.5694,
2187
+ "rewards/accuracies": 0.9416667222976685,
2188
+ "rewards/chosen": 0.18735943734645844,
2189
+ "rewards/margins": 0.2668268084526062,
2190
+ "rewards/rejected": -0.07946738600730896,
2191
+ "step": 1400
2192
+ },
2193
+ {
2194
+ "epoch": 0.96,
2195
+ "eval_logits/chosen": -2.0464956760406494,
2196
+ "eval_logits/rejected": -1.776126742362976,
2197
+ "eval_logps/chosen": -90.40695190429688,
2198
+ "eval_logps/rejected": -73.16618347167969,
2199
+ "eval_loss": 0.5701765418052673,
2200
+ "eval_rewards/accuracies": 0.9416666626930237,
2201
+ "eval_rewards/chosen": 0.15787601470947266,
2202
+ "eval_rewards/margins": 0.26444879174232483,
2203
+ "eval_rewards/rejected": -0.10657278448343277,
2204
+ "eval_runtime": 118.3051,
2205
+ "eval_samples_per_second": 24.192,
2206
+ "eval_steps_per_second": 0.761,
2207
+ "step": 1400
2208
+ },
2209
+ {
2210
+ "epoch": 0.97,
2211
+ "learning_rate": 1.902587519025875e-08,
2212
+ "logits/chosen": -1.9084300994873047,
2213
+ "logits/rejected": -1.5941137075424194,
2214
+ "logps/chosen": -91.3935546875,
2215
+ "logps/rejected": -76.21849060058594,
2216
+ "loss": 0.5703,
2217
+ "rewards/accuracies": 0.9416666030883789,
2218
+ "rewards/chosen": 0.18224991858005524,
2219
+ "rewards/margins": 0.2809165418148041,
2220
+ "rewards/rejected": -0.09866663068532944,
2221
+ "step": 1410
2222
+ },
2223
+ {
2224
+ "epoch": 0.97,
2225
+ "learning_rate": 1.5220700152207e-08,
2226
+ "logits/chosen": -1.8584785461425781,
2227
+ "logits/rejected": -1.5252989530563354,
2228
+ "logps/chosen": -94.4938735961914,
2229
+ "logps/rejected": -76.2500228881836,
2230
+ "loss": 0.5705,
2231
+ "rewards/accuracies": 0.9166666269302368,
2232
+ "rewards/chosen": 0.17414768040180206,
2233
+ "rewards/margins": 0.2775163948535919,
2234
+ "rewards/rejected": -0.10336872190237045,
2235
+ "step": 1420
2236
+ },
2237
+ {
2238
+ "epoch": 0.98,
2239
+ "learning_rate": 1.141552511415525e-08,
2240
+ "logits/chosen": -1.8703104257583618,
2241
+ "logits/rejected": -1.5645638704299927,
2242
+ "logps/chosen": -92.91007232666016,
2243
+ "logps/rejected": -72.98766326904297,
2244
+ "loss": 0.5703,
2245
+ "rewards/accuracies": 0.9416666030883789,
2246
+ "rewards/chosen": 0.1589893251657486,
2247
+ "rewards/margins": 0.26451554894447327,
2248
+ "rewards/rejected": -0.10552623122930527,
2249
+ "step": 1430
2250
+ },
2251
+ {
2252
+ "epoch": 0.99,
2253
+ "learning_rate": 7.6103500761035e-09,
2254
+ "logits/chosen": -1.9101779460906982,
2255
+ "logits/rejected": -1.6234633922576904,
2256
+ "logps/chosen": -88.99760437011719,
2257
+ "logps/rejected": -73.46092224121094,
2258
+ "loss": 0.5694,
2259
+ "rewards/accuracies": 0.9333333969116211,
2260
+ "rewards/chosen": 0.17230169475078583,
2261
+ "rewards/margins": 0.25403517484664917,
2262
+ "rewards/rejected": -0.08173345029354095,
2263
+ "step": 1440
2264
+ },
2265
+ {
2266
+ "epoch": 0.99,
2267
+ "learning_rate": 3.80517503805175e-09,
2268
+ "logits/chosen": -1.837728500366211,
2269
+ "logits/rejected": -1.6049699783325195,
2270
+ "logps/chosen": -92.38600158691406,
2271
+ "logps/rejected": -78.56689453125,
2272
+ "loss": 0.571,
2273
+ "rewards/accuracies": 0.949999988079071,
2274
+ "rewards/chosen": 0.18492794036865234,
2275
+ "rewards/margins": 0.28263044357299805,
2276
+ "rewards/rejected": -0.0977025032043457,
2277
+ "step": 1450
2278
+ },
2279
+ {
2280
+ "epoch": 1.0,
2281
+ "learning_rate": 0.0,
2282
+ "logits/chosen": -1.8631808757781982,
2283
+ "logits/rejected": -1.5711183547973633,
2284
+ "logps/chosen": -90.29685974121094,
2285
+ "logps/rejected": -76.57972717285156,
2286
+ "loss": 0.5684,
2287
+ "rewards/accuracies": 0.9750000238418579,
2288
+ "rewards/chosen": 0.18871954083442688,
2289
+ "rewards/margins": 0.3021387457847595,
2290
+ "rewards/rejected": -0.11341919749975204,
2291
+ "step": 1460
2292
+ },
2293
+ {
2294
+ "epoch": 1.0,
2295
+ "step": 1460,
2296
+ "total_flos": 0.0,
2297
+ "train_loss": 0.6280729855576607,
2298
+ "train_runtime": 9689.6427,
2299
+ "train_samples_per_second": 14.469,
2300
+ "train_steps_per_second": 0.151
2301
+ }
2302
+ ],
2303
+ "logging_steps": 10,
2304
+ "max_steps": 1460,
2305
+ "num_train_epochs": 1,
2306
+ "save_steps": 100,
2307
+ "total_flos": 0.0,
2308
+ "trial_name": null,
2309
+ "trial_params": null
2310
+ }