lole25 commited on
Commit
21f5880
1 Parent(s): b7f762d

Model save

Browse files
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: DUAL-GPO/phi-2-sft-lora-ultrachat-merged
9
+ model-index:
10
+ - name: phi-2-ipo-chatml
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # phi-2-ipo-chatml
18
+
19
+ This model is a fine-tuned version of [DUAL-GPO/phi-2-sft-lora-ultrachat-merged](https://huggingface.co/DUAL-GPO/phi-2-sft-lora-ultrachat-merged) on the None dataset.
20
+
21
+ ## Model description
22
+
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
+
27
+ More information needed
28
+
29
+ ## Training and evaluation data
30
+
31
+ More information needed
32
+
33
+ ## Training procedure
34
+
35
+ ### Training hyperparameters
36
+
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 5e-06
39
+ - train_batch_size: 4
40
+ - eval_batch_size: 4
41
+ - seed: 42
42
+ - distributed_type: multi-GPU
43
+ - num_devices: 2
44
+ - gradient_accumulation_steps: 4
45
+ - total_train_batch_size: 32
46
+ - total_eval_batch_size: 8
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: cosine
49
+ - lr_scheduler_warmup_ratio: 0.1
50
+ - num_epochs: 1
51
+
52
+ ### Training results
53
+
54
+
55
+
56
+ ### Framework versions
57
+
58
+ - PEFT 0.7.1
59
+ - Transformers 4.36.2
60
+ - Pytorch 2.1.2+cu121
61
+ - Datasets 2.14.6
62
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f6129406b7d164dbd1602bfc97116dcebc8946aa0a55b580a02bfec272c7b76e
3
  size 335579632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e10f8da8c9ca888d0e2c2f7ebf035cdaf9ab2942f4ffbdef8926e08d3225748d
3
  size 335579632
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 2071.8518089414265,
4
+ "train_runtime": 14310.5789,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 4.272,
7
+ "train_steps_per_second": 0.133
8
+ }
runs/May19_15-00-24_gpu4-119-5/events.out.tfevents.1716095069.gpu4-119-5.3541575.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:88d9dec4a7b8d92c657bf7b38fd90617be4c02d5832da3b921f0b9f1554b2cc2
3
- size 125822
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:caaa13a0e8513dfec362daca2ff17f22179ff4dd275de55216c3053b33319d02
3
+ size 126810
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 2071.8518089414265,
4
+ "train_runtime": 14310.5789,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 4.272,
7
+ "train_steps_per_second": 0.133
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2718 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9997382884061764,
5
+ "eval_steps": 500,
6
+ "global_step": 1910,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 2.617801047120419e-08,
14
+ "logits/chosen": 0.5248222947120667,
15
+ "logits/rejected": 0.7921571731567383,
16
+ "logps/chosen": -341.40020751953125,
17
+ "logps/rejected": -250.28689575195312,
18
+ "loss": 2500.0,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 2.617801047120419e-07,
28
+ "logits/chosen": 0.6701858043670654,
29
+ "logits/rejected": 0.7214743494987488,
30
+ "logps/chosen": -282.2621765136719,
31
+ "logps/rejected": -253.0035858154297,
32
+ "loss": 2503.3785,
33
+ "rewards/accuracies": 0.3472222089767456,
34
+ "rewards/chosen": -0.0003065296623390168,
35
+ "rewards/margins": -0.0004795099375769496,
36
+ "rewards/rejected": 0.00017298036254942417,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.01,
41
+ "learning_rate": 5.235602094240838e-07,
42
+ "logits/chosen": 0.6246625781059265,
43
+ "logits/rejected": 0.6360182166099548,
44
+ "logps/chosen": -314.952392578125,
45
+ "logps/rejected": -267.18927001953125,
46
+ "loss": 2498.9023,
47
+ "rewards/accuracies": 0.4375,
48
+ "rewards/chosen": 0.0006705065025016665,
49
+ "rewards/margins": -0.00035223033046349883,
50
+ "rewards/rejected": 0.0010227367747575045,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 7.853403141361258e-07,
56
+ "logits/chosen": 0.621583104133606,
57
+ "logits/rejected": 0.6869794726371765,
58
+ "logps/chosen": -260.4806213378906,
59
+ "logps/rejected": -254.1117401123047,
60
+ "loss": 2501.4164,
61
+ "rewards/accuracies": 0.4375,
62
+ "rewards/chosen": -0.0003680586232803762,
63
+ "rewards/margins": 8.779224299360067e-05,
64
+ "rewards/rejected": -0.00045585090992972255,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.02,
69
+ "learning_rate": 1.0471204188481676e-06,
70
+ "logits/chosen": 0.6395634412765503,
71
+ "logits/rejected": 0.7919565439224243,
72
+ "logps/chosen": -240.0687255859375,
73
+ "logps/rejected": -229.44149780273438,
74
+ "loss": 2499.6219,
75
+ "rewards/accuracies": 0.48750001192092896,
76
+ "rewards/chosen": 0.001351092243567109,
77
+ "rewards/margins": 0.0006501252064481378,
78
+ "rewards/rejected": 0.0007009669207036495,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.03,
83
+ "learning_rate": 1.3089005235602096e-06,
84
+ "logits/chosen": 0.5841951370239258,
85
+ "logits/rejected": 0.6311219334602356,
86
+ "logps/chosen": -292.2647705078125,
87
+ "logps/rejected": -244.11605834960938,
88
+ "loss": 2499.3211,
89
+ "rewards/accuracies": 0.4625000059604645,
90
+ "rewards/chosen": -0.0004217842943035066,
91
+ "rewards/margins": -0.0002647504734341055,
92
+ "rewards/rejected": -0.00015703373355790973,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.03,
97
+ "learning_rate": 1.5706806282722515e-06,
98
+ "logits/chosen": 0.5548856854438782,
99
+ "logits/rejected": 0.6343160271644592,
100
+ "logps/chosen": -250.15615844726562,
101
+ "logps/rejected": -239.6744842529297,
102
+ "loss": 2497.8898,
103
+ "rewards/accuracies": 0.35624998807907104,
104
+ "rewards/chosen": -0.00014420936349779367,
105
+ "rewards/margins": -0.00012744043488055468,
106
+ "rewards/rejected": -1.6768904970376752e-05,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.04,
111
+ "learning_rate": 1.8324607329842933e-06,
112
+ "logits/chosen": 0.5806199312210083,
113
+ "logits/rejected": 0.6849480867385864,
114
+ "logps/chosen": -263.5242004394531,
115
+ "logps/rejected": -242.7255096435547,
116
+ "loss": 2495.5641,
117
+ "rewards/accuracies": 0.5,
118
+ "rewards/chosen": -0.0010010639671236277,
119
+ "rewards/margins": 0.0013473776634782553,
120
+ "rewards/rejected": -0.0023484418634325266,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.04,
125
+ "learning_rate": 2.094240837696335e-06,
126
+ "logits/chosen": 0.6474049687385559,
127
+ "logits/rejected": 0.6796275973320007,
128
+ "logps/chosen": -265.99652099609375,
129
+ "logps/rejected": -255.4057159423828,
130
+ "loss": 2483.6129,
131
+ "rewards/accuracies": 0.5062500238418579,
132
+ "rewards/chosen": 0.00029454095056280494,
133
+ "rewards/margins": 0.0018218889599666,
134
+ "rewards/rejected": -0.0015273483004420996,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.05,
139
+ "learning_rate": 2.356020942408377e-06,
140
+ "logits/chosen": 0.6297029256820679,
141
+ "logits/rejected": 0.6693249344825745,
142
+ "logps/chosen": -259.20013427734375,
143
+ "logps/rejected": -260.8564453125,
144
+ "loss": 2468.257,
145
+ "rewards/accuracies": 0.59375,
146
+ "rewards/chosen": 0.0004229500482324511,
147
+ "rewards/margins": 0.0030678685288876295,
148
+ "rewards/rejected": -0.002644918393343687,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.05,
153
+ "learning_rate": 2.617801047120419e-06,
154
+ "logits/chosen": 0.6495063304901123,
155
+ "logits/rejected": 0.6480900645256042,
156
+ "logps/chosen": -265.02508544921875,
157
+ "logps/rejected": -234.58029174804688,
158
+ "loss": 2468.3684,
159
+ "rewards/accuracies": 0.512499988079071,
160
+ "rewards/chosen": -0.000877298996783793,
161
+ "rewards/margins": 0.0035102677065879107,
162
+ "rewards/rejected": -0.0043875668197870255,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.06,
167
+ "learning_rate": 2.8795811518324613e-06,
168
+ "logits/chosen": 0.6897019147872925,
169
+ "logits/rejected": 0.7566229701042175,
170
+ "logps/chosen": -304.11102294921875,
171
+ "logps/rejected": -264.30621337890625,
172
+ "loss": 2465.5504,
173
+ "rewards/accuracies": 0.5249999761581421,
174
+ "rewards/chosen": -0.004195456858724356,
175
+ "rewards/margins": 0.0027198302559554577,
176
+ "rewards/rejected": -0.006915287580341101,
177
+ "step": 110
178
+ },
179
+ {
180
+ "epoch": 0.06,
181
+ "learning_rate": 3.141361256544503e-06,
182
+ "logits/chosen": 0.6134337186813354,
183
+ "logits/rejected": 0.7231487035751343,
184
+ "logps/chosen": -311.39984130859375,
185
+ "logps/rejected": -257.10650634765625,
186
+ "loss": 2428.1793,
187
+ "rewards/accuracies": 0.643750011920929,
188
+ "rewards/chosen": -0.0012777966912835836,
189
+ "rewards/margins": 0.009219733066856861,
190
+ "rewards/rejected": -0.010497529059648514,
191
+ "step": 120
192
+ },
193
+ {
194
+ "epoch": 0.07,
195
+ "learning_rate": 3.403141361256545e-06,
196
+ "logits/chosen": 0.7281027436256409,
197
+ "logits/rejected": 0.7294681668281555,
198
+ "logps/chosen": -287.9706115722656,
199
+ "logps/rejected": -253.9458770751953,
200
+ "loss": 2393.4254,
201
+ "rewards/accuracies": 0.637499988079071,
202
+ "rewards/chosen": 0.0035368993412703276,
203
+ "rewards/margins": 0.01360202394425869,
204
+ "rewards/rejected": -0.010065125301480293,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 0.07,
209
+ "learning_rate": 3.6649214659685865e-06,
210
+ "logits/chosen": 0.6975444555282593,
211
+ "logits/rejected": 0.7464872598648071,
212
+ "logps/chosen": -283.344970703125,
213
+ "logps/rejected": -269.69134521484375,
214
+ "loss": 2391.2451,
215
+ "rewards/accuracies": 0.574999988079071,
216
+ "rewards/chosen": -0.0033713714219629765,
217
+ "rewards/margins": 0.01122850738465786,
218
+ "rewards/rejected": -0.0145998764783144,
219
+ "step": 140
220
+ },
221
+ {
222
+ "epoch": 0.08,
223
+ "learning_rate": 3.926701570680629e-06,
224
+ "logits/chosen": 0.6442640423774719,
225
+ "logits/rejected": 0.7055094838142395,
226
+ "logps/chosen": -297.76904296875,
227
+ "logps/rejected": -262.79010009765625,
228
+ "loss": 2339.1273,
229
+ "rewards/accuracies": 0.6000000238418579,
230
+ "rewards/chosen": -0.006125102750957012,
231
+ "rewards/margins": 0.016310054808855057,
232
+ "rewards/rejected": -0.022435154765844345,
233
+ "step": 150
234
+ },
235
+ {
236
+ "epoch": 0.08,
237
+ "learning_rate": 4.18848167539267e-06,
238
+ "logits/chosen": 0.6234780550003052,
239
+ "logits/rejected": 0.5896192789077759,
240
+ "logps/chosen": -279.10498046875,
241
+ "logps/rejected": -247.49490356445312,
242
+ "loss": 2336.1186,
243
+ "rewards/accuracies": 0.581250011920929,
244
+ "rewards/chosen": -0.01037096418440342,
245
+ "rewards/margins": 0.019876617938280106,
246
+ "rewards/rejected": -0.030247583985328674,
247
+ "step": 160
248
+ },
249
+ {
250
+ "epoch": 0.09,
251
+ "learning_rate": 4.450261780104713e-06,
252
+ "logits/chosen": 0.6194897890090942,
253
+ "logits/rejected": 0.6363841891288757,
254
+ "logps/chosen": -293.1652526855469,
255
+ "logps/rejected": -242.565185546875,
256
+ "loss": 2349.7029,
257
+ "rewards/accuracies": 0.6187499761581421,
258
+ "rewards/chosen": -0.013055374845862389,
259
+ "rewards/margins": 0.02146710641682148,
260
+ "rewards/rejected": -0.03452248126268387,
261
+ "step": 170
262
+ },
263
+ {
264
+ "epoch": 0.09,
265
+ "learning_rate": 4.712041884816754e-06,
266
+ "logits/chosen": 0.6403040885925293,
267
+ "logits/rejected": 0.7126356363296509,
268
+ "logps/chosen": -291.5748596191406,
269
+ "logps/rejected": -251.09121704101562,
270
+ "loss": 2273.3473,
271
+ "rewards/accuracies": 0.6000000238418579,
272
+ "rewards/chosen": -0.030034661293029785,
273
+ "rewards/margins": 0.02929743006825447,
274
+ "rewards/rejected": -0.05933208391070366,
275
+ "step": 180
276
+ },
277
+ {
278
+ "epoch": 0.1,
279
+ "learning_rate": 4.9738219895287965e-06,
280
+ "logits/chosen": 0.7060214281082153,
281
+ "logits/rejected": 0.7062759399414062,
282
+ "logps/chosen": -251.7637939453125,
283
+ "logps/rejected": -224.1781768798828,
284
+ "loss": 2288.9881,
285
+ "rewards/accuracies": 0.53125,
286
+ "rewards/chosen": -0.039670929312705994,
287
+ "rewards/margins": 0.025101035833358765,
288
+ "rewards/rejected": -0.06477196514606476,
289
+ "step": 190
290
+ },
291
+ {
292
+ "epoch": 0.1,
293
+ "learning_rate": 4.999661831436499e-06,
294
+ "logits/chosen": 0.6065430045127869,
295
+ "logits/rejected": 0.5599089860916138,
296
+ "logps/chosen": -303.3594970703125,
297
+ "logps/rejected": -281.65301513671875,
298
+ "loss": 2323.0852,
299
+ "rewards/accuracies": 0.6187499761581421,
300
+ "rewards/chosen": -0.051525335758924484,
301
+ "rewards/margins": 0.03378116711974144,
302
+ "rewards/rejected": -0.08530650287866592,
303
+ "step": 200
304
+ },
305
+ {
306
+ "epoch": 0.11,
307
+ "learning_rate": 4.9984929711403395e-06,
308
+ "logits/chosen": 0.6741048097610474,
309
+ "logits/rejected": 0.7067887187004089,
310
+ "logps/chosen": -256.2054138183594,
311
+ "logps/rejected": -225.0240478515625,
312
+ "loss": 2257.4254,
313
+ "rewards/accuracies": 0.5625,
314
+ "rewards/chosen": -0.05382019281387329,
315
+ "rewards/margins": 0.03237619996070862,
316
+ "rewards/rejected": -0.08619637787342072,
317
+ "step": 210
318
+ },
319
+ {
320
+ "epoch": 0.12,
321
+ "learning_rate": 4.996489634487865e-06,
322
+ "logits/chosen": 0.6408742070198059,
323
+ "logits/rejected": 0.7459608316421509,
324
+ "logps/chosen": -280.1336975097656,
325
+ "logps/rejected": -261.680908203125,
326
+ "loss": 2225.691,
327
+ "rewards/accuracies": 0.643750011920929,
328
+ "rewards/chosen": -0.06642502546310425,
329
+ "rewards/margins": 0.03849685937166214,
330
+ "rewards/rejected": -0.10492189228534698,
331
+ "step": 220
332
+ },
333
+ {
334
+ "epoch": 0.12,
335
+ "learning_rate": 4.9936524905772466e-06,
336
+ "logits/chosen": 0.511016845703125,
337
+ "logits/rejected": 0.6876882910728455,
338
+ "logps/chosen": -295.7177429199219,
339
+ "logps/rejected": -278.0430908203125,
340
+ "loss": 2274.0725,
341
+ "rewards/accuracies": 0.46875,
342
+ "rewards/chosen": -0.0953628420829773,
343
+ "rewards/margins": 0.025789355859160423,
344
+ "rewards/rejected": -0.12115219980478287,
345
+ "step": 230
346
+ },
347
+ {
348
+ "epoch": 0.13,
349
+ "learning_rate": 4.9899824869915e-06,
350
+ "logits/chosen": 0.6104953289031982,
351
+ "logits/rejected": 0.6206714510917664,
352
+ "logps/chosen": -269.36212158203125,
353
+ "logps/rejected": -230.29348754882812,
354
+ "loss": 2182.6342,
355
+ "rewards/accuracies": 0.625,
356
+ "rewards/chosen": -0.0956740602850914,
357
+ "rewards/margins": 0.05557785555720329,
358
+ "rewards/rejected": -0.1512519270181656,
359
+ "step": 240
360
+ },
361
+ {
362
+ "epoch": 0.13,
363
+ "learning_rate": 4.985480849482012e-06,
364
+ "logits/chosen": 0.5841513872146606,
365
+ "logits/rejected": 0.7289382815361023,
366
+ "logps/chosen": -297.09185791015625,
367
+ "logps/rejected": -283.0509338378906,
368
+ "loss": 2300.1652,
369
+ "rewards/accuracies": 0.5249999761581421,
370
+ "rewards/chosen": -0.12446895986795425,
371
+ "rewards/margins": 0.020662058144807816,
372
+ "rewards/rejected": -0.14513102173805237,
373
+ "step": 250
374
+ },
375
+ {
376
+ "epoch": 0.14,
377
+ "learning_rate": 4.980149081559142e-06,
378
+ "logits/chosen": 0.594504177570343,
379
+ "logits/rejected": 0.659439742565155,
380
+ "logps/chosen": -318.9085693359375,
381
+ "logps/rejected": -282.30548095703125,
382
+ "loss": 2157.8906,
383
+ "rewards/accuracies": 0.6499999761581421,
384
+ "rewards/chosen": -0.0681561678647995,
385
+ "rewards/margins": 0.05495098978281021,
386
+ "rewards/rejected": -0.12310715764760971,
387
+ "step": 260
388
+ },
389
+ {
390
+ "epoch": 0.14,
391
+ "learning_rate": 4.9739889639900655e-06,
392
+ "logits/chosen": 0.660437822341919,
393
+ "logits/rejected": 0.6371047496795654,
394
+ "logps/chosen": -278.40045166015625,
395
+ "logps/rejected": -276.32794189453125,
396
+ "loss": 2100.2004,
397
+ "rewards/accuracies": 0.6187499761581421,
398
+ "rewards/chosen": -0.0825929194688797,
399
+ "rewards/margins": 0.06175379827618599,
400
+ "rewards/rejected": -0.1443466991186142,
401
+ "step": 270
402
+ },
403
+ {
404
+ "epoch": 0.15,
405
+ "learning_rate": 4.967002554204009e-06,
406
+ "logits/chosen": 0.5346761345863342,
407
+ "logits/rejected": 0.653354287147522,
408
+ "logps/chosen": -274.6492919921875,
409
+ "logps/rejected": -256.0135192871094,
410
+ "loss": 2231.843,
411
+ "rewards/accuracies": 0.550000011920929,
412
+ "rewards/chosen": -0.10598695278167725,
413
+ "rewards/margins": 0.04181862622499466,
414
+ "rewards/rejected": -0.1478056013584137,
415
+ "step": 280
416
+ },
417
+ {
418
+ "epoch": 0.15,
419
+ "learning_rate": 4.959192185605089e-06,
420
+ "logits/chosen": 0.6121161580085754,
421
+ "logits/rejected": 0.6786028742790222,
422
+ "logps/chosen": -291.69268798828125,
423
+ "logps/rejected": -270.1987609863281,
424
+ "loss": 2302.0896,
425
+ "rewards/accuracies": 0.5687500238418579,
426
+ "rewards/chosen": -0.11571818590164185,
427
+ "rewards/margins": 0.04277648404240608,
428
+ "rewards/rejected": -0.15849466621875763,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 0.16,
433
+ "learning_rate": 4.950560466792969e-06,
434
+ "logits/chosen": 0.6503465175628662,
435
+ "logits/rejected": 0.6750337481498718,
436
+ "logps/chosen": -297.166259765625,
437
+ "logps/rejected": -264.58905029296875,
438
+ "loss": 2290.2439,
439
+ "rewards/accuracies": 0.59375,
440
+ "rewards/chosen": -0.1382652223110199,
441
+ "rewards/margins": 0.050003498792648315,
442
+ "rewards/rejected": -0.1882687211036682,
443
+ "step": 300
444
+ },
445
+ {
446
+ "epoch": 0.16,
447
+ "learning_rate": 4.9411102806916185e-06,
448
+ "logits/chosen": 0.6111949682235718,
449
+ "logits/rejected": 0.5789340734481812,
450
+ "logps/chosen": -335.71209716796875,
451
+ "logps/rejected": -271.94281005859375,
452
+ "loss": 2030.3432,
453
+ "rewards/accuracies": 0.71875,
454
+ "rewards/chosen": -0.1214783638715744,
455
+ "rewards/margins": 0.07279713451862335,
456
+ "rewards/rejected": -0.19427552819252014,
457
+ "step": 310
458
+ },
459
+ {
460
+ "epoch": 0.17,
461
+ "learning_rate": 4.930844783586424e-06,
462
+ "logits/chosen": 0.6168066263198853,
463
+ "logits/rejected": 0.6404728293418884,
464
+ "logps/chosen": -267.7217102050781,
465
+ "logps/rejected": -254.76156616210938,
466
+ "loss": 2180.659,
467
+ "rewards/accuracies": 0.606249988079071,
468
+ "rewards/chosen": -0.1399281919002533,
469
+ "rewards/margins": 0.05871574953198433,
470
+ "rewards/rejected": -0.19864396750926971,
471
+ "step": 320
472
+ },
473
+ {
474
+ "epoch": 0.17,
475
+ "learning_rate": 4.919767404070033e-06,
476
+ "logits/chosen": 0.6463326215744019,
477
+ "logits/rejected": 0.6055666208267212,
478
+ "logps/chosen": -288.9108581542969,
479
+ "logps/rejected": -271.7781066894531,
480
+ "loss": 2136.0844,
481
+ "rewards/accuracies": 0.6000000238418579,
482
+ "rewards/chosen": -0.18461689352989197,
483
+ "rewards/margins": 0.058290112763643265,
484
+ "rewards/rejected": -0.24290700256824493,
485
+ "step": 330
486
+ },
487
+ {
488
+ "epoch": 0.18,
489
+ "learning_rate": 4.907881841897216e-06,
490
+ "logits/chosen": 0.5913775563240051,
491
+ "logits/rejected": 0.6087537407875061,
492
+ "logps/chosen": -348.69317626953125,
493
+ "logps/rejected": -276.8719787597656,
494
+ "loss": 2095.2922,
495
+ "rewards/accuracies": 0.65625,
496
+ "rewards/chosen": -0.21791012585163116,
497
+ "rewards/margins": 0.08299825340509415,
498
+ "rewards/rejected": -0.3009083867073059,
499
+ "step": 340
500
+ },
501
+ {
502
+ "epoch": 0.18,
503
+ "learning_rate": 4.89519206674919e-06,
504
+ "logits/chosen": 0.5306932926177979,
505
+ "logits/rejected": 0.5599890947341919,
506
+ "logps/chosen": -277.52801513671875,
507
+ "logps/rejected": -282.8912048339844,
508
+ "loss": 2027.6721,
509
+ "rewards/accuracies": 0.6187499761581421,
510
+ "rewards/chosen": -0.2483474463224411,
511
+ "rewards/margins": 0.08130116015672684,
512
+ "rewards/rejected": -0.32964861392974854,
513
+ "step": 350
514
+ },
515
+ {
516
+ "epoch": 0.19,
517
+ "learning_rate": 4.881702316907769e-06,
518
+ "logits/chosen": 0.5017831921577454,
519
+ "logits/rejected": 0.6145971417427063,
520
+ "logps/chosen": -247.94332885742188,
521
+ "logps/rejected": -276.70562744140625,
522
+ "loss": 2096.8201,
523
+ "rewards/accuracies": 0.59375,
524
+ "rewards/chosen": -0.2660738229751587,
525
+ "rewards/margins": 0.07246068120002747,
526
+ "rewards/rejected": -0.33853450417518616,
527
+ "step": 360
528
+ },
529
+ {
530
+ "epoch": 0.19,
531
+ "learning_rate": 4.86741709783982e-06,
532
+ "logits/chosen": 0.5044198632240295,
533
+ "logits/rejected": 0.610668957233429,
534
+ "logps/chosen": -370.6680908203125,
535
+ "logps/rejected": -315.87445068359375,
536
+ "loss": 2240.1234,
537
+ "rewards/accuracies": 0.625,
538
+ "rewards/chosen": -0.2816161513328552,
539
+ "rewards/margins": 0.0706692636013031,
540
+ "rewards/rejected": -0.3522854149341583,
541
+ "step": 370
542
+ },
543
+ {
544
+ "epoch": 0.2,
545
+ "learning_rate": 4.852341180692471e-06,
546
+ "logits/chosen": 0.5787937045097351,
547
+ "logits/rejected": 0.6844087839126587,
548
+ "logps/chosen": -314.65386962890625,
549
+ "logps/rejected": -279.57464599609375,
550
+ "loss": 2000.0223,
551
+ "rewards/accuracies": 0.6499999761581421,
552
+ "rewards/chosen": -0.28157955408096313,
553
+ "rewards/margins": 0.09683749079704285,
554
+ "rewards/rejected": -0.3784170150756836,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 0.2,
559
+ "learning_rate": 4.836479600699579e-06,
560
+ "logits/chosen": 0.5943303108215332,
561
+ "logits/rejected": 0.5367528200149536,
562
+ "logps/chosen": -326.36322021484375,
563
+ "logps/rejected": -322.28851318359375,
564
+ "loss": 2038.491,
565
+ "rewards/accuracies": 0.6499999761581421,
566
+ "rewards/chosen": -0.31988343596458435,
567
+ "rewards/margins": 0.08327943086624146,
568
+ "rewards/rejected": -0.4031628668308258,
569
+ "step": 390
570
+ },
571
+ {
572
+ "epoch": 0.21,
573
+ "learning_rate": 4.819837655500014e-06,
574
+ "logits/chosen": 0.4382709562778473,
575
+ "logits/rejected": 0.5377568006515503,
576
+ "logps/chosen": -268.5506286621094,
577
+ "logps/rejected": -256.25457763671875,
578
+ "loss": 2150.4953,
579
+ "rewards/accuracies": 0.606249988079071,
580
+ "rewards/chosen": -0.3451124131679535,
581
+ "rewards/margins": 0.06852660328149796,
582
+ "rewards/rejected": -0.41363900899887085,
583
+ "step": 400
584
+ },
585
+ {
586
+ "epoch": 0.21,
587
+ "learning_rate": 4.802420903368286e-06,
588
+ "logits/chosen": 0.5234228372573853,
589
+ "logits/rejected": 0.5555657744407654,
590
+ "logps/chosen": -306.6415100097656,
591
+ "logps/rejected": -284.44061279296875,
592
+ "loss": 2237.9328,
593
+ "rewards/accuracies": 0.5562499761581421,
594
+ "rewards/chosen": -0.3614524304866791,
595
+ "rewards/margins": 0.05758042261004448,
596
+ "rewards/rejected": -0.41903290152549744,
597
+ "step": 410
598
+ },
599
+ {
600
+ "epoch": 0.22,
601
+ "learning_rate": 4.784235161358124e-06,
602
+ "logits/chosen": 0.477716863155365,
603
+ "logits/rejected": 0.5133184194564819,
604
+ "logps/chosen": -334.1138916015625,
605
+ "logps/rejected": -301.9080505371094,
606
+ "loss": 2092.0188,
607
+ "rewards/accuracies": 0.643750011920929,
608
+ "rewards/chosen": -0.3627592921257019,
609
+ "rewards/margins": 0.10072964429855347,
610
+ "rewards/rejected": -0.463488906621933,
611
+ "step": 420
612
+ },
613
+ {
614
+ "epoch": 0.23,
615
+ "learning_rate": 4.765286503359632e-06,
616
+ "logits/chosen": 0.5422715544700623,
617
+ "logits/rejected": 0.6406744122505188,
618
+ "logps/chosen": -305.5355529785156,
619
+ "logps/rejected": -282.97210693359375,
620
+ "loss": 2014.3465,
621
+ "rewards/accuracies": 0.6312500238418579,
622
+ "rewards/chosen": -0.29455187916755676,
623
+ "rewards/margins": 0.07915346324443817,
624
+ "rewards/rejected": -0.3737053871154785,
625
+ "step": 430
626
+ },
627
+ {
628
+ "epoch": 0.23,
629
+ "learning_rate": 4.745581258070654e-06,
630
+ "logits/chosen": 0.43309369683265686,
631
+ "logits/rejected": 0.5271438360214233,
632
+ "logps/chosen": -287.7671813964844,
633
+ "logps/rejected": -281.0647888183594,
634
+ "loss": 2185.2459,
635
+ "rewards/accuracies": 0.550000011920929,
636
+ "rewards/chosen": -0.34788942337036133,
637
+ "rewards/margins": 0.05820406228303909,
638
+ "rewards/rejected": -0.4060935080051422,
639
+ "step": 440
640
+ },
641
+ {
642
+ "epoch": 0.24,
643
+ "learning_rate": 4.725126006883047e-06,
644
+ "logits/chosen": 0.45388108491897583,
645
+ "logits/rejected": 0.5083224177360535,
646
+ "logps/chosen": -268.25958251953125,
647
+ "logps/rejected": -266.2535400390625,
648
+ "loss": 2137.6896,
649
+ "rewards/accuracies": 0.550000011920929,
650
+ "rewards/chosen": -0.3282170295715332,
651
+ "rewards/margins": 0.05696592479944229,
652
+ "rewards/rejected": -0.3851829469203949,
653
+ "step": 450
654
+ },
655
+ {
656
+ "epoch": 0.24,
657
+ "learning_rate": 4.70392758168454e-06,
658
+ "logits/chosen": 0.5513511896133423,
659
+ "logits/rejected": 0.510746955871582,
660
+ "logps/chosen": -370.2562255859375,
661
+ "logps/rejected": -305.41265869140625,
662
+ "loss": 2088.2188,
663
+ "rewards/accuracies": 0.6312500238418579,
664
+ "rewards/chosen": -0.31895995140075684,
665
+ "rewards/margins": 0.08234255015850067,
666
+ "rewards/rejected": -0.4013025164604187,
667
+ "step": 460
668
+ },
669
+ {
670
+ "epoch": 0.25,
671
+ "learning_rate": 4.68199306257695e-06,
672
+ "logits/chosen": 0.5287891626358032,
673
+ "logits/rejected": 0.5082263350486755,
674
+ "logps/chosen": -355.8836364746094,
675
+ "logps/rejected": -318.6692810058594,
676
+ "loss": 2073.2719,
677
+ "rewards/accuracies": 0.643750011920929,
678
+ "rewards/chosen": -0.327432781457901,
679
+ "rewards/margins": 0.10078072547912598,
680
+ "rewards/rejected": -0.42821353673934937,
681
+ "step": 470
682
+ },
683
+ {
684
+ "epoch": 0.25,
685
+ "learning_rate": 4.659329775511478e-06,
686
+ "logits/chosen": 0.553450345993042,
687
+ "logits/rejected": 0.5118339657783508,
688
+ "logps/chosen": -307.2048645019531,
689
+ "logps/rejected": -274.0884094238281,
690
+ "loss": 2053.993,
691
+ "rewards/accuracies": 0.612500011920929,
692
+ "rewards/chosen": -0.3153889775276184,
693
+ "rewards/margins": 0.08515409380197525,
694
+ "rewards/rejected": -0.40054306387901306,
695
+ "step": 480
696
+ },
697
+ {
698
+ "epoch": 0.26,
699
+ "learning_rate": 4.635945289841902e-06,
700
+ "logits/chosen": 0.49377956986427307,
701
+ "logits/rejected": 0.4585692286491394,
702
+ "logps/chosen": -312.8728942871094,
703
+ "logps/rejected": -320.9603271484375,
704
+ "loss": 2090.7937,
705
+ "rewards/accuracies": 0.606249988079071,
706
+ "rewards/chosen": -0.3384076952934265,
707
+ "rewards/margins": 0.10788736492395401,
708
+ "rewards/rejected": -0.4462950825691223,
709
+ "step": 490
710
+ },
711
+ {
712
+ "epoch": 0.26,
713
+ "learning_rate": 4.611847415796476e-06,
714
+ "logits/chosen": 0.4848670959472656,
715
+ "logits/rejected": 0.42902207374572754,
716
+ "logps/chosen": -343.72515869140625,
717
+ "logps/rejected": -293.16815185546875,
718
+ "loss": 2090.8078,
719
+ "rewards/accuracies": 0.574999988079071,
720
+ "rewards/chosen": -0.37585896253585815,
721
+ "rewards/margins": 0.0655459612607956,
722
+ "rewards/rejected": -0.44140490889549255,
723
+ "step": 500
724
+ },
725
+ {
726
+ "epoch": 0.27,
727
+ "learning_rate": 4.587044201869378e-06,
728
+ "logits/chosen": 0.4169999957084656,
729
+ "logits/rejected": 0.4936888813972473,
730
+ "logps/chosen": -311.93963623046875,
731
+ "logps/rejected": -320.00408935546875,
732
+ "loss": 2092.8064,
733
+ "rewards/accuracies": 0.5687500238418579,
734
+ "rewards/chosen": -0.42008256912231445,
735
+ "rewards/margins": 0.07488216459751129,
736
+ "rewards/rejected": -0.49496474862098694,
737
+ "step": 510
738
+ },
739
+ {
740
+ "epoch": 0.27,
741
+ "learning_rate": 4.561543932132574e-06,
742
+ "logits/chosen": 0.46682390570640564,
743
+ "logits/rejected": 0.4480462074279785,
744
+ "logps/chosen": -331.2510070800781,
745
+ "logps/rejected": -289.58355712890625,
746
+ "loss": 2058.4975,
747
+ "rewards/accuracies": 0.5874999761581421,
748
+ "rewards/chosen": -0.4290853440761566,
749
+ "rewards/margins": 0.08971880376338959,
750
+ "rewards/rejected": -0.5188041925430298,
751
+ "step": 520
752
+ },
753
+ {
754
+ "epoch": 0.28,
755
+ "learning_rate": 4.535355123469009e-06,
756
+ "logits/chosen": 0.4296157956123352,
757
+ "logits/rejected": 0.4146192967891693,
758
+ "logps/chosen": -331.16595458984375,
759
+ "logps/rejected": -309.8785705566406,
760
+ "loss": 2106.7164,
761
+ "rewards/accuracies": 0.637499988079071,
762
+ "rewards/chosen": -0.5114426612854004,
763
+ "rewards/margins": 0.09367333352565765,
764
+ "rewards/rejected": -0.6051160097122192,
765
+ "step": 530
766
+ },
767
+ {
768
+ "epoch": 0.28,
769
+ "learning_rate": 4.508486522728037e-06,
770
+ "logits/chosen": 0.5072197318077087,
771
+ "logits/rejected": 0.5970960855484009,
772
+ "logps/chosen": -322.4172668457031,
773
+ "logps/rejected": -315.7408752441406,
774
+ "loss": 2065.5746,
775
+ "rewards/accuracies": 0.6000000238418579,
776
+ "rewards/chosen": -0.49308618903160095,
777
+ "rewards/margins": 0.08288516104221344,
778
+ "rewards/rejected": -0.5759714245796204,
779
+ "step": 540
780
+ },
781
+ {
782
+ "epoch": 0.29,
783
+ "learning_rate": 4.480947103804044e-06,
784
+ "logits/chosen": 0.5679532289505005,
785
+ "logits/rejected": 0.6028575897216797,
786
+ "logps/chosen": -318.32794189453125,
787
+ "logps/rejected": -275.52716064453125,
788
+ "loss": 2036.5631,
789
+ "rewards/accuracies": 0.699999988079071,
790
+ "rewards/chosen": -0.5034439563751221,
791
+ "rewards/margins": 0.1076350212097168,
792
+ "rewards/rejected": -0.6110790371894836,
793
+ "step": 550
794
+ },
795
+ {
796
+ "epoch": 0.29,
797
+ "learning_rate": 4.452746064639239e-06,
798
+ "logits/chosen": 0.3957621455192566,
799
+ "logits/rejected": 0.5071176290512085,
800
+ "logps/chosen": -377.244873046875,
801
+ "logps/rejected": -318.47821044921875,
802
+ "loss": 2035.9066,
803
+ "rewards/accuracies": 0.65625,
804
+ "rewards/chosen": -0.4797874093055725,
805
+ "rewards/margins": 0.0891679972410202,
806
+ "rewards/rejected": -0.5689553022384644,
807
+ "step": 560
808
+ },
809
+ {
810
+ "epoch": 0.3,
811
+ "learning_rate": 4.423892824151617e-06,
812
+ "logits/chosen": 0.5428576469421387,
813
+ "logits/rejected": 0.5582025051116943,
814
+ "logps/chosen": -325.265869140625,
815
+ "logps/rejected": -279.44622802734375,
816
+ "loss": 2099.3277,
817
+ "rewards/accuracies": 0.625,
818
+ "rewards/chosen": -0.4889600872993469,
819
+ "rewards/margins": 0.08092018216848373,
820
+ "rewards/rejected": -0.5698802471160889,
821
+ "step": 570
822
+ },
823
+ {
824
+ "epoch": 0.3,
825
+ "learning_rate": 4.3943970190891164e-06,
826
+ "logits/chosen": 0.47033435106277466,
827
+ "logits/rejected": 0.4538179039955139,
828
+ "logps/chosen": -338.0190124511719,
829
+ "logps/rejected": -284.2043151855469,
830
+ "loss": 1990.1314,
831
+ "rewards/accuracies": 0.612500011920929,
832
+ "rewards/chosen": -0.5050845742225647,
833
+ "rewards/margins": 0.07665327191352844,
834
+ "rewards/rejected": -0.5817378163337708,
835
+ "step": 580
836
+ },
837
+ {
838
+ "epoch": 0.31,
839
+ "learning_rate": 4.364268500811025e-06,
840
+ "logits/chosen": 0.43608421087265015,
841
+ "logits/rejected": 0.588505208492279,
842
+ "logps/chosen": -355.6614685058594,
843
+ "logps/rejected": -311.62603759765625,
844
+ "loss": 1957.0082,
845
+ "rewards/accuracies": 0.668749988079071,
846
+ "rewards/chosen": -0.5208545327186584,
847
+ "rewards/margins": 0.10252735763788223,
848
+ "rewards/rejected": -0.6233818531036377,
849
+ "step": 590
850
+ },
851
+ {
852
+ "epoch": 0.31,
853
+ "learning_rate": 4.333517331997704e-06,
854
+ "logits/chosen": 0.5493937730789185,
855
+ "logits/rejected": 0.5916265249252319,
856
+ "logps/chosen": -346.2372131347656,
857
+ "logps/rejected": -297.2783203125,
858
+ "loss": 1937.0328,
859
+ "rewards/accuracies": 0.65625,
860
+ "rewards/chosen": -0.4746910631656647,
861
+ "rewards/margins": 0.10906670242547989,
862
+ "rewards/rejected": -0.583757758140564,
863
+ "step": 600
864
+ },
865
+ {
866
+ "epoch": 0.32,
867
+ "learning_rate": 4.302153783289737e-06,
868
+ "logits/chosen": 0.5612093210220337,
869
+ "logits/rejected": 0.5712814331054688,
870
+ "logps/chosen": -299.03192138671875,
871
+ "logps/rejected": -295.932373046875,
872
+ "loss": 1899.6357,
873
+ "rewards/accuracies": 0.643750011920929,
874
+ "rewards/chosen": -0.45807942748069763,
875
+ "rewards/margins": 0.11209867149591446,
876
+ "rewards/rejected": -0.5701780915260315,
877
+ "step": 610
878
+ },
879
+ {
880
+ "epoch": 0.32,
881
+ "learning_rate": 4.270188329857613e-06,
882
+ "logits/chosen": 0.4900333285331726,
883
+ "logits/rejected": 0.6181058287620544,
884
+ "logps/chosen": -345.901123046875,
885
+ "logps/rejected": -315.23150634765625,
886
+ "loss": 1829.7244,
887
+ "rewards/accuracies": 0.637499988079071,
888
+ "rewards/chosen": -0.44543027877807617,
889
+ "rewards/margins": 0.11854176223278046,
890
+ "rewards/rejected": -0.563971996307373,
891
+ "step": 620
892
+ },
893
+ {
894
+ "epoch": 0.33,
895
+ "learning_rate": 4.237631647903115e-06,
896
+ "logits/chosen": 0.5171164274215698,
897
+ "logits/rejected": 0.6000061631202698,
898
+ "logps/chosen": -305.6175842285156,
899
+ "logps/rejected": -288.61529541015625,
900
+ "loss": 1928.8063,
901
+ "rewards/accuracies": 0.643750011920929,
902
+ "rewards/chosen": -0.41954368352890015,
903
+ "rewards/margins": 0.10498888790607452,
904
+ "rewards/rejected": -0.5245326161384583,
905
+ "step": 630
906
+ },
907
+ {
908
+ "epoch": 0.33,
909
+ "learning_rate": 4.204494611093548e-06,
910
+ "logits/chosen": 0.5002217292785645,
911
+ "logits/rejected": 0.4931250512599945,
912
+ "logps/chosen": -374.89447021484375,
913
+ "logps/rejected": -323.4224853515625,
914
+ "loss": 2052.2016,
915
+ "rewards/accuracies": 0.643750011920929,
916
+ "rewards/chosen": -0.4047914445400238,
917
+ "rewards/margins": 0.10931004583835602,
918
+ "rewards/rejected": -0.5141014456748962,
919
+ "step": 640
920
+ },
921
+ {
922
+ "epoch": 0.34,
923
+ "learning_rate": 4.170788286930024e-06,
924
+ "logits/chosen": 0.5889243483543396,
925
+ "logits/rejected": 0.49473047256469727,
926
+ "logps/chosen": -311.41204833984375,
927
+ "logps/rejected": -297.2741394042969,
928
+ "loss": 2161.5602,
929
+ "rewards/accuracies": 0.606249988079071,
930
+ "rewards/chosen": -0.4144001007080078,
931
+ "rewards/margins": 0.09458984434604645,
932
+ "rewards/rejected": -0.5089899301528931,
933
+ "step": 650
934
+ },
935
+ {
936
+ "epoch": 0.35,
937
+ "learning_rate": 4.136523933051005e-06,
938
+ "logits/chosen": 0.46171918511390686,
939
+ "logits/rejected": 0.5521343350410461,
940
+ "logps/chosen": -260.4045715332031,
941
+ "logps/rejected": -253.0673370361328,
942
+ "loss": 2058.6592,
943
+ "rewards/accuracies": 0.5562499761581421,
944
+ "rewards/chosen": -0.35439711809158325,
945
+ "rewards/margins": 0.0739353746175766,
946
+ "rewards/rejected": -0.42833250761032104,
947
+ "step": 660
948
+ },
949
+ {
950
+ "epoch": 0.35,
951
+ "learning_rate": 4.101712993472348e-06,
952
+ "logits/chosen": 0.5095491409301758,
953
+ "logits/rejected": 0.5851413607597351,
954
+ "logps/chosen": -333.0138244628906,
955
+ "logps/rejected": -299.2201232910156,
956
+ "loss": 2114.274,
957
+ "rewards/accuracies": 0.5687500238418579,
958
+ "rewards/chosen": -0.3759954273700714,
959
+ "rewards/margins": 0.07854921370744705,
960
+ "rewards/rejected": -0.45454463362693787,
961
+ "step": 670
962
+ },
963
+ {
964
+ "epoch": 0.36,
965
+ "learning_rate": 4.066367094765091e-06,
966
+ "logits/chosen": 0.5400400161743164,
967
+ "logits/rejected": 0.6057177782058716,
968
+ "logps/chosen": -314.9399108886719,
969
+ "logps/rejected": -299.5291442871094,
970
+ "loss": 2165.3699,
971
+ "rewards/accuracies": 0.6187499761581421,
972
+ "rewards/chosen": -0.3974139094352722,
973
+ "rewards/margins": 0.0817367285490036,
974
+ "rewards/rejected": -0.4791506826877594,
975
+ "step": 680
976
+ },
977
+ {
978
+ "epoch": 0.36,
979
+ "learning_rate": 4.030498042172277e-06,
980
+ "logits/chosen": 0.46717318892478943,
981
+ "logits/rejected": 0.5683552622795105,
982
+ "logps/chosen": -311.09478759765625,
983
+ "logps/rejected": -286.71856689453125,
984
+ "loss": 1970.309,
985
+ "rewards/accuracies": 0.637499988079071,
986
+ "rewards/chosen": -0.36804935336112976,
987
+ "rewards/margins": 0.09274602681398392,
988
+ "rewards/rejected": -0.46079540252685547,
989
+ "step": 690
990
+ },
991
+ {
992
+ "epoch": 0.37,
993
+ "learning_rate": 3.994117815666095e-06,
994
+ "logits/chosen": 0.5168116688728333,
995
+ "logits/rejected": 0.5277969241142273,
996
+ "logps/chosen": -331.3283996582031,
997
+ "logps/rejected": -316.9101257324219,
998
+ "loss": 2094.1586,
999
+ "rewards/accuracies": 0.5874999761581421,
1000
+ "rewards/chosen": -0.3705582618713379,
1001
+ "rewards/margins": 0.07725582271814346,
1002
+ "rewards/rejected": -0.44781407713890076,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "epoch": 0.37,
1007
+ "learning_rate": 3.957238565946672e-06,
1008
+ "logits/chosen": 0.4891355633735657,
1009
+ "logits/rejected": 0.5829272270202637,
1010
+ "logps/chosen": -341.33270263671875,
1011
+ "logps/rejected": -326.12237548828125,
1012
+ "loss": 2015.8363,
1013
+ "rewards/accuracies": 0.668749988079071,
1014
+ "rewards/chosen": -0.3461948037147522,
1015
+ "rewards/margins": 0.1047135442495346,
1016
+ "rewards/rejected": -0.450908362865448,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "epoch": 0.38,
1021
+ "learning_rate": 3.919872610383831e-06,
1022
+ "logits/chosen": 0.5361688733100891,
1023
+ "logits/rejected": 0.4950820505619049,
1024
+ "logps/chosen": -301.51470947265625,
1025
+ "logps/rejected": -264.515625,
1026
+ "loss": 2067.884,
1027
+ "rewards/accuracies": 0.637499988079071,
1028
+ "rewards/chosen": -0.31028127670288086,
1029
+ "rewards/margins": 0.07867135107517242,
1030
+ "rewards/rejected": -0.38895267248153687,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "epoch": 0.38,
1035
+ "learning_rate": 3.882032428903195e-06,
1036
+ "logits/chosen": 0.5467667579650879,
1037
+ "logits/rejected": 0.6214498281478882,
1038
+ "logps/chosen": -310.25006103515625,
1039
+ "logps/rejected": -304.9784240722656,
1040
+ "loss": 1854.108,
1041
+ "rewards/accuracies": 0.6187499761581421,
1042
+ "rewards/chosen": -0.31836968660354614,
1043
+ "rewards/margins": 0.10814164578914642,
1044
+ "rewards/rejected": -0.42651137709617615,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "epoch": 0.39,
1049
+ "learning_rate": 3.84373065981799e-06,
1050
+ "logits/chosen": 0.5906549692153931,
1051
+ "logits/rejected": 0.5477146506309509,
1052
+ "logps/chosen": -336.2082214355469,
1053
+ "logps/rejected": -320.98419189453125,
1054
+ "loss": 1870.9252,
1055
+ "rewards/accuracies": 0.6937500238418579,
1056
+ "rewards/chosen": -0.4078896641731262,
1057
+ "rewards/margins": 0.12890958786010742,
1058
+ "rewards/rejected": -0.5367991924285889,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "epoch": 0.39,
1063
+ "learning_rate": 3.8049800956079552e-06,
1064
+ "logits/chosen": 0.5637394189834595,
1065
+ "logits/rejected": 0.5951513648033142,
1066
+ "logps/chosen": -312.58380126953125,
1067
+ "logps/rejected": -294.388671875,
1068
+ "loss": 2077.7766,
1069
+ "rewards/accuracies": 0.581250011920929,
1070
+ "rewards/chosen": -0.4279128909111023,
1071
+ "rewards/margins": 0.0825895294547081,
1072
+ "rewards/rejected": -0.5105024576187134,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "epoch": 0.4,
1077
+ "learning_rate": 3.765793678646753e-06,
1078
+ "logits/chosen": 0.49519261717796326,
1079
+ "logits/rejected": 0.6189637184143066,
1080
+ "logps/chosen": -306.1346435546875,
1081
+ "logps/rejected": -292.5857238769531,
1082
+ "loss": 1969.0184,
1083
+ "rewards/accuracies": 0.6499999761581421,
1084
+ "rewards/chosen": -0.36586111783981323,
1085
+ "rewards/margins": 0.12026441097259521,
1086
+ "rewards/rejected": -0.48612555861473083,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "epoch": 0.4,
1091
+ "learning_rate": 3.726184496879323e-06,
1092
+ "logits/chosen": 0.4921692907810211,
1093
+ "logits/rejected": 0.501875102519989,
1094
+ "logps/chosen": -299.2643127441406,
1095
+ "logps/rejected": -286.10308837890625,
1096
+ "loss": 2031.8379,
1097
+ "rewards/accuracies": 0.612500011920929,
1098
+ "rewards/chosen": -0.4021153450012207,
1099
+ "rewards/margins": 0.09204810857772827,
1100
+ "rewards/rejected": -0.494163453578949,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "epoch": 0.41,
1105
+ "learning_rate": 3.686165779450619e-06,
1106
+ "logits/chosen": 0.4793570935726166,
1107
+ "logits/rejected": 0.529563307762146,
1108
+ "logps/chosen": -292.5688171386719,
1109
+ "logps/rejected": -271.9722595214844,
1110
+ "loss": 2042.7758,
1111
+ "rewards/accuracies": 0.5625,
1112
+ "rewards/chosen": -0.35684093832969666,
1113
+ "rewards/margins": 0.09431228041648865,
1114
+ "rewards/rejected": -0.4511532187461853,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "epoch": 0.41,
1119
+ "learning_rate": 3.645750892287178e-06,
1120
+ "logits/chosen": 0.5109771490097046,
1121
+ "logits/rejected": 0.5633312463760376,
1122
+ "logps/chosen": -340.49176025390625,
1123
+ "logps/rejected": -291.06756591796875,
1124
+ "loss": 1950.8371,
1125
+ "rewards/accuracies": 0.6312500238418579,
1126
+ "rewards/chosen": -0.33940690755844116,
1127
+ "rewards/margins": 0.10259196907281876,
1128
+ "rewards/rejected": -0.4419988691806793,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "epoch": 0.42,
1133
+ "learning_rate": 3.604953333633009e-06,
1134
+ "logits/chosen": 0.5391252636909485,
1135
+ "logits/rejected": 0.5337890982627869,
1136
+ "logps/chosen": -289.8169860839844,
1137
+ "logps/rejected": -260.87677001953125,
1138
+ "loss": 1936.325,
1139
+ "rewards/accuracies": 0.612500011920929,
1140
+ "rewards/chosen": -0.32662123441696167,
1141
+ "rewards/margins": 0.08759995549917221,
1142
+ "rewards/rejected": -0.4142211973667145,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "epoch": 0.42,
1147
+ "learning_rate": 3.56378672954129e-06,
1148
+ "logits/chosen": 0.5750656127929688,
1149
+ "logits/rejected": 0.6396702527999878,
1150
+ "logps/chosen": -315.6180114746094,
1151
+ "logps/rejected": -275.87847900390625,
1152
+ "loss": 1988.6426,
1153
+ "rewards/accuracies": 0.6000000238418579,
1154
+ "rewards/chosen": -0.3871499300003052,
1155
+ "rewards/margins": 0.11068111658096313,
1156
+ "rewards/rejected": -0.4978310167789459,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "epoch": 0.43,
1161
+ "learning_rate": 3.5222648293233806e-06,
1162
+ "logits/chosen": 0.5217655897140503,
1163
+ "logits/rejected": 0.5655398368835449,
1164
+ "logps/chosen": -312.867919921875,
1165
+ "logps/rejected": -288.189697265625,
1166
+ "loss": 2051.176,
1167
+ "rewards/accuracies": 0.6000000238418579,
1168
+ "rewards/chosen": -0.36912721395492554,
1169
+ "rewards/margins": 0.11765513569116592,
1170
+ "rewards/rejected": -0.48678237199783325,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "epoch": 0.43,
1175
+ "learning_rate": 3.4804015009566573e-06,
1176
+ "logits/chosen": 0.5041700005531311,
1177
+ "logits/rejected": 0.5614610314369202,
1178
+ "logps/chosen": -299.4610595703125,
1179
+ "logps/rejected": -294.58990478515625,
1180
+ "loss": 1867.1797,
1181
+ "rewards/accuracies": 0.6625000238418579,
1182
+ "rewards/chosen": -0.3925167918205261,
1183
+ "rewards/margins": 0.11273722350597382,
1184
+ "rewards/rejected": -0.5052539706230164,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "epoch": 0.44,
1189
+ "learning_rate": 3.4382107264527244e-06,
1190
+ "logits/chosen": 0.5158332586288452,
1191
+ "logits/rejected": 0.5967845916748047,
1192
+ "logps/chosen": -292.78387451171875,
1193
+ "logps/rejected": -286.22259521484375,
1194
+ "loss": 2125.177,
1195
+ "rewards/accuracies": 0.581250011920929,
1196
+ "rewards/chosen": -0.38172584772109985,
1197
+ "rewards/margins": 0.08153903484344482,
1198
+ "rewards/rejected": -0.4632648825645447,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "epoch": 0.44,
1203
+ "learning_rate": 3.3957065971875387e-06,
1204
+ "logits/chosen": 0.5155011415481567,
1205
+ "logits/rejected": 0.6147416234016418,
1206
+ "logps/chosen": -309.46826171875,
1207
+ "logps/rejected": -292.97857666015625,
1208
+ "loss": 2040.8613,
1209
+ "rewards/accuracies": 0.6000000238418579,
1210
+ "rewards/chosen": -0.3627835512161255,
1211
+ "rewards/margins": 0.08301069587469101,
1212
+ "rewards/rejected": -0.4457942843437195,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "epoch": 0.45,
1217
+ "learning_rate": 3.352903309194999e-06,
1218
+ "logits/chosen": 0.49975937604904175,
1219
+ "logits/rejected": 0.5919264554977417,
1220
+ "logps/chosen": -314.7627868652344,
1221
+ "logps/rejected": -304.3371276855469,
1222
+ "loss": 2078.6633,
1223
+ "rewards/accuracies": 0.643750011920929,
1224
+ "rewards/chosen": -0.3559049963951111,
1225
+ "rewards/margins": 0.08720938861370087,
1226
+ "rewards/rejected": -0.44311434030532837,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "epoch": 0.46,
1231
+ "learning_rate": 3.309815158425591e-06,
1232
+ "logits/chosen": 0.5549314618110657,
1233
+ "logits/rejected": 0.45847368240356445,
1234
+ "logps/chosen": -302.06573486328125,
1235
+ "logps/rejected": -272.84136962890625,
1236
+ "loss": 2036.8318,
1237
+ "rewards/accuracies": 0.6000000238418579,
1238
+ "rewards/chosen": -0.35362687706947327,
1239
+ "rewards/margins": 0.08619710803031921,
1240
+ "rewards/rejected": -0.4398239552974701,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "epoch": 0.46,
1245
+ "learning_rate": 3.266456535971654e-06,
1246
+ "logits/chosen": 0.5452480912208557,
1247
+ "logits/rejected": 0.6346302628517151,
1248
+ "logps/chosen": -334.15423583984375,
1249
+ "logps/rejected": -286.4046630859375,
1250
+ "loss": 2093.8855,
1251
+ "rewards/accuracies": 0.637499988079071,
1252
+ "rewards/chosen": -0.37248533964157104,
1253
+ "rewards/margins": 0.08576939254999161,
1254
+ "rewards/rejected": -0.45825472474098206,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "epoch": 0.47,
1259
+ "learning_rate": 3.2228419232608692e-06,
1260
+ "logits/chosen": 0.535306990146637,
1261
+ "logits/rejected": 0.4712342321872711,
1262
+ "logps/chosen": -282.85968017578125,
1263
+ "logps/rejected": -274.203125,
1264
+ "loss": 2098.0096,
1265
+ "rewards/accuracies": 0.612500011920929,
1266
+ "rewards/chosen": -0.34499186277389526,
1267
+ "rewards/margins": 0.08891385793685913,
1268
+ "rewards/rejected": -0.433905690908432,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "epoch": 0.47,
1273
+ "learning_rate": 3.1789858872195888e-06,
1274
+ "logits/chosen": 0.6342380046844482,
1275
+ "logits/rejected": 0.629281759262085,
1276
+ "logps/chosen": -286.6862487792969,
1277
+ "logps/rejected": -271.05792236328125,
1278
+ "loss": 1953.5867,
1279
+ "rewards/accuracies": 0.6312500238418579,
1280
+ "rewards/chosen": -0.3626910150051117,
1281
+ "rewards/margins": 0.0837903618812561,
1282
+ "rewards/rejected": -0.4464813768863678,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "epoch": 0.48,
1287
+ "learning_rate": 3.1349030754075945e-06,
1288
+ "logits/chosen": 0.5246872901916504,
1289
+ "logits/rejected": 0.5165098905563354,
1290
+ "logps/chosen": -342.14508056640625,
1291
+ "logps/rejected": -293.86541748046875,
1292
+ "loss": 1883.041,
1293
+ "rewards/accuracies": 0.5687500238418579,
1294
+ "rewards/chosen": -0.3446907103061676,
1295
+ "rewards/margins": 0.09030432999134064,
1296
+ "rewards/rejected": -0.43499502539634705,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "epoch": 0.48,
1301
+ "learning_rate": 3.0906082111259313e-06,
1302
+ "logits/chosen": 0.5085369944572449,
1303
+ "logits/rejected": 0.6210469603538513,
1304
+ "logps/chosen": -324.135498046875,
1305
+ "logps/rejected": -288.7994079589844,
1306
+ "loss": 2054.7223,
1307
+ "rewards/accuracies": 0.6000000238418579,
1308
+ "rewards/chosen": -0.4166959822177887,
1309
+ "rewards/margins": 0.06678648293018341,
1310
+ "rewards/rejected": -0.4834825098514557,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "epoch": 0.49,
1315
+ "learning_rate": 3.046116088499449e-06,
1316
+ "logits/chosen": 0.4991762638092041,
1317
+ "logits/rejected": 0.5716468691825867,
1318
+ "logps/chosen": -322.47100830078125,
1319
+ "logps/rejected": -305.85247802734375,
1320
+ "loss": 1875.9535,
1321
+ "rewards/accuracies": 0.643750011920929,
1322
+ "rewards/chosen": -0.4109939634799957,
1323
+ "rewards/margins": 0.1062377318739891,
1324
+ "rewards/rejected": -0.5172317028045654,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "epoch": 0.49,
1329
+ "learning_rate": 3.0014415675356813e-06,
1330
+ "logits/chosen": 0.5478680729866028,
1331
+ "logits/rejected": 0.5703476667404175,
1332
+ "logps/chosen": -349.9249572753906,
1333
+ "logps/rejected": -293.72625732421875,
1334
+ "loss": 2017.5168,
1335
+ "rewards/accuracies": 0.637499988079071,
1336
+ "rewards/chosen": -0.42143529653549194,
1337
+ "rewards/margins": 0.10333251953125,
1338
+ "rewards/rejected": -0.5247678160667419,
1339
+ "step": 940
1340
+ },
1341
+ {
1342
+ "epoch": 0.5,
1343
+ "learning_rate": 2.9565995691617242e-06,
1344
+ "logits/chosen": 0.5334219932556152,
1345
+ "logits/rejected": 0.5596605539321899,
1346
+ "logps/chosen": -266.18426513671875,
1347
+ "logps/rejected": -274.7750244140625,
1348
+ "loss": 2175.4988,
1349
+ "rewards/accuracies": 0.550000011920929,
1350
+ "rewards/chosen": -0.4017234444618225,
1351
+ "rewards/margins": 0.06810633838176727,
1352
+ "rewards/rejected": -0.469829797744751,
1353
+ "step": 950
1354
+ },
1355
+ {
1356
+ "epoch": 0.5,
1357
+ "learning_rate": 2.9116050702407706e-06,
1358
+ "logits/chosen": 0.602304995059967,
1359
+ "logits/rejected": 0.5311247110366821,
1360
+ "logps/chosen": -277.86669921875,
1361
+ "logps/rejected": -269.7265319824219,
1362
+ "loss": 2177.3363,
1363
+ "rewards/accuracies": 0.5375000238418579,
1364
+ "rewards/chosen": -0.4107086658477783,
1365
+ "rewards/margins": 0.04552067071199417,
1366
+ "rewards/rejected": -0.4562292993068695,
1367
+ "step": 960
1368
+ },
1369
+ {
1370
+ "epoch": 0.51,
1371
+ "learning_rate": 2.8664730985699537e-06,
1372
+ "logits/chosen": 0.503399670124054,
1373
+ "logits/rejected": 0.639790415763855,
1374
+ "logps/chosen": -273.44622802734375,
1375
+ "logps/rejected": -260.8460693359375,
1376
+ "loss": 2045.4781,
1377
+ "rewards/accuracies": 0.550000011920929,
1378
+ "rewards/chosen": -0.35943537950515747,
1379
+ "rewards/margins": 0.06276627629995346,
1380
+ "rewards/rejected": -0.42220163345336914,
1381
+ "step": 970
1382
+ },
1383
+ {
1384
+ "epoch": 0.51,
1385
+ "learning_rate": 2.8212187278611907e-06,
1386
+ "logits/chosen": 0.5885658860206604,
1387
+ "logits/rejected": 0.5756844282150269,
1388
+ "logps/chosen": -325.32952880859375,
1389
+ "logps/rejected": -282.21453857421875,
1390
+ "loss": 1891.5383,
1391
+ "rewards/accuracies": 0.6625000238418579,
1392
+ "rewards/chosen": -0.33539459109306335,
1393
+ "rewards/margins": 0.12442357838153839,
1394
+ "rewards/rejected": -0.45981818437576294,
1395
+ "step": 980
1396
+ },
1397
+ {
1398
+ "epoch": 0.52,
1399
+ "learning_rate": 2.7758570727066843e-06,
1400
+ "logits/chosen": 0.6374679803848267,
1401
+ "logits/rejected": 0.6624254584312439,
1402
+ "logps/chosen": -295.37738037109375,
1403
+ "logps/rejected": -270.3449401855469,
1404
+ "loss": 1947.8027,
1405
+ "rewards/accuracies": 0.606249988079071,
1406
+ "rewards/chosen": -0.31836217641830444,
1407
+ "rewards/margins": 0.11628556251525879,
1408
+ "rewards/rejected": -0.43464773893356323,
1409
+ "step": 990
1410
+ },
1411
+ {
1412
+ "epoch": 0.52,
1413
+ "learning_rate": 2.730403283530767e-06,
1414
+ "logits/chosen": 0.5361552238464355,
1415
+ "logits/rejected": 0.5510913133621216,
1416
+ "logps/chosen": -301.1198425292969,
1417
+ "logps/rejected": -294.21234130859375,
1418
+ "loss": 2041.7438,
1419
+ "rewards/accuracies": 0.6187499761581421,
1420
+ "rewards/chosen": -0.34723925590515137,
1421
+ "rewards/margins": 0.09882111847400665,
1422
+ "rewards/rejected": -0.44606032967567444,
1423
+ "step": 1000
1424
+ },
1425
+ {
1426
+ "epoch": 0.53,
1427
+ "learning_rate": 2.6848725415297888e-06,
1428
+ "logits/chosen": 0.4825092852115631,
1429
+ "logits/rejected": 0.5265048742294312,
1430
+ "logps/chosen": -304.63140869140625,
1431
+ "logps/rejected": -276.94464111328125,
1432
+ "loss": 1806.9965,
1433
+ "rewards/accuracies": 0.675000011920929,
1434
+ "rewards/chosen": -0.360734760761261,
1435
+ "rewards/margins": 0.12996909022331238,
1436
+ "rewards/rejected": -0.49070388078689575,
1437
+ "step": 1010
1438
+ },
1439
+ {
1440
+ "epoch": 0.53,
1441
+ "learning_rate": 2.639280053601719e-06,
1442
+ "logits/chosen": 0.5663384199142456,
1443
+ "logits/rejected": 0.6073741912841797,
1444
+ "logps/chosen": -316.7065124511719,
1445
+ "logps/rejected": -322.63494873046875,
1446
+ "loss": 2131.3072,
1447
+ "rewards/accuracies": 0.6000000238418579,
1448
+ "rewards/chosen": -0.40306559205055237,
1449
+ "rewards/margins": 0.07923634350299835,
1450
+ "rewards/rejected": -0.48230189085006714,
1451
+ "step": 1020
1452
+ },
1453
+ {
1454
+ "epoch": 0.54,
1455
+ "learning_rate": 2.59364104726716e-06,
1456
+ "logits/chosen": 0.5961092710494995,
1457
+ "logits/rejected": 0.6114310622215271,
1458
+ "logps/chosen": -321.1643981933594,
1459
+ "logps/rejected": -285.0345153808594,
1460
+ "loss": 1963.3941,
1461
+ "rewards/accuracies": 0.625,
1462
+ "rewards/chosen": -0.3763844072818756,
1463
+ "rewards/margins": 0.11694605648517609,
1464
+ "rewards/rejected": -0.4933304190635681,
1465
+ "step": 1030
1466
+ },
1467
+ {
1468
+ "epoch": 0.54,
1469
+ "learning_rate": 2.547970765583491e-06,
1470
+ "logits/chosen": 0.5694259405136108,
1471
+ "logits/rejected": 0.5002479553222656,
1472
+ "logps/chosen": -326.5146484375,
1473
+ "logps/rejected": -302.0265808105469,
1474
+ "loss": 2105.7691,
1475
+ "rewards/accuracies": 0.643750011920929,
1476
+ "rewards/chosen": -0.40365272760391235,
1477
+ "rewards/margins": 0.09866581857204437,
1478
+ "rewards/rejected": -0.5023185014724731,
1479
+ "step": 1040
1480
+ },
1481
+ {
1482
+ "epoch": 0.55,
1483
+ "learning_rate": 2.502284462053799e-06,
1484
+ "logits/chosen": 0.48704004287719727,
1485
+ "logits/rejected": 0.5804616808891296,
1486
+ "logps/chosen": -332.91162109375,
1487
+ "logps/rejected": -303.14617919921875,
1488
+ "loss": 1834.8215,
1489
+ "rewards/accuracies": 0.675000011920929,
1490
+ "rewards/chosen": -0.3842596411705017,
1491
+ "rewards/margins": 0.12224143743515015,
1492
+ "rewards/rejected": -0.5065010786056519,
1493
+ "step": 1050
1494
+ },
1495
+ {
1496
+ "epoch": 0.55,
1497
+ "learning_rate": 2.456597395532338e-06,
1498
+ "logits/chosen": 0.4348418116569519,
1499
+ "logits/rejected": 0.5593420267105103,
1500
+ "logps/chosen": -335.1719055175781,
1501
+ "logps/rejected": -308.69830322265625,
1502
+ "loss": 1984.7055,
1503
+ "rewards/accuracies": 0.6499999761581421,
1504
+ "rewards/chosen": -0.3781774044036865,
1505
+ "rewards/margins": 0.09801622480154037,
1506
+ "rewards/rejected": -0.4761936068534851,
1507
+ "step": 1060
1508
+ },
1509
+ {
1510
+ "epoch": 0.56,
1511
+ "learning_rate": 2.4109248251281953e-06,
1512
+ "logits/chosen": 0.5774113535881042,
1513
+ "logits/rejected": 0.6447092294692993,
1514
+ "logps/chosen": -314.6346740722656,
1515
+ "logps/rejected": -309.253173828125,
1516
+ "loss": 2091.5516,
1517
+ "rewards/accuracies": 0.625,
1518
+ "rewards/chosen": -0.3814038932323456,
1519
+ "rewards/margins": 0.10535021126270294,
1520
+ "rewards/rejected": -0.48675408959388733,
1521
+ "step": 1070
1522
+ },
1523
+ {
1524
+ "epoch": 0.57,
1525
+ "learning_rate": 2.365282005108875e-06,
1526
+ "logits/chosen": 0.5727014541625977,
1527
+ "logits/rejected": 0.5744349360466003,
1528
+ "logps/chosen": -324.27154541015625,
1529
+ "logps/rejected": -283.59686279296875,
1530
+ "loss": 1978.2016,
1531
+ "rewards/accuracies": 0.6187499761581421,
1532
+ "rewards/chosen": -0.3670370876789093,
1533
+ "rewards/margins": 0.101667121052742,
1534
+ "rewards/rejected": -0.46870413422584534,
1535
+ "step": 1080
1536
+ },
1537
+ {
1538
+ "epoch": 0.57,
1539
+ "learning_rate": 2.319684179805491e-06,
1540
+ "logits/chosen": 0.51801598072052,
1541
+ "logits/rejected": 0.5624555349349976,
1542
+ "logps/chosen": -300.46734619140625,
1543
+ "logps/rejected": -287.84307861328125,
1544
+ "loss": 1968.3572,
1545
+ "rewards/accuracies": 0.625,
1546
+ "rewards/chosen": -0.34267181158065796,
1547
+ "rewards/margins": 0.11022396385669708,
1548
+ "rewards/rejected": -0.45289579033851624,
1549
+ "step": 1090
1550
+ },
1551
+ {
1552
+ "epoch": 0.58,
1553
+ "learning_rate": 2.2741465785212905e-06,
1554
+ "logits/chosen": 0.5614932775497437,
1555
+ "logits/rejected": 0.5673717856407166,
1556
+ "logps/chosen": -306.4109191894531,
1557
+ "logps/rejected": -280.79205322265625,
1558
+ "loss": 2043.6598,
1559
+ "rewards/accuracies": 0.574999988079071,
1560
+ "rewards/chosen": -0.3364596962928772,
1561
+ "rewards/margins": 0.0902407318353653,
1562
+ "rewards/rejected": -0.42670050263404846,
1563
+ "step": 1100
1564
+ },
1565
+ {
1566
+ "epoch": 0.58,
1567
+ "learning_rate": 2.2286844104451848e-06,
1568
+ "logits/chosen": 0.5359445810317993,
1569
+ "logits/rejected": 0.6252576112747192,
1570
+ "logps/chosen": -322.9229736328125,
1571
+ "logps/rejected": -268.3149108886719,
1572
+ "loss": 2021.5035,
1573
+ "rewards/accuracies": 0.6000000238418579,
1574
+ "rewards/chosen": -0.3492088317871094,
1575
+ "rewards/margins": 0.09384563565254211,
1576
+ "rewards/rejected": -0.4430544972419739,
1577
+ "step": 1110
1578
+ },
1579
+ {
1580
+ "epoch": 0.59,
1581
+ "learning_rate": 2.183312859572008e-06,
1582
+ "logits/chosen": 0.4629904627799988,
1583
+ "logits/rejected": 0.6062875390052795,
1584
+ "logps/chosen": -305.40814208984375,
1585
+ "logps/rejected": -251.60107421875,
1586
+ "loss": 2015.4504,
1587
+ "rewards/accuracies": 0.6000000238418579,
1588
+ "rewards/chosen": -0.33251887559890747,
1589
+ "rewards/margins": 0.09136182069778442,
1590
+ "rewards/rejected": -0.4238806664943695,
1591
+ "step": 1120
1592
+ },
1593
+ {
1594
+ "epoch": 0.59,
1595
+ "learning_rate": 2.1380470796311843e-06,
1596
+ "logits/chosen": 0.5688912868499756,
1597
+ "logits/rejected": 0.5831522941589355,
1598
+ "logps/chosen": -300.88079833984375,
1599
+ "logps/rejected": -278.1544494628906,
1600
+ "loss": 2065.1121,
1601
+ "rewards/accuracies": 0.6187499761581421,
1602
+ "rewards/chosen": -0.35972604155540466,
1603
+ "rewards/margins": 0.08174722641706467,
1604
+ "rewards/rejected": -0.4414733052253723,
1605
+ "step": 1130
1606
+ },
1607
+ {
1608
+ "epoch": 0.6,
1609
+ "learning_rate": 2.092902189025507e-06,
1610
+ "logits/chosen": 0.5642494559288025,
1611
+ "logits/rejected": 0.5858667492866516,
1612
+ "logps/chosen": -290.29217529296875,
1613
+ "logps/rejected": -271.4120788574219,
1614
+ "loss": 1997.6766,
1615
+ "rewards/accuracies": 0.6625000238418579,
1616
+ "rewards/chosen": -0.33833402395248413,
1617
+ "rewards/margins": 0.11596833169460297,
1618
+ "rewards/rejected": -0.4543024003505707,
1619
+ "step": 1140
1620
+ },
1621
+ {
1622
+ "epoch": 0.6,
1623
+ "learning_rate": 2.0478932657817105e-06,
1624
+ "logits/chosen": 0.6026065349578857,
1625
+ "logits/rejected": 0.637354850769043,
1626
+ "logps/chosen": -318.32611083984375,
1627
+ "logps/rejected": -278.863037109375,
1628
+ "loss": 2020.9238,
1629
+ "rewards/accuracies": 0.625,
1630
+ "rewards/chosen": -0.3429223895072937,
1631
+ "rewards/margins": 0.10237185657024384,
1632
+ "rewards/rejected": -0.4452942907810211,
1633
+ "step": 1150
1634
+ },
1635
+ {
1636
+ "epoch": 0.61,
1637
+ "learning_rate": 2.0030353425145376e-06,
1638
+ "logits/chosen": 0.5387909412384033,
1639
+ "logits/rejected": 0.5283801555633545,
1640
+ "logps/chosen": -281.98944091796875,
1641
+ "logps/rejected": -282.2474060058594,
1642
+ "loss": 2121.0545,
1643
+ "rewards/accuracies": 0.6000000238418579,
1644
+ "rewards/chosen": -0.40112051367759705,
1645
+ "rewards/margins": 0.060013122856616974,
1646
+ "rewards/rejected": -0.4611336290836334,
1647
+ "step": 1160
1648
+ },
1649
+ {
1650
+ "epoch": 0.61,
1651
+ "learning_rate": 1.958343401405964e-06,
1652
+ "logits/chosen": 0.5407668352127075,
1653
+ "logits/rejected": 0.5168458223342896,
1654
+ "logps/chosen": -271.4942321777344,
1655
+ "logps/rejected": -262.18310546875,
1656
+ "loss": 1999.9113,
1657
+ "rewards/accuracies": 0.581250011920929,
1658
+ "rewards/chosen": -0.3525828719139099,
1659
+ "rewards/margins": 0.07946565002202988,
1660
+ "rewards/rejected": -0.4320485591888428,
1661
+ "step": 1170
1662
+ },
1663
+ {
1664
+ "epoch": 0.62,
1665
+ "learning_rate": 1.9138323692012734e-06,
1666
+ "logits/chosen": 0.5828490257263184,
1667
+ "logits/rejected": 0.6618238091468811,
1668
+ "logps/chosen": -355.33355712890625,
1669
+ "logps/rejected": -293.50323486328125,
1670
+ "loss": 2052.1723,
1671
+ "rewards/accuracies": 0.6187499761581421,
1672
+ "rewards/chosen": -0.3654106557369232,
1673
+ "rewards/margins": 0.09124873578548431,
1674
+ "rewards/rejected": -0.45665937662124634,
1675
+ "step": 1180
1676
+ },
1677
+ {
1678
+ "epoch": 0.62,
1679
+ "learning_rate": 1.8695171122236443e-06,
1680
+ "logits/chosen": 0.5557527542114258,
1681
+ "logits/rejected": 0.6298555135726929,
1682
+ "logps/chosen": -312.16094970703125,
1683
+ "logps/rejected": -288.3011779785156,
1684
+ "loss": 1976.1777,
1685
+ "rewards/accuracies": 0.6625000238418579,
1686
+ "rewards/chosen": -0.3318810760974884,
1687
+ "rewards/margins": 0.12243027985095978,
1688
+ "rewards/rejected": -0.4543113708496094,
1689
+ "step": 1190
1690
+ },
1691
+ {
1692
+ "epoch": 0.63,
1693
+ "learning_rate": 1.8254124314089225e-06,
1694
+ "logits/chosen": 0.49655452370643616,
1695
+ "logits/rejected": 0.6068973541259766,
1696
+ "logps/chosen": -294.2212829589844,
1697
+ "logps/rejected": -318.2776794433594,
1698
+ "loss": 2004.0912,
1699
+ "rewards/accuracies": 0.6499999761581421,
1700
+ "rewards/chosen": -0.33194708824157715,
1701
+ "rewards/margins": 0.09898443520069122,
1702
+ "rewards/rejected": -0.43093156814575195,
1703
+ "step": 1200
1704
+ },
1705
+ {
1706
+ "epoch": 0.63,
1707
+ "learning_rate": 1.781533057362221e-06,
1708
+ "logits/chosen": 0.5596984028816223,
1709
+ "logits/rejected": 0.5853079557418823,
1710
+ "logps/chosen": -322.6697082519531,
1711
+ "logps/rejected": -272.7930603027344,
1712
+ "loss": 1955.9035,
1713
+ "rewards/accuracies": 0.6625000238418579,
1714
+ "rewards/chosen": -0.369145929813385,
1715
+ "rewards/margins": 0.12208880484104156,
1716
+ "rewards/rejected": -0.491234689950943,
1717
+ "step": 1210
1718
+ },
1719
+ {
1720
+ "epoch": 0.64,
1721
+ "learning_rate": 1.7378936454380277e-06,
1722
+ "logits/chosen": 0.5217611193656921,
1723
+ "logits/rejected": 0.5789054036140442,
1724
+ "logps/chosen": -336.248779296875,
1725
+ "logps/rejected": -306.63067626953125,
1726
+ "loss": 1938.6953,
1727
+ "rewards/accuracies": 0.6312500238418579,
1728
+ "rewards/chosen": -0.3923150599002838,
1729
+ "rewards/margins": 0.09614621102809906,
1730
+ "rewards/rejected": -0.48846131563186646,
1731
+ "step": 1220
1732
+ },
1733
+ {
1734
+ "epoch": 0.64,
1735
+ "learning_rate": 1.6945087708454273e-06,
1736
+ "logits/chosen": 0.5122044682502747,
1737
+ "logits/rejected": 0.54069584608078,
1738
+ "logps/chosen": -292.97711181640625,
1739
+ "logps/rejected": -275.40069580078125,
1740
+ "loss": 2029.4551,
1741
+ "rewards/accuracies": 0.6499999761581421,
1742
+ "rewards/chosen": -0.4053107798099518,
1743
+ "rewards/margins": 0.08846473693847656,
1744
+ "rewards/rejected": -0.49377545714378357,
1745
+ "step": 1230
1746
+ },
1747
+ {
1748
+ "epoch": 0.65,
1749
+ "learning_rate": 1.651392923780105e-06,
1750
+ "logits/chosen": 0.5897082090377808,
1751
+ "logits/rejected": 0.5889968276023865,
1752
+ "logps/chosen": -332.0108947753906,
1753
+ "logps/rejected": -292.29119873046875,
1754
+ "loss": 1970.8734,
1755
+ "rewards/accuracies": 0.606249988079071,
1756
+ "rewards/chosen": -0.38728633522987366,
1757
+ "rewards/margins": 0.11461566388607025,
1758
+ "rewards/rejected": -0.5019019842147827,
1759
+ "step": 1240
1760
+ },
1761
+ {
1762
+ "epoch": 0.65,
1763
+ "learning_rate": 1.608560504584737e-06,
1764
+ "logits/chosen": 0.5301613807678223,
1765
+ "logits/rejected": 0.6047118902206421,
1766
+ "logps/chosen": -335.94598388671875,
1767
+ "logps/rejected": -308.5770263671875,
1768
+ "loss": 1999.6207,
1769
+ "rewards/accuracies": 0.6812499761581421,
1770
+ "rewards/chosen": -0.41463202238082886,
1771
+ "rewards/margins": 0.1080860048532486,
1772
+ "rewards/rejected": -0.522718071937561,
1773
+ "step": 1250
1774
+ },
1775
+ {
1776
+ "epoch": 0.66,
1777
+ "learning_rate": 1.5660258189393945e-06,
1778
+ "logits/chosen": 0.46664172410964966,
1779
+ "logits/rejected": 0.6063970923423767,
1780
+ "logps/chosen": -292.8914489746094,
1781
+ "logps/rejected": -287.7743835449219,
1782
+ "loss": 1998.873,
1783
+ "rewards/accuracies": 0.59375,
1784
+ "rewards/chosen": -0.41233953833580017,
1785
+ "rewards/margins": 0.0787600427865982,
1786
+ "rewards/rejected": -0.49109959602355957,
1787
+ "step": 1260
1788
+ },
1789
+ {
1790
+ "epoch": 0.66,
1791
+ "learning_rate": 1.5238030730835578e-06,
1792
+ "logits/chosen": 0.5474873185157776,
1793
+ "logits/rejected": 0.6035341024398804,
1794
+ "logps/chosen": -316.2177734375,
1795
+ "logps/rejected": -293.90936279296875,
1796
+ "loss": 2059.1195,
1797
+ "rewards/accuracies": 0.6187499761581421,
1798
+ "rewards/chosen": -0.4276704788208008,
1799
+ "rewards/margins": 0.09831003844738007,
1800
+ "rewards/rejected": -0.525980532169342,
1801
+ "step": 1270
1802
+ },
1803
+ {
1804
+ "epoch": 0.67,
1805
+ "learning_rate": 1.4819063690713565e-06,
1806
+ "logits/chosen": 0.631860077381134,
1807
+ "logits/rejected": 0.604947566986084,
1808
+ "logps/chosen": -280.0582275390625,
1809
+ "logps/rejected": -259.1899108886719,
1810
+ "loss": 1954.3059,
1811
+ "rewards/accuracies": 0.5625,
1812
+ "rewards/chosen": -0.42631417512893677,
1813
+ "rewards/margins": 0.09757888317108154,
1814
+ "rewards/rejected": -0.5238930583000183,
1815
+ "step": 1280
1816
+ },
1817
+ {
1818
+ "epoch": 0.68,
1819
+ "learning_rate": 1.4403497000615885e-06,
1820
+ "logits/chosen": 0.5674291253089905,
1821
+ "logits/rejected": 0.5523722767829895,
1822
+ "logps/chosen": -313.6990661621094,
1823
+ "logps/rejected": -303.3889465332031,
1824
+ "loss": 1995.9789,
1825
+ "rewards/accuracies": 0.612500011920929,
1826
+ "rewards/chosen": -0.4305340647697449,
1827
+ "rewards/margins": 0.10536874830722809,
1828
+ "rewards/rejected": -0.5359027981758118,
1829
+ "step": 1290
1830
+ },
1831
+ {
1832
+ "epoch": 0.68,
1833
+ "learning_rate": 1.3991469456441273e-06,
1834
+ "logits/chosen": 0.5443872809410095,
1835
+ "logits/rejected": 0.629615843296051,
1836
+ "logps/chosen": -343.49072265625,
1837
+ "logps/rejected": -309.73480224609375,
1838
+ "loss": 1837.8883,
1839
+ "rewards/accuracies": 0.699999988079071,
1840
+ "rewards/chosen": -0.43309682607650757,
1841
+ "rewards/margins": 0.1318664252758026,
1842
+ "rewards/rejected": -0.5649632811546326,
1843
+ "step": 1300
1844
+ },
1845
+ {
1846
+ "epoch": 0.69,
1847
+ "learning_rate": 1.3583118672042441e-06,
1848
+ "logits/chosen": 0.43076056241989136,
1849
+ "logits/rejected": 0.4598851799964905,
1850
+ "logps/chosen": -333.4667053222656,
1851
+ "logps/rejected": -277.95086669921875,
1852
+ "loss": 1996.2734,
1853
+ "rewards/accuracies": 0.637499988079071,
1854
+ "rewards/chosen": -0.44571250677108765,
1855
+ "rewards/margins": 0.10792438685894012,
1856
+ "rewards/rejected": -0.553636908531189,
1857
+ "step": 1310
1858
+ },
1859
+ {
1860
+ "epoch": 0.69,
1861
+ "learning_rate": 1.3178581033264218e-06,
1862
+ "logits/chosen": 0.5987478494644165,
1863
+ "logits/rejected": 0.571110725402832,
1864
+ "logps/chosen": -277.62542724609375,
1865
+ "logps/rejected": -271.3750305175781,
1866
+ "loss": 2000.3379,
1867
+ "rewards/accuracies": 0.6000000238418579,
1868
+ "rewards/chosen": -0.4611617624759674,
1869
+ "rewards/margins": 0.09958922863006592,
1870
+ "rewards/rejected": -0.5607509613037109,
1871
+ "step": 1320
1872
+ },
1873
+ {
1874
+ "epoch": 0.7,
1875
+ "learning_rate": 1.2777991652391757e-06,
1876
+ "logits/chosen": 0.5397824048995972,
1877
+ "logits/rejected": 0.5152544379234314,
1878
+ "logps/chosen": -285.2364501953125,
1879
+ "logps/rejected": -273.76214599609375,
1880
+ "loss": 1866.8488,
1881
+ "rewards/accuracies": 0.668749988079071,
1882
+ "rewards/chosen": -0.44938772916793823,
1883
+ "rewards/margins": 0.10914802551269531,
1884
+ "rewards/rejected": -0.5585357546806335,
1885
+ "step": 1330
1886
+ },
1887
+ {
1888
+ "epoch": 0.7,
1889
+ "learning_rate": 1.2381484323024178e-06,
1890
+ "logits/chosen": 0.4960288107395172,
1891
+ "logits/rejected": 0.5434283018112183,
1892
+ "logps/chosen": -351.0420837402344,
1893
+ "logps/rejected": -310.51483154296875,
1894
+ "loss": 2042.0777,
1895
+ "rewards/accuracies": 0.675000011920929,
1896
+ "rewards/chosen": -0.45654526352882385,
1897
+ "rewards/margins": 0.11480891704559326,
1898
+ "rewards/rejected": -0.5713542103767395,
1899
+ "step": 1340
1900
+ },
1901
+ {
1902
+ "epoch": 0.71,
1903
+ "learning_rate": 1.1989191475388518e-06,
1904
+ "logits/chosen": 0.5680415034294128,
1905
+ "logits/rejected": 0.5574635863304138,
1906
+ "logps/chosen": -313.9222106933594,
1907
+ "logps/rejected": -292.32843017578125,
1908
+ "loss": 2134.3227,
1909
+ "rewards/accuracies": 0.550000011920929,
1910
+ "rewards/chosen": -0.43487685918807983,
1911
+ "rewards/margins": 0.0749758630990982,
1912
+ "rewards/rejected": -0.5098527669906616,
1913
+ "step": 1350
1914
+ },
1915
+ {
1916
+ "epoch": 0.71,
1917
+ "learning_rate": 1.160124413210918e-06,
1918
+ "logits/chosen": 0.49776411056518555,
1919
+ "logits/rejected": 0.5919879078865051,
1920
+ "logps/chosen": -329.8394775390625,
1921
+ "logps/rejected": -288.6294860839844,
1922
+ "loss": 1978.9926,
1923
+ "rewards/accuracies": 0.606249988079071,
1924
+ "rewards/chosen": -0.38458770513534546,
1925
+ "rewards/margins": 0.08599748462438583,
1926
+ "rewards/rejected": -0.4705851972103119,
1927
+ "step": 1360
1928
+ },
1929
+ {
1930
+ "epoch": 0.72,
1931
+ "learning_rate": 1.1217771864447396e-06,
1932
+ "logits/chosen": 0.4789052903652191,
1933
+ "logits/rejected": 0.5946076512336731,
1934
+ "logps/chosen": -343.82440185546875,
1935
+ "logps/rejected": -312.8480224609375,
1936
+ "loss": 1838.2496,
1937
+ "rewards/accuracies": 0.675000011920929,
1938
+ "rewards/chosen": -0.41200247406959534,
1939
+ "rewards/margins": 0.13190023601055145,
1940
+ "rewards/rejected": -0.5439027547836304,
1941
+ "step": 1370
1942
+ },
1943
+ {
1944
+ "epoch": 0.72,
1945
+ "learning_rate": 1.08389027490255e-06,
1946
+ "logits/chosen": 0.518727719783783,
1947
+ "logits/rejected": 0.5448898673057556,
1948
+ "logps/chosen": -308.6346130371094,
1949
+ "logps/rejected": -262.6923828125,
1950
+ "loss": 2192.4686,
1951
+ "rewards/accuracies": 0.637499988079071,
1952
+ "rewards/chosen": -0.45256251096725464,
1953
+ "rewards/margins": 0.08636941015720367,
1954
+ "rewards/rejected": -0.5389319062232971,
1955
+ "step": 1380
1956
+ },
1957
+ {
1958
+ "epoch": 0.73,
1959
+ "learning_rate": 1.046476332505036e-06,
1960
+ "logits/chosen": 0.5441064834594727,
1961
+ "logits/rejected": 0.6444369554519653,
1962
+ "logps/chosen": -301.1471862792969,
1963
+ "logps/rejected": -272.04296875,
1964
+ "loss": 2016.5557,
1965
+ "rewards/accuracies": 0.637499988079071,
1966
+ "rewards/chosen": -0.3905426263809204,
1967
+ "rewards/margins": 0.11363613605499268,
1968
+ "rewards/rejected": -0.5041787028312683,
1969
+ "step": 1390
1970
+ },
1971
+ {
1972
+ "epoch": 0.73,
1973
+ "learning_rate": 1.0095478552050348e-06,
1974
+ "logits/chosen": 0.5234050750732422,
1975
+ "logits/rejected": 0.6232806444168091,
1976
+ "logps/chosen": -278.05462646484375,
1977
+ "logps/rejected": -237.6455078125,
1978
+ "loss": 1943.2496,
1979
+ "rewards/accuracies": 0.6000000238418579,
1980
+ "rewards/chosen": -0.4108065664768219,
1981
+ "rewards/margins": 0.09128499031066895,
1982
+ "rewards/rejected": -0.5020915269851685,
1983
+ "step": 1400
1984
+ },
1985
+ {
1986
+ "epoch": 0.74,
1987
+ "learning_rate": 9.731171768139808e-07,
1988
+ "logits/chosen": 0.5350190997123718,
1989
+ "logits/rejected": 0.5485085248947144,
1990
+ "logps/chosen": -287.7547607421875,
1991
+ "logps/rejected": -263.4801025390625,
1992
+ "loss": 2019.2094,
1993
+ "rewards/accuracies": 0.625,
1994
+ "rewards/chosen": -0.3987637162208557,
1995
+ "rewards/margins": 0.07709892094135284,
1996
+ "rewards/rejected": -0.47586268186569214,
1997
+ "step": 1410
1998
+ },
1999
+ {
2000
+ "epoch": 0.74,
2001
+ "learning_rate": 9.371964648825221e-07,
2002
+ "logits/chosen": 0.5190201997756958,
2003
+ "logits/rejected": 0.5698527693748474,
2004
+ "logps/chosen": -345.8193664550781,
2005
+ "logps/rejected": -291.22821044921875,
2006
+ "loss": 1915.1037,
2007
+ "rewards/accuracies": 0.6625000238418579,
2008
+ "rewards/chosen": -0.39381805062294006,
2009
+ "rewards/margins": 0.11871011555194855,
2010
+ "rewards/rejected": -0.512528121471405,
2011
+ "step": 1420
2012
+ },
2013
+ {
2014
+ "epoch": 0.75,
2015
+ "learning_rate": 9.017977166366445e-07,
2016
+ "logits/chosen": 0.5621356964111328,
2017
+ "logits/rejected": 0.574070155620575,
2018
+ "logps/chosen": -301.27276611328125,
2019
+ "logps/rejected": -268.485595703125,
2020
+ "loss": 2027.6629,
2021
+ "rewards/accuracies": 0.606249988079071,
2022
+ "rewards/chosen": -0.4134562611579895,
2023
+ "rewards/margins": 0.08553650230169296,
2024
+ "rewards/rejected": -0.49899277091026306,
2025
+ "step": 1430
2026
+ },
2027
+ {
2028
+ "epoch": 0.75,
2029
+ "learning_rate": 8.669327549707096e-07,
2030
+ "logits/chosen": 0.5577922463417053,
2031
+ "logits/rejected": 0.6254245042800903,
2032
+ "logps/chosen": -307.9657897949219,
2033
+ "logps/rejected": -278.01068115234375,
2034
+ "loss": 1957.0965,
2035
+ "rewards/accuracies": 0.65625,
2036
+ "rewards/chosen": -0.38403937220573425,
2037
+ "rewards/margins": 0.10890159755945206,
2038
+ "rewards/rejected": -0.4929409921169281,
2039
+ "step": 1440
2040
+ },
2041
+ {
2042
+ "epoch": 0.76,
2043
+ "learning_rate": 8.326132244986932e-07,
2044
+ "logits/chosen": 0.5165051817893982,
2045
+ "logits/rejected": 0.463472455739975,
2046
+ "logps/chosen": -325.6259460449219,
2047
+ "logps/rejected": -282.59246826171875,
2048
+ "loss": 1931.7,
2049
+ "rewards/accuracies": 0.668749988079071,
2050
+ "rewards/chosen": -0.3894692063331604,
2051
+ "rewards/margins": 0.11373750865459442,
2052
+ "rewards/rejected": -0.5032067894935608,
2053
+ "step": 1450
2054
+ },
2055
+ {
2056
+ "epoch": 0.76,
2057
+ "learning_rate": 7.988505876649863e-07,
2058
+ "logits/chosen": 0.5601966977119446,
2059
+ "logits/rejected": 0.5657812356948853,
2060
+ "logps/chosen": -329.68304443359375,
2061
+ "logps/rejected": -279.58612060546875,
2062
+ "loss": 2044.8211,
2063
+ "rewards/accuracies": 0.6499999761581421,
2064
+ "rewards/chosen": -0.4129094183444977,
2065
+ "rewards/margins": 0.08905430138111115,
2066
+ "rewards/rejected": -0.5019636750221252,
2067
+ "step": 1460
2068
+ },
2069
+ {
2070
+ "epoch": 0.77,
2071
+ "learning_rate": 7.656561209160248e-07,
2072
+ "logits/chosen": 0.5333635210990906,
2073
+ "logits/rejected": 0.6559829711914062,
2074
+ "logps/chosen": -334.6247863769531,
2075
+ "logps/rejected": -293.94403076171875,
2076
+ "loss": 1924.4891,
2077
+ "rewards/accuracies": 0.668749988079071,
2078
+ "rewards/chosen": -0.3982022702693939,
2079
+ "rewards/margins": 0.12732462584972382,
2080
+ "rewards/rejected": -0.5255268812179565,
2081
+ "step": 1470
2082
+ },
2083
+ {
2084
+ "epoch": 0.77,
2085
+ "learning_rate": 7.330409109340563e-07,
2086
+ "logits/chosen": 0.44074922800064087,
2087
+ "logits/rejected": 0.5341771245002747,
2088
+ "logps/chosen": -325.88763427734375,
2089
+ "logps/rejected": -301.38427734375,
2090
+ "loss": 2064.7137,
2091
+ "rewards/accuracies": 0.6000000238418579,
2092
+ "rewards/chosen": -0.39723503589630127,
2093
+ "rewards/margins": 0.09469970315694809,
2094
+ "rewards/rejected": -0.49193471670150757,
2095
+ "step": 1480
2096
+ },
2097
+ {
2098
+ "epoch": 0.78,
2099
+ "learning_rate": 7.010158509342682e-07,
2100
+ "logits/chosen": 0.6108728647232056,
2101
+ "logits/rejected": 0.5816964507102966,
2102
+ "logps/chosen": -280.2751159667969,
2103
+ "logps/rejected": -272.6864929199219,
2104
+ "loss": 2012.7195,
2105
+ "rewards/accuracies": 0.5874999761581421,
2106
+ "rewards/chosen": -0.39345765113830566,
2107
+ "rewards/margins": 0.10830279439687729,
2108
+ "rewards/rejected": -0.5017604827880859,
2109
+ "step": 1490
2110
+ },
2111
+ {
2112
+ "epoch": 0.79,
2113
+ "learning_rate": 6.695916370265529e-07,
2114
+ "logits/chosen": 0.5402048826217651,
2115
+ "logits/rejected": 0.5511065721511841,
2116
+ "logps/chosen": -293.0211486816406,
2117
+ "logps/rejected": -299.3897399902344,
2118
+ "loss": 2051.2686,
2119
+ "rewards/accuracies": 0.637499988079071,
2120
+ "rewards/chosen": -0.38481405377388,
2121
+ "rewards/margins": 0.09063725918531418,
2122
+ "rewards/rejected": -0.4754512906074524,
2123
+ "step": 1500
2124
+ },
2125
+ {
2126
+ "epoch": 0.79,
2127
+ "learning_rate": 6.387787646430854e-07,
2128
+ "logits/chosen": 0.4866926670074463,
2129
+ "logits/rejected": 0.553752064704895,
2130
+ "logps/chosen": -323.8518371582031,
2131
+ "logps/rejected": -315.7661437988281,
2132
+ "loss": 1892.2789,
2133
+ "rewards/accuracies": 0.6937500238418579,
2134
+ "rewards/chosen": -0.40690216422080994,
2135
+ "rewards/margins": 0.1288827359676361,
2136
+ "rewards/rejected": -0.535784900188446,
2137
+ "step": 1510
2138
+ },
2139
+ {
2140
+ "epoch": 0.8,
2141
+ "learning_rate": 6.085875250329401e-07,
2142
+ "logits/chosen": 0.5334519743919373,
2143
+ "logits/rejected": 0.629740834236145,
2144
+ "logps/chosen": -342.43341064453125,
2145
+ "logps/rejected": -304.27178955078125,
2146
+ "loss": 1996.1777,
2147
+ "rewards/accuracies": 0.59375,
2148
+ "rewards/chosen": -0.4247370660305023,
2149
+ "rewards/margins": 0.07812504470348358,
2150
+ "rewards/rejected": -0.5028620958328247,
2151
+ "step": 1520
2152
+ },
2153
+ {
2154
+ "epoch": 0.8,
2155
+ "learning_rate": 5.79028001824894e-07,
2156
+ "logits/chosen": 0.536971926689148,
2157
+ "logits/rejected": 0.6389668583869934,
2158
+ "logps/chosen": -344.0328063964844,
2159
+ "logps/rejected": -307.81195068359375,
2160
+ "loss": 2099.0133,
2161
+ "rewards/accuracies": 0.65625,
2162
+ "rewards/chosen": -0.4052516520023346,
2163
+ "rewards/margins": 0.1204022616147995,
2164
+ "rewards/rejected": -0.5256539583206177,
2165
+ "step": 1530
2166
+ },
2167
+ {
2168
+ "epoch": 0.81,
2169
+ "learning_rate": 5.501100676595761e-07,
2170
+ "logits/chosen": 0.501252293586731,
2171
+ "logits/rejected": 0.582123875617981,
2172
+ "logps/chosen": -350.5921936035156,
2173
+ "logps/rejected": -305.71551513671875,
2174
+ "loss": 2080.3439,
2175
+ "rewards/accuracies": 0.6312500238418579,
2176
+ "rewards/chosen": -0.386319637298584,
2177
+ "rewards/margins": 0.1132698804140091,
2178
+ "rewards/rejected": -0.4995895326137543,
2179
+ "step": 1540
2180
+ },
2181
+ {
2182
+ "epoch": 0.81,
2183
+ "learning_rate": 5.218433808920884e-07,
2184
+ "logits/chosen": 0.5797770619392395,
2185
+ "logits/rejected": 0.5843828916549683,
2186
+ "logps/chosen": -311.27313232421875,
2187
+ "logps/rejected": -272.329833984375,
2188
+ "loss": 1840.9139,
2189
+ "rewards/accuracies": 0.6312500238418579,
2190
+ "rewards/chosen": -0.4069671630859375,
2191
+ "rewards/margins": 0.10061927884817123,
2192
+ "rewards/rejected": -0.5075864195823669,
2193
+ "step": 1550
2194
+ },
2195
+ {
2196
+ "epoch": 0.82,
2197
+ "learning_rate": 4.942373823661928e-07,
2198
+ "logits/chosen": 0.4943598806858063,
2199
+ "logits/rejected": 0.5981740355491638,
2200
+ "logps/chosen": -339.5876770019531,
2201
+ "logps/rejected": -323.6890563964844,
2202
+ "loss": 2091.2941,
2203
+ "rewards/accuracies": 0.606249988079071,
2204
+ "rewards/chosen": -0.42342454195022583,
2205
+ "rewards/margins": 0.09093138575553894,
2206
+ "rewards/rejected": -0.5143559575080872,
2207
+ "step": 1560
2208
+ },
2209
+ {
2210
+ "epoch": 0.82,
2211
+ "learning_rate": 4.6730129226114363e-07,
2212
+ "logits/chosen": 0.5075241327285767,
2213
+ "logits/rejected": 0.5773884654045105,
2214
+ "logps/chosen": -301.22906494140625,
2215
+ "logps/rejected": -278.94293212890625,
2216
+ "loss": 2206.433,
2217
+ "rewards/accuracies": 0.5874999761581421,
2218
+ "rewards/chosen": -0.4186634421348572,
2219
+ "rewards/margins": 0.08012911677360535,
2220
+ "rewards/rejected": -0.4987925887107849,
2221
+ "step": 1570
2222
+ },
2223
+ {
2224
+ "epoch": 0.83,
2225
+ "learning_rate": 4.4104410701222703e-07,
2226
+ "logits/chosen": 0.47776398062705994,
2227
+ "logits/rejected": 0.6287192106246948,
2228
+ "logps/chosen": -321.4864196777344,
2229
+ "logps/rejected": -324.2401428222656,
2230
+ "loss": 2054.8516,
2231
+ "rewards/accuracies": 0.643750011920929,
2232
+ "rewards/chosen": -0.4205222725868225,
2233
+ "rewards/margins": 0.08562308549880981,
2234
+ "rewards/rejected": -0.5061453580856323,
2235
+ "step": 1580
2236
+ },
2237
+ {
2238
+ "epoch": 0.83,
2239
+ "learning_rate": 4.154745963060197e-07,
2240
+ "logits/chosen": 0.5759707093238831,
2241
+ "logits/rejected": 0.6366346478462219,
2242
+ "logps/chosen": -310.05487060546875,
2243
+ "logps/rejected": -309.42303466796875,
2244
+ "loss": 1960.8133,
2245
+ "rewards/accuracies": 0.668749988079071,
2246
+ "rewards/chosen": -0.40062469244003296,
2247
+ "rewards/margins": 0.12129988521337509,
2248
+ "rewards/rejected": -0.521924614906311,
2249
+ "step": 1590
2250
+ },
2251
+ {
2252
+ "epoch": 0.84,
2253
+ "learning_rate": 3.9060130015138863e-07,
2254
+ "logits/chosen": 0.5741788148880005,
2255
+ "logits/rejected": 0.6168212294578552,
2256
+ "logps/chosen": -313.01531982421875,
2257
+ "logps/rejected": -290.6816101074219,
2258
+ "loss": 1803.5051,
2259
+ "rewards/accuracies": 0.668749988079071,
2260
+ "rewards/chosen": -0.39632636308670044,
2261
+ "rewards/margins": 0.0985620766878128,
2262
+ "rewards/rejected": -0.49488845467567444,
2263
+ "step": 1600
2264
+ },
2265
+ {
2266
+ "epoch": 0.84,
2267
+ "learning_rate": 3.664325260271953e-07,
2268
+ "logits/chosen": 0.5977301001548767,
2269
+ "logits/rejected": 0.5980079770088196,
2270
+ "logps/chosen": -305.8802185058594,
2271
+ "logps/rejected": -245.7880859375,
2272
+ "loss": 1887.7617,
2273
+ "rewards/accuracies": 0.59375,
2274
+ "rewards/chosen": -0.38649195432662964,
2275
+ "rewards/margins": 0.1288624256849289,
2276
+ "rewards/rejected": -0.5153544545173645,
2277
+ "step": 1610
2278
+ },
2279
+ {
2280
+ "epoch": 0.85,
2281
+ "learning_rate": 3.429763461076677e-07,
2282
+ "logits/chosen": 0.4545535147190094,
2283
+ "logits/rejected": 0.656732439994812,
2284
+ "logps/chosen": -348.40594482421875,
2285
+ "logps/rejected": -304.6084899902344,
2286
+ "loss": 2087.083,
2287
+ "rewards/accuracies": 0.6187499761581421,
2288
+ "rewards/chosen": -0.39987581968307495,
2289
+ "rewards/margins": 0.09592723846435547,
2290
+ "rewards/rejected": -0.4958030581474304,
2291
+ "step": 1620
2292
+ },
2293
+ {
2294
+ "epoch": 0.85,
2295
+ "learning_rate": 3.202405945663556e-07,
2296
+ "logits/chosen": 0.5257354974746704,
2297
+ "logits/rejected": 0.5875495076179504,
2298
+ "logps/chosen": -321.76397705078125,
2299
+ "logps/rejected": -286.9283142089844,
2300
+ "loss": 2075.5064,
2301
+ "rewards/accuracies": 0.5625,
2302
+ "rewards/chosen": -0.39552539587020874,
2303
+ "rewards/margins": 0.10027774423360825,
2304
+ "rewards/rejected": -0.4958031177520752,
2305
+ "step": 1630
2306
+ },
2307
+ {
2308
+ "epoch": 0.86,
2309
+ "learning_rate": 2.982328649595856e-07,
2310
+ "logits/chosen": 0.4653220772743225,
2311
+ "logits/rejected": 0.5480079650878906,
2312
+ "logps/chosen": -309.8276672363281,
2313
+ "logps/rejected": -274.49029541015625,
2314
+ "loss": 1948.8629,
2315
+ "rewards/accuracies": 0.65625,
2316
+ "rewards/chosen": -0.3729521930217743,
2317
+ "rewards/margins": 0.1282852441072464,
2318
+ "rewards/rejected": -0.5012374520301819,
2319
+ "step": 1640
2320
+ },
2321
+ {
2322
+ "epoch": 0.86,
2323
+ "learning_rate": 2.7696050769026954e-07,
2324
+ "logits/chosen": 0.5885381698608398,
2325
+ "logits/rejected": 0.5182594060897827,
2326
+ "logps/chosen": -282.14544677734375,
2327
+ "logps/rejected": -316.86639404296875,
2328
+ "loss": 1897.5818,
2329
+ "rewards/accuracies": 0.668749988079071,
2330
+ "rewards/chosen": -0.4102315902709961,
2331
+ "rewards/margins": 0.10423590242862701,
2332
+ "rewards/rejected": -0.5144674777984619,
2333
+ "step": 1650
2334
+ },
2335
+ {
2336
+ "epoch": 0.87,
2337
+ "learning_rate": 2.564306275529341e-07,
2338
+ "logits/chosen": 0.5538032650947571,
2339
+ "logits/rejected": 0.6108819246292114,
2340
+ "logps/chosen": -317.8446960449219,
2341
+ "logps/rejected": -322.9876403808594,
2342
+ "loss": 1870.0818,
2343
+ "rewards/accuracies": 0.6312500238418579,
2344
+ "rewards/chosen": -0.40971916913986206,
2345
+ "rewards/margins": 0.10465312004089355,
2346
+ "rewards/rejected": -0.5143723487854004,
2347
+ "step": 1660
2348
+ },
2349
+ {
2350
+ "epoch": 0.87,
2351
+ "learning_rate": 2.3665008136077332e-07,
2352
+ "logits/chosen": 0.5508009791374207,
2353
+ "logits/rejected": 0.6037713885307312,
2354
+ "logps/chosen": -357.86907958984375,
2355
+ "logps/rejected": -307.35723876953125,
2356
+ "loss": 2033.6316,
2357
+ "rewards/accuracies": 0.625,
2358
+ "rewards/chosen": -0.4212369918823242,
2359
+ "rewards/margins": 0.09935277700424194,
2360
+ "rewards/rejected": -0.5205897688865662,
2361
+ "step": 1670
2362
+ },
2363
+ {
2364
+ "epoch": 0.88,
2365
+ "learning_rate": 2.1762547565553293e-07,
2366
+ "logits/chosen": 0.5105335712432861,
2367
+ "logits/rejected": 0.5201815366744995,
2368
+ "logps/chosen": -263.2312316894531,
2369
+ "logps/rejected": -266.52801513671875,
2370
+ "loss": 2212.5818,
2371
+ "rewards/accuracies": 0.5,
2372
+ "rewards/chosen": -0.4164590835571289,
2373
+ "rewards/margins": 0.04953977093100548,
2374
+ "rewards/rejected": -0.4659988284111023,
2375
+ "step": 1680
2376
+ },
2377
+ {
2378
+ "epoch": 0.88,
2379
+ "learning_rate": 1.993631645009747e-07,
2380
+ "logits/chosen": 0.5417348742485046,
2381
+ "logits/rejected": 0.5702573657035828,
2382
+ "logps/chosen": -342.66839599609375,
2383
+ "logps/rejected": -324.40557861328125,
2384
+ "loss": 2001.3066,
2385
+ "rewards/accuracies": 0.65625,
2386
+ "rewards/chosen": -0.42196089029312134,
2387
+ "rewards/margins": 0.09920088946819305,
2388
+ "rewards/rejected": -0.5211617350578308,
2389
+ "step": 1690
2390
+ },
2391
+ {
2392
+ "epoch": 0.89,
2393
+ "learning_rate": 1.818692473606748e-07,
2394
+ "logits/chosen": 0.5118446350097656,
2395
+ "logits/rejected": 0.6164907217025757,
2396
+ "logps/chosen": -285.0245056152344,
2397
+ "logps/rejected": -270.7235412597656,
2398
+ "loss": 1936.7527,
2399
+ "rewards/accuracies": 0.6312500238418579,
2400
+ "rewards/chosen": -0.39391833543777466,
2401
+ "rewards/margins": 0.10235867649316788,
2402
+ "rewards/rejected": -0.49627700448036194,
2403
+ "step": 1700
2404
+ },
2405
+ {
2406
+ "epoch": 0.9,
2407
+ "learning_rate": 1.6514956706084885e-07,
2408
+ "logits/chosen": 0.5716298818588257,
2409
+ "logits/rejected": 0.533431351184845,
2410
+ "logps/chosen": -298.3558654785156,
2411
+ "logps/rejected": -315.1064453125,
2412
+ "loss": 2030.7699,
2413
+ "rewards/accuracies": 0.5874999761581421,
2414
+ "rewards/chosen": -0.444709837436676,
2415
+ "rewards/margins": 0.07402367889881134,
2416
+ "rewards/rejected": -0.5187335014343262,
2417
+ "step": 1710
2418
+ },
2419
+ {
2420
+ "epoch": 0.9,
2421
+ "learning_rate": 1.4920970783889737e-07,
2422
+ "logits/chosen": 0.41161495447158813,
2423
+ "logits/rejected": 0.5431499481201172,
2424
+ "logps/chosen": -335.1160583496094,
2425
+ "logps/rejected": -285.2781677246094,
2426
+ "loss": 2117.8803,
2427
+ "rewards/accuracies": 0.5874999761581421,
2428
+ "rewards/chosen": -0.382477343082428,
2429
+ "rewards/margins": 0.10167716443538666,
2430
+ "rewards/rejected": -0.48415452241897583,
2431
+ "step": 1720
2432
+ },
2433
+ {
2434
+ "epoch": 0.91,
2435
+ "learning_rate": 1.340549934783164e-07,
2436
+ "logits/chosen": 0.5718288421630859,
2437
+ "logits/rejected": 0.5717177987098694,
2438
+ "logps/chosen": -286.6661682128906,
2439
+ "logps/rejected": -286.0840759277344,
2440
+ "loss": 2009.7559,
2441
+ "rewards/accuracies": 0.6000000238418579,
2442
+ "rewards/chosen": -0.40986448526382446,
2443
+ "rewards/margins": 0.08955219388008118,
2444
+ "rewards/rejected": -0.49941664934158325,
2445
+ "step": 1730
2446
+ },
2447
+ {
2448
+ "epoch": 0.91,
2449
+ "learning_rate": 1.196904855305961e-07,
2450
+ "logits/chosen": 0.5395318865776062,
2451
+ "logits/rejected": 0.6408040523529053,
2452
+ "logps/chosen": -318.8950500488281,
2453
+ "logps/rejected": -298.7578125,
2454
+ "loss": 2034.2352,
2455
+ "rewards/accuracies": 0.6499999761581421,
2456
+ "rewards/chosen": -0.4017771780490875,
2457
+ "rewards/margins": 0.11808891594409943,
2458
+ "rewards/rejected": -0.5198661088943481,
2459
+ "step": 1740
2460
+ },
2461
+ {
2462
+ "epoch": 0.92,
2463
+ "learning_rate": 1.0612098162470302e-07,
2464
+ "logits/chosen": 0.4985600411891937,
2465
+ "logits/rejected": 0.5432295799255371,
2466
+ "logps/chosen": -286.3262634277344,
2467
+ "logps/rejected": -289.0979919433594,
2468
+ "loss": 1954.6828,
2469
+ "rewards/accuracies": 0.6187499761581421,
2470
+ "rewards/chosen": -0.39057815074920654,
2471
+ "rewards/margins": 0.0826260969042778,
2472
+ "rewards/rejected": -0.47320422530174255,
2473
+ "step": 1750
2474
+ },
2475
+ {
2476
+ "epoch": 0.92,
2477
+ "learning_rate": 9.335101386471285e-08,
2478
+ "logits/chosen": 0.5725753307342529,
2479
+ "logits/rejected": 0.6116470098495483,
2480
+ "logps/chosen": -335.08843994140625,
2481
+ "logps/rejected": -312.4408264160156,
2482
+ "loss": 2050.2887,
2483
+ "rewards/accuracies": 0.668749988079071,
2484
+ "rewards/chosen": -0.40838685631752014,
2485
+ "rewards/margins": 0.0921243280172348,
2486
+ "rewards/rejected": -0.5005111694335938,
2487
+ "step": 1760
2488
+ },
2489
+ {
2490
+ "epoch": 0.93,
2491
+ "learning_rate": 8.138484731612273e-08,
2492
+ "logits/chosen": 0.6018707156181335,
2493
+ "logits/rejected": 0.5222383141517639,
2494
+ "logps/chosen": -253.42379760742188,
2495
+ "logps/rejected": -274.9010925292969,
2496
+ "loss": 2175.2457,
2497
+ "rewards/accuracies": 0.574999988079071,
2498
+ "rewards/chosen": -0.3636482059955597,
2499
+ "rewards/margins": 0.072720468044281,
2500
+ "rewards/rejected": -0.4363686442375183,
2501
+ "step": 1770
2502
+ },
2503
+ {
2504
+ "epoch": 0.93,
2505
+ "learning_rate": 7.022647858135501e-08,
2506
+ "logits/chosen": 0.5884579420089722,
2507
+ "logits/rejected": 0.5674210786819458,
2508
+ "logps/chosen": -338.01837158203125,
2509
+ "logps/rejected": -324.35980224609375,
2510
+ "loss": 2028.5854,
2511
+ "rewards/accuracies": 0.625,
2512
+ "rewards/chosen": -0.42924776673316956,
2513
+ "rewards/margins": 0.09629428386688232,
2514
+ "rewards/rejected": -0.5255420804023743,
2515
+ "step": 1780
2516
+ },
2517
+ {
2518
+ "epoch": 0.94,
2519
+ "learning_rate": 5.987963446492384e-08,
2520
+ "logits/chosen": 0.5171129703521729,
2521
+ "logits/rejected": 0.6137627363204956,
2522
+ "logps/chosen": -315.68609619140625,
2523
+ "logps/rejected": -290.91619873046875,
2524
+ "loss": 2091.5113,
2525
+ "rewards/accuracies": 0.6187499761581421,
2526
+ "rewards/chosen": -0.41310811042785645,
2527
+ "rewards/margins": 0.09795816987752914,
2528
+ "rewards/rejected": -0.5110663175582886,
2529
+ "step": 1790
2530
+ },
2531
+ {
2532
+ "epoch": 0.94,
2533
+ "learning_rate": 5.034777072871394e-08,
2534
+ "logits/chosen": 0.5506697297096252,
2535
+ "logits/rejected": 0.5700551867485046,
2536
+ "logps/chosen": -306.2592468261719,
2537
+ "logps/rejected": -284.780029296875,
2538
+ "loss": 2043.7551,
2539
+ "rewards/accuracies": 0.6000000238418579,
2540
+ "rewards/chosen": -0.3679637312889099,
2541
+ "rewards/margins": 0.10389117151498795,
2542
+ "rewards/rejected": -0.4718549847602844,
2543
+ "step": 1800
2544
+ },
2545
+ {
2546
+ "epoch": 0.95,
2547
+ "learning_rate": 4.163407093778243e-08,
2548
+ "logits/chosen": 0.4897570013999939,
2549
+ "logits/rejected": 0.565123438835144,
2550
+ "logps/chosen": -326.5579833984375,
2551
+ "logps/rejected": -310.0860595703125,
2552
+ "loss": 1972.2996,
2553
+ "rewards/accuracies": 0.612500011920929,
2554
+ "rewards/chosen": -0.4172098636627197,
2555
+ "rewards/margins": 0.09725239127874374,
2556
+ "rewards/rejected": -0.5144622921943665,
2557
+ "step": 1810
2558
+ },
2559
+ {
2560
+ "epoch": 0.95,
2561
+ "learning_rate": 3.37414453970758e-08,
2562
+ "logits/chosen": 0.4947708249092102,
2563
+ "logits/rejected": 0.6116417050361633,
2564
+ "logps/chosen": -369.77789306640625,
2565
+ "logps/rejected": -310.3260192871094,
2566
+ "loss": 1986.4135,
2567
+ "rewards/accuracies": 0.6000000238418579,
2568
+ "rewards/chosen": -0.3952573835849762,
2569
+ "rewards/margins": 0.10770855844020844,
2570
+ "rewards/rejected": -0.5029659271240234,
2571
+ "step": 1820
2572
+ },
2573
+ {
2574
+ "epoch": 0.96,
2575
+ "learning_rate": 2.6672530179410183e-08,
2576
+ "logits/chosen": 0.49206972122192383,
2577
+ "logits/rejected": 0.6127623915672302,
2578
+ "logps/chosen": -326.89813232421875,
2579
+ "logps/rejected": -278.120849609375,
2580
+ "loss": 1947.9119,
2581
+ "rewards/accuracies": 0.625,
2582
+ "rewards/chosen": -0.4299922585487366,
2583
+ "rewards/margins": 0.10199449956417084,
2584
+ "rewards/rejected": -0.5319867730140686,
2585
+ "step": 1830
2586
+ },
2587
+ {
2588
+ "epoch": 0.96,
2589
+ "learning_rate": 2.04296862450451e-08,
2590
+ "logits/chosen": 0.5761113166809082,
2591
+ "logits/rejected": 0.5587304830551147,
2592
+ "logps/chosen": -328.021240234375,
2593
+ "logps/rejected": -305.7353210449219,
2594
+ "loss": 1986.5195,
2595
+ "rewards/accuracies": 0.6625000238418579,
2596
+ "rewards/chosen": -0.4139803349971771,
2597
+ "rewards/margins": 0.09623098373413086,
2598
+ "rewards/rejected": -0.5102113485336304,
2599
+ "step": 1840
2600
+ },
2601
+ {
2602
+ "epoch": 0.97,
2603
+ "learning_rate": 1.501499865314171e-08,
2604
+ "logits/chosen": 0.533464252948761,
2605
+ "logits/rejected": 0.5722233057022095,
2606
+ "logps/chosen": -323.434814453125,
2607
+ "logps/rejected": -295.4156188964844,
2608
+ "loss": 1949.7422,
2609
+ "rewards/accuracies": 0.59375,
2610
+ "rewards/chosen": -0.3916718363761902,
2611
+ "rewards/margins": 0.10514561086893082,
2612
+ "rewards/rejected": -0.4968174397945404,
2613
+ "step": 1850
2614
+ },
2615
+ {
2616
+ "epoch": 0.97,
2617
+ "learning_rate": 1.0430275865371265e-08,
2618
+ "logits/chosen": 0.5557342171669006,
2619
+ "logits/rejected": 0.5775563716888428,
2620
+ "logps/chosen": -320.87158203125,
2621
+ "logps/rejected": -306.94561767578125,
2622
+ "loss": 1832.5422,
2623
+ "rewards/accuracies": 0.7124999761581421,
2624
+ "rewards/chosen": -0.39497238397598267,
2625
+ "rewards/margins": 0.13699549436569214,
2626
+ "rewards/rejected": -0.53196781873703,
2627
+ "step": 1860
2628
+ },
2629
+ {
2630
+ "epoch": 0.98,
2631
+ "learning_rate": 6.677049141901315e-09,
2632
+ "logits/chosen": 0.5044248104095459,
2633
+ "logits/rejected": 0.5328904986381531,
2634
+ "logps/chosen": -296.86865234375,
2635
+ "logps/rejected": -287.9560546875,
2636
+ "loss": 1983.6154,
2637
+ "rewards/accuracies": 0.6625000238418579,
2638
+ "rewards/chosen": -0.39695119857788086,
2639
+ "rewards/margins": 0.09827554225921631,
2640
+ "rewards/rejected": -0.49522677063941956,
2641
+ "step": 1870
2642
+ },
2643
+ {
2644
+ "epoch": 0.98,
2645
+ "learning_rate": 3.756572029968708e-09,
2646
+ "logits/chosen": 0.481318861246109,
2647
+ "logits/rejected": 0.6070166230201721,
2648
+ "logps/chosen": -329.2176818847656,
2649
+ "logps/rejected": -315.81561279296875,
2650
+ "loss": 1846.1697,
2651
+ "rewards/accuracies": 0.668749988079071,
2652
+ "rewards/chosen": -0.404033899307251,
2653
+ "rewards/margins": 0.11680416762828827,
2654
+ "rewards/rejected": -0.5208381414413452,
2655
+ "step": 1880
2656
+ },
2657
+ {
2658
+ "epoch": 0.99,
2659
+ "learning_rate": 1.6698199452053199e-09,
2660
+ "logits/chosen": 0.5581511855125427,
2661
+ "logits/rejected": 0.5964113473892212,
2662
+ "logps/chosen": -296.1249694824219,
2663
+ "logps/rejected": -285.54205322265625,
2664
+ "loss": 1886.4951,
2665
+ "rewards/accuracies": 0.6499999761581421,
2666
+ "rewards/chosen": -0.41859906911849976,
2667
+ "rewards/margins": 0.10544377565383911,
2668
+ "rewards/rejected": -0.5240427851676941,
2669
+ "step": 1890
2670
+ },
2671
+ {
2672
+ "epoch": 0.99,
2673
+ "learning_rate": 4.1748984585560094e-10,
2674
+ "logits/chosen": 0.5473756194114685,
2675
+ "logits/rejected": 0.5344858765602112,
2676
+ "logps/chosen": -310.27728271484375,
2677
+ "logps/rejected": -314.9624328613281,
2678
+ "loss": 2101.4789,
2679
+ "rewards/accuracies": 0.574999988079071,
2680
+ "rewards/chosen": -0.41071709990501404,
2681
+ "rewards/margins": 0.09045806527137756,
2682
+ "rewards/rejected": -0.5011752247810364,
2683
+ "step": 1900
2684
+ },
2685
+ {
2686
+ "epoch": 1.0,
2687
+ "learning_rate": 0.0,
2688
+ "logits/chosen": 0.6098369359970093,
2689
+ "logits/rejected": 0.6253079771995544,
2690
+ "logps/chosen": -299.2195739746094,
2691
+ "logps/rejected": -283.7442932128906,
2692
+ "loss": 2031.7352,
2693
+ "rewards/accuracies": 0.5874999761581421,
2694
+ "rewards/chosen": -0.4044032096862793,
2695
+ "rewards/margins": 0.08278089016675949,
2696
+ "rewards/rejected": -0.4871840476989746,
2697
+ "step": 1910
2698
+ },
2699
+ {
2700
+ "epoch": 1.0,
2701
+ "step": 1910,
2702
+ "total_flos": 0.0,
2703
+ "train_loss": 2071.8518089414265,
2704
+ "train_runtime": 14310.5789,
2705
+ "train_samples_per_second": 4.272,
2706
+ "train_steps_per_second": 0.133
2707
+ }
2708
+ ],
2709
+ "logging_steps": 10,
2710
+ "max_steps": 1910,
2711
+ "num_input_tokens_seen": 0,
2712
+ "num_train_epochs": 1,
2713
+ "save_steps": 100,
2714
+ "total_flos": 0.0,
2715
+ "train_batch_size": 4,
2716
+ "trial_name": null,
2717
+ "trial_params": null
2718
+ }