NicholasCorrado
commited on
Commit
•
69c9d49
1
Parent(s):
5ee7120
Model save
Browse files- README.md +94 -0
- all_results.json +9 -0
- generation_config.json +6 -0
- train_results.json +9 -0
README.md
ADDED
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
base_model: alignment-handbook/zephyr-7b-sft-full
|
5 |
+
tags:
|
6 |
+
- trl
|
7 |
+
- dpo
|
8 |
+
- generated_from_trainer
|
9 |
+
model-index:
|
10 |
+
- name: zephyr-7b-uf-rlced-conifer-group-dpo-2e
|
11 |
+
results: []
|
12 |
+
---
|
13 |
+
|
14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
+
should probably proofread and complete it, then remove this comment. -->
|
16 |
+
|
17 |
+
# zephyr-7b-uf-rlced-conifer-group-dpo-2e
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
|
20 |
+
It achieves the following results on the evaluation set:
|
21 |
+
- Loss: 0.2410
|
22 |
+
- Rewards/chosen: -3.4515
|
23 |
+
- Rewards/rejected: -8.7505
|
24 |
+
- Rewards/accuracies: 0.8769
|
25 |
+
- Rewards/margins: 5.2990
|
26 |
+
- Logps/rejected: -1278.7848
|
27 |
+
- Logps/chosen: -737.6204
|
28 |
+
- Logits/rejected: 3.0507
|
29 |
+
- Logits/chosen: 0.9407
|
30 |
+
- Alpha0: 0.6369
|
31 |
+
- Alpha1: 0.3631
|
32 |
+
- Task Loss1: 0.1726
|
33 |
+
- Task Excess Loss1: 0.0379
|
34 |
+
- Excess Loss: 0.0341
|
35 |
+
- Task Loss0: 0.5306
|
36 |
+
- Task Excess Loss0: 0.0889
|
37 |
+
|
38 |
+
## Model description
|
39 |
+
|
40 |
+
More information needed
|
41 |
+
|
42 |
+
## Intended uses & limitations
|
43 |
+
|
44 |
+
More information needed
|
45 |
+
|
46 |
+
## Training and evaluation data
|
47 |
+
|
48 |
+
More information needed
|
49 |
+
|
50 |
+
## Training procedure
|
51 |
+
|
52 |
+
### Training hyperparameters
|
53 |
+
|
54 |
+
The following hyperparameters were used during training:
|
55 |
+
- learning_rate: 5e-07
|
56 |
+
- train_batch_size: 8
|
57 |
+
- eval_batch_size: 8
|
58 |
+
- seed: 42
|
59 |
+
- distributed_type: multi-GPU
|
60 |
+
- num_devices: 8
|
61 |
+
- gradient_accumulation_steps: 4
|
62 |
+
- total_train_batch_size: 256
|
63 |
+
- total_eval_batch_size: 64
|
64 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
65 |
+
- lr_scheduler_type: cosine
|
66 |
+
- lr_scheduler_warmup_ratio: 0.1
|
67 |
+
- num_epochs: 2
|
68 |
+
|
69 |
+
### Training results
|
70 |
+
|
71 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Alpha0 | Alpha1 | Task Loss1 | Task Excess Loss1 | Excess Loss | Task Loss0 | Task Excess Loss0 |
|
72 |
+
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:------:|:------:|:----------:|:-----------------:|:-----------:|:----------:|:-----------------:|
|
73 |
+
| 0.3541 | 0.1388 | 100 | 0.4194 | -1.3743 | -2.6267 | 0.8102 | 1.2524 | -666.4093 | -529.9026 | -2.7580 | -2.7843 | 0.8214 | 0.1786 | 0.3373 | 0.1973 | 0.1899 | 0.6883 | 0.2655 |
|
74 |
+
| 0.2214 | 0.2776 | 200 | 0.3480 | -1.2450 | -2.9488 | 0.8412 | 1.7038 | -698.6146 | -516.9692 | 0.1216 | -0.2174 | 0.8786 | 0.1214 | 0.2866 | 0.1517 | 0.1250 | 0.5355 | 0.0929 |
|
75 |
+
| 0.2284 | 0.4164 | 300 | 0.3271 | -1.7298 | -3.6279 | 0.8515 | 1.8981 | -766.5247 | -565.4502 | 1.3769 | 0.5823 | 0.6417 | 0.3583 | 0.2721 | 0.1383 | 0.1130 | 0.5406 | 0.0794 |
|
76 |
+
| 0.1837 | 0.5552 | 400 | 0.3040 | -1.7232 | -4.0037 | 0.8553 | 2.2805 | -804.1021 | -564.7872 | 1.8300 | 0.7862 | 0.7891 | 0.2109 | 0.2517 | 0.1159 | 0.0949 | 0.5490 | 0.0796 |
|
77 |
+
| 0.1749 | 0.6940 | 500 | 0.2966 | -1.7976 | -4.1927 | 0.8637 | 2.3951 | -823.0039 | -572.2305 | 1.7164 | 0.5785 | 0.8057 | 0.1943 | 0.2448 | 0.1097 | 0.0856 | 0.5124 | 0.0570 |
|
78 |
+
| 0.1823 | 0.8328 | 600 | 0.3030 | -1.7187 | -3.9261 | 0.8647 | 2.2074 | -796.3432 | -564.3366 | 2.4921 | 1.3988 | 0.9053 | 0.0947 | 0.2541 | 0.1193 | 0.0922 | 0.5047 | 0.0596 |
|
79 |
+
| 0.1766 | 0.9715 | 700 | 0.2895 | -1.6400 | -4.2369 | 0.8647 | 2.5969 | -827.4293 | -556.4711 | 1.6749 | 0.1680 | 0.9622 | 0.0378 | 0.2417 | 0.1057 | 0.0812 | 0.5020 | 0.0532 |
|
80 |
+
| 0.1131 | 1.1103 | 800 | 0.2646 | -2.7794 | -6.7040 | 0.8647 | 3.9245 | -1074.1326 | -670.4117 | 2.3249 | 0.3844 | 0.0325 | 0.9675 | 0.1990 | 0.0653 | 0.0567 | 0.5372 | 0.0871 |
|
81 |
+
| 0.1006 | 1.2491 | 900 | 0.2490 | -3.6465 | -8.6692 | 0.8712 | 5.0227 | -1270.6554 | -757.1147 | 3.3211 | 1.0777 | 0.4760 | 0.5240 | 0.1852 | 0.0492 | 0.0420 | 0.5341 | 0.0967 |
|
82 |
+
| 0.0951 | 1.3879 | 1000 | 0.2470 | -3.0354 | -7.7369 | 0.8797 | 4.7015 | -1177.4214 | -696.0082 | 3.1614 | 0.9199 | 0.0150 | 0.9850 | 0.1756 | 0.0450 | 0.0382 | 0.5249 | 0.0834 |
|
83 |
+
| 0.0885 | 1.5267 | 1100 | 0.2435 | -3.4543 | -8.4740 | 0.8731 | 5.0197 | -1251.1321 | -737.8961 | 3.4589 | 1.3892 | 0.0151 | 0.9849 | 0.1747 | 0.0421 | 0.0368 | 0.5310 | 0.0887 |
|
84 |
+
| 0.1003 | 1.6655 | 1200 | 0.2416 | -3.3615 | -8.4285 | 0.875 | 5.0670 | -1246.5889 | -728.6184 | 2.9341 | 0.9100 | 0.0721 | 0.9279 | 0.1730 | 0.0396 | 0.0352 | 0.5285 | 0.0863 |
|
85 |
+
| 0.0865 | 1.8043 | 1300 | 0.2412 | -3.3114 | -8.4737 | 0.8769 | 5.1623 | -1251.1091 | -723.6140 | 2.9432 | 0.8628 | 0.0755 | 0.9245 | 0.1734 | 0.0388 | 0.0343 | 0.5272 | 0.0847 |
|
86 |
+
| 0.0893 | 1.9431 | 1400 | 0.2410 | -3.4515 | -8.7505 | 0.8769 | 5.2990 | -1278.7848 | -737.6204 | 3.0507 | 0.9407 | 0.6369 | 0.3631 | 0.1726 | 0.0379 | 0.0341 | 0.5306 | 0.0889 |
|
87 |
+
|
88 |
+
|
89 |
+
### Framework versions
|
90 |
+
|
91 |
+
- Transformers 4.44.1
|
92 |
+
- Pytorch 2.1.2+cu121
|
93 |
+
- Datasets 2.21.0
|
94 |
+
- Tokenizers 0.19.1
|
all_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.9986120749479528,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.17575526105033026,
|
5 |
+
"train_runtime": 46867.94,
|
6 |
+
"train_samples": 184443,
|
7 |
+
"train_samples_per_second": 7.871,
|
8 |
+
"train_steps_per_second": 0.031
|
9 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.44.1"
|
6 |
+
}
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.9986120749479528,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.17575526105033026,
|
5 |
+
"train_runtime": 46867.94,
|
6 |
+
"train_samples": 184443,
|
7 |
+
"train_samples_per_second": 7.871,
|
8 |
+
"train_steps_per_second": 0.031
|
9 |
+
}
|