mnoukhov commited on
Commit
01300dd
·
verified ·
1 Parent(s): d331ed6

mnoukhov/pythia410m-dpo-tldr

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. README.md +73 -0
  3. adapter_config.json +31 -0
  4. adapter_model.safetensors +3 -0
  5. code/README.md +4 -0
  6. code/Untitled.ipynb +1093 -0
  7. code/__pycache__/callbacks.cpython-311.pyc +0 -0
  8. code/__pycache__/generate_and_eval.cpython-311.pyc +0 -0
  9. code/__pycache__/generate_and_llm_judge.cpython-311.pyc +0 -0
  10. code/__pycache__/generate_vllm.cpython-311.pyc +0 -0
  11. code/__pycache__/gpt_reward_modeling.cpython-311.pyc +0 -0
  12. code/__pycache__/scalar_rm_model.cpython-311.pyc +0 -0
  13. code/callbacks.py +471 -0
  14. code/configs/accelerate_zero2_4gpu.yml +20 -0
  15. code/configs/create_rlhf_410m.yml +11 -0
  16. code/configs/create_rlhf_410m_1b.yml +11 -0
  17. code/configs/dpo1b2_10k_pythia410m_fp16.yml +19 -0
  18. code/configs/dpo1b2_20k-reuse_pythia410m_fp16.yml +19 -0
  19. code/configs/dpo1b2_20k_pythia410m-iter1_fp16.yml +19 -0
  20. code/configs/dpo1b2_20k_pythia410m_fp16.yml +19 -0
  21. code/configs/dpo1b2_20kgold_pythia410m-iter1_fp16.yml +19 -0
  22. code/configs/dpo1b2_20kgold_pythia410m_fp16.yml +19 -0
  23. code/configs/dpo1b2_20kgoldonly_pythia410m-iter1_fp16.yml +20 -0
  24. code/configs/dpo1b2_20kgoldonly_pythia410m_fp16.yml +20 -0
  25. code/configs/dpo1b2_20konly-reuse_pythia410m_fp16.yml +20 -0
  26. code/configs/dpo1b2_20konly_pythia410m-iter1_fp16.yml +20 -0
  27. code/configs/dpo1b2_20konly_pythia410m_fp16.yml +20 -0
  28. code/configs/dpo1b2_a100.yml +20 -0
  29. code/configs/dpo1b_eval_generated_pythia410m_fp16.yml +11 -0
  30. code/configs/dpo1b_eval_pythia410m_fp16.yml +19 -0
  31. code/configs/dpo1b_eval_regenerated_pythia410m_fp16.yml +11 -0
  32. code/configs/dpo1b_predict_generated_pythia410m-dpo1.yml +11 -0
  33. code/configs/dpo1b_pythia410m_costa_fp16.yml +28 -0
  34. code/configs/dpo1b_pythia410m_fp16.yml +28 -0
  35. code/configs/dpo1b_relabel_comparisons.yml +12 -0
  36. code/configs/dpo1b_relabel_generated_pythia410m_fp16.yml +12 -0
  37. code/configs/dpo1b_relabel_generated_same_prompts.yml +12 -0
  38. code/configs/dpo1b_relabel_vllm_generated_pythia410m.yml +12 -0
  39. code/configs/dpo1b_test.yml +19 -0
  40. code/configs/dpo1b_vllm_pythia410m.yml +18 -0
  41. code/configs/dpo2_costa_1b_20k_bf16.yml +36 -0
  42. code/configs/dpo2_costa_1b_20k_fp16.yml +37 -0
  43. code/configs/dpo2_costa_2.8b_bf16.yml +40 -0
  44. code/configs/dpo2_pythia2.8b_tldr.yml +34 -0
  45. code/configs/dpo3_costa_1b_20k_fp16.yml +35 -0
  46. code/configs/dpo_1b_bf16.yml +28 -0
  47. code/configs/dpo_1b_fp16.yml +31 -0
  48. code/configs/dpo_20konly_1b_bf16.yml +32 -0
  49. code/configs/dpo_20konly_1b_fp16.yml +33 -0
  50. code/configs/dpo_costa_1b_constantlr_fp16.yml +32 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ code/wandb/run-20240510_164928-cfb3179a6dd00a0d09b55fc900877f5b/run-cfb3179a6dd00a0d09b55fc900877f5b.wandb filter=lfs diff=lfs merge=lfs -text
37
+ code/wandb/run-20240510_204631-cfb3179a6dd00a0d09b55fc900877f5b/run-cfb3179a6dd00a0d09b55fc900877f5b.wandb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - generated_from_trainer
6
+ base_model: mnoukhov/pythia410m-sft-tldr
7
+ model-index:
8
+ - name: pythia410m-dpo-tldr
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # pythia410m-dpo-tldr
16
+
17
+ This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.5395
20
+ - Rewards/chosen: -1.3883
21
+ - Rewards/rejected: -1.9858
22
+ - Rewards/accuracies: 0.7226
23
+ - Rewards/margins: 0.5975
24
+ - Logps/rejected: -98.0320
25
+ - Logps/chosen: -98.0320
26
+ - Logps/ref Rejected: -63.5119
27
+ - Logps/ref Chosen: -70.2656
28
+
29
+ ## Model description
30
+
31
+ More information needed
32
+
33
+ ## Intended uses & limitations
34
+
35
+ More information needed
36
+
37
+ ## Training and evaluation data
38
+
39
+ More information needed
40
+
41
+ ## Training procedure
42
+
43
+ ### Training hyperparameters
44
+
45
+ The following hyperparameters were used during training:
46
+ - learning_rate: 3e-05
47
+ - train_batch_size: 16
48
+ - eval_batch_size: 8
49
+ - seed: 42
50
+ - distributed_type: multi-GPU
51
+ - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 64
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: cosine
55
+ - num_epochs: 1.0
56
+
57
+ ### Training results
58
+
59
+ | Training Loss | Epoch | Step | Logps/chosen | Logps/ref Chosen | Logps/ref Rejected | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
60
+ |:-------------:|:-----:|:----:|:------------:|:----------------:|:------------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
61
+ | 0.5961 | 0.2 | 291 | -93.0907 | -70.2656 | -63.5119 | -93.0907 | 0.5659 | 0.7036 | -1.1413 | 0.4667 | -1.6079 |
62
+ | 0.5574 | 0.4 | 582 | 0.5405 | -1.6195 | -2.2373 | 0.7216 | 0.6178 | -102.6558 | -102.6558 | -63.5119 | -70.2656 |
63
+ | 0.5418 | 0.6 | 873 | 0.5373 | -1.4908 | -2.1191 | 0.7226 | 0.6283 | -100.0813 | -100.0813 | -63.5119 | -70.2656 |
64
+ | 0.5339 | 0.8 | 1164 | 0.5395 | -1.3883 | -1.9858 | 0.7226 | 0.5975 | -98.0320 | -98.0320 | -63.5119 | -70.2656 |
65
+
66
+
67
+ ### Framework versions
68
+
69
+ - PEFT 0.10.0
70
+ - Transformers 4.38.2
71
+ - Pytorch 2.1.2+cu121
72
+ - Datasets 2.17.0
73
+ - Tokenizers 0.15.2
adapter_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mnoukhov/pythia410m-sft-tldr",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 32,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 16,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "dense",
24
+ "dense_h_to_4h",
25
+ "dense_4h_to_h",
26
+ "query_key_value"
27
+ ],
28
+ "task_type": "CAUSAL_LM",
29
+ "use_dora": false,
30
+ "use_rslora": false
31
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44ce263e6fd885f50d82ca515b9325375b43ee36ededb75acf161ce88bc2e41
3
+ size 48
code/README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # how to generate and psuedo label
2
+
3
+ - generate with `generate_vllm.py`
4
+ - pseudolabel with either `dpo_training.py` or `gpt_reward_modeling.py` by setting `mode = relabel`
code/Untitled.ipynb ADDED
@@ -0,0 +1,1093 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 2,
6
+ "id": "070a4097-7a17-409f-af5d-3d0cf43926ca",
7
+ "metadata": {},
8
+ "outputs": [],
9
+ "source": [
10
+ "from peft import AutoPeftModelForCausalLM, PeftModelForCausalLM\n",
11
+ "from huggingface_hub import list_repo_refs\n",
12
+ "from transformers import AutoTokenizer, AutoModelForCausalLM"
13
+ ]
14
+ },
15
+ {
16
+ "cell_type": "code",
17
+ "execution_count": 4,
18
+ "id": "100ec138-f7c1-4d8f-b7e0-eb715f320fdc",
19
+ "metadata": {},
20
+ "outputs": [
21
+ {
22
+ "name": "stderr",
23
+ "output_type": "stream",
24
+ "text": [
25
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
26
+ ]
27
+ }
28
+ ],
29
+ "source": [
30
+ "tokenizer = AutoTokenizer.from_pretrained(\"mnoukhov/pythia410m-tldr-sft\")"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": 9,
36
+ "id": "dbc9a2db-2c16-4e8f-bd2a-213ddc5d139d",
37
+ "metadata": {},
38
+ "outputs": [
39
+ {
40
+ "data": {
41
+ "text/plain": [
42
+ "0"
43
+ ]
44
+ },
45
+ "execution_count": 9,
46
+ "metadata": {},
47
+ "output_type": "execute_result"
48
+ }
49
+ ],
50
+ "source": [
51
+ "tokenizer.add_special_tokens({\"pad_token\": \"<|padding|>\"}) "
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "code",
56
+ "execution_count": 16,
57
+ "id": "03788af8-6733-492f-84e3-fd358bb88ffd",
58
+ "metadata": {},
59
+ "outputs": [
60
+ {
61
+ "data": {
62
+ "text/plain": [
63
+ "1"
64
+ ]
65
+ },
66
+ "execution_count": 16,
67
+ "metadata": {},
68
+ "output_type": "execute_result"
69
+ }
70
+ ],
71
+ "source": [
72
+ "tokenizer.pad_token_id"
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": 12,
78
+ "id": "576d3fda-7902-43d7-b4b1-3054f6192b11",
79
+ "metadata": {},
80
+ "outputs": [],
81
+ "source": [
82
+ "example_text = \"hello my name is mr hello\""
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "execution_count": 24,
88
+ "id": "c73ddb0c-1551-4b12-82d8-26d3742d6f57",
89
+ "metadata": {},
90
+ "outputs": [],
91
+ "source": [
92
+ "toks = tokenizer(example_text + tokenizer.eos_token, padding=\"max_length\", max_length=7, truncation=True)"
93
+ ]
94
+ },
95
+ {
96
+ "cell_type": "code",
97
+ "execution_count": 25,
98
+ "id": "8904af15-4d27-4718-b53a-060ae65173a9",
99
+ "metadata": {},
100
+ "outputs": [
101
+ {
102
+ "data": {
103
+ "text/plain": [
104
+ "{'input_ids': [25521, 619, 1416, 310, 278, 83, 23120], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}"
105
+ ]
106
+ },
107
+ "execution_count": 25,
108
+ "metadata": {},
109
+ "output_type": "execute_result"
110
+ }
111
+ ],
112
+ "source": [
113
+ "toks"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "code",
118
+ "execution_count": 26,
119
+ "id": "8fcf7c83-e8df-457b-9eab-1b1ed2145a76",
120
+ "metadata": {},
121
+ "outputs": [
122
+ {
123
+ "data": {
124
+ "text/plain": [
125
+ "7"
126
+ ]
127
+ },
128
+ "execution_count": 26,
129
+ "metadata": {},
130
+ "output_type": "execute_result"
131
+ }
132
+ ],
133
+ "source": [
134
+ "sum(toks['attention_mask'])"
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "code",
139
+ "execution_count": 2,
140
+ "id": "ef1dddf6-1d26-4950-910a-c40b2cc394c6",
141
+ "metadata": {},
142
+ "outputs": [],
143
+ "source": [
144
+ "base_model_name = \"vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr\"\n",
145
+ "base_model_revision = \"sft__55513__1706646024\""
146
+ ]
147
+ },
148
+ {
149
+ "cell_type": "code",
150
+ "execution_count": 35,
151
+ "id": "bb0df32c-9d90-4ab0-a87d-0ff6ecab03b6",
152
+ "metadata": {},
153
+ "outputs": [],
154
+ "source": [
155
+ "model_path = \"/home/toolkit/trl_results/mnoukhov/EleutherAI_pythia-1b-deduped__sft__tldr_dpo_costa_1b_fp16.yml_3d94f50_b9ff2_merged/main\""
156
+ ]
157
+ },
158
+ {
159
+ "cell_type": "code",
160
+ "execution_count": 36,
161
+ "id": "3ae77b2a-3132-4dd1-903b-35f28b7e7e5f",
162
+ "metadata": {},
163
+ "outputs": [],
164
+ "source": [
165
+ "base_model = AutoModelForCausalLM.from_pretrained(model_path)"
166
+ ]
167
+ },
168
+ {
169
+ "cell_type": "code",
170
+ "execution_count": 37,
171
+ "id": "08c1d05d-44a4-4859-9d54-48e7a3cd1da7",
172
+ "metadata": {},
173
+ "outputs": [
174
+ {
175
+ "data": {
176
+ "application/vnd.jupyter.widget-view+json": {
177
+ "model_id": "12749e76749a40469d7732dc23e0f1dc",
178
+ "version_major": 2,
179
+ "version_minor": 0
180
+ },
181
+ "text/plain": [
182
+ "model.safetensors: 0%| | 0.00/4.05G [00:00<?, ?B/s]"
183
+ ]
184
+ },
185
+ "metadata": {},
186
+ "output_type": "display_data"
187
+ },
188
+ {
189
+ "data": {
190
+ "text/plain": [
191
+ "CommitInfo(commit_url='https://huggingface.co/mnoukhov/EleutherAI_pythia-1b-deduped__sft__tldr_dpo_costa_1b_fp16.yml_3d94f50_b9ff2_merged/commit/cd8f4bf53ab02881549cb73b6271005b2e8c3be6', commit_message='Upload GPTNeoXForCausalLM', commit_description='', oid='cd8f4bf53ab02881549cb73b6271005b2e8c3be6', pr_url=None, pr_revision=None, pr_num=None)"
192
+ ]
193
+ },
194
+ "execution_count": 37,
195
+ "metadata": {},
196
+ "output_type": "execute_result"
197
+ }
198
+ ],
199
+ "source": [
200
+ "base_model.push_to_hub(\"mnoukhov/EleutherAI_pythia-1b-deduped__sft__tldr_dpo_costa_1b_fp16.yml_3d94f50_b9ff2_merged\")"
201
+ ]
202
+ },
203
+ {
204
+ "cell_type": "code",
205
+ "execution_count": 4,
206
+ "id": "9ef8927b-f908-460f-adba-54508b133ae0",
207
+ "metadata": {},
208
+ "outputs": [],
209
+ "source": [
210
+ "adapter_repo = \"mnoukhov/EleutherAI_pythia-1b-deduped__sft__tldr_dpo_1b_fp16.yml_24e9f83\""
211
+ ]
212
+ },
213
+ {
214
+ "cell_type": "code",
215
+ "execution_count": 5,
216
+ "id": "cb7336d2-a4ac-4607-83ae-e7e1e0b1665d",
217
+ "metadata": {},
218
+ "outputs": [],
219
+ "source": [
220
+ "refs = list_repo_refs(adapter_repo)"
221
+ ]
222
+ },
223
+ {
224
+ "cell_type": "code",
225
+ "execution_count": 6,
226
+ "id": "2ab002af-7f3b-41b1-a8ad-f7c2296bd68f",
227
+ "metadata": {},
228
+ "outputs": [
229
+ {
230
+ "data": {
231
+ "application/vnd.jupyter.widget-view+json": {
232
+ "model_id": "f4c8e90f4fba4589a00ec3ee75dc7505",
233
+ "version_major": 2,
234
+ "version_minor": 0
235
+ },
236
+ "text/plain": [
237
+ "adapter_config.json: 0%| | 0.00/706 [00:00<?, ?B/s]"
238
+ ]
239
+ },
240
+ "metadata": {},
241
+ "output_type": "display_data"
242
+ },
243
+ {
244
+ "data": {
245
+ "application/vnd.jupyter.widget-view+json": {
246
+ "model_id": "1eb21476d51f44858c32c59e72d70105",
247
+ "version_major": 2,
248
+ "version_minor": 0
249
+ },
250
+ "text/plain": [
251
+ "adapter_model.safetensors: 0%| | 0.00/18.5M [00:00<?, ?B/s]"
252
+ ]
253
+ },
254
+ "metadata": {},
255
+ "output_type": "display_data"
256
+ },
257
+ {
258
+ "data": {
259
+ "application/vnd.jupyter.widget-view+json": {
260
+ "model_id": "014e10f937374412aa10524d1a4d7a8f",
261
+ "version_major": 2,
262
+ "version_minor": 0
263
+ },
264
+ "text/plain": [
265
+ "model.safetensors: 0%| | 0.00/4.05G [00:00<?, ?B/s]"
266
+ ]
267
+ },
268
+ "metadata": {},
269
+ "output_type": "display_data"
270
+ },
271
+ {
272
+ "name": "stdout",
273
+ "output_type": "stream",
274
+ "text": [
275
+ "step2324\n"
276
+ ]
277
+ },
278
+ {
279
+ "data": {
280
+ "application/vnd.jupyter.widget-view+json": {
281
+ "model_id": "7c55865b8a7f4c6795b22c0a68b702a6",
282
+ "version_major": 2,
283
+ "version_minor": 0
284
+ },
285
+ "text/plain": [
286
+ "adapter_model.safetensors: 0%| | 0.00/18.5M [00:00<?, ?B/s]"
287
+ ]
288
+ },
289
+ "metadata": {},
290
+ "output_type": "display_data"
291
+ },
292
+ {
293
+ "data": {
294
+ "application/vnd.jupyter.widget-view+json": {
295
+ "model_id": "58cb851b46974ae5a3cb066717520f8d",
296
+ "version_major": 2,
297
+ "version_minor": 0
298
+ },
299
+ "text/plain": [
300
+ "model.safetensors: 0%| | 0.00/4.05G [00:00<?, ?B/s]"
301
+ ]
302
+ },
303
+ "metadata": {},
304
+ "output_type": "display_data"
305
+ },
306
+ {
307
+ "name": "stdout",
308
+ "output_type": "stream",
309
+ "text": [
310
+ "step1743\n"
311
+ ]
312
+ },
313
+ {
314
+ "data": {
315
+ "application/vnd.jupyter.widget-view+json": {
316
+ "model_id": "742410c4a20d4a09b10c0c96c8977df5",
317
+ "version_major": 2,
318
+ "version_minor": 0
319
+ },
320
+ "text/plain": [
321
+ "adapter_model.safetensors: 0%| | 0.00/18.5M [00:00<?, ?B/s]"
322
+ ]
323
+ },
324
+ "metadata": {},
325
+ "output_type": "display_data"
326
+ },
327
+ {
328
+ "data": {
329
+ "application/vnd.jupyter.widget-view+json": {
330
+ "model_id": "48117e9ab89248038fa8a76ca9a191db",
331
+ "version_major": 2,
332
+ "version_minor": 0
333
+ },
334
+ "text/plain": [
335
+ "model.safetensors: 0%| | 0.00/4.05G [00:00<?, ?B/s]"
336
+ ]
337
+ },
338
+ "metadata": {},
339
+ "output_type": "display_data"
340
+ },
341
+ {
342
+ "name": "stdout",
343
+ "output_type": "stream",
344
+ "text": [
345
+ "step1162\n"
346
+ ]
347
+ },
348
+ {
349
+ "data": {
350
+ "application/vnd.jupyter.widget-view+json": {
351
+ "model_id": "9b9cef19d78c465f900e345ac44acae6",
352
+ "version_major": 2,
353
+ "version_minor": 0
354
+ },
355
+ "text/plain": [
356
+ "adapter_model.safetensors: 0%| | 0.00/18.5M [00:00<?, ?B/s]"
357
+ ]
358
+ },
359
+ "metadata": {},
360
+ "output_type": "display_data"
361
+ },
362
+ {
363
+ "data": {
364
+ "application/vnd.jupyter.widget-view+json": {
365
+ "model_id": "08878fc4eb1a407bb6238d4bec9e2817",
366
+ "version_major": 2,
367
+ "version_minor": 0
368
+ },
369
+ "text/plain": [
370
+ "model.safetensors: 0%| | 0.00/4.05G [00:00<?, ?B/s]"
371
+ ]
372
+ },
373
+ "metadata": {},
374
+ "output_type": "display_data"
375
+ },
376
+ {
377
+ "name": "stdout",
378
+ "output_type": "stream",
379
+ "text": [
380
+ "step581\n"
381
+ ]
382
+ },
383
+ {
384
+ "data": {
385
+ "application/vnd.jupyter.widget-view+json": {
386
+ "model_id": "f39fc2322c504c4b9d6b601bbcbbb923",
387
+ "version_major": 2,
388
+ "version_minor": 0
389
+ },
390
+ "text/plain": [
391
+ "adapter_model.safetensors: 0%| | 0.00/18.5M [00:00<?, ?B/s]"
392
+ ]
393
+ },
394
+ "metadata": {},
395
+ "output_type": "display_data"
396
+ },
397
+ {
398
+ "name": "stdout",
399
+ "output_type": "stream",
400
+ "text": [
401
+ "step1\n"
402
+ ]
403
+ }
404
+ ],
405
+ "source": [
406
+ "for branch in refs.branches:\n",
407
+ " if branch.name == \"main\":\n",
408
+ " continue\n",
409
+ "\n",
410
+ " model = PeftModelForCausalLM.from_pretrained(base_model, adapter_repo, revision=branch.name)\n",
411
+ " merged = model.merge_and_unload()\n",
412
+ " merged.push_to_hub(f\"{adapter_repo}_merged\", revision=branch.name)\n",
413
+ " print(branch.name)"
414
+ ]
415
+ },
416
+ {
417
+ "cell_type": "code",
418
+ "execution_count": 1,
419
+ "id": "24627996-2bc2-4944-a36c-0d86108a82c6",
420
+ "metadata": {},
421
+ "outputs": [],
422
+ "source": [
423
+ "from datasets import load_dataset, builder, load_from_disk\n",
424
+ "builder.has_sufficient_disk_space = lambda needed_bytes, directory=\".\": True "
425
+ ]
426
+ },
427
+ {
428
+ "cell_type": "code",
429
+ "execution_count": 4,
430
+ "id": "ab8916ed-d39b-4d64-b287-ea4569567005",
431
+ "metadata": {},
432
+ "outputs": [],
433
+ "source": [
434
+ "ds = load_from_disk(\"/home/toolkit/trl_results/vwxyzjn_summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144/vwxyzjn_EleutherAI_pythia-1b-deduped__dpo__tldr\")"
435
+ ]
436
+ },
437
+ {
438
+ "cell_type": "code",
439
+ "execution_count": 11,
440
+ "id": "6ee65d83-872d-4d96-9c81-be53f2fc54c1",
441
+ "metadata": {},
442
+ "outputs": [
443
+ {
444
+ "data": {
445
+ "text/plain": [
446
+ "'?'"
447
+ ]
448
+ },
449
+ "execution_count": 11,
450
+ "metadata": {},
451
+ "output_type": "execute_result"
452
+ }
453
+ ],
454
+ "source": [
455
+ "ds['generations_dpo__55513__1707379566'][0][-1]"
456
+ ]
457
+ },
458
+ {
459
+ "cell_type": "code",
460
+ "execution_count": 13,
461
+ "id": "a11a3760-515b-4a02-9053-853aa3b06fd4",
462
+ "metadata": {},
463
+ "outputs": [],
464
+ "source": [
465
+ "ppo_ds = load_from_disk(\"vwxyzjn_summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144/vwxyzjn_EleutherAI_pythia-1b-deduped__ppo_left_padding_new_nowhiten_reward__tldr\")"
466
+ ]
467
+ },
468
+ {
469
+ "cell_type": "code",
470
+ "execution_count": 24,
471
+ "id": "5d0c3c4f-71b1-46b0-abdb-036e1bd49a26",
472
+ "metadata": {},
473
+ "outputs": [],
474
+ "source": [
475
+ "text = ppo_ds[\"generations_ppo_left_padding_new_nowhiten_reward__55513__1709671967\"][0]"
476
+ ]
477
+ },
478
+ {
479
+ "cell_type": "code",
480
+ "execution_count": 3,
481
+ "id": "8d2ec316-db2b-481b-9e25-82b2dd363772",
482
+ "metadata": {},
483
+ "outputs": [
484
+ {
485
+ "name": "stderr",
486
+ "output_type": "stream",
487
+ "text": [
488
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
489
+ ]
490
+ }
491
+ ],
492
+ "source": [
493
+ "tokenizer = AutoTokenizer.from_pretrained(\"EleutherAI/pythia-6.9b-deduped\")"
494
+ ]
495
+ },
496
+ {
497
+ "cell_type": "code",
498
+ "execution_count": 4,
499
+ "id": "1fedd4e0-a0a5-4499-9561-605e5adc8d88",
500
+ "metadata": {},
501
+ "outputs": [
502
+ {
503
+ "data": {
504
+ "text/plain": [
505
+ "[1]"
506
+ ]
507
+ },
508
+ "execution_count": 4,
509
+ "metadata": {},
510
+ "output_type": "execute_result"
511
+ }
512
+ ],
513
+ "source": [
514
+ "tokenizer.encode('<|padding|>')"
515
+ ]
516
+ },
517
+ {
518
+ "cell_type": "code",
519
+ "execution_count": 5,
520
+ "id": "42b8260f-19a7-42e1-b809-a24deff3699c",
521
+ "metadata": {},
522
+ "outputs": [
523
+ {
524
+ "data": {
525
+ "application/vnd.jupyter.widget-view+json": {
526
+ "model_id": "032ad7febe1b4eb9899d22e5d44d23a0",
527
+ "version_major": 2,
528
+ "version_minor": 0
529
+ },
530
+ "text/plain": [
531
+ "Downloading readme: 0%| | 0.00/456 [00:00<?, ?B/s]"
532
+ ]
533
+ },
534
+ "metadata": {},
535
+ "output_type": "display_data"
536
+ },
537
+ {
538
+ "data": {
539
+ "application/vnd.jupyter.widget-view+json": {
540
+ "model_id": "954fb6b000ac4b29b0c9033f242aac73",
541
+ "version_major": 2,
542
+ "version_minor": 0
543
+ },
544
+ "text/plain": [
545
+ "Downloading data: 0%| | 0.00/122M [00:00<?, ?B/s]"
546
+ ]
547
+ },
548
+ "metadata": {},
549
+ "output_type": "display_data"
550
+ },
551
+ {
552
+ "data": {
553
+ "application/vnd.jupyter.widget-view+json": {
554
+ "model_id": "3e544b20f15d48f59e901fbaf896a24d",
555
+ "version_major": 2,
556
+ "version_minor": 0
557
+ },
558
+ "text/plain": [
559
+ "Downloading data: 0%| | 0.00/6.54M [00:00<?, ?B/s]"
560
+ ]
561
+ },
562
+ "metadata": {},
563
+ "output_type": "display_data"
564
+ },
565
+ {
566
+ "data": {
567
+ "application/vnd.jupyter.widget-view+json": {
568
+ "model_id": "62d3170267d742ceaf6bdad2a2cef5ae",
569
+ "version_major": 2,
570
+ "version_minor": 0
571
+ },
572
+ "text/plain": [
573
+ "Generating train split: 0%| | 0/160800 [00:00<?, ? examples/s]"
574
+ ]
575
+ },
576
+ "metadata": {},
577
+ "output_type": "display_data"
578
+ },
579
+ {
580
+ "data": {
581
+ "application/vnd.jupyter.widget-view+json": {
582
+ "model_id": "ce7cff29f9c042949acad2dcec3ddd6e",
583
+ "version_major": 2,
584
+ "version_minor": 0
585
+ },
586
+ "text/plain": [
587
+ "Generating test split: 0%| | 0/8552 [00:00<?, ? examples/s]"
588
+ ]
589
+ },
590
+ "metadata": {},
591
+ "output_type": "display_data"
592
+ }
593
+ ],
594
+ "source": [
595
+ "ds = load_dataset(\"sophiex/hh-rlhf\")"
596
+ ]
597
+ },
598
+ {
599
+ "cell_type": "code",
600
+ "execution_count": 9,
601
+ "id": "df1ccb5e-7206-45e7-a449-76b64fda72ed",
602
+ "metadata": {},
603
+ "outputs": [
604
+ {
605
+ "data": {
606
+ "application/vnd.jupyter.widget-view+json": {
607
+ "model_id": "a9abf38ffb184ba4a4995450a4413bf2",
608
+ "version_major": 2,
609
+ "version_minor": 0
610
+ },
611
+ "text/plain": [
612
+ "Map (num_proc=16): 0%| | 0/160800 [00:00<?, ? examples/s]"
613
+ ]
614
+ },
615
+ "metadata": {},
616
+ "output_type": "display_data"
617
+ },
618
+ {
619
+ "data": {
620
+ "application/vnd.jupyter.widget-view+json": {
621
+ "model_id": "8e0af258d31742998176207df5cac540",
622
+ "version_major": 2,
623
+ "version_minor": 0
624
+ },
625
+ "text/plain": [
626
+ "Map (num_proc=16): 0%| | 0/8552 [00:00<?, ? examples/s]"
627
+ ]
628
+ },
629
+ "metadata": {},
630
+ "output_type": "display_data"
631
+ }
632
+ ],
633
+ "source": [
634
+ "tokds = ds.map(lambda x: tokenizer(x['prompt'] + x['chosen']), num_proc=16)"
635
+ ]
636
+ },
637
+ {
638
+ "cell_type": "code",
639
+ "execution_count": 12,
640
+ "id": "2e72f7f3-b047-4eab-99a7-cc08d19efeba",
641
+ "metadata": {},
642
+ "outputs": [
643
+ {
644
+ "data": {
645
+ "application/vnd.jupyter.widget-view+json": {
646
+ "model_id": "99c7615c05da46d6be5c68ecfba3e748",
647
+ "version_major": 2,
648
+ "version_minor": 0
649
+ },
650
+ "text/plain": [
651
+ "Map: 0%| | 0/160800 [00:00<?, ? examples/s]"
652
+ ]
653
+ },
654
+ "metadata": {},
655
+ "output_type": "display_data"
656
+ },
657
+ {
658
+ "data": {
659
+ "application/vnd.jupyter.widget-view+json": {
660
+ "model_id": "c9b61731ac524d8c8ad1a44e47bb12b2",
661
+ "version_major": 2,
662
+ "version_minor": 0
663
+ },
664
+ "text/plain": [
665
+ "Map: 0%| | 0/8552 [00:00<?, ? examples/s]"
666
+ ]
667
+ },
668
+ "metadata": {},
669
+ "output_type": "display_data"
670
+ }
671
+ ],
672
+ "source": [
673
+ "tokds = tokds.map(lambda x: {\"length\": len(x['input_ids'])})"
674
+ ]
675
+ },
676
+ {
677
+ "cell_type": "code",
678
+ "execution_count": 16,
679
+ "id": "413e3eb3-ad2f-4f71-9f27-894c4942be4f",
680
+ "metadata": {},
681
+ "outputs": [],
682
+ "source": [
683
+ "import seaborn as sns"
684
+ ]
685
+ },
686
+ {
687
+ "cell_type": "code",
688
+ "execution_count": 17,
689
+ "id": "a4c42a89-88dd-4f3d-82cb-1fd7ecb60815",
690
+ "metadata": {},
691
+ "outputs": [
692
+ {
693
+ "data": {
694
+ "text/plain": [
695
+ "<seaborn.axisgrid.FacetGrid at 0x7f8abec580d0>"
696
+ ]
697
+ },
698
+ "execution_count": 17,
699
+ "metadata": {},
700
+ "output_type": "execute_result"
701
+ },
702
+ {
703
+ "data": {
704
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAekAAAHpCAYAAACmzsSXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAz8UlEQVR4nO3deXhV1b3/8U+mE3IgI5gEymAAi6CgBRXTwSKkBEqrXrnPNZYKrYDFG6hAq5Rbq5b2Xvg5gAgRbCvEai3qfRRbsCCEQSwBMSUVUHkcsHgrCZWckzBlXr8/6NnNycAQTrJXct6v5zmPOXuv7Hz3Jp5P9t5rrxVhjDECAADWiXS7AAAA0DxCGgAASxHSAABYipAGAMBShDQAAJYipAEAsBQhDQCApQjp82CMUUVFhXikHADQngjp83D8+HElJibq+PHjbpcCAAgjhDQAAJYipAEAsBQhDQCApQhpAAAsRUgDAGApQhoAAEsR0gAAWIqQBgDAUoQ0AACWIqQBALAUIQ0AgKUIaQAALEVIAwBgKUIaAABLEdIAAFiKkAYAwFKENAAAliKkAQCwFCENAIClCOkOxhijsrIyGWPcLgUA0MYI6Q7G5/Mp59G18vl8bpcCAGhjhHQHFOONd7sEAEA7IKRdxuVrAEBLCGmXcfkaANASQtoCZ7t8zZk2AIQvQtpynGkDQPgipDsAOooBQHgipAEAsBQhDQCApQhpAAAsRUgDAGApQhoAAEsR0gAAWIqQ7uAY7AQAOi9CuoNjsBMA6LwI6U6AwU4AoHMipAEAsBQhDQCApQhpAAAsRUgDAGApQhoAAEsR0gAAWIqQBgDAUoQ0AACWIqQBALAUIQ0AgKUIaQAALEVIAwBgKUIaAABLEdKWYX5oAECANSG9aNEiRUREaPbs2c6yyspK5ebmqnv37urWrZsmTpyo0tLSoO87fPiwJkyYIK/Xq9TUVN17772qra0NarNt2zYNHz5csbGxGjhwoPLz89thj1qH+aEBAAFWhPSePXv01FNPadiwYUHL58yZoz/+8Y966aWXtH37dn322We69dZbnfV1dXWaMGGCqqurtXPnTj3zzDPKz8/XAw884LQ5dOiQJkyYoBtvvFHFxcWaPXu2pk2bpo0bN7bb/l0o5ocGAEgWhPSJEyc0adIk/frXv1ZycrKzvLy8XE8//bQWL16s0aNHa8SIEVq9erV27typXbt2SZJef/11vfvuu3ruued09dVXa/z48frFL36hvLw8VVdXS5JWrlypjIwMPfbYYxo8eLBmzpypf//3f9eSJUtarKmqqkoVFRVBLwAA2pvrIZ2bm6sJEyYoKysraHlRUZFqamqCll9++eXq27evCgsLJUmFhYUaOnSo0tLSnDbZ2dmqqKjQgQMHnDaNt52dne1sozkLFy5UYmKi8+rTp89F7ycAABfK1ZBes2aN/vKXv2jhwoVN1pWUlMjj8SgpKSloeVpamkpKSpw2DQM6sD6w7mxtKioqdPr06Wbrmj9/vsrLy53Xp59+2qr9AwDgYkS79YM//fRT3XPPPdq0aZO6dOniVhnNio2NVWxsrNtlAADCnGtn0kVFRTp69KiGDx+u6OhoRUdHa/v27XriiScUHR2ttLQ0VVdXy+/3B31faWmp0tPTJUnp6elNensH3p+rTUJCguLi4tpo7wAAuHiuhfSYMWO0b98+FRcXO69rrrlGkyZNcr6OiYlRQUGB8z0HDx7U4cOHlZmZKUnKzMzUvn37dPToUafNpk2blJCQoCFDhjhtGm4j0CawDQAAbOXa5e74+HhdeeWVQcu6du2q7t27O8unTp2quXPnKiUlRQkJCZo1a5YyMzN1/fXXS5LGjh2rIUOG6I477tDDDz+skpIS3X///crNzXUuV8+YMUPLly/XfffdpzvvvFNbtmzRiy++qPXr17fvDgMAcIFcC+nzsWTJEkVGRmrixImqqqpSdna2nnzySWd9VFSU1q1bp7vvvluZmZnq2rWrpkyZogULFjhtMjIytH79es2ZM0dLly5V79699Zvf/EbZ2dlu7BIAAOfNqpDetm1b0PsuXbooLy9PeXl5LX5Pv3799Nprr511u6NGjdLevXtDUSIAAO3G9eekAQBA8whpAAAsRUh3IMYYJt4AgDBCSHcgPp9P05avV12jWb4AAJ0TId3BRMcxQxYAhAurenfj/DS87G2McbkaAEBbIaQ7oJrKk8p9do+iY6K0NGe42+UAANoIl7s7KI83XjHehCbLjTEqKyvjDBsAOgFCupPx+XzKeXQtvcABoBMgpDuhGC+dywCgMyCkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHSAABYihHHOgBmvwKA8ERIdwA1p88MA1pfc1p1dcyABQDhgsvdHYTHG68YZsACgLBCSAMAYClCGgAASxHSHZgxRn6/3+0yAABthJDuwGpOndCc/B2qq6UzGQB0RoR0Bxft7ep2CQCANkJIAwBgKUIaAABLMZiJhRqOMGaMcbkaAIBbCGkLBUYYi4qO1ILxA9wuBwDgEi53W8rjjZcUeab3NkOBAkBYIqQtR+9tAAhfhDQAAJYipAEAsBQdxzoBeoMDQOdESHcCNZVneoNHx0Rpac7woHWBAE9OTlZERIRLFQIAWoPL3RYIBOmxY8dUVlbWqm14vPGK8SY0We7z+ZTz6FrnTBsA0HFwJm2BwHPR9TWnVVlRHvIe3THe+JBuDwDQPghpS3i88aqrjlYtM1oBAP6Jy90AAFiKkAYAwFKENAAAliKkAQCwFCHdiRhj5Pf73S4DABAihHQnUnPqxJlZs+ghDgCdAiHdyTBrFgB0HoQ0AACWIqQBALAUIQ0AgKUIaQAALEVIAwBgKUIaAABLEdIAAFiKkAYAwFKENAAAliKkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHSAABYKtrtAhB6xhj5fD7n6+bWJScnKyIiwo3yAADniTPpTqim8qRyn92jySsK5Pf7g9b5fD7lPLrWCXEAgL04k3ZRwzPeUPN44xUV0/w/b4w3vk1+JgAgtDiTdpHP59O05etVV1frdikAAAsR0i6LjuOsFgDQPEIaAABLEdIAAFiKkAYAwFKENAAAliKkAQCwFCENAIClCOlOzBjTZMQxAEDHQUh3YjWnTmhO/g7V1TJYCgB0RIR0Jxft7ep2CQCAViKkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHSAABYipAGAMBShDQAAJYipAEAsBQhDQCApQhpAAAsRUgDAGApQjrMGWNUVlYmY4zbpQAAGiGkw5zP51POo2vl8/ncLgUA0IirIb1ixQoNGzZMCQkJSkhIUGZmpv70pz856ysrK5Wbm6vu3burW7dumjhxokpLS4O2cfjwYU2YMEFer1epqam69957Vdto/uRt27Zp+PDhio2N1cCBA5Wfn98eu9dhxHjj3S4BANAMV0O6d+/eWrRokYqKivT2229r9OjRuvnmm3XgwAFJ0pw5c/THP/5RL730krZv367PPvtMt956q/P9dXV1mjBhgqqrq7Vz504988wzys/P1wMPPOC0OXTokCZMmKAbb7xRxcXFmj17tqZNm6aNGze2+/4CAHAhot384d/+9reD3v/3f/+3VqxYoV27dql37956+umn9fzzz2v06NGSpNWrV2vw4MHatWuXrr/+er3++ut69913tXnzZqWlpenqq6/WL37xC82bN08PPfSQPB6PVq5cqYyMDD322GOSpMGDB+vNN9/UkiVLlJ2d3WxdVVVVqqqqct5XVFS00REAAKBl1tyTrqur05o1a3Ty5EllZmaqqKhINTU1ysrKctpcfvnl6tu3rwoLCyVJhYWFGjp0qNLS0pw22dnZqqiocM7GCwsLg7YRaBPYRnMWLlyoxMRE59WnT59Q7ioAAOfF9ZDet2+funXrptjYWM2YMUOvvPKKhgwZopKSEnk8HiUlJQW1T0tLU0lJiSSppKQkKKAD6wPrztamoqJCp0+fbram+fPnq7y83Hl9+umnodhVAAAuiKuXuyVp0KBBKi4uVnl5uf73f/9XU6ZM0fbt212tKTY2VrGxsa7WAACA6yHt8Xg0cOBASdKIESO0Z88eLV26VLfddpuqq6vl9/uDzqZLS0uVnp4uSUpPT9dbb70VtL1A7++GbRr3CC8tLVVCQoLi4uLaarcAALhorl/ubqy+vl5VVVUaMWKEYmJiVFBQ4Kw7ePCgDh8+rMzMTElSZmam9u3bp6NHjzptNm3apISEBA0ZMsRp03AbgTaBbQAAYCtXz6Tnz5+v8ePHq2/fvjp+/Lief/55bdu2TRs3blRiYqKmTp2quXPnKiUlRQkJCZo1a5YyMzN1/fXXS5LGjh2rIUOG6I477tDDDz+skpIS3X///crNzXUuV8+YMUPLly/XfffdpzvvvFNbtmzRiy++qPXr17u56wAAnJOrIX306FFNnjxZR44cUWJiooYNG6aNGzfqG9/4hiRpyZIlioyM1MSJE1VVVaXs7Gw9+eSTzvdHRUVp3bp1uvvuu5WZmamuXbtqypQpWrBggdMmIyND69ev15w5c7R06VL17t1bv/nNb1p8/AoAAFu4GtJPP/30Wdd36dJFeXl5ysvLa7FNv3799Nprr511O6NGjdLevXtbVSMAAG6x7p40AAA4w/Xe3Wh7xhj5fD7nvwCAjoGQDgM1lSeV++we1decVmVFubzd0879TQAA1xHSYcLjjVdddXSTGcIAAPbinjQAAJYipAEAsBQhDQCApQhpAAAsRUgDAGApQhoXxBijsrIyGWPcLgUAOj1CGhfE5/Mp59G1DIoCAO2AkMYFi/HGu10CAIQFQhoAAEsR0gAAWIphQcNQw4k26AAGAPYipMNQYMKN6JgoLc0Z7nY5AIAWENJhyuONV1QM//wAYDPuSQMAYClCGgAASxHSAABYipAGAMBShDSCMDY3ANiDkEYQxuYGAHsQ0miCsbkBwA6ENAAAliKkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHSYcwYI7/f73YZAIAWMFdhGKs5dUJz8t9XXEqqM3gJI40BgD0I6TAX7e2qmsqTyn12j6JjorQ0Z7jbJQEA/omQhiTJ441XVAy/DgBgE+5JAwBgKUIaAABLEdIAAFiKkAYAwFKENAAAliKkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHSAABYipAGAMBSrQrp/v3769ixY02W+/1+9e/f/6KLgp2MMc6UlgCAtteqkP7kk09UV1fXZHlVVZX+/ve/X3RRsJPP59O05etVV1vrdikAEBYuaG7CP/zhD87XGzduVGJiovO+rq5OBQUFuvTSS0NWHOwTHRfvdgkAEDYuKKRvueUWSVJERISmTJkStC4mJkaXXnqpHnvssZAVh/ZljJHf73e7DADAP11QSNfX10uSMjIytGfPHvXo0aNNioI7ak6d0Jz89+XtnuZ2KQAAXWBIBxw6dCjUdcAS0d6ubpcAAPinVoW0JBUUFKigoEBHjx51zrADVq1addGFwQ6BHt3JyclulwIAYadVvbt//vOfa+zYsSooKNDnn38un88X9ELn4fP5lPPoWv5dAcAFrTqTXrlypfLz83XHHXeEuh5YKMZLj24AcEOrQrq6ulpf/vKXQ10LLNFw0BJjjMvVAED4atXl7mnTpun5558PdS2wRE3lSeU+u0eTVxTwSBYAuKhVZ9KVlZX61a9+pc2bN2vYsGGKiYkJWr948eKQFAf3eLzxioppdb9CAEAItOpT+J133tHVV18tSdq/f3/QuoiIiIsuCgAAtDKkt27dGuo6AABAI0xVCQCApVp1Jn3jjTee9bL2li1bWl0QAAA4o1UhHbgfHVBTU6Pi4mLt37+/ycQbAACgdVoV0kuWLGl2+UMPPaQTJ05cVEEAAOCMkN6T/u53v8u43QAAhEhIQ7qwsFBdunQJ5SYBAAhbrbrcfeuttwa9N8boyJEjevvtt/Wzn/0sJIUBABDuWhXSiYmJQe8jIyM1aNAgLViwQGPHjg1JYQAAhLtWhfTq1atDXQcAAGjkogZnLioq0nvvvSdJuuKKK/SlL30pJEUBAIBWhvTRo0eVk5Ojbdu2KSkpSZLk9/t14403as2aNbrkkktCWSMAAGGpVb27Z82apePHj+vAgQMqKytTWVmZ9u/fr4qKCv3whz8MdY0AAISlVp1Jb9iwQZs3b9bgwYOdZUOGDFFeXh4dxwAACJFWnUnX19c3mUNakmJiYlRfX3/RRQEAgFaG9OjRo3XPPffos88+c5b9/e9/15w5czRmzJiQFQcAQDhrVUgvX75cFRUVuvTSSzVgwAANGDBAGRkZqqio0LJly0JdIwAAYalV96T79Omjv/zlL9q8ebPef/99SdLgwYOVlZUV0uIAAAhnF3QmvWXLFg0ZMkQVFRWKiIjQN77xDc2aNUuzZs3StddeqyuuuEI7duxoq1oBAAgrFxTSjz/+uKZPn66EhIQm6xITE/WDH/xAixcvDllxAACEswsK6b/+9a8aN25ci+vHjh2roqKiiy4KdjDGyO/3u10GAIStCwrp0tLSZh+9CoiOjtY//vGPiy4Kdqg5dUJz8neorrbW7VIAICxdUEh/4Qtf0P79+1tc/84776hnz54XXRTsEe3t6nYJABC2Liikv/nNb+pnP/uZKisrm6w7ffq0HnzwQX3rW98KWXEAAISzC3oE6/7779fLL7+sL37xi5o5c6YGDRokSXr//feVl5enuro6/fSnP22TQgEACDcXFNJpaWnauXOn7r77bs2fP1/GGElSRESEsrOzlZeXp7S0tDYpFO4xxsjn87ldBgCEnQsezKRfv3567bXX5PP59OGHH8oYo8suu0zJycltUR8sUFN5UrnP7lF9zWnV1dUqyu2CACBMtGrEMUlKTk7WtddeG8paYDGPN1511dGqPc4ZNQC0l1aN3R0qCxcu1LXXXqv4+Hilpqbqlltu0cGDB4PaVFZWKjc3V927d1e3bt00ceJElZaWBrU5fPiwJkyYIK/Xq9TUVN17772qbfTY0LZt2zR8+HDFxsZq4MCBys/Pb+vdAwDgorga0tu3b1dubq527dqlTZs2qaamRmPHjtXJkyedNnPmzNEf//hHvfTSS9q+fbs+++wz3Xrrrc76uro6TZgwQdXV1dq5c6eeeeYZ5efn64EHHnDaHDp0SBMmTNCNN96o4uJizZ49W9OmTdPGjRvbdX8BALgQrb7cHQobNmwIep+fn6/U1FQVFRXphhtuUHl5uZ5++mk9//zzGj16tCRp9erVGjx4sHbt2qXrr79er7/+ut59911t3rxZaWlpuvrqq/WLX/xC8+bN00MPPSSPx6OVK1cqIyNDjz32mKQzk4G8+eabWrJkibKzs5vUVVVVpaqqKud9RUVFGx4FAACa5+qZdGPl5eWSpJSUFElSUVGRampqgmbXuvzyy9W3b18VFhZKkgoLCzV06NCgXuXZ2dmqqKjQgQMHnDaNZ+jKzs52ttHYwoULlZiY6Lz69OkTup0EAOA8WRPS9fX1mj17tr7yla/oyiuvlCSVlJTI4/EoKSkpqG1aWppKSkqcNo0f+wq8P1ebiooKnT59ukkt8+fPV3l5ufP69NNPQ7KPAABcCFcvdzeUm5ur/fv3680333S7FMXGxio2NtbtMqzV8Lnp5ORkRUREuFwRAHROVpxJz5w5U+vWrdPWrVvVu3dvZ3l6erqqq6ubzMRUWlqq9PR0p03j3t6B9+dqk5CQoLi4uFDvTqcXeG568ooCBjkBgDbkakgbYzRz5ky98sor2rJlizIyMoLWjxgxQjExMSooKHCWHTx4UIcPH1ZmZqYkKTMzU/v27dPRo0edNps2bVJCQoKGDBnitGm4jUCbwDZw4TzeeMV4m84rDgAIHVcvd+fm5ur555/Xq6++qvj4eOcecmJiouLi4pSYmKipU6dq7ty5SklJUUJCgmbNmqXMzExdf/31ks7MYT1kyBDdcccdevjhh1VSUqL7779fubm5ziXrGTNmaPny5brvvvt05513asuWLXrxxRe1fv161/YdAIBzcfVMesWKFSovL9eoUaPUs2dP5/XCCy84bZYsWaJvfetbmjhxom644Qalp6fr5ZdfdtZHRUVp3bp1ioqKUmZmpr773e9q8uTJWrBggdMmIyND69ev16ZNm3TVVVfpscce029+85tmH78CAMAWrp5JByboOJsuXbooLy9PeXl5LbYJjCd+NqNGjdLevXsvuEYAANxiRccxdFzGGJWVlZ3XH1wAgAtDSOOi+P1+5Ty6ll7eANAGCGlctBhvvNslAECnREgDAGApQhoAAEsR0gAAWIqQBgDAUoQ0AACWIqQBALAUIQ0AgKUIaQAALEVIAwBgKUIaAABLEdIAAFiKkAYAwFKENAAAliKk0WrGGPn9frfLAIBOi5BGq9WcOqE5+TtUV1vrdikA0CkR0rgo0d6ubpcAAJ0WIQ0AgKUIaQAALEVII6SMMSorK5Mxxu1SAKDDI6QRUj6fTzmPrpXP53O7FADo8AhphFyMN97tEgCgUyCkAQCwFCENAIClCGkAACxFSAMAYClCGgAAS0W7XQA6PmOM88gVz0cDQOgQ0rhoNZUnlfvsHkXHRGlpznC3ywGAToOQRkh4vPGKiuHXCQBCiXvSAABYipAGAMBShDQAAJYipAEAsBQhDQCApQhphIwxRn6/3+0yAKDTIKQRMjWnTmhO/g7V1da6XQoAdAqENEIq2tvV7RIAoNMgpAEAsBQhjTZljFFZWRljegNAKxDSaFM+n085j651JuAAAJw/QhptLsYb73YJANAhEdIAAFiKkAYAwFKENAAAliKkAQCwFCENAIClCGkAACwV7XYB6HyMMc5z0QxiAgCtR0gj5GoqTyr32T2KjonS0pzhbpcDAB0WIY024fHGKyqGXy8AuBjck0a7YixvADh/hDTaFWN5A8D543qkCwIdq8I1qBjLGwDODyHtAp/Pp8krClR96oTq6mrdLgcAYCkud7skxpugmLjOfUZpjJHf73e7DADosAhptJmaUyc0J3+H6mq5WgAArUFIo01Fe7u6XQIAdFiENAAAliKkAQCwFCENAIClCGkAACxFSAMAYClCGgAASxHScBUTbgBAywhpuIoJNwCgZYQ0XMeEGwDQPEIaAABLMQsW2lxgas7A1wCA80NIo83VVJ5U7rN7FB0TpaU5w90uBwA6DEIa7cLjjVdkdBRTVwLABeCeNNoNU1cCwIUhpNGumLoSAM4fIQ0AgKUIaQAALEVIAwBgKUIaAABLEdIAAFiKkAYAwFIMZoJ2xzChAHB+CGm0O4YJBYDzw+VuuMLjjVeMN6HJcmOMysrKOMMGABHSsIzP51POo2udy+EAEM4IabjGGNPshBsx3vj2LwYALERIwzVMuAEAZ+dqSL/xxhv69re/rV69eikiIkJr164NWm+M0QMPPKCePXsqLi5OWVlZ+uCDD4LalJWVadKkSUpISFBSUpKmTp2qEydOBLV555139LWvfU1dunRRnz599PDDD7f1ruE8MeEGALTM1ZA+efKkrrrqKuXl5TW7/uGHH9YTTzyhlStXavfu3eratauys7NVWVnptJk0aZIOHDigTZs2ad26dXrjjTd01113OesrKio0duxY9evXT0VFRXrkkUf00EMP6Ve/+lWb7x8AABfD1Uewxo8fr/Hjxze7zhijxx9/XPfff79uvvlmSdJvf/tbpaWlae3atcrJydF7772nDRs2aM+ePbrmmmskScuWLdM3v/lNPfroo+rVq5d+97vfqbq6WqtWrZLH49EVV1yh4uJiLV68OCjMG6qqqlJVVZXzvqKiIsR7DgDAuVl7T/rQoUMqKSlRVlaWsywxMVEjR45UYWGhJKmwsFBJSUlOQEtSVlaWIiMjtXv3bqfNDTfcII/H47TJzs7WwYMHW+xBvHDhQiUmJjqvPn36tMUuAgBwVtaGdElJiSQpLS0taHlaWpqzrqSkRKmpqUHro6OjlZKSEtSmuW00/BmNzZ8/X+Xl5c7r008/vfgdAgDgAjHiWDNiY2MVGxvrdhkAgDBn7Zl0enq6JKm0tDRoeWlpqbMuPT1dR48eDVpfW1ursrKyoDbNbaPhz4C7AmN5M9IYAASzNqQzMjKUnp6ugoICZ1lFRYV2796tzMxMSVJmZqb8fr+KioqcNlu2bFF9fb1GjhzptHnjjTdUU1PjtNm0aZMGDRqk5OTkdtobnE1gLO/JKwqaHdwEAMKVqyF94sQJFRcXq7i4WNKZzmLFxcU6fPiwIiIiNHv2bP3yl7/UH/7wB+3bt0+TJ09Wr169dMstt0iSBg8erHHjxmn69Ol666239Oc//1kzZ85UTk6OevXqJUn6zne+I4/Ho6lTp+rAgQN64YUXtHTpUs2dO9elvUZzWhrLGwDCmav3pN9++23deOONzvtAcE6ZMkX5+fm67777dPLkSd11113y+/366le/qg0bNqhLly7O9/zud7/TzJkzNWbMGEVGRmrixIl64oknnPWJiYl6/fXXlZubqxEjRqhHjx564IEHWnz8CgAAW7ga0qNGjTrrPciIiAgtWLBACxYsaLFNSkqKnn/++bP+nGHDhmnHjh2trhMAADdYe08aAIBwR0jDGi3NigUA4YqQhjWYFQsAghHSsAqzYgHAvxDSsJoxhkFOAIQtQhrWaTgCWVlZmXIeXdviZCgA0JkxdjesExiBLDomSktzhivGG+92SQDgCkIaVvJ44xUVw68ngPDG5W4AACxFSAMAYClCGtZicBMA4Y6QhrUY3ARAuCOkYTUGNwEQzghpAAAsRUgDAGApQhoAAEsR0uhQGMsbQDghpNGh+Hw+xvIGEDYYdxHWC0y4EfgvY3kDCBeENKwXmHCjvua0KivK5e2e5nZJANAuCGl0CB5vvOqqo1XLwCYAwgj3pAEAsBQhDQCApQhpAAAsRUgDAGApQhoAAEsR0gAAWIpHsNDhNBzcxBijiIgIpaSkKCIiwu3SACCkCGl0OI0HN4mJ66qXfjJRKSkpbpcGACFFSKNDaji4SUxcN7fLAYA2wT1pAAAsRUijU2EqSwCdCSGNToWpLAF0JtyTRocX6O0d+JqpLAF0FoQ0OrxAb+/omCgtzRnudjkAEDKENDoFjzdeUTH8OgPoXLgnjU7DGCO/3+92GQAQMoQ0Oo2aUyc0J3+H6mpr3S4FAEKCkEanEu3t6nYJABAyhDQAAJYipAEAsBQhDQCApQhpdGqNhwll2FAAHQkhjU6t8TChDBsKoCNh9Ad0OucaJjQ6rpuzPjk5WREREe1eIwCcD0IanU5Lw4QGwrvm9Jn1UdGRWpozXMnJyUpJSSGsAViHy93olDzeeMV4E4KW+Xw+TVu+XnV1tfJ44yVF6s6l6/Qf/+9lLn8DsBIhjU6ruWFCo+MaXfr2dlWMt1s7VgUA54/L3ei0zgwT+r7iUlI5UwbQIRHS6NSivV2de9T1NadVV8e43gA6Di53Iyx4vPGKaXSpGwBsR0gDAGApQhoAAEsR0gAAWIqQBgDAUoQ0AACW4hEshL3AcKGBmbEYIhSALQhphL2Gz1HXVtfo1zPGqH///gQ1ANdxuRtQg+eoIyN011NbGKEMgBUIaaCRhmN5G2NUVlbmXAoHgPZESANn4fP5lPPoWs6sAbiCkAbOIcbLcKIA3EHHMaCRQG/vwNcA4BZCGmgk0Ns7OiZKS3OGu10OgDBGSAPN8HjjFRXD/x4A3MU9aQAALEVIAy0wxsjv9zdZxiNZANoLIQ20oObUCc3J36Hamhr5fD6VlZWprKyMR7IAtBtuugFnEe3t6nQki4qO1ILxA3gkC0C74UwaOA8eb7ykSM3J36G62lpnOZe/AbQlQhq4ANHerkHvGZEMQFsipNtZw4Ey0LEF/i25/A2grRDS7czn82na8vWqq6k9d2NYKRDOH3/8saYtXx/UsYzL3gBCiY5jLoiO48yrI2s4/3SEJy5ohLJnZoxWRESEkpOTmY8awEXjTBpoBWf+6Qbvo+Pi9cknnwTdo6ZjGYCLQUgDIRJ4rjrSE+cso2MZgIvB5W4ghBr3/pak6LhuTkhzGRzAhSCkgRALdCwL/LfmdMv3rANtCG8AzSGkgRBr2LGssqJc0d6uivPGKzI6Sp988ol+8r/FWjltlJKTk2WM0e2Pvao1P75FKSkpbpcOwDKENNAGPN541VVHq7bB6GRn7lm/r2hv1/MaZpSzbAB0HAPaUeCedcNhRgPPWR87dkyff/65Pv/8cx07dkwff/yxch5d60zsUVZWpvr6+qDe4vQeBzo3zqQBFzWcwCNweVySvN3TzzyHHdNFn3zyiR7c8LGMMVowfoBzuTwpKUk+n0//ueoNLpcDnRQhDVig8eXxwPvTx32ak79DSX0uU131ac3J3+FcLg+EelxKqtN7PCkpSX6/n0vkQCfB5W7Acg0f62p4uTwmLj7oTHzyigIdOnRItz3yij7++GMugwOdACENdAKBEc/8fr8UEancZ/fojic36+OPPw4Kau5hAx0LIQ10EoERz+rqap2OadNXFuijjz7SsWPHnM5nDc+0G3dEA2AX7kkDnUiTEc8iI3Tn0nXydk93HvkKnGkH3jfsiGaMce5lB4K74b3t5OTkM2frYvQ0oD2EVUjn5eXpkUceUUlJia666iotW7ZM1113ndtlAW0q2tv1nx3R/tXxLK7R+4Yd0QI9yxv3NK+trtEjOSOcnuZP3D5CGRkZTmgHepsHBAI/JSWl1WHecP51/ihAOAqbkH7hhRc0d+5crVy5UiNHjtTjjz+u7OxsHTx4UKmpqW6XB7SLxmfaDTuiBXqXt9TTvLY2uKf59JUFQaG9YPwA/fi5Pys2MfVfo6118erXM8Y4Z+mSWjxTb+693+/X7Bf2yhijpTnDlZSUFLQ+8EeAJOeyfXNn/saYoO1HRESc9Y8K6czkKIHe8oHvDcUfHMnJyU69DbfZ3OA1Df9Iadxz/1zt+aOmcwibkF68eLGmT5+u73//+5KklStXav369Vq1apV+8pOfBLWtqqpSVVWV8768/MwZRUVFxUXXUVFRoUr/P2SMUX1NpSqPn9lmZGQU73lvxftztT3lK3Xez1i2Vok9M1RfU6kZy/YpJrarorqcUn1NpWqrz7zufGL9mfanjkuSvImXXND7wPZv/+UzTdbHdOmqJ2eMkyTNWLZWdXV1zvq6+notvD1TP//DftWcPhm0/ajoKD1405Wa99utiu2W0uz2Zv3qdf3ytpHO99fV1+vJGeOUlJSk1vD7/Zr1q9e17K6xTr2RMV2cbTZcH/gZfr9fc5/9syTpwZuu1P0v7HbWn6v94ju+0upacXahHJMgPj7+7H9MmTBQVVVloqKizCuvvBK0fPLkyeamm25q0v7BBx80knjx4sWLF682fZWXl581v8LiTPrzzz9XXV2d0tLSgpanpaXp/fffb9J+/vz5mjt3rvM+0AO2e/furb58VFFRoT59+ujTTz9VQkJCq7bR3jpizVLHrLsj1ixRd3vqiDVLHbPu9qw5Pr75sfsDwiKkL1RsbKxiY2ODloXqslFCQkKH+UUN6Ig1Sx2z7o5Ys0Td7akj1ix1zLptqDksnpPu0aOHoqKiVFpaGrS8tLRU6enpLlUFAMDZhUVIezwejRgxQgUFBc6y+vp6FRQUKDMz08XKAABoWdhc7p47d66mTJmia665Rtddd50ef/xxnTx50unt3dZiY2P14IMPNrmMbrOOWLPUMevuiDVL1N2eOmLNUses26aaI4wJn/EAly9f7gxmcvXVV+uJJ57QyJEj3S4LAIBmhVVIAwDQkYTFPWkAADoiQhoAAEsR0gAAWIqQBgDAUoR0O8jLy9Oll16qLl26aOTIkXrrrbdcq2XhwoW69tprFR8fr9TUVN1yyy06ePBgUJtRo0Y5MwUFXjNmzAhqc/jwYU2YMEFer1epqam69957nVmT2sJDDz3UpKbLL7/cWV9ZWanc3Fx1795d3bp108SJE5sMXtPeNV966aVNao6IiFBubq4ke47zG2+8oW9/+9vq1auXIiIitHbt2qD1xhg98MAD6tmzp+Li4pSVlaUPPvggqE1ZWZkmTZqkhIQEJSUlaerUqTpx4kRQm3feeUdf+9rX1KVLF/Xp00cPP/xwm9VdU1OjefPmaejQoeratat69eqlyZMn67PPPgvaRnP/RosWLWqzus91rL/3ve81qWfcuHFBbWw71pKa/T2PiIjQI4884rRp72N9Pp91ofrc2LZtm4YPH67Y2FgNHDhQ+fn5ra67iRDNYYEWrFmzxng8HrNq1Spz4MABM336dJOUlGRKS0tdqSc7O9usXr3a7N+/3xQXF5tvfvObpm/fvubEiRNOm69//etm+vTp5siRI86r4SDwtbW15sorrzRZWVlm79695rXXXjM9evQw8+fPb7O6H3zwQXPFFVcE1fSPf/zDWT9jxgzTp08fU1BQYN5++21z/fXXmy9/+cuu1nz06NGgejdt2mQkma1btxpj7DnOr732mvnpT39qXn75ZSOpyUQ0ixYtMomJiWbt2rXmr3/9q7nppptMRkaGOX36tNNm3Lhx5qqrrjK7du0yO3bsMAMHDjS33367s768vNykpaWZSZMmmf3795vf//73Ji4uzjz11FNtUrff7zdZWVnmhRdeMO+//74pLCw01113nRkxYkTQNvr162cWLFgQ9G/Q8P+FUNd9rmM9ZcoUM27cuKB6ysrKgtrYdqyNMUH1HjlyxKxatcpERESYjz76yGnT3sf6fD7rQvG58fHHHxuv12vmzp1r3n33XbNs2TITFRVlNmzY0Kq6GyOk29h1111ncnNznfd1dXWmV69eZuHChS5W9S9Hjx41ksz27dudZV//+tfNPffc0+L3vPbaayYyMtKUlJQ4y1asWGESEhJMVVVVm9T54IMPmquuuqrZdX6/38TExJiXXnrJWfbee+8ZSaawsNC1mhu75557zIABA0x9fb0xxs7j3PgDuL6+3qSnp5tHHnnEWeb3+01sbKz5/e9/b4wx5t133zWSzJ49e5w2f/rTn0xERIT5+9//bowx5sknnzTJyclBdc+bN88MGjSoTepuzltvvWUkmb/97W/Osn79+pklS5a0+D1tWXdLIX3zzTe3+D0d5VjffPPNZvTo0UHL3DzWxjT9rAvV58Z9991nrrjiiqCfddttt5ns7OyQ1M3l7jZUXV2toqIiZWVlOcsiIyOVlZWlwsJCFyv7l8Bc2Y3nR/3d736nHj166Morr9T8+fN16tQpZ11hYaGGDh0aNKtYdna2KioqdODAgTar9YMPPlCvXr3Uv39/TZo0SYcPH5YkFRUVqaamJug4X3755erbt69znN2qOaC6ulrPPfec7rzzzqCZ1Gw8zg0dOnRIJSUlQcc2MTFRI0eODDq2SUlJuuaaa5w2WVlZioyM1O7du502N9xwgzweT9C+HDx4UD6fr132pby8XBEREU0my1m0aJG6d++uL33pS3rkkUeCLmW6Ufe2bduUmpqqQYMG6e6779axY8eC6rH9WJeWlmr9+vWaOnVqk3VuHuvGn3Wh+twoLCwM2kagTag+48NmWFA3XOgUme2tvr5es2fP1le+8hVdeeWVzvLvfOc76tevn3r16qV33nlH8+bN08GDB/Xyyy9LkkpKSprdp8C6tjBy5Ejl5+dr0KBBOnLkiH7+85/ra1/7mvbv36+SkhJ5PJ4mH75paWlOPW7U3NDatWvl9/v1ve99z1lm43FuLPBzmquj4bFNTU0NWh8dHa2UlJSgNhkZGU22EViXnJzcJvUHVFZWat68ebr99tuDZjX64Q9/qOHDhyslJUU7d+7U/PnzdeTIES1evNiVuseNG6dbb71VGRkZ+uijj/Rf//VfGj9+vAoLCxUVFdUhjvUzzzyj+Ph43XrrrUHL3TzWzX3Whepzo6U2FRUVOn36tOLi4lpdt0RIh7Xc3Fzt379fb775ZtDyu+66y/l66NCh6tmzp8aMGaOPPvpIAwYMaO8yJUnjx493vh42bJhGjhypfv366cUXX7zo/wnaw9NPP63x48erV69ezjIbj3NnVFNTo//4j/+QMUYrVqwIWtdw3vhhw4bJ4/HoBz/4gRYuXOjKuM05OTnO10OHDtWwYcM0YMAAbdu2TWPGjGn3elpj1apVmjRpkrp06RK03M1j3dJnXUfA5e42ZPMUmTNnztS6deu0detW9e7d+6xtA+Obf/jhh5Kk9PT0ZvcpsK49JCUl6Ytf/KI+/PBDpaenq7q6Wn6/v0lNgXrcrPlvf/ubNm/erGnTpp21nY3HOfBzzvY7nJ6erqNHjwatr62tVVlZmevHPxDQf/vb37Rp06Zzzg08cuRI1dbW6pNPPnFqc/PfoH///urRo0fQ74Stx1qSduzYoYMHD57zd11qv2Pd0mddqD43WmqTkJAQkhMIQroN2ThFpjFGM2fO1CuvvKItW7Y0ubzUnOLiYklSz549JUmZmZnat29f0IdF4ANwyJAhbVJ3YydOnNBHH32knj17asSIEYqJiQk6zgcPHtThw4ed4+xmzatXr1ZqaqomTJhw1nY2HueMjAylp6cHHduKigrt3r076Nj6/X4VFRU5bbZs2aL6+nrnD4/MzEy98cYbqqmpCdqXQYMGtdnl10BAf/DBB9q8ebO6d+9+zu8pLi5WZGSkc0nZjbob+r//+z8dO3Ys6HfCxmMd8PTTT2vEiBG66qqrztm2rY/1uT7rQvW5kZmZGbSNQJuQfcaHpPsZWrRmzRoTGxtr8vPzzbvvvmvuuusuk5SUFNRbsD3dfffdJjEx0Wzbti3oUYhTp04ZY4z58MMPzYIFC8zbb79tDh06ZF599VXTv39/c8MNNzjbCDyWMHbsWFNcXGw2bNhgLrnkkjZ9nOlHP/qR2bZtmzl06JD585//bLKyskyPHj3M0aNHjTFnHqXo27ev2bJli3n77bdNZmamyczMdLVmY8705u/bt6+ZN29e0HKbjvPx48fN3r17zd69e40ks3jxYrN3716nF/SiRYtMUlKSefXVV80777xjbr755mYfwfrSl75kdu/ebd58801z2WWXBT0W5Pf7TVpamrnjjjvM/v37zZo1a4zX672ox4LOVnd1dbW56aabTO/evU1xcXHQ73qgV+7OnTvNkiVLTHFxsfnoo4/Mc889Zy655BIzefLkNqv7bDUfP37c/PjHPzaFhYXm0KFDZvPmzWb48OHmsssuM5WVlc42bDvWAeXl5cbr9ZoVK1Y0+X43jvW5PuuMCc3nRuARrHvvvde89957Ji8vj0ewOpply5aZvn37Go/HY6677jqza9cu12qR1Oxr9erVxhhjDh8+bG644QaTkpJiYmNjzcCBA829994b9PyuMcZ88sknZvz48SYuLs706NHD/OhHPzI1NTVtVvdtt91mevbsaTwej/nCF75gbrvtNvPhhx8660+fPm3+8z//0yQnJxuv12v+7d/+zRw5csTVmo0xZuPGjUaSOXjwYNBym47z1q1bm/2dmDJlijHmzGNYP/vZz0xaWpqJjY01Y8aMabI/x44dM7fffrvp1q2bSUhIMN///vfN8ePHg9r89a9/NV/96ldNbGys+cIXvmAWLVrUZnUfOnSoxd/1wHPqRUVFZuTIkSYxMdF06dLFDB482PzP//xPUCCGuu6z1Xzq1CkzduxYc8kll5iYmBjTr18/M3369CZ/0Nt2rAOeeuopExcXZ/x+f5Pvd+NYn+uzzpjQfW5s3brVXH311cbj8Zj+/fsH/YyLxVSVAABYinvSAABYipAGAMBShDQAAJYipAEAsBQhDQCApQhpAAAsRUgDAGApQhoAAEsR0gAAWIqQBgDAUoQ0AACW+v/XfUaUz6/OiAAAAABJRU5ErkJggg==",
705
+ "text/plain": [
706
+ "<Figure size 500x500 with 1 Axes>"
707
+ ]
708
+ },
709
+ "metadata": {},
710
+ "output_type": "display_data"
711
+ }
712
+ ],
713
+ "source": [
714
+ "sns.displot(tokds[\"train\"][\"length\"])"
715
+ ]
716
+ },
717
+ {
718
+ "cell_type": "code",
719
+ "execution_count": 18,
720
+ "id": "d11597f9-0441-440c-8214-b9d8b2df6f79",
721
+ "metadata": {},
722
+ "outputs": [
723
+ {
724
+ "data": {
725
+ "application/vnd.jupyter.widget-view+json": {
726
+ "model_id": "46d3909d41c649acb800d4bf00197951",
727
+ "version_major": 2,
728
+ "version_minor": 0
729
+ },
730
+ "text/plain": [
731
+ "Map (num_proc=16): 0%| | 0/160800 [00:00<?, ? examples/s]"
732
+ ]
733
+ },
734
+ "metadata": {},
735
+ "output_type": "display_data"
736
+ },
737
+ {
738
+ "data": {
739
+ "application/vnd.jupyter.widget-view+json": {
740
+ "model_id": "e886faa17c774740a2058a5dd8e0673d",
741
+ "version_major": 2,
742
+ "version_minor": 0
743
+ },
744
+ "text/plain": [
745
+ "Map (num_proc=16): 0%| | 0/8552 [00:00<?, ? examples/s]"
746
+ ]
747
+ },
748
+ "metadata": {},
749
+ "output_type": "display_data"
750
+ }
751
+ ],
752
+ "source": [
753
+ "tokds = ds.map(lambda x: tokenizer(x['prompt']), num_proc=16)"
754
+ ]
755
+ },
756
+ {
757
+ "cell_type": "code",
758
+ "execution_count": 19,
759
+ "id": "84290aac-1c4e-4d29-89bd-318cf2c9daf3",
760
+ "metadata": {},
761
+ "outputs": [
762
+ {
763
+ "data": {
764
+ "application/vnd.jupyter.widget-view+json": {
765
+ "model_id": "eb0406bdb9884fcc826630224f2d1a8a",
766
+ "version_major": 2,
767
+ "version_minor": 0
768
+ },
769
+ "text/plain": [
770
+ "Map: 0%| | 0/160800 [00:00<?, ? examples/s]"
771
+ ]
772
+ },
773
+ "metadata": {},
774
+ "output_type": "display_data"
775
+ },
776
+ {
777
+ "data": {
778
+ "application/vnd.jupyter.widget-view+json": {
779
+ "model_id": "50580c27e575445bb239783adee19f90",
780
+ "version_major": 2,
781
+ "version_minor": 0
782
+ },
783
+ "text/plain": [
784
+ "Map: 0%| | 0/8552 [00:00<?, ? examples/s]"
785
+ ]
786
+ },
787
+ "metadata": {},
788
+ "output_type": "display_data"
789
+ }
790
+ ],
791
+ "source": [
792
+ "tokds = tokds.map(lambda x: {\"prompt_length\": len(x['input_ids'])})"
793
+ ]
794
+ },
795
+ {
796
+ "cell_type": "code",
797
+ "execution_count": 22,
798
+ "id": "44d2f307-118b-493d-b626-97490e2bc4aa",
799
+ "metadata": {},
800
+ "outputs": [
801
+ {
802
+ "data": {
803
+ "application/vnd.jupyter.widget-view+json": {
804
+ "model_id": "588d062fd6c2489da6f57b287c66d6e6",
805
+ "version_major": 2,
806
+ "version_minor": 0
807
+ },
808
+ "text/plain": [
809
+ "Filter (num_proc=16): 0%| | 0/160800 [00:00<?, ? examples/s]"
810
+ ]
811
+ },
812
+ "metadata": {},
813
+ "output_type": "display_data"
814
+ },
815
+ {
816
+ "data": {
817
+ "application/vnd.jupyter.widget-view+json": {
818
+ "model_id": "bc9327dc6ed2467597a56b4655aca9a9",
819
+ "version_major": 2,
820
+ "version_minor": 0
821
+ },
822
+ "text/plain": [
823
+ "Filter (num_proc=16): 0%| | 0/8552 [00:00<?, ? examples/s]"
824
+ ]
825
+ },
826
+ "metadata": {},
827
+ "output_type": "display_data"
828
+ }
829
+ ],
830
+ "source": [
831
+ "filttokds = tokds.filter(lambda x: x[\"prompt_length\"] > 1024, num_proc=16)"
832
+ ]
833
+ },
834
+ {
835
+ "cell_type": "code",
836
+ "execution_count": 25,
837
+ "id": "2b6d57f7-40b7-4417-88bc-83c63b22f153",
838
+ "metadata": {},
839
+ "outputs": [
840
+ {
841
+ "data": {
842
+ "text/plain": [
843
+ "31"
844
+ ]
845
+ },
846
+ "execution_count": 25,
847
+ "metadata": {},
848
+ "output_type": "execute_result"
849
+ }
850
+ ],
851
+ "source": [
852
+ "len(filttokds[\"test\"])"
853
+ ]
854
+ },
855
+ {
856
+ "cell_type": "code",
857
+ "execution_count": null,
858
+ "id": "78d391fa-9a57-446b-9007-fe64ef8fc735",
859
+ "metadata": {},
860
+ "outputs": [],
861
+ "source": [
862
+ "tokds = ds.map(lambda x: tokenizer(x['prompt']), num_proc=16)"
863
+ ]
864
+ },
865
+ {
866
+ "cell_type": "code",
867
+ "execution_count": 31,
868
+ "id": "176dbd05-67c5-45a6-b891-1237deb7d6c9",
869
+ "metadata": {},
870
+ "outputs": [],
871
+ "source": [
872
+ "ds = load_dataset(\"mnoukhov/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144\")"
873
+ ]
874
+ },
875
+ {
876
+ "cell_type": "code",
877
+ "execution_count": 32,
878
+ "id": "9eab3eaa-55ed-4279-96d6-3c189266ba86",
879
+ "metadata": {},
880
+ "outputs": [
881
+ {
882
+ "data": {
883
+ "application/vnd.jupyter.widget-view+json": {
884
+ "model_id": "911df020ac294d9ca2b360e1a0be3f93",
885
+ "version_major": 2,
886
+ "version_minor": 0
887
+ },
888
+ "text/plain": [
889
+ "Filter: 0%| | 0/116722 [00:00<?, ? examples/s]"
890
+ ]
891
+ },
892
+ "metadata": {},
893
+ "output_type": "display_data"
894
+ }
895
+ ],
896
+ "source": [
897
+ "ds[\"train\"] = ds[\"train\"].filter(lambda x: x[\"has_comparison\"] == True)"
898
+ ]
899
+ },
900
+ {
901
+ "cell_type": "code",
902
+ "execution_count": 34,
903
+ "id": "9cc44838-af6d-4e3d-be4e-49436900f469",
904
+ "metadata": {},
905
+ "outputs": [
906
+ {
907
+ "data": {
908
+ "application/vnd.jupyter.widget-view+json": {
909
+ "model_id": "4041ceb7527a4e638b138fc12897c35e",
910
+ "version_major": 2,
911
+ "version_minor": 0
912
+ },
913
+ "text/plain": [
914
+ "Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]"
915
+ ]
916
+ },
917
+ "metadata": {},
918
+ "output_type": "display_data"
919
+ },
920
+ {
921
+ "data": {
922
+ "application/vnd.jupyter.widget-view+json": {
923
+ "model_id": "51b19ac739f14b31a5820a9f077cf177",
924
+ "version_major": 2,
925
+ "version_minor": 0
926
+ },
927
+ "text/plain": [
928
+ "Creating parquet from Arrow format: 0%| | 0/10 [00:00<?, ?ba/s]"
929
+ ]
930
+ },
931
+ "metadata": {},
932
+ "output_type": "display_data"
933
+ },
934
+ {
935
+ "data": {
936
+ "application/vnd.jupyter.widget-view+json": {
937
+ "model_id": "d7e0c946845940bb8de920287d982cd6",
938
+ "version_major": 2,
939
+ "version_minor": 0
940
+ },
941
+ "text/plain": [
942
+ "Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]"
943
+ ]
944
+ },
945
+ "metadata": {},
946
+ "output_type": "display_data"
947
+ },
948
+ {
949
+ "data": {
950
+ "application/vnd.jupyter.widget-view+json": {
951
+ "model_id": "acd8997214f448ca8cc665b4fd3b1af6",
952
+ "version_major": 2,
953
+ "version_minor": 0
954
+ },
955
+ "text/plain": [
956
+ "Creating parquet from Arrow format: 0%| | 0/7 [00:00<?, ?ba/s]"
957
+ ]
958
+ },
959
+ "metadata": {},
960
+ "output_type": "display_data"
961
+ },
962
+ {
963
+ "data": {
964
+ "application/vnd.jupyter.widget-view+json": {
965
+ "model_id": "cae753400174470f8c97d61fbb557202",
966
+ "version_major": 2,
967
+ "version_minor": 0
968
+ },
969
+ "text/plain": [
970
+ "Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]"
971
+ ]
972
+ },
973
+ "metadata": {},
974
+ "output_type": "display_data"
975
+ },
976
+ {
977
+ "data": {
978
+ "application/vnd.jupyter.widget-view+json": {
979
+ "model_id": "2e6161edbdc24eca8fc89781a3b511e8",
980
+ "version_major": 2,
981
+ "version_minor": 0
982
+ },
983
+ "text/plain": [
984
+ "Creating parquet from Arrow format: 0%| | 0/7 [00:00<?, ?ba/s]"
985
+ ]
986
+ },
987
+ "metadata": {},
988
+ "output_type": "display_data"
989
+ },
990
+ {
991
+ "name": "stderr",
992
+ "output_type": "stream",
993
+ "text": [
994
+ "/home/toolkit/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:983: UserWarning: Not enough free disk space to download the file. The expected file size is: 0.00 MB. The target location /home/toolkit/huggingface/hub only has 0.00 MB free disk space.\n",
995
+ " warnings.warn(\n",
996
+ "/home/toolkit/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py:983: UserWarning: Not enough free disk space to download the file. The expected file size is: 0.00 MB. The target location /home/toolkit/huggingface/hub/datasets--mnoukhov--summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144_labelled/blobs only has 0.00 MB free disk space.\n",
997
+ " warnings.warn(\n"
998
+ ]
999
+ },
1000
+ {
1001
+ "data": {
1002
+ "application/vnd.jupyter.widget-view+json": {
1003
+ "model_id": "d6504b2578bf42a3810c24e04da166e1",
1004
+ "version_major": 2,
1005
+ "version_minor": 0
1006
+ },
1007
+ "text/plain": [
1008
+ "README.md: 0%| | 0.00/1.17k [00:00<?, ?B/s]"
1009
+ ]
1010
+ },
1011
+ "metadata": {},
1012
+ "output_type": "display_data"
1013
+ },
1014
+ {
1015
+ "data": {
1016
+ "text/plain": [
1017
+ "CommitInfo(commit_url='https://huggingface.co/datasets/mnoukhov/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144_labelled/commit/a873a0b902f97283fb440254b724da8257439c33', commit_message='Upload dataset', commit_description='', oid='a873a0b902f97283fb440254b724da8257439c33', pr_url=None, pr_revision=None, pr_num=None)"
1018
+ ]
1019
+ },
1020
+ "execution_count": 34,
1021
+ "metadata": {},
1022
+ "output_type": "execute_result"
1023
+ }
1024
+ ],
1025
+ "source": [
1026
+ "ds.push_to_hub(\"mnoukhov/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144_labelled\")"
1027
+ ]
1028
+ },
1029
+ {
1030
+ "cell_type": "code",
1031
+ "execution_count": 33,
1032
+ "id": "557d07cf-781c-4c9a-8499-2bd4d076e98d",
1033
+ "metadata": {},
1034
+ "outputs": [
1035
+ {
1036
+ "data": {
1037
+ "text/plain": [
1038
+ "DatasetDict({\n",
1039
+ " train: Dataset({\n",
1040
+ " features: ['id', 'subreddit', 'title', 'post', 'summary', 'query_token', 'query', 'reference_response', 'reference_response_token', 'reference_response_token_len', 'query_reference_response', 'query_reference_response_token', 'query_reference_response_token_response_label', 'query_reference_response_token_len', 'has_comparison'],\n",
1041
+ " num_rows: 9504\n",
1042
+ " })\n",
1043
+ " validation: Dataset({\n",
1044
+ " features: ['id', 'subreddit', 'title', 'post', 'summary', 'query_token', 'query', 'reference_response', 'reference_response_token', 'reference_response_token_len', 'query_reference_response', 'query_reference_response_token', 'query_reference_response_token_response_label', 'query_reference_response_token_len', 'has_comparison'],\n",
1045
+ " num_rows: 6447\n",
1046
+ " })\n",
1047
+ " test: Dataset({\n",
1048
+ " features: ['id', 'subreddit', 'title', 'post', 'summary', 'query_token', 'query', 'reference_response', 'reference_response_token', 'reference_response_token_len', 'query_reference_response', 'query_reference_response_token', 'query_reference_response_token_response_label', 'query_reference_response_token_len', 'has_comparison'],\n",
1049
+ " num_rows: 6553\n",
1050
+ " })\n",
1051
+ "})"
1052
+ ]
1053
+ },
1054
+ "execution_count": 33,
1055
+ "metadata": {},
1056
+ "output_type": "execute_result"
1057
+ }
1058
+ ],
1059
+ "source": [
1060
+ "ds"
1061
+ ]
1062
+ },
1063
+ {
1064
+ "cell_type": "code",
1065
+ "execution_count": null,
1066
+ "id": "a7ad1cea-5ac6-4d57-9aff-6783ea61fb13",
1067
+ "metadata": {},
1068
+ "outputs": [],
1069
+ "source": []
1070
+ }
1071
+ ],
1072
+ "metadata": {
1073
+ "kernelspec": {
1074
+ "display_name": "Python 3 (ipykernel)",
1075
+ "language": "python",
1076
+ "name": "python3"
1077
+ },
1078
+ "language_info": {
1079
+ "codemirror_mode": {
1080
+ "name": "ipython",
1081
+ "version": 3
1082
+ },
1083
+ "file_extension": ".py",
1084
+ "mimetype": "text/x-python",
1085
+ "name": "python",
1086
+ "nbconvert_exporter": "python",
1087
+ "pygments_lexer": "ipython3",
1088
+ "version": "3.11.2"
1089
+ }
1090
+ },
1091
+ "nbformat": 4,
1092
+ "nbformat_minor": 5
1093
+ }
code/__pycache__/callbacks.cpython-311.pyc ADDED
Binary file (18.8 kB). View file
 
code/__pycache__/generate_and_eval.cpython-311.pyc ADDED
Binary file (15.5 kB). View file
 
code/__pycache__/generate_and_llm_judge.cpython-311.pyc ADDED
Binary file (17.2 kB). View file
 
code/__pycache__/generate_vllm.cpython-311.pyc ADDED
Binary file (13.5 kB). View file
 
code/__pycache__/gpt_reward_modeling.cpython-311.pyc ADDED
Binary file (24.1 kB). View file
 
code/__pycache__/scalar_rm_model.cpython-311.pyc ADDED
Binary file (13.7 kB). View file
 
code/callbacks.py ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ from dataclasses import dataclass
3
+ from typing import Any, Dict, List, Optional, Tuple, Union
4
+
5
+ import accelerate
6
+ import torch
7
+ from datasets import Dataset
8
+ from torch.utils.data import DataLoader
9
+ from tqdm.auto import tqdm
10
+ from transformers import PreTrainedTokenizerBase, TrainerCallback
11
+
12
+ import wandb
13
+ from trl.trainer.utils import pad_to_length
14
+
15
+
16
+ @dataclass
17
+ class PromptAndTextCollator:
18
+ tokenizer: PreTrainedTokenizerBase
19
+ padding: Union[bool, str] = True
20
+ max_prompt_length: Optional[int] = None
21
+ max_length: Optional[int] = None
22
+ prompt_field: str = "prompt"
23
+ target_field: str = "label"
24
+ return_tensors: str = "pt"
25
+
26
+ def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, Any]:
27
+ prompts = [feat[self.prompt_field] for feat in features]
28
+ texts = [feat[self.prompt_field] + " " + feat[self.target_field] for feat in features]
29
+
30
+ original_side = self.tokenizer.padding_side
31
+ self.tokenizer.padding_side = "left"
32
+
33
+ tokenized_batch = self.tokenizer(
34
+ prompts,
35
+ truncation=True,
36
+ padding=True,
37
+ max_length=self.max_prompt_length,
38
+ return_tensors=self.return_tensors,
39
+ )
40
+ tokenized_batch["prompt"] = prompts
41
+
42
+ self.tokenizer.padding_side = original_side
43
+
44
+ tokenized_texts = self.tokenizer(
45
+ texts,
46
+ truncation=True,
47
+ padding=True,
48
+ max_length=self.max_length,
49
+ return_tensors=self.return_tensors,
50
+ )
51
+
52
+ text_labels = tokenized_texts["input_ids"].clone()
53
+ if self.tokenizer.pad_token_id is not None:
54
+ text_labels[text_labels == self.tokenizer.pad_token_id] = -100
55
+
56
+ tokenized_batch.update(
57
+ {
58
+ "text_input_ids": tokenized_texts["input_ids"],
59
+ "text_attention_mask": tokenized_texts["attention_mask"],
60
+ "text_labels": text_labels,
61
+ }
62
+ )
63
+
64
+ return tokenized_batch
65
+
66
+
67
+ class GoldModelRewardCallback(TrainerCallback):
68
+ def __init__(
69
+ self,
70
+ args,
71
+ gold_model,
72
+ gold_eval_dataset,
73
+ tokenizer,
74
+ accelerator,
75
+ max_length,
76
+ max_prompt_length,
77
+ prompt_field,
78
+ target_field,
79
+ gold_load_and_unload=False,
80
+ log_n_samples_during_eval=0,
81
+ generation_config=None,
82
+ ):
83
+ self.max_length = max_length
84
+ self.log_n_samples_during_eval = log_n_samples_during_eval
85
+ self.generation_config = generation_config
86
+
87
+ # data_collator = DataCollatorWithPadding(tokenizer)
88
+ data_collator = PromptAndTextCollator(
89
+ tokenizer,
90
+ max_prompt_length=max_prompt_length,
91
+ max_length=max_length,
92
+ prompt_field=prompt_field,
93
+ target_field=target_field,
94
+ )
95
+ dataloader_params = {
96
+ "batch_size": args.eval_batch_size,
97
+ "collate_fn": data_collator,
98
+ "num_workers": args.dataloader_num_workers,
99
+ "pin_memory": args.dataloader_pin_memory,
100
+ }
101
+ dataloader = DataLoader(gold_eval_dataset, **dataloader_params)
102
+ self.dataloader = accelerator.prepare(dataloader)
103
+ self.accelerator = accelerator
104
+ self.completed_step = -1
105
+ self.gold_model = gold_model
106
+ self.gold_load_and_unload = gold_load_and_unload
107
+ # keep model on gpu the whole time
108
+ if not self.gold_load_and_unload:
109
+ self.gold_model = self.accelerator.prepare(self.gold_model)
110
+
111
+ def on_evaluate(self, args, state, control, model, tokenizer, metrics, **kwargs):
112
+ samples_to_log = []
113
+ gold_reward_sum = 0.0
114
+ nll_sum = 0.0
115
+ total_samples = 0
116
+ sample_length_sum = 0.0
117
+
118
+ # load model onto gpu for inference then unload
119
+ if self.gold_load_and_unload:
120
+ self.gold_model = self.accelerator.prepare(self.gold_model)
121
+
122
+ if state.global_step == self.completed_step:
123
+ return
124
+
125
+ for inputs in tqdm(
126
+ self.dataloader, desc="Gold Eval", dynamic_ncols=True, disable=not state.is_local_process_zero
127
+ ):
128
+ # get loss over true continuation i.e. ppl on dataset
129
+ with torch.no_grad():
130
+ nll_loss = model(
131
+ input_ids=inputs["text_input_ids"],
132
+ attention_mask=inputs["text_attention_mask"],
133
+ labels=inputs["text_labels"],
134
+ ).loss
135
+
136
+ nll_loss = self.accelerator.gather_for_metrics(nll_loss)
137
+
138
+ # generate from model
139
+ policy_output_decoded, ref_output_decoded, policy_output_ids = self.get_batch_samples(
140
+ model,
141
+ tokenizer,
142
+ inputs["input_ids"],
143
+ inputs["attention_mask"],
144
+ return_ids=True,
145
+ )
146
+
147
+ # gold reward
148
+ policy_output_attention_mask = (policy_output_ids != tokenizer.pad_token_id).to(torch.int64)
149
+ with torch.no_grad():
150
+ gold_rewards = self.gold_model(
151
+ input_ids=policy_output_ids, attention_mask=policy_output_attention_mask
152
+ )[0]
153
+
154
+ gold_rewards = self.accelerator.gather_for_metrics(gold_rewards)
155
+
156
+ if state.is_local_process_zero:
157
+ nll_sum += nll_loss.sum().item()
158
+ gold_reward_sum += gold_rewards.sum().item()
159
+ total_samples += gold_rewards.size(0)
160
+ sample_length_sum += policy_output_attention_mask.sum().item()
161
+
162
+ # Sample and save to game log if requested (for one batch to save time)
163
+ for i, (prompt, pol, ref) in enumerate(
164
+ zip(inputs["prompt"], policy_output_decoded, ref_output_decoded)
165
+ ):
166
+ if len(samples_to_log) < self.log_n_samples_during_eval:
167
+ samples_to_log.append([prompt, pol[len(prompt) :], ref[len(prompt) :]])
168
+ else:
169
+ break
170
+
171
+ if self.gold_load_and_unload:
172
+ self.gold_model = self.gold_model.to("cpu")
173
+ torch.cuda.empty_cache()
174
+
175
+ if state.is_world_process_zero:
176
+ gold_log = {
177
+ "eval/gold_rewards_mean": gold_reward_sum / total_samples,
178
+ "eval/perplexity": math.exp(nll_sum / total_samples),
179
+ "eval/gold_sample_length": sample_length_sum / total_samples,
180
+ }
181
+ for key, value in gold_log.items():
182
+ print(f"{key}: {value}")
183
+ if state.epoch:
184
+ gold_log["epoch"] = round(state.epoch, 2)
185
+ gold_log["step"] = state.global_step
186
+ if samples_to_log:
187
+ gold_log["gold_log"] = (
188
+ wandb.Table(
189
+ columns=["Prompt", "Policy", "Ref Model"],
190
+ rows=samples_to_log,
191
+ ),
192
+ )
193
+ wandb.log(gold_log)
194
+
195
+ self.completed_step = state.global_step
196
+
197
+ def get_batch_samples(self, model, tokenizer, input_ids, attention_mask, return_ids=False) -> Tuple[str, str]:
198
+ """Reduce inputs to unseen prompts, and maximum batch size if necessary
199
+ Generate samples from the model and reference model for the given batch of inputs."""
200
+ policy_output = model.generate(
201
+ input_ids=input_ids,
202
+ attention_mask=attention_mask,
203
+ generation_config=self.generation_config,
204
+ )
205
+
206
+ # if self.ref_model is None:
207
+ with self.accelerator.unwrap_model(model).disable_adapter():
208
+ reference_output = model.generate(
209
+ input_ids=input_ids,
210
+ attention_mask=attention_mask,
211
+ generation_config=self.generation_config,
212
+ )
213
+ # else:
214
+ # reference_output = self.ref_model.generate(
215
+ # **inputs,
216
+ # generation_config=self.generation_config,
217
+ # )
218
+
219
+ policy_output = pad_to_length(policy_output, self.max_length, tokenizer.pad_token_id)
220
+ policy_output_decoded = tokenizer.batch_decode(policy_output, skip_special_tokens=True)
221
+
222
+ reference_output = pad_to_length(reference_output, self.max_length, tokenizer.pad_token_id)
223
+ reference_output_decoded = tokenizer.batch_decode(reference_output, skip_special_tokens=True)
224
+
225
+ if return_ids:
226
+ return policy_output_decoded, reference_output_decoded, policy_output
227
+ else:
228
+ return policy_output_decoded, reference_output_decoded
229
+
230
+
231
+ class PerplexityCallback(TrainerCallback):
232
+ """Like GoldModelReward in that you generate and get ppl on dataset
233
+
234
+ But you don't run eval with the gold model
235
+ Useful when gold model is very larger and you want to run inference later
236
+ """
237
+
238
+ def __init__(
239
+ self,
240
+ args,
241
+ dataset,
242
+ tokenizer,
243
+ accelerator,
244
+ max_length,
245
+ max_prompt_length,
246
+ prompt_field,
247
+ target_field,
248
+ hub_model_id=None,
249
+ **kwargs,
250
+ ):
251
+ self.max_length = max_length
252
+
253
+ # data_collator = DataCollatorWithPadding(tokenizer)
254
+ data_collator = PromptAndTextCollator(
255
+ tokenizer,
256
+ max_prompt_length=max_prompt_length,
257
+ max_length=max_length,
258
+ prompt_field=prompt_field,
259
+ target_field=target_field,
260
+ )
261
+ dataloader_params = {
262
+ "batch_size": args.eval_batch_size,
263
+ "collate_fn": data_collator,
264
+ "num_workers": args.dataloader_num_workers,
265
+ "pin_memory": args.dataloader_pin_memory,
266
+ }
267
+ dataloader = DataLoader(dataset, **dataloader_params)
268
+ self.dataloader = accelerator.prepare(dataloader)
269
+ self.accelerator = accelerator
270
+ self.completed_step = -1
271
+ self.hub_model_id = hub_model_id
272
+
273
+ def on_evaluate(self, args, state, control, model, tokenizer, metrics, **kwargs):
274
+ nll_sum = 0.0
275
+ total_samples = 0
276
+
277
+ if state.global_step == self.completed_step:
278
+ return
279
+
280
+ for inputs in tqdm(
281
+ self.dataloader, desc="PPL and Gen Eval", dynamic_ncols=True, disable=not state.is_local_process_zero
282
+ ):
283
+ # get loss over true continuation i.e. ppl on dataset
284
+ with torch.no_grad():
285
+ nll_loss = model(
286
+ input_ids=inputs["text_input_ids"],
287
+ attention_mask=inputs["text_attention_mask"],
288
+ labels=inputs["text_labels"],
289
+ ).loss
290
+
291
+ nll_loss = self.accelerator.gather_for_metrics(nll_loss)
292
+
293
+ if state.is_local_process_zero:
294
+ total_samples += nll_loss.size(0)
295
+ nll_sum += nll_loss.sum().item()
296
+
297
+ if state.is_world_process_zero:
298
+ # gather_for_metrics doesn't work for list of strings?
299
+ gold_log = {
300
+ "eval/perplexity": math.exp(nll_sum / total_samples),
301
+ }
302
+ for key, value in gold_log.items():
303
+ print(f"{key}: {value}")
304
+ if state.epoch:
305
+ gold_log["epoch"] = round(state.epoch, 2)
306
+ gold_log["step"] = state.global_step
307
+
308
+ wandb.log(gold_log)
309
+
310
+ if self.hub_model_id is not None:
311
+ model.push_to_hub(self.hub_model_id, revision=f"step{state.global_step}")
312
+
313
+ self.completed_step = state.global_step
314
+
315
+
316
+ class PerplexityGenCallback(TrainerCallback):
317
+ """Like GoldModelReward in that you generate and get ppl on dataset
318
+
319
+ But you don't run eval with the gold model
320
+ Useful when gold model is very larger and you want to run inference later
321
+ """
322
+
323
+ def __init__(
324
+ self,
325
+ args,
326
+ dataset,
327
+ tokenizer,
328
+ accelerator,
329
+ max_length,
330
+ max_prompt_length,
331
+ prompt_field,
332
+ target_field,
333
+ log_n_samples_during_eval=0,
334
+ generation_config=None,
335
+ hub_model_id="tmp",
336
+ ):
337
+ self.max_length = max_length
338
+ self.log_n_samples_during_eval = log_n_samples_during_eval
339
+ self.generation_config = generation_config
340
+
341
+ # data_collator = DataCollatorWithPadding(tokenizer)
342
+ data_collator = PromptAndTextCollator(
343
+ tokenizer,
344
+ max_prompt_length=max_prompt_length,
345
+ max_length=max_length,
346
+ prompt_field=prompt_field,
347
+ target_field=target_field,
348
+ )
349
+ dataloader_params = {
350
+ "batch_size": args.eval_batch_size,
351
+ "collate_fn": data_collator,
352
+ "num_workers": args.dataloader_num_workers,
353
+ "pin_memory": args.dataloader_pin_memory,
354
+ }
355
+ dataloader = DataLoader(dataset, **dataloader_params)
356
+ self.dataloader = accelerator.prepare(dataloader)
357
+ self.accelerator = accelerator
358
+ self.completed_step = -1
359
+ self.hub_name = hub_model_id
360
+
361
+ def on_evaluate(self, args, state, control, model, tokenizer, metrics, **kwargs):
362
+ all_generations = []
363
+ all_prompts = []
364
+ nll_sum = 0.0
365
+ total_samples = 0
366
+ sample_length_sum = 0.0
367
+
368
+ if state.global_step == self.completed_step:
369
+ return
370
+
371
+ for inputs in tqdm(
372
+ self.dataloader, desc="PPL and Gen Eval", dynamic_ncols=True, disable=not state.is_local_process_zero
373
+ ):
374
+ # get loss over true continuation i.e. ppl on dataset
375
+ with torch.no_grad():
376
+ nll_loss = model(
377
+ input_ids=inputs["text_input_ids"],
378
+ attention_mask=inputs["text_attention_mask"],
379
+ labels=inputs["text_labels"],
380
+ ).loss
381
+
382
+ # generate from model
383
+ policy_output_ids = model.generate(
384
+ input_ids=inputs["input_ids"],
385
+ attention_mask=inputs["attention_mask"],
386
+ generation_config=self.generation_config,
387
+ )
388
+ policy_output_ids = pad_to_length(policy_output_ids, self.max_length, tokenizer.pad_token_id)
389
+
390
+ policy_output_attention_mask = (policy_output_ids != tokenizer.pad_token_id).to(torch.int64)
391
+ generation_sizes = policy_output_attention_mask.sum(dim=1)
392
+
393
+ (nll_loss, generation_ids, generation_sizes) = self.accelerator.gather_for_metrics(
394
+ (nll_loss, policy_output_ids, generation_sizes)
395
+ )
396
+
397
+ prompts = accelerate.utils.gather_object(inputs["prompt"])
398
+
399
+ if state.is_local_process_zero:
400
+ nll_sum += nll_loss.sum().item()
401
+ total_samples += generation_sizes.size(0)
402
+ sample_length_sum += generation_sizes.sum().item()
403
+ generation_strs = tokenizer.batch_decode(generation_ids, skip_special_tokens=True)
404
+ all_prompts.extend(prompts)
405
+ all_generations.extend(generation_strs)
406
+
407
+ if state.is_world_process_zero:
408
+ # gather_for_metrics doesn't work for list of strings?
409
+ gold_log = {
410
+ "eval/perplexity": math.exp(nll_sum / total_samples),
411
+ "eval/gold_sample_length": sample_length_sum / total_samples,
412
+ }
413
+ for key, value in gold_log.items():
414
+ print(f"{key}: {value}")
415
+ if state.epoch:
416
+ gold_log["epoch"] = round(state.epoch, 2)
417
+ gold_log["step"] = state.global_step
418
+
419
+ if self.log_n_samples_during_eval:
420
+ samples_to_log = [
421
+ [prompt, generation[len(prompt) :]]
422
+ for prompt, generation in zip(
423
+ all_prompts[: self.log_n_samples_during_eval],
424
+ all_generations[: self.log_n_samples_during_eval],
425
+ )
426
+ ]
427
+ gold_log["gold_log"] = (
428
+ wandb.Table(
429
+ columns=["Prompt", "Policy"],
430
+ rows=samples_to_log,
431
+ ),
432
+ )
433
+
434
+ wandb.log(gold_log)
435
+ generation_ds = Dataset.from_dict({"generations": all_generations})
436
+ generation_ds.push_to_hub(f"{self.hub_name}_generations", revision=str(state.global_step))
437
+
438
+ self.completed_step = state.global_step
439
+
440
+ def get_batch_samples(self, model, tokenizer, input_ids, attention_mask, return_ids=False) -> Tuple[str, str]:
441
+ """Reduce inputs to unseen prompts, and maximum batch size if necessary
442
+ Generate samples from the model and reference model for the given batch of inputs."""
443
+ policy_output = model.generate(
444
+ input_ids=input_ids,
445
+ attention_mask=attention_mask,
446
+ generation_config=self.generation_config,
447
+ )
448
+
449
+ # if self.ref_model is None:
450
+ with self.accelerator.unwrap_model(model).disable_adapter():
451
+ reference_output = model.generate(
452
+ input_ids=input_ids,
453
+ attention_mask=attention_mask,
454
+ generation_config=self.generation_config,
455
+ )
456
+ # else:
457
+ # reference_output = self.ref_model.generate(
458
+ # **inputs,
459
+ # generation_config=self.generation_config,
460
+ # )
461
+
462
+ policy_output = pad_to_length(policy_output, self.max_length, tokenizer.pad_token_id)
463
+ policy_output_decoded = tokenizer.batch_decode(policy_output, skip_special_tokens=True)
464
+
465
+ reference_output = pad_to_length(reference_output, self.max_length, tokenizer.pad_token_id)
466
+ reference_output_decoded = tokenizer.batch_decode(reference_output, skip_special_tokens=True)
467
+
468
+ if return_ids:
469
+ return policy_output_decoded, reference_output_decoded, policy_output
470
+ else:
471
+ return policy_output_decoded, reference_output_decoded
code/configs/accelerate_zero2_4gpu.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ compute_environment: LOCAL_MACHINE
2
+ debug: false
3
+ deepspeed_config:
4
+ offload_optimizer_device: none
5
+ offload_param_device: none
6
+ zero3_init_flag: false
7
+ zero_stage: 2
8
+ distributed_type: DEEPSPEED
9
+ downcast_bf16: 'no'
10
+ machine_rank: 0
11
+ main_training_function: main
12
+ mixed_precision: 'no'
13
+ num_machines: 1
14
+ num_processes: 4
15
+ rdzv_backend: static
16
+ same_network: true
17
+ tpu_env: []
18
+ tpu_use_cluster: false
19
+ tpu_use_sudo: false
20
+ use_cpu: false
code/configs/create_rlhf_410m.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_summarize_tldr_rbaseline
2
+ train_split: train
3
+ eval_split: valid[:2000]
4
+ ###
5
+ model_name: mnoukhov/pythia410m-tldr-sft-rm-adapter
6
+ new_column_name: reward_baseline
7
+ dataset_name: CarperAI/openai_summarize_tldr
8
+ load_in_8bit: False
9
+ fp16: True
10
+ batch_size: 32
11
+ max_length: 560
code/configs/create_rlhf_410m_1b.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_summarize_tldr_grbaseline
2
+ train_split: train
3
+ eval_split: valid[:2000]
4
+ ###
5
+ model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
6
+ new_column_name: gold_reward_baseline
7
+ dataset_name: mnoukhov/openai_summarize_tldr_rbaseline
8
+ load_in_8bit: False
9
+ fp16: True
10
+ batch_size: 32
11
+ max_length: 560
code/configs/dpo1b2_10k_pythia410m_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_10k
5
+ beta: 0.5
6
+ num_train_epochs: 5
7
+ eval_steps: 750
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20k-reuse_pythia410m_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_comparisons_20k_regen_and_relabelled
5
+ beta: 0.5
6
+ max_steps: 10000
7
+ eval_steps: 1000
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20k_pythia410m-iter1_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_410m_dpo1
5
+ beta: 0.5
6
+ max_steps: 10000
7
+ eval_steps: 1000
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20k_pythia410m_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_410m_dpo1
5
+ beta: 0.5
6
+ max_steps: 10000
7
+ eval_steps: 1000
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20kgold_pythia410m-iter1_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_1b
5
+ beta: 0.5
6
+ max_steps: 10000
7
+ eval_steps: 1000
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20kgold_pythia410m_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_1b
5
+ beta: 0.5
6
+ max_steps: 10000
7
+ eval_steps: 1000
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ learning_rate: 1e-5
12
+ use_peft: True
13
+ lora_all_linear: True
14
+ lora_r: 8
15
+ lora_alpha: 32
16
+ lora_dropout: 0.05
17
+ gradient_accumulation_steps: 4
18
+ per_device_train_batch_size: 4
19
+ warmup_steps: 150
code/configs/dpo1b2_20kgoldonly_pythia410m-iter1_fp16.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_1b
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: False
11
+ fp16: True
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 4
20
+ warmup_steps: 150
code/configs/dpo1b2_20kgoldonly_pythia410m_fp16.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_1b
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: False
11
+ fp16: True
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 4
20
+ warmup_steps: 150
code/configs/dpo1b2_20konly-reuse_pythia410m_fp16.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_comparisons_20k_regen_and_relabelled
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: False
11
+ fp16: True
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 4
20
+ warmup_steps: 150
code/configs/dpo1b2_20konly_pythia410m-iter1_fp16.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_410m_dpo1
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: False
11
+ fp16: True
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 4
20
+ warmup_steps: 150
code/configs/dpo1b2_20konly_pythia410m_fp16.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_410m_dpo1
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: False
11
+ fp16: True
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 4
20
+ warmup_steps: 150
code/configs/dpo1b2_a100.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ train_split: train[:1]
4
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
5
+ pseudo_dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_410m_dpo1
6
+ beta: 0.5
7
+ max_steps: 10000
8
+ eval_steps: 1000
9
+ load_in_8bit: False
10
+ bf16: True
11
+ fp16: False
12
+ learning_rate: 1e-5
13
+ use_peft: True
14
+ lora_all_linear: True
15
+ lora_r: 8
16
+ lora_alpha: 32
17
+ lora_dropout: 0.05
18
+ gradient_accumulation_steps: 4
19
+ per_device_train_batch_size: 16
20
+ warmup_steps: 150
code/configs/dpo1b_eval_generated_pythia410m_fp16.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
2
+ dataset_name: mnoukhov/openai_comparisons_20k_regen_and_relabelled
3
+ eval_split: train
4
+ use_peft: False
5
+ beta: 0.5
6
+ load_in_8bit: False
7
+ bf16: False
8
+ fp16: True
9
+ per_device_eval_batch_size: 8
10
+ warmup_steps: 150
11
+ mode: eval
code/configs/dpo1b_eval_pythia410m_fp16.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ beta: 0.5
5
+ num_train_epochs: 5
6
+ eval_steps: 750
7
+ load_in_8bit: False
8
+ bf16: False
9
+ fp16: True
10
+ learning_rate: 1e-5
11
+ use_peft: True
12
+ lora_all_linear: True
13
+ lora_r: 8
14
+ lora_alpha: 32
15
+ lora_dropout: 0.05
16
+ gradient_accumulation_steps: 4
17
+ per_device_train_batch_size: 4
18
+ warmup_steps: 150
19
+ just_eval: True
code/configs/dpo1b_eval_regenerated_pythia410m_fp16.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
2
+ dataset_name: arianhosseini/openai_comparisons_20k_regen_and_relabelled
3
+ eval_split: train
4
+ use_peft: False
5
+ beta: 0.5
6
+ load_in_8bit: False
7
+ bf16: False
8
+ fp16: True
9
+ per_device_eval_batch_size: 8
10
+ warmup_steps: 150
11
+ mode: eval
code/configs/dpo1b_predict_generated_pythia410m-dpo1.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_summarize_generated_20k_relabel_1b_predict_410m-dpo1
2
+ mode: predict
3
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
4
+ dataset_name: mnoukhov/openai_summarize_generated_20k_relabel_1b_margin
5
+ eval_split: train
6
+ use_peft: False
7
+ beta: 0.5
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ per_device_eval_batch_size: 8
code/configs/dpo1b_pythia410m_costa_fp16.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ # gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ max_prompt_length: 512
5
+ max_target_length: 131
6
+ max_length: 640
7
+ lr_scheduler_type: cosine
8
+ ## hub stuff
9
+ push_to_hub: True
10
+ push_to_hub_organization: mnoukhov
11
+ ## training stuff
12
+ gold_eval: ppl
13
+ eval_steps: 0.2
14
+ save_steps: 0.2
15
+ beta: 0.05
16
+ max_steps: -1
17
+ num_train_epochs: 1
18
+ load_in_8bit: False
19
+ bf16: False
20
+ fp16: True
21
+ learning_rate: 1e-5
22
+ use_peft: True
23
+ lora_r: 16
24
+ lora_alpha: 32
25
+ lora_dropout: 0.
26
+ gradient_accumulation_steps: 4
27
+ per_device_train_batch_size: 4
28
+ per_device_eval_batch_size: 4
code/configs/dpo1b_pythia410m_fp16.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt_relabel1b
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ max_prompt_length: 512
5
+ max_target_length: 131
6
+ max_length: 640
7
+ lr_scheduler_type: cosine
8
+ ## hub stuff
9
+ push_to_hub: True
10
+ push_to_hub_organization: mnoukhov
11
+ ## training stuff
12
+ gold_eval: full
13
+ eval_steps: 0.2
14
+ save_steps: 0.2
15
+ beta: 0.05
16
+ max_steps: -1
17
+ num_train_epochs: 1
18
+ load_in_8bit: False
19
+ bf16: False
20
+ fp16: True
21
+ learning_rate: 1e-5
22
+ use_peft: True
23
+ lora_r: 16
24
+ lora_alpha: 32
25
+ lora_dropout: 0.
26
+ gradient_accumulation_steps: 4
27
+ per_device_train_batch_size: 4
28
+ per_device_eval_batch_size: 4
code/configs/dpo1b_relabel_comparisons.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_summarize_comparisons_relabelled_margin
2
+ mode: relabel
3
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
4
+ dataset_name: mnoukhov/openai_summarize_comparisons_tldrprompt
5
+ eval_split: train
6
+ use_peft: False
7
+ beta: 0.5
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ per_device_eval_batch_size: 8
12
+ warmup_steps: 150
code/configs/dpo1b_relabel_generated_pythia410m_fp16.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_summarize_generated_20k_relabelled_margin
2
+ mode: relabel
3
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
4
+ dataset_name: mnoukhov/openai_summarize_generated_20k
5
+ eval_split: train
6
+ use_peft: False
7
+ beta: 0.5
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ per_device_eval_batch_size: 8
12
+ warmup_steps: 150
code/configs/dpo1b_relabel_generated_same_prompts.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: /home/toolkit/huggingface/openai_comparisons_20k_regen_and_relabelled
2
+ mode: relabel
3
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
4
+ dataset_name: arianhosseini/openai_comparisons_20k_regen_and_relabelled
5
+ eval_split: train
6
+ use_peft: False
7
+ beta: 0.5
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ per_device_eval_batch_size: 8
12
+ warmup_steps: 150
code/configs/dpo1b_relabel_vllm_generated_pythia410m.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ output_dir: openai_summarize_vllm_generated_20k_label410m
2
+ mode: relabel
3
+ model_name: mnoukhov/pythia410m-tldrprompt-dpo1b-adapter
4
+ dataset_name: mnoukhov/openai_summarize_vllm_generated_20k
5
+ eval_split: train
6
+ use_peft: False
7
+ beta: 0.5
8
+ load_in_8bit: False
9
+ bf16: False
10
+ fp16: True
11
+ per_device_eval_batch_size: 8
12
+ warmup_steps: 150
code/configs/dpo1b_test.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: /home/toolkit/huggingface/tldr_sft_pythia410m_fp32_trainall_3epochs
2
+ dataset_name: mnoukhov/openai_summarize_comparisons_relabel_pythia1b
3
+ beta: 0.5
4
+ num_train_epochs: 3
5
+ eval_steps: 750
6
+ load_in_8bit: False
7
+ bf16: False
8
+ fp16: True
9
+ learning_rate: 1e-5
10
+ use_peft: True
11
+ lora_all_linear: True
12
+ lora_r: 8
13
+ lora_alpha: 32
14
+ lora_dropout: 0.05
15
+ gradient_accumulation_steps: 4
16
+ per_device_train_batch_size: 4
17
+ warmup_steps: 150
18
+ eval_steps: 10
19
+ save_steps: 10
code/configs/dpo1b_vllm_pythia410m.yml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: mnoukhov/pythia410m-tldr-sft
2
+ dataset_name: mnoukhov/openai_summarize_vllm_generated_20k_label410m
3
+ gold_model_name: mnoukhov/pythia1b-sft-rm-tldrprompt
4
+ beta: 0.5
5
+ max_steps: 10000
6
+ eval_steps: 1000
7
+ load_in_8bit: False
8
+ bf16: False
9
+ fp16: True
10
+ learning_rate: 1e-5
11
+ use_peft: True
12
+ lora_all_linear: True
13
+ lora_r: 8
14
+ lora_alpha: 32
15
+ lora_dropout: 0.05
16
+ gradient_accumulation_steps: 4
17
+ per_device_train_batch_size: 4
18
+ warmup_steps: 150
code/configs/dpo2_costa_1b_20k_bf16.yml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## dpo 2
2
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_generated_20k_relabel_pythia1b_dpo_temp0.7_length128
3
+ train_split: train[:1]
4
+ max_prompt_length: 512
5
+ max_target_length: 131
6
+ max_length: 640
7
+ ## costa stuff
8
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
9
+ model_revision: sft__55513__1706646024
10
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
11
+ tokenizer_name: EleutherAI/pythia-1b-deduped
12
+ prompt_field: query
13
+ eval_split: validation
14
+ ## hub stuff
15
+ push_to_hub: True
16
+ push_to_hub_organization: mnoukhov
17
+ ## training stuff
18
+ gold_eval: ppl
19
+ eval_steps: 0.2
20
+ save_steps: 0.2
21
+ beta: 0.5
22
+ max_steps: -1
23
+ num_train_epochs: 1
24
+ load_in_8bit: False
25
+ bf16: True
26
+ fp16: False
27
+ learning_rate: 3e-6
28
+ use_peft: True
29
+ lora_all_linear: True
30
+ lora_r: 8
31
+ lora_alpha: 32
32
+ lora_dropout: 0.05
33
+ gradient_accumulation_steps: 4
34
+ per_device_train_batch_size: 16
35
+ per_device_eval_batch_size: 4
36
+ warmup_steps: 150
code/configs/dpo2_costa_1b_20k_fp16.yml ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## dpo 2
2
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_unlabelled_vllm_dpo2_costa_1b_fp16.yml_bfcef
3
+ pseudo_dataset_split: train[:20000]
4
+ train_split: train[:1]
5
+ max_prompt_length: 512
6
+ max_target_length: 131
7
+ max_length: 640
8
+ lr_scheduler_type: cosine
9
+ ## costa stuff
10
+ # model_name: mnoukhov/EleutherAI_pythia-1b-deduped__sft__tldr_dpo_costa_1b_fp16.yml_3d94f50_b9ff2_merged
11
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
12
+ model_revision: sft__55513__1706646024
13
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
14
+ tokenizer_name: EleutherAI/pythia-1b-deduped
15
+ prompt_field: query
16
+ eval_split: validation
17
+ ## hub stuff
18
+ push_to_hub: True
19
+ push_to_hub_organization: mnoukhov
20
+ ## training stuff
21
+ gold_eval: ppl
22
+ eval_steps: 0.2
23
+ save_steps: 0.2
24
+ beta: 0.05
25
+ max_steps: -1
26
+ num_train_epochs: 2
27
+ load_in_8bit: False
28
+ bf16: False
29
+ fp16: True
30
+ learning_rate: 1e-5
31
+ use_peft: True
32
+ lora_r: 16
33
+ lora_alpha: 32
34
+ lora_dropout: 0.
35
+ gradient_accumulation_steps: 4
36
+ per_device_train_batch_size: 4
37
+ per_device_eval_batch_size: 4
code/configs/dpo2_costa_2.8b_bf16.yml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # dpo2
2
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_unlabelled_vllm_dpo_costa_2.8b_bf16.yml_6e799
3
+ pseudo_dataset_split: train[:20000]
4
+ train_split: train[:1]
5
+ ## costa stuff
6
+ model_name: vwxyzjn/EleutherAI_pythia-2.8b-deduped__sft__tldr
7
+ model_revision: sft__55513__1708611267
8
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
9
+ tokenizer_name: EleutherAI/pythia-1b-deduped
10
+ prompt_field: query
11
+ eval_split: validation
12
+ max_prompt_length: 512
13
+ max_target_length: 131
14
+ max_length: 640
15
+ lr_scheduler_type: cosine
16
+ ## hub stuff
17
+ push_to_hub: True
18
+ push_to_hub_organization: mnoukhov
19
+ ## training stuff
20
+ gold_eval: ppl
21
+ eval_steps: 0.33
22
+ save_steps: 0.33
23
+ beta: 0.05
24
+ max_steps: -1
25
+ num_train_epochs: 1
26
+ load_in_8bit: False
27
+ bf16: True
28
+ fp16: False
29
+ learning_rate: 1e-5
30
+ use_peft: True
31
+ lora_r: 16
32
+ lora_alpha: 32
33
+ lora_dropout: 0.
34
+ load_in_8bit: False
35
+ gradient_checkpointing: True
36
+ gradient_checkpointing_use_reentrant: False
37
+ gradient_accumulation_steps: 4
38
+ per_device_train_batch_size: 16
39
+ per_device_eval_batch_size: 8
40
+ eval_first_step: False
code/configs/dpo2_pythia2.8b_tldr.yml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_unlabelled_vllm_dpo_costa_2.8b_bf16.yml_6e799_new
2
+ train_split: train[:1]
3
+ # dpo 2
4
+ eval_first_step: False
5
+ model_name: mnoukhov/EleutherAI_pythia-2.8b-deduped__sft__tldr_55513
6
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
7
+ tokenizer_name: EleutherAI/pythia-1b-deduped
8
+ prompt_field: query
9
+ eval_split: validation
10
+ max_prompt_length: 512
11
+ max_target_length: 131
12
+ max_length: 640
13
+ lr_scheduler_type: cosine
14
+ ## hub stuff
15
+ push_to_hub: True
16
+ push_to_hub_organization: mnoukhov
17
+ ## training stuff
18
+ gold_eval: ppl
19
+ eval_steps: 0.2
20
+ save_steps: 0.2
21
+ beta: 0.05
22
+ max_steps: -1
23
+ num_train_epochs: 1
24
+ load_in_8bit: False
25
+ bf16: True
26
+ fp16: False
27
+ learning_rate: 1e-5
28
+ use_peft: True
29
+ lora_r: 16
30
+ lora_alpha: 32
31
+ lora_dropout: 0.
32
+ gradient_accumulation_steps: 16
33
+ per_device_train_batch_size: 4
34
+ per_device_eval_batch_size: 4
code/configs/dpo3_costa_1b_20k_fp16.yml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## dpo 2
2
+ pseudo_dataset_name:
3
+ train_split: train[:1]
4
+ max_prompt_length: 512
5
+ max_target_length: 131
6
+ max_length: 640
7
+ lr_scheduler_type: cosine
8
+ ## costa stuff
9
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
10
+ model_revision: sft__55513__1706646024
11
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
12
+ tokenizer_name: EleutherAI/pythia-1b-deduped
13
+ prompt_field: query
14
+ eval_split: validation
15
+ ## hub stuff
16
+ push_to_hub: True
17
+ push_to_hub_organization: mnoukhov
18
+ ## training stuff
19
+ gold_eval: ppl
20
+ eval_steps: 0.2
21
+ save_steps: 0.2
22
+ beta: 0.05
23
+ max_steps: -1
24
+ num_train_epochs: 1
25
+ load_in_8bit: False
26
+ bf16: False
27
+ fp16: True
28
+ learning_rate: 3e-5
29
+ use_peft: True
30
+ lora_r: 16
31
+ lora_alpha: 32
32
+ lora_dropout: 0.
33
+ gradient_accumulation_steps: 4
34
+ per_device_train_batch_size: 4
35
+ per_device_eval_batch_size: 4
code/configs/dpo_1b_bf16.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
2
+ model_revision: sft__55513__1706646024
3
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
4
+ eval_split: validation
5
+ tokenizer_name: EleutherAI/pythia-1b-deduped
6
+ prompt_field: query
7
+ gold_model_name: vwxyzjn/EleutherAI_pythia-6.9b-deduped__reward__tldr
8
+ gold_model_revision: reward__55513__1706651113
9
+ gold_dataset_name: vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144
10
+ gold_prompt_field: query
11
+ gold_eval_split: validation
12
+ strip_prompt: False
13
+ ## training stuff
14
+ beta: 0.5
15
+ max_steps: 10000
16
+ eval_steps: 1000
17
+ load_in_8bit: False
18
+ bf16: True
19
+ fp16: False
20
+ learning_rate: 1e-5
21
+ use_peft: True
22
+ lora_all_linear: True
23
+ lora_r: 8
24
+ lora_alpha: 32
25
+ lora_dropout: 0.05
26
+ gradient_accumulation_steps: 16
27
+ per_device_train_batch_size: 4
28
+ warmup_steps: 150
code/configs/dpo_1b_fp16.yml ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## costa stuff
2
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
3
+ model_revision: sft__55513__1706646024
4
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
5
+ tokenizer_name: EleutherAI/pythia-1b-deduped
6
+ prompt_field: query
7
+ eval_split: validation
8
+ max_target_length: 128
9
+ ## hub stuff
10
+ push_to_hub: True
11
+ push_to_hub_organization: mnoukhov
12
+ ## training stuff
13
+ gold_eval: ppl
14
+ eval_steps: 0.2
15
+ save_steps: 0.2
16
+ beta: 0.5
17
+ max_steps: -1
18
+ num_train_epochs: 2
19
+ load_in_8bit: False
20
+ bf16: False
21
+ fp16: True
22
+ learning_rate: 1e-5
23
+ use_peft: True
24
+ lora_all_linear: True
25
+ lora_r: 8
26
+ lora_alpha: 32
27
+ lora_dropout: 0.05
28
+ gradient_accumulation_steps: 4
29
+ per_device_train_batch_size: 4
30
+ per_device_eval_batch_size: 4
31
+ warmup_steps: 150
code/configs/dpo_20konly_1b_bf16.yml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## costa stuff
2
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
3
+ model_revision: sft__55513__1706646024
4
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
5
+ tokenizer_name: EleutherAI/pythia-1b-deduped
6
+ eval_split: validation
7
+ prompt_field: query
8
+ gold_model_name: vwxyzjn/EleutherAI_pythia-6.9b-deduped__reward__tldr
9
+ gold_model_revision: reward__55513__1706651113
10
+ gold_dataset_name: vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144
11
+ gold_prompt_field: query
12
+ gold_target_field: reference_response
13
+ gold_eval_split: validation
14
+ strip_prompt: False
15
+ ## training stuff
16
+ eval_first_step: False
17
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_generated_20k_relabel_pythia1b_dpo
18
+ beta: 0.5
19
+ max_steps: 10000
20
+ eval_steps: 1000
21
+ load_in_8bit: False
22
+ bf16: True
23
+ fp16: False
24
+ learning_rate: 1e-5
25
+ use_peft: True
26
+ lora_all_linear: True
27
+ lora_r: 8
28
+ lora_alpha: 32
29
+ lora_dropout: 0.05
30
+ gradient_accumulation_steps: 16
31
+ per_device_train_batch_size: 4
32
+ warmup_steps: 150
code/configs/dpo_20konly_1b_fp16.yml ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## costa stuff
2
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
3
+ model_revision: sft__55513__1706646024
4
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
5
+ tokenizer_name: EleutherAI/pythia-1b-deduped
6
+ prompt_field: query
7
+ eval_split: validation
8
+ pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_generated_20k_relabel_pythia1b_dpo
9
+ max_target_length: 128
10
+ ## hub stuff
11
+ push_to_hub: True
12
+ push_to_hub_organization: mnoukhov
13
+ ## training stuff
14
+ gold_eval: ppl
15
+ eval_steps: 0.2
16
+ save_steps: 0.2
17
+ train_split: train[:1]
18
+ beta: 0.5
19
+ max_steps: -1
20
+ num_train_epochs: 5
21
+ load_in_8bit: False
22
+ bf16: False
23
+ fp16: True
24
+ learning_rate: 1e-5
25
+ use_peft: True
26
+ lora_all_linear: True
27
+ lora_r: 8
28
+ lora_alpha: 32
29
+ lora_dropout: 0.05
30
+ gradient_accumulation_steps: 4
31
+ per_device_train_batch_size: 4
32
+ per_device_eval_batch_size: 4
33
+ warmup_steps: 150
code/configs/dpo_costa_1b_constantlr_fp16.yml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## costa stuff
2
+ model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
3
+ model_revision: sft__55513__1706646024
4
+ dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
5
+ tokenizer_name: EleutherAI/pythia-1b-deduped
6
+ prompt_field: query
7
+ eval_split: validation
8
+ max_target_length: 169
9
+ ## hub stuff
10
+ push_to_hub: True
11
+ push_to_hub_organization: mnoukhov
12
+ ## training stuff
13
+ gold_eval: ppl
14
+ eval_steps: 0.2
15
+ save_steps: 0.2
16
+ beta: 0.5
17
+ max_steps: -1
18
+ num_train_epochs: 1
19
+ load_in_8bit: False
20
+ bf16: False
21
+ fp16: True
22
+ learning_rate: 1e-6
23
+ lr_scheduler_type: constant_with_warmup
24
+ use_peft: True
25
+ lora_all_linear: True
26
+ lora_r: 32
27
+ lora_alpha: 64
28
+ lora_dropout: 0.05
29
+ gradient_accumulation_steps: 4
30
+ per_device_train_batch_size: 4
31
+ per_device_eval_batch_size: 4
32
+ warmup_steps: 150