andreaskoepf
commited on
Commit
·
cecdd87
1
Parent(s):
94537a6
Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,6 @@ license: other
|
|
4 |
|
5 |
# OpenAssistant LLaMa 30B SFT 6
|
6 |
|
7 |
-
- **Paper:** https://arxiv.org/abs/2304.07327
|
8 |
-
|
9 |
Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
|
10 |
|
11 |
Thanks to Mick for writing the `xor_codec.py` script which enables this process
|
@@ -140,3 +138,50 @@ ae48c4c68e4e171d502dd0896aa19a84 ./pytorch_model-00002-of-00007.bin
|
|
140 |
```
|
141 |
|
142 |
If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
# OpenAssistant LLaMa 30B SFT 6
|
6 |
|
|
|
|
|
7 |
Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
|
8 |
|
9 |
Thanks to Mick for writing the `xor_codec.py` script which enables this process
|
|
|
138 |
```
|
139 |
|
140 |
If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**
|
141 |
+
|
142 |
+
### Configuration
|
143 |
+
|
144 |
+
```
|
145 |
+
llama-30b-sft-6:
|
146 |
+
dtype: fp16
|
147 |
+
log_dir: "llama_log_30b"
|
148 |
+
learning_rate: 1e-5
|
149 |
+
model_name: /home/ubuntu/Open-Assistant/model/model_training/.saved/llama-30b-super-pretrain/checkpoint-3500
|
150 |
+
output_dir: llama_model_30b
|
151 |
+
deepspeed_config: configs/zero3_config_sft.json
|
152 |
+
weight_decay: 0.0
|
153 |
+
residual_dropout: 0.0
|
154 |
+
max_length: 2048
|
155 |
+
use_flash_attention: true
|
156 |
+
warmup_steps: 20
|
157 |
+
gradient_checkpointing: true
|
158 |
+
gradient_accumulation_steps: 16
|
159 |
+
per_device_train_batch_size: 2
|
160 |
+
per_device_eval_batch_size: 3
|
161 |
+
eval_steps: 101
|
162 |
+
save_steps: 292
|
163 |
+
num_train_epochs: 8
|
164 |
+
save_total_limit: 3
|
165 |
+
use_custom_sampler: true
|
166 |
+
sort_by_length: false
|
167 |
+
save_strategy: steps
|
168 |
+
datasets:
|
169 |
+
- oasst_export:
|
170 |
+
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
|
171 |
+
input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
|
172 |
+
val_split: 0.05
|
173 |
+
- vicuna:
|
174 |
+
val_split: 0.05
|
175 |
+
max_val_set: 800
|
176 |
+
fraction: 0.8
|
177 |
+
- dolly15k:
|
178 |
+
val_split: 0.05
|
179 |
+
max_val_set: 300
|
180 |
+
- grade_school_math_instructions:
|
181 |
+
val_split: 0.05
|
182 |
+
- code_alpaca:
|
183 |
+
val_split: 0.05
|
184 |
+
max_val_set: 250
|
185 |
+
```
|
186 |
+
|
187 |
+
- **OASST dataset paper:** https://arxiv.org/abs/2304.07327
|