fblgit commited on
Commit
ae3a42f
1 Parent(s): ed457a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -2
README.md CHANGED
@@ -1,6 +1,19 @@
1
  ---
2
- license: cc-by-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
4
  Finetune of miqu-70b-sf dequant of miqudev's leak of Mistral-70B (allegedly an early mistral medium). My diffs are available under CC-0 (That is the Senku-70B repo, full includes the merge), this is a merge with the leaked model, you can use the other repository to save bandwidth.
5
 
6
  EQ-Bench: 84.89
@@ -18,4 +31,135 @@ The user’s message goes here
18
  <|im_end|>
19
  <|im_start|>assistant <|im_end|>
20
 
21
- Credit to https://twitter.com/hu_yifei for providing GSM & Hellaswag. It is the first open weight model to dethrone GPT-4 on EQ bench,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
3
+ tags:
4
+ - generated_from_trainer
5
+ base_model: 152334H/miqu-1-70b-sf
6
+ model-index:
7
+ - name: qlora-out
8
+ results: []
9
+ license: cc0-1.0
10
+ datasets:
11
+ - Open-Orca/SlimOrca
12
  ---
13
+
14
+ # ShinojiResearch/Senku-70B-Full
15
+
16
+ ## Model Details
17
  Finetune of miqu-70b-sf dequant of miqudev's leak of Mistral-70B (allegedly an early mistral medium). My diffs are available under CC-0 (That is the Senku-70B repo, full includes the merge), this is a merge with the leaked model, you can use the other repository to save bandwidth.
18
 
19
  EQ-Bench: 84.89
 
31
  <|im_end|>
32
  <|im_start|>assistant <|im_end|>
33
 
34
+ Credit to https://twitter.com/hu_yifei for providing GSM & Hellaswag. It is the first open weight model to dethrone GPT-4 on EQ bench,
35
+
36
+ ## Base Model Details
37
+
38
+ This model is a fine-tuned version of [152334H/miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) on the Slimorca dataset.
39
+ It achieves the following results on the evaluation set:
40
+ - Loss: 0.3110
41
+
42
+ ## Training procedure
43
+
44
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
45
+ <details><summary>See axolotl config</summary>
46
+
47
+ axolotl version: `0.4.0`
48
+ ```yaml
49
+ base_model: 152334H/miqu-1-70b-sf
50
+ model_type: MistralForCausalLM
51
+ tokenizer_type: LlamaTokenizer
52
+ is_mistral_derived_model: true
53
+
54
+ load_in_8bit: false
55
+ load_in_4bit: true
56
+ strict: false
57
+
58
+ datasets:
59
+ - path: Open-Orca/SlimOrca
60
+ type: sharegpt
61
+ conversation: chatml
62
+ dataset_prepared_path: last_run_prepared
63
+ val_set_size: 0.1
64
+ output_dir: ./qlora-out
65
+
66
+ adapter: qlora
67
+ lora_model_dir:
68
+
69
+ sequence_len: 8192
70
+ sample_packing: true
71
+ pad_to_sequence_len: true
72
+
73
+ lora_r: 32
74
+ lora_alpha: 16
75
+ lora_dropout: 0.05
76
+ lora_target_linear: true
77
+ lora_fan_in_fan_out:
78
+ lora_target_modules:
79
+ - gate_proj
80
+ - down_proj
81
+ - up_proj
82
+ - q_proj
83
+ - v_proj
84
+ - k_proj
85
+ - o_proj
86
+
87
+ wandb_project:
88
+ wandb_entity:
89
+ wandb_watch:
90
+ wandb_name:
91
+ wandb_log_model:
92
+
93
+ gradient_accumulation_steps: 4
94
+ micro_batch_size: 2
95
+ num_epochs: 1
96
+ optimizer: adamw_bnb_8bit
97
+ lr_scheduler: cosine
98
+ learning_rate: 0.0002
99
+
100
+ train_on_inputs: false
101
+ group_by_length: false
102
+ bf16: auto
103
+ fp16:
104
+ tf32: false
105
+
106
+ gradient_checkpointing: true
107
+ early_stopping_patience:
108
+ resume_from_checkpoint:
109
+ local_rank:
110
+ logging_steps: 1
111
+ xformers_attention:
112
+ flash_attention: true
113
+
114
+ loss_watchdog_threshold: 5.0
115
+ loss_watchdog_patience: 3
116
+
117
+ warmup_steps: 10
118
+ evals_per_epoch: 4
119
+ eval_table_size:
120
+ eval_table_max_new_tokens: 128
121
+ saves_per_epoch: 1
122
+ debug:
123
+ deepspeed:
124
+ weight_decay: 0.0
125
+ fsdp:
126
+ fsdp_config:
127
+ special_tokens:
128
+ bos_token: "<s>"
129
+ eos_token: "</s>"
130
+ unk_token: "<unk>"
131
+ ```
132
+
133
+ </details><br>
134
+
135
+ ### Training hyperparameters
136
+
137
+ The following hyperparameters were used during training:
138
+ - learning_rate: 0.0002
139
+ - train_batch_size: 2
140
+ - eval_batch_size: 2
141
+ - seed: 42
142
+ - gradient_accumulation_steps: 4
143
+ - total_train_batch_size: 8
144
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
145
+ - lr_scheduler_type: cosine
146
+ - lr_scheduler_warmup_steps: 10
147
+ - num_epochs: 1
148
+
149
+ ### Training results
150
+
151
+ | Training Loss | Epoch | Step | Validation Loss |
152
+ |:-------------:|:-----:|:----:|:---------------:|
153
+ | 0.9043 | 0.0 | 1 | 0.6387 |
154
+ | 0.5612 | 0.25 | 881 | 0.3279 |
155
+ | 0.6044 | 0.5 | 1762 | 0.3177 |
156
+ | 0.6592 | 0.75 | 2643 | 0.3110 |
157
+
158
+
159
+ ### Framework versions
160
+
161
+ - PEFT 0.8.2
162
+ - Transformers 4.38.0.dev0
163
+ - Pytorch 2.1.2+cu118
164
+ - Datasets 2.16.1
165
+ - Tokenizers 0.15.0