Weyaxi commited on
Commit
6c29087
1 Parent(s): 3eef002

adapter readme

Browse files
Files changed (1) hide show
  1. README.md +6 -166
README.md CHANGED
@@ -3,176 +3,16 @@ license: other
3
  library_name: peft
4
  tags:
5
  - axolotl
6
- - generated_from_trainer
7
  base_model: Qwen/Qwen1.5-32B
8
- model-index:
9
- - name: Einstein-v4-Qwen-1.5-32B
10
- results: []
11
  ---
 
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
- <details><summary>See axolotl config</summary>
18
-
19
- axolotl version: `0.4.0`
20
- ```yaml
21
- base_model: Qwen/Qwen1.5-32B
22
- model_type: AutoModelForCausalLM
23
- tokenizer_type: AutoTokenizer
24
-
25
- load_in_8bit: false
26
- load_in_4bit: true
27
- strict: false
28
-
29
- chat_template: chatml
30
- datasets:
31
- - path: data/merged_all.json
32
- ds_type: json
33
- type: alpaca
34
- conversation: chatml
35
-
36
- - path: data/capybara_sharegpt.json
37
- ds_type: json
38
- type: sharegpt
39
- conversation: chatml
40
-
41
- - path: data/synthia-v1.3_sharegpt_12500.json
42
- ds_type: json
43
- type: sharegpt
44
- conversation: chatml
45
-
46
- - path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json
47
- ds_type: json
48
- type: sharegpt
49
- conversation: chatml
50
-
51
- - path: data/slimorca_dedup_filtered_95k_sharegpt.json
52
- ds_type: json
53
- type: sharegpt
54
- conversation: chatml
55
-
56
- - path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json
57
- ds_type: json
58
- type: sharegpt
59
- conversation: chatml
60
-
61
- dataset_prepared_path: last_run_prepared
62
- val_set_size: 0 # because we won't eval, out of memory :(
63
- output_dir: ./Einstein-v4-Qwen-1.5-32B-model
64
-
65
- sequence_len: 4096
66
- sample_packing: true
67
- pad_to_sequence_len: true
68
- eval_sample_packing: false
69
-
70
- adapter: qlora
71
- lora_model_dir:
72
- lora_r: 64
73
- lora_alpha: 32
74
- lora_dropout: 0.05
75
- lora_target_linear: true
76
- lora_fan_in_fan_out:
77
- lora_modules_to_save:
78
- - "embed_tokens"
79
- - "lm_head"
80
-
81
- wandb_project: Einstein
82
- wandb_entity:
83
- wandb_watch:
84
- wandb_name: Einstein-v4-Qwen-1.5-32B-qlora-2-epoch
85
- wandb_log_model:
86
- hub_model_id: Weyaxi/Einstein-v4-Qwen-1.5-32B
87
-
88
- save_safetensors: true
89
-
90
- gradient_accumulation_steps: 4
91
- micro_batch_size: 1
92
- num_epochs: 2
93
- optimizer: adamw_bnb_8bit
94
- lr_scheduler: cosine
95
- learning_rate: 0.0002
96
-
97
- train_on_inputs: false
98
- group_by_length: false
99
- bf16: true
100
- fp16: false
101
- tf32: false
102
-
103
- gradient_checkpointing: true
104
- early_stopping_patience:
105
- resume_from_checkpoint:
106
- local_rank:
107
- logging_steps: 1
108
- xformers_attention:
109
- flash_attention: true
110
-
111
- warmup_steps: 10
112
- evals_per_epoch: 0 # because we won't eval, out of memory :(
113
- eval_table_size:
114
- eval_table_max_new_tokens: 128
115
- saves_per_epoch: 2
116
- debug:
117
-
118
- deepspeed: zero3_bf16_cpuoffload_params.json
119
- weight_decay: 0.0
120
- fsdp:
121
- fsdp_config:
122
- special_tokens:
123
- bos_token: "<s>"
124
- eos_token: "<|im_end|>"
125
- unk_token: "<unk>"
126
- tokens:
127
- - "<|im_start|>"
128
-
129
- ```
130
-
131
- </details><br>
132
-
133
- # Einstein-v4-Qwen-1.5-32B
134
-
135
- This model is a fine-tuned version of [Qwen/Qwen1.5-32B](https://huggingface.co/Qwen/Qwen1.5-32B) on the None dataset.
136
-
137
- ## Model description
138
-
139
- More information needed
140
-
141
- ## Intended uses & limitations
142
-
143
- More information needed
144
-
145
- ## Training and evaluation data
146
-
147
- More information needed
148
-
149
- ## Training procedure
150
-
151
- ### Training hyperparameters
152
-
153
- The following hyperparameters were used during training:
154
- - learning_rate: 0.0002
155
- - train_batch_size: 1
156
- - eval_batch_size: 1
157
- - seed: 42
158
- - distributed_type: multi-GPU
159
- - num_devices: 9
160
- - gradient_accumulation_steps: 4
161
- - total_train_batch_size: 36
162
- - total_eval_batch_size: 9
163
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
164
- - lr_scheduler_type: cosine
165
- - lr_scheduler_warmup_steps: 10
166
- - num_epochs: 2
167
-
168
- ### Training results
169
 
 
170
 
 
171
 
172
- ### Framework versions
173
 
174
- - PEFT 0.10.0
175
- - Transformers 4.40.0.dev0
176
- - Pytorch 2.1.2+cu118
177
- - Datasets 2.18.0
178
- - Tokenizers 0.15.0
 
3
  library_name: peft
4
  tags:
5
  - axolotl
 
6
  base_model: Qwen/Qwen1.5-32B
 
 
 
7
  ---
8
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/l9tUquGV3mprrh9nzRpon.png)
9
 
10
+ # Einstein-v4-Qwen-1.5-32B-adapter
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ Adapter files of Einstein-v4-Qwen-1.5-32B. Finetuned from [Qwen/Qwen1.5-32B](https://huggingface.co/Qwen/Qwen1.5-32B).
13
 
14
+ ## Original Weights
15
 
16
+ You can access original weights from here:
17
 
18
+ [Weyaxi/Einstein-v4-Qwen-1.5-32B](https://huggingface.co/Weyaxi/Einstein-v4-Qwen-1.5-32B)