Text Generation
Transformers
Safetensors
qwen2
Generated from Trainer
axolotl
conversational
Inference Endpoints
text-generation-inference
ehartford commited on
Commit
4b21382
1 Parent(s): 810440c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +165 -29
  2. model.safetensors +1 -1
  3. pytorch_model.bin +3 -0
README.md CHANGED
@@ -3,48 +3,184 @@ license: apache-2.0
3
  base_model: Qwen/Qwen2-0.5B
4
  tags:
5
  - generated_from_trainer
6
- - axolotl
7
- datasets:
8
- - cognitivecomputations/Dolphin-2.9
9
- - teknium/OpenHermes-2.5
10
- - m-a-p/CodeFeedback-Filtered-Instruction
11
- - cognitivecomputations/dolphin-coder
12
- - cognitivecomputations/samantha-data
13
- - microsoft/orca-math-word-problems-200k
14
- - Locutusque/function-calling-chatml
15
- - internlm/Agent-FLAN
16
  ---
17
 
18
- # Dolphin 2.9.3 Qwen2 0.5B 🐬
 
 
 
 
 
 
 
 
 
 
19
 
20
- Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
21
 
22
- [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
23
- Discord: https://discord.gg/cognitivecomputations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
- <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
 
 
26
 
27
- Our appreciation for the sponsors of Dolphin 2.9.3:
28
- - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
 
29
 
30
- This model is based on Qwen2-0.5b, and is governed by the Apache-2.0
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- The base model has 128k context, and the full-weight fine-tuning was with 16k sequence length.
 
 
 
 
33
 
 
 
 
 
 
 
 
34
 
35
- example:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- ```
38
- <|im_start|>system
39
- You are Dolphin, a helpful AI assistant.<|im_end|>
40
- <|im_start|>user
41
- {prompt}<|im_end|>
42
- <|im_start|>assistant
43
 
44
  ```
45
 
46
- Dolphin-2.9.3 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
49
 
50
- Dolphin is licensed according to Apache-2.0 We grant permission for any use, including commercial, that falls within accordance with said license. Dolphin was trained on data generated from GPT4, among other models.
 
 
 
 
3
  base_model: Qwen/Qwen2-0.5B
4
  tags:
5
  - generated_from_trainer
6
+ model-index:
7
+ - name: 0.5b out
8
+ results: []
 
 
 
 
 
 
 
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
+ <details><summary>See axolotl config</summary>
16
+
17
+ axolotl version: `0.4.1`
18
+ ```yaml
19
+ base_model: Qwen/Qwen2-0.5B
20
+ model_type: AutoModelForCausalLM
21
+ tokenizer_type: AutoTokenizer
22
 
23
+ # load_in_4bit: true
24
 
25
+ chat_template: chatml
26
+ datasets:
27
+ - path: /workspace/datasets/dolphin201-sharegpt2.jsonl
28
+ type: sharegpt
29
+ conversation: chatml
30
+ - path: /workspace/datasets/SystemChat_sharegpt.jsonl
31
+ type: sharegpt
32
+ conversation: chatml
33
+ # - path: /workspace/datasets/SystemChat_multilingual_sharegpt.jsonl
34
+ # type: sharegpt
35
+ # conversation: chatml
36
+ # - path: /workspace/datasets/SystemChat-2.0-Arabic/SystemChatArabic_sharegpt.jsonl
37
+ # type: sharegpt
38
+ # conversation: chatml
39
+ # - path: /workspace/datasets/dolphin-coder-translate-sharegpt2.jsonl
40
+ # type: sharegpt
41
+ # conversation: chatml
42
+ # - path: /workspace/datasets/dolphin-coder-codegen-sharegpt2.jsonl
43
+ # type: sharegpt
44
+ # conversation: chatml
45
+ # - path: /workspace/datasets/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
46
+ # type: sharegpt
47
+ # conversation: chatml
48
+ # - path: /workspace/datasets/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
49
+ # type: sharegpt
50
+ # conversation: chatml
51
+ - path: /workspace/datasets/not_samantha_norefusals.jsonl
52
+ type: sharegpt
53
+ conversation: chatml
54
+ - path: /workspace/datasets/Orca-Math-resort-unfiltered.jsonl
55
+ type: sharegpt
56
+ # conversation: chatml
57
+ # - path: /workspace/datasets/agent_instruct_react_unfiltered.jsonl
58
+ # type: sharegpt
59
+ # conversation: chatml
60
+ # - path: /workspace/datasets/toolbench_instruct_j1s1_3k_unfiltered.jsonl
61
+ # type: sharegpt
62
+ # conversation: chatml
63
+ # - path: /workspace/datasets/toolbench_negative_unfiltered.jsonl
64
+ # type: sharegpt
65
+ # conversation: chatml
66
+ # - path: /workspace/datasets/toolbench_react_10p_unfiltered.jsonl
67
+ # type: sharegpt
68
+ # conversation: chatml
69
+ # - path: /workspace/datasets/toolbench_tflan_cot_30p_unfiltered.jsonl
70
+ # type: sharegpt
71
+ # conversation: chatml
72
+ - path: /workspace/datasets/openhermes200k_unfiltered.jsonl
73
+ type: sharegpt
74
+ conversation: chatml
75
 
76
+ dataset_prepared_path: last_run_prepared
77
+ val_set_size: 0.03
78
+ output_dir: 0.5b out
79
 
80
+ sequence_len: 16384
81
+ sample_packing: true
82
+ pad_to_sequence_len: true
83
 
84
+ # adapter: qlora
85
+ # lora_r: 16
86
+ # lora_alpha: 32
87
+ # lora_dropout: 0.05
88
+ # lora_target_modules:
89
+ # - q_proj
90
+ # - k_proj
91
+ # - v_proj
92
+ # - o_proj
93
+ # - gate_proj
94
+ # - up_proj
95
+ # - down_proj
96
 
97
+ wandb_project: 2.9.3-qwen-2.9.3-qwen2-0.5b
98
+ # wandb_entity: oaaic
99
+ # wandb_watch:
100
+ # wandb_name:
101
+ # wandb_log_model:
102
 
103
+ gradient_accumulation_steps: 4
104
+ micro_batch_size: 1
105
+ num_epochs: 3
106
+ optimizer: adamw_8bit
107
+ lr_scheduler: constant
108
+ learning_rate: 1e-4
109
+ # max_grad_norm: 1.0
110
 
111
+ train_on_inputs: false
112
+ group_by_length: false
113
+ bf16: true
114
+ tf32: false
115
+
116
+ gradient_checkpointing: true
117
+ gradient_checkpointing_kwargs:
118
+ use_reentrant: true
119
+ logging_steps: 1
120
+ flash_attention: true
121
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
122
+ warmup_steps: 10
123
+ # evals_per_epoch: 2
124
+ saves_per_epoch: 2
125
+ save_total_limit: 2
126
+ weight_decay: 0.1
127
+ special_tokens:
128
+ eos_token: <|im_end|>
129
 
 
 
 
 
 
 
130
 
131
  ```
132
 
133
+ </details><br>
134
+
135
+ # 0.5b out
136
+
137
+ This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the None dataset.
138
+ It achieves the following results on the evaluation set:
139
+ - Loss: 0.9948
140
+
141
+ ## Model description
142
+
143
+ More information needed
144
+
145
+ ## Intended uses & limitations
146
+
147
+ More information needed
148
+
149
+ ## Training and evaluation data
150
+
151
+ More information needed
152
+
153
+ ## Training procedure
154
+
155
+ ### Training hyperparameters
156
+
157
+ The following hyperparameters were used during training:
158
+ - learning_rate: 0.0001
159
+ - train_batch_size: 1
160
+ - eval_batch_size: 1
161
+ - seed: 42
162
+ - distributed_type: multi-GPU
163
+ - num_devices: 8
164
+ - gradient_accumulation_steps: 4
165
+ - total_train_batch_size: 32
166
+ - total_eval_batch_size: 8
167
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
168
+ - lr_scheduler_type: constant
169
+ - lr_scheduler_warmup_steps: 10
170
+ - num_epochs: 3
171
+
172
+ ### Training results
173
+
174
+ | Training Loss | Epoch | Step | Validation Loss |
175
+ |:-------------:|:------:|:----:|:---------------:|
176
+ | 1.0145 | 1.0111 | 933 | 1.0237 |
177
+ | 0.9716 | 2.0116 | 1867 | 0.9972 |
178
+ | 0.8939 | 2.9743 | 2766 | 0.9948 |
179
+
180
 
181
+ ### Framework versions
182
 
183
+ - Transformers 4.41.1
184
+ - Pytorch 2.3.0+cu121
185
+ - Datasets 2.19.1
186
+ - Tokenizers 0.19.1
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9adccc59fd39ac70af1c269702fa727fc14840d6c6ddc60668e95b9478947294
3
  size 988097824
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c26f3f20158634d20c14afb1c56a674ae9ffe07db89cb9aad146e1d9ca03f13f
3
  size 988097824
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c457a2141607b17260280acf162945f9b1b5a4d02a9001ce86eff4592d6c292
3
+ size 38325