RichardErkhov commited on
Commit
10655a0
·
verified ·
1 Parent(s): 170f388

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +405 -0
README.md ADDED
@@ -0,0 +1,405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ zephyr-7b-gemma-v0.1 - bnb 4bits
11
+ - Model creator: https://huggingface.co/HuggingFaceH4/
12
+ - Original model: https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: other
20
+ tags:
21
+ - alignment-handbook
22
+ - trl
23
+ - dpo
24
+ - generated_from_trainer
25
+ base_model: HuggingFaceH4/zephyr-7b-gemma-sft-v0.1
26
+ datasets:
27
+ - argilla/dpo-mix-7k
28
+ license_name: gemma-terms-of-use
29
+ license_link: https://ai.google.dev/gemma/terms
30
+ pipeline_tag: text-generation
31
+ model-index:
32
+ - name: zephyr-7b-gemma
33
+ results:
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: MT-Bench
39
+ type: unknown
40
+ metrics:
41
+ - type: unknown
42
+ value: 7.81
43
+ name: score
44
+ source:
45
+ url: https://huggingface.co/spaces/lmsys/mt-bench
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: AI2 Reasoning Challenge (25-Shot)
51
+ type: ai2_arc
52
+ config: ARC-Challenge
53
+ split: test
54
+ args:
55
+ num_few_shot: 25
56
+ metrics:
57
+ - type: acc_norm
58
+ value: 58.45
59
+ name: normalized accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: HellaSwag (10-Shot)
68
+ type: hellaswag
69
+ split: validation
70
+ args:
71
+ num_few_shot: 10
72
+ metrics:
73
+ - type: acc_norm
74
+ value: 83.48
75
+ name: normalized accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: MMLU (5-Shot)
84
+ type: cais/mmlu
85
+ config: all
86
+ split: test
87
+ args:
88
+ num_few_shot: 5
89
+ metrics:
90
+ - type: acc
91
+ value: 60.68
92
+ name: accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: TruthfulQA (0-shot)
101
+ type: truthful_qa
102
+ config: multiple_choice
103
+ split: validation
104
+ args:
105
+ num_few_shot: 0
106
+ metrics:
107
+ - type: mc2
108
+ value: 52.07
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: Winogrande (5-shot)
117
+ type: winogrande
118
+ config: winogrande_xl
119
+ split: validation
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 74.19
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
128
+ name: Open LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: GSM8k (5-shot)
134
+ type: gsm8k
135
+ config: main
136
+ split: test
137
+ args:
138
+ num_few_shot: 5
139
+ metrics:
140
+ - type: acc
141
+ value: 45.56
142
+ name: accuracy
143
+ source:
144
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-gemma-v0.1
145
+ name: Open LLM Leaderboard
146
+ ---
147
+
148
+ <img src="https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/resolve/main/thumbnail.png" alt="Zephyr 7B Gemma Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
149
+
150
+ # Model Card for Zephyr 7B Gemma
151
+
152
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 7B Gemma is the third model in the series, and is a fine-tuned version of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). You can reproduce the training of this model via the recipe provided in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook).
153
+
154
+ ## Model description
155
+
156
+ - **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
157
+ - **Language(s) (NLP):** Primarily English
158
+ - **License:** Gemma Terms of Use
159
+ - **Finetuned from model:** [google/gemma-7b](https://huggingface.co/google/gemma-7b)
160
+
161
+ ### Model Sources
162
+
163
+ <!-- Provide the basic links for the model. -->
164
+
165
+ - **Repository:** https://github.com/huggingface/alignment-handbook
166
+ - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-7b-gemma-chat
167
+
168
+ ## Performance
169
+
170
+ | Model |MT Bench⬇️|IFEval|
171
+ |-----------------------------------------------------------------------|------:|------:|
172
+ |[zephyr-7b-gemma-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1)| 7.81 | 28.76|
173
+ |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.34 | 43.81|
174
+ |[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 6.38 | 38.01|
175
+
176
+
177
+
178
+ | Model |AGIEval|GPT4All|TruthfulQA|BigBench|Average ⬇️|
179
+ |-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
180
+ |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
181
+ |[zephyr-7b-gemma-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1)| 34.22| 66.37| 52.19| 37.10| 47.47|
182
+ |[mlabonne/Gemmalpaca-7B](https://huggingface.co/mlabonne/Gemmalpaca-7B)| 21.6 | 40.87| 44.85 | 30.49| 34.45|
183
+ |[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 21.33| 40.84| 41.70| 30.25| 33.53|
184
+
185
+
186
+ <details><summary>Details of AGIEval, GPT4All, TruthfulQA, BigBench </summary>
187
+
188
+ ### AGIEval
189
+ | Task |Version| Metric |Value| |Stderr|
190
+ |------------------------------|------:|--------|----:|---|-----:|
191
+ |agieval_aqua_rat | 0|acc |21.65|± | 2.59|
192
+ | | |acc_norm|25.20|± | 2.73|
193
+ |agieval_logiqa_en | 0|acc |34.72|± | 1.87|
194
+ | | |acc_norm|35.94|± | 1.88|
195
+ |agieval_lsat_ar | 0|acc |19.57|± | 2.62|
196
+ | | |acc_norm|21.74|± | 2.73|
197
+ |agieval_lsat_lr | 0|acc |30.59|± | 2.04|
198
+ | | |acc_norm|32.55|± | 2.08|
199
+ |agieval_lsat_rc | 0|acc |49.07|± | 3.05|
200
+ | | |acc_norm|42.75|± | 3.02|
201
+ |agieval_sat_en | 0|acc |54.85|± | 3.48|
202
+ | | |acc_norm|53.40|± | 3.48|
203
+ |agieval_sat_en_without_passage| 0|acc |37.38|± | 3.38|
204
+ | | |acc_norm|33.98|± | 3.31|
205
+ |agieval_sat_math | 0|acc |30.91|± | 3.12|
206
+ | | |acc_norm|28.18|± | 3.04|
207
+
208
+ Average: 34.22%
209
+
210
+ ### GPT4All
211
+ | Task |Version| Metric |Value| |Stderr|
212
+ |-------------|------:|--------|----:|---|-----:|
213
+ |arc_challenge| 0|acc |49.15|± | 1.46|
214
+ | | |acc_norm|52.47|± | 1.46|
215
+ |arc_easy | 0|acc |77.44|± | 0.86|
216
+ | | |acc_norm|74.75|± | 0.89|
217
+ |boolq | 1|acc |79.69|± | 0.70|
218
+ |hellaswag | 0|acc |60.59|± | 0.49|
219
+ | | |acc_norm|78.00|± | 0.41|
220
+ |openbookqa | 0|acc |29.20|± | 2.04|
221
+ | | |acc_norm|37.80|± | 2.17|
222
+ |piqa | 0|acc |76.82|± | 0.98|
223
+ | | |acc_norm|77.80|± | 0.97|
224
+ |winogrande | 0|acc |64.09|± | 1.35|
225
+
226
+ Average: 66.37%
227
+
228
+ ### TruthfulQA
229
+ | Task |Version|Metric|Value| |Stderr|
230
+ |-------------|------:|------|----:|---|-----:|
231
+ |truthfulqa_mc| 1|mc1 |35.74|± | 1.68|
232
+ | | |mc2 |52.19|± | 1.59|
233
+
234
+ Average: 52.19%
235
+
236
+ ### Bigbench
237
+ | Task |Version| Metric |Value| |Stderr|
238
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
239
+ |bigbench_causal_judgement | 0|multiple_choice_grade|53.68|± | 3.63|
240
+ |bigbench_date_understanding | 0|multiple_choice_grade|59.89|± | 2.55|
241
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|30.23|± | 2.86|
242
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|11.42|± | 1.68|
243
+ | | |exact_str_match | 0.00|± | 0.00|
244
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.40|± | 2.02|
245
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.14|± | 1.49|
246
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|44.67|± | 2.88|
247
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|26.80|± | 1.98|
248
+ |bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
249
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|52.75|± | 1.12|
250
+ |bigbench_ruin_names | 0|multiple_choice_grade|33.04|± | 2.22|
251
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|33.37|± | 1.49|
252
+ |bigbench_snarks | 0|multiple_choice_grade|48.62|± | 3.73|
253
+ |bigbench_sports_understanding | 0|multiple_choice_grade|58.11|± | 1.57|
254
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|37.20|± | 1.53|
255
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|20.08|± | 1.13|
256
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|15.77|± | 0.87|
257
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|44.67|± | 2.88|
258
+
259
+ Average: 37.1%
260
+
261
+ </details>
262
+
263
+
264
+ ## Intended uses & limitations
265
+
266
+ The model was initially fine-tuned on the [DEITA 10K](https://huggingface.co/datasets/HuggingFaceH4/deita-10k-v0-sft) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
267
+ We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k) dataset, which contains 7k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities.
268
+
269
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
270
+
271
+ ```python
272
+ # pip install transformers>=4.38.2
273
+ # pip install accelerate
274
+
275
+ import torch
276
+ from transformers import pipeline
277
+
278
+ pipe = pipeline(
279
+ "text-generation",
280
+ model="HuggingFaceH4/zephyr-7b-gemma-v0.1",
281
+ device_map="auto",
282
+ torch_dtype=torch.bfloat16,
283
+ )
284
+ messages = [
285
+ {
286
+ "role": "system",
287
+ "content": "", # Model not yet trained for follow this
288
+ },
289
+ {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
290
+ ]
291
+ outputs = pipe(
292
+ messages,
293
+ max_new_tokens=128,
294
+ do_sample=True,
295
+ temperature=0.7,
296
+ top_k=50,
297
+ top_p=0.95,
298
+ stop_sequence="<|im_end|>",
299
+ )
300
+ print(outputs[0]["generated_text"][-1]["content"])
301
+ # It is not possible for a human to eat a helicopter in one sitting, as a
302
+ # helicopter is a large and inedible machine. Helicopters are made of metal,
303
+ # plastic, and other materials that are not meant to be consumed by humans.
304
+ # Eating a helicopter would be extremely dangerous and would likely cause
305
+ # serious health problems, including choking, suffocation, and poisoning. It is
306
+ # important to only eat food that is safe and intended for human consumption.
307
+ ```
308
+
309
+ ## Bias, Risks, and Limitations
310
+
311
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
312
+
313
+ Zephyr 7B Gemma has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model (`google/gemma-7b`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [StarCoder2 model card](https://huggingface.co/bigcode/starcoder2-15b) for an example of this.
314
+
315
+
316
+ ## Training and evaluation data
317
+
318
+
319
+ This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-gemma-sft-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-sft-v0.1) on the argilla/dpo-mix-7k dataset.
320
+
321
+ It achieves the following results on the evaluation set:
322
+ - Loss: 0.4695
323
+ - Rewards/chosen: -3.3746
324
+ - Rewards/rejected: -4.9715
325
+ - Rewards/accuracies: 0.7188
326
+ - Rewards/margins: 1.5970
327
+ - Logps/rejected: -459.4853
328
+ - Logps/chosen: -429.9115
329
+ - Logits/rejected: 86.4684
330
+ - Logits/chosen: 92.8200
331
+
332
+ ### Training hyperparameters
333
+
334
+ The following hyperparameters were used during training:
335
+ - learning_rate: 5e-07
336
+ - train_batch_size: 2
337
+ - eval_batch_size: 4
338
+ - seed: 42
339
+ - distributed_type: multi-GPU
340
+ - num_devices: 8
341
+ - gradient_accumulation_steps: 8
342
+ - total_train_batch_size: 128
343
+ - total_eval_batch_size: 32
344
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
345
+ - lr_scheduler_type: cosine
346
+ - lr_scheduler_warmup_ratio: 0.1
347
+ - num_epochs: 2
348
+
349
+ ### Training results
350
+
351
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
352
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
353
+ | 0.1923 | 1.9 | 100 | 0.4736 | -3.4575 | -4.9556 | 0.75 | 1.4980 | -459.1662 | -431.5707 | 86.3863 | 92.7360 |
354
+
355
+
356
+ ### Framework versions
357
+
358
+ - Transformers 4.39.0.dev0
359
+ - Pytorch 2.1.2+cu121
360
+ - Datasets 2.14.6
361
+ - Tokenizers 0.15.1
362
+
363
+ ## Citation Information
364
+
365
+ If you find this model useful in your work, please consider citing the Zephyr technical report:
366
+
367
+ ```
368
+ @misc{tunstall2023zephyr,
369
+ title={Zephyr: Direct Distillation of LM Alignment},
370
+ author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf},
371
+ year={2023},
372
+ eprint={2310.16944},
373
+ archivePrefix={arXiv},
374
+ primaryClass={cs.LG}
375
+ }
376
+ ```
377
+
378
+ You may also wish to cite the creators of this model as well:
379
+
380
+ ```
381
+ @misc{zephyr_7b_gemma,
382
+ author = {Lewis Tunstall and Philipp Schmid},
383
+ title = {Zephyr 7B Gemma},
384
+ year = {2024},
385
+ publisher = {Hugging Face},
386
+ journal = {Hugging Face repository},
387
+ howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1}}
388
+ }
389
+ ```
390
+
391
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
392
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-gemma-v0.1)
393
+
394
+ | Metric |Value|
395
+ |---------------------------------|----:|
396
+ |Avg. |62.41|
397
+ |AI2 Reasoning Challenge (25-Shot)|58.45|
398
+ |HellaSwag (10-Shot) |83.48|
399
+ |MMLU (5-Shot) |60.68|
400
+ |TruthfulQA (0-shot) |52.07|
401
+ |Winogrande (5-shot) |74.19|
402
+ |GSM8k (5-shot) |45.56|
403
+
404
+
405
+