RichardErkhov commited on
Commit
d59408a
·
verified ·
1 Parent(s): 337037e

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ zephyr-7b-gemma-sft-v0.1 - bnb 4bits
11
+ - Model creator: https://huggingface.co/HuggingFaceH4/
12
+ - Original model: https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-sft-v0.1/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: other
20
+ license_name: gemma-terms-of-use
21
+ license_link: https://ai.google.dev/gemma/terms
22
+ base_model: google/gemma-7b
23
+ tags:
24
+ - alignment-handbook
25
+ - trl
26
+ - sft
27
+ - generated_from_trainer
28
+ - trl
29
+ - sft
30
+ - generated_from_trainer
31
+ datasets:
32
+ - HuggingFaceH4/deita-10k-v0-sft
33
+ model-index:
34
+ - name: zephyr-7b-gemma-sft
35
+ results: []
36
+ language:
37
+ - en
38
+ ---
39
+
40
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
41
+ should probably proofread and complete it, then remove this comment. -->
42
+
43
+ # zephyr-7b-gemma-sft
44
+
45
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the HuggingFaceH4/deita-10k-v0-sft dataset.
46
+ It achieves the following results on the evaluation set:
47
+ - Loss: 0.9732
48
+
49
+ ## Model description
50
+
51
+ More information needed
52
+
53
+ ## Intended uses & limitations
54
+
55
+ More information needed
56
+
57
+ ## Training and evaluation data
58
+
59
+ More information needed
60
+
61
+ ## Training procedure
62
+
63
+ ### Training hyperparameters
64
+
65
+ The following hyperparameters were used during training:
66
+ - learning_rate: 2e-05
67
+ - train_batch_size: 4
68
+ - eval_batch_size: 4
69
+ - seed: 42
70
+ - distributed_type: multi-GPU
71
+ - num_devices: 16
72
+ - gradient_accumulation_steps: 2
73
+ - total_train_batch_size: 128
74
+ - total_eval_batch_size: 64
75
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
76
+ - lr_scheduler_type: cosine
77
+ - lr_scheduler_warmup_ratio: 0.1
78
+ - num_epochs: 3
79
+
80
+ ### Training results
81
+
82
+ | Training Loss | Epoch | Step | Validation Loss |
83
+ |:-------------:|:-----:|:----:|:---------------:|
84
+ | 0.9482 | 1.0 | 299 | 0.9848 |
85
+ | 0.8139 | 2.0 | 599 | 0.9610 |
86
+ | 0.722 | 2.99 | 897 | 0.9732 |
87
+
88
+
89
+ ### Framework versions
90
+
91
+ - Transformers 4.39.0.dev0
92
+ - Pytorch 2.1.2+cu121
93
+ - Datasets 2.14.6
94
+ - Tokenizers 0.15.1
95
+