Felladrin commited on
Commit
010d90c
1 Parent(s): 8c0ff27

Update model

Browse files
Files changed (2) hide show
  1. README.md +32 -21
  2. model.safetensors +1 -1
README.md CHANGED
@@ -5,6 +5,18 @@ license: apache-2.0
5
  datasets:
6
  - HuggingFaceH4/ultrachat_200k
7
  - Felladrin/ChatML-ultrachat_200k
 
 
 
 
 
 
 
 
 
 
 
 
8
  base_model: Locutusque/TinyMistral-248M
9
  pipeline_tag: text-generation
10
  widget:
@@ -45,20 +57,22 @@ widget:
45
  inference:
46
  parameters:
47
  max_new_tokens: 250
48
- penalty_alpha: 0.45
49
- top_k: 4
50
- repetition_penalty: 1.03
51
- guidance_scale: 1.3
52
  ---
53
 
54
- # Locutusque's TinyMistral-248M trained on UltraChat dataset
55
 
56
  - Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
57
- - Dataset: [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-ultrachat_200k)] [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
58
- - License: [Apache License 2.0](https://huggingface.co/Felladrin/TinyMistral-248M-Chat-v1/resolve/main/license.txt)
59
- - Availability in other ML formats:
60
- - ONNX: [Felladrin/onnx-TinyMistral-248M-Chat-v1](https://huggingface.co/Felladrin/onnx-TinyMistral-248M-Chat-v1)
61
- - GGUF: [Felladrin/gguf-TinyMistral-248M-Chat-v1](https://huggingface.co/Felladrin/gguf-TinyMistral-248M-Chat-v1)
 
 
 
 
62
 
63
  ## Recommended Prompt Format
64
 
@@ -73,10 +87,8 @@ inference:
73
  ## Recommended Inference Parameters
74
 
75
  ```yml
76
- penalty_alpha: 0.45
77
- top_k: 4
78
- repetition_penalty: 1.03
79
- guidance_scale: 1.3
80
  ```
81
 
82
  ## Usage Example
@@ -84,7 +96,7 @@ guidance_scale: 1.3
84
  ```python
85
  from transformers import pipeline
86
 
87
- generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-v1")
88
 
89
  messages = [
90
  {
@@ -110,10 +122,8 @@ prompt = generate.tokenizer.apply_chat_template(messages, tokenize=False, add_ge
110
  output = generate(
111
  prompt,
112
  max_new_tokens=256,
113
- penalty_alpha=0.45,
114
- top_k=4,
115
- repetition_penalty=1.03,
116
- guidance_scale=1.3,
117
  )
118
 
119
  print(output[0]["generated_text"])
@@ -126,10 +136,11 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
126
  | Hyperparameter | Value |
127
  | :--------------------- | :-------------------------------------------- |
128
  | Learning rate | 2e-5 |
129
- | Total train batch size | 16 |
130
  | Max. sequence length | 2048 |
131
- | Weight decay | 0 |
132
  | Warmup ratio | 0.1 |
 
133
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
134
  | Scheduler | cosine |
135
  | Seed | 42 |
 
5
  datasets:
6
  - HuggingFaceH4/ultrachat_200k
7
  - Felladrin/ChatML-ultrachat_200k
8
+ - Open-Orca/OpenOrca
9
+ - Felladrin/ChatML-OpenOrca
10
+ - hkust-nlp/deita-10k-v0
11
+ - Felladrin/ChatML-deita-10k-v0
12
+ - LDJnr/Capybara
13
+ - Felladrin/ChatML-Capybara
14
+ - databricks/databricks-dolly-15k
15
+ - Felladrin/ChatML-databricks-dolly-15k
16
+ - euclaise/reddit-instruct-curated
17
+ - Felladrin/ChatML-reddit-instruct-curated
18
+ - CohereForAI/aya_dataset
19
+ - Felladrin/ChatML-aya_dataset
20
  base_model: Locutusque/TinyMistral-248M
21
  pipeline_tag: text-generation
22
  widget:
 
57
  inference:
58
  parameters:
59
  max_new_tokens: 250
60
+ penalty_alpha: 0.5
61
+ top_k: 5
 
 
62
  ---
63
 
64
+ # Locutusque's TinyMistral-248M trained on chat datasets
65
 
66
  - Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
67
+ - Datasets:
68
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-ultrachat_200k)] [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
69
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-OpenOrca)] [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)
70
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-deita-10k-v0)] [hkust-nlp/deita-10k-v0](https://huggingface.co/datasets/hkust-nlp/deita-10k-v0)
71
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-Capybara)] [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
72
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k)] [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
73
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-reddit-instruct-curated)] [euclaise/reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated)
74
+ - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-aya_dataset)] [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
75
+ - License: [Apache License 2.0](https://huggingface.co/Felladrin/TinyMistral-248M-Chat-v2/resolve/main/license.txt)
76
 
77
  ## Recommended Prompt Format
78
 
 
87
  ## Recommended Inference Parameters
88
 
89
  ```yml
90
+ penalty_alpha: 0.5
91
+ top_k: 5
 
 
92
  ```
93
 
94
  ## Usage Example
 
96
  ```python
97
  from transformers import pipeline
98
 
99
+ generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-v2")
100
 
101
  messages = [
102
  {
 
122
  output = generate(
123
  prompt,
124
  max_new_tokens=256,
125
+ penalty_alpha=0.5,
126
+ top_k=5,
 
 
127
  )
128
 
129
  print(output[0]["generated_text"])
 
136
  | Hyperparameter | Value |
137
  | :--------------------- | :-------------------------------------------- |
138
  | Learning rate | 2e-5 |
139
+ | Total train batch size | 32 |
140
  | Max. sequence length | 2048 |
141
+ | Weight decay | 0.01 |
142
  | Warmup ratio | 0.1 |
143
+ | NEFTune Noise Alpha | 5 |
144
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
145
  | Scheduler | cosine |
146
  | Seed | 42 |
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6dcc360a5b94a63142091dba60d272644ea30b0af2a0f0b0192691b1d2f4417a
3
  size 992108712
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52178bd78ce2e9eaff3fba98236b261d0c97c5423b6eb1dee8d6d3abe1a37850
3
  size 992108712