altomek commited on
Commit
b3152c1
·
verified ·
1 Parent(s): 9d5f392

quants upload

Browse files

Bielik-11B-v2.2-Instruct 8bpw quant

README.md ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: speakleash/Bielik-11B-v2
4
+ language:
5
+ - pl
6
+ library_name: transformers
7
+ tags:
8
+ - finetuned
9
+ inference:
10
+ parameters:
11
+ temperature: 0.2
12
+ widget:
13
+ - messages:
14
+ - role: user
15
+ content: Co przedstawia polskie godło?
16
+ extra_gated_description: If you want to learn more about how you can use the model, please refer to our <a href="https://bielik.ai/terms/">Terms of Use</a>.
17
+ ---
18
+
19
+ <p align="center">
20
+ <img src="https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct/raw/main/speakleash_cyfronet.png">
21
+ </p>
22
+
23
+ # Bielik-11B-v2.2-Instruct
24
+
25
+ Bielik-11B-v2.2-Instruct is a generative text model featuring 11 billion parameters.
26
+ It is an instruct fine-tuned version of the [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2).
27
+ Forementioned model stands as a testament to the unique collaboration between the open-science/open-souce project SpeakLeash and the High Performance Computing (HPC) center: ACK Cyfronet AGH.
28
+ Developed and trained on Polish text corpora, which has been cherry-picked and processed by the SpeakLeash team, this endeavor leverages Polish large-scale computing infrastructure,
29
+ specifically within the PLGrid environment, and more precisely, the HPC centers: ACK Cyfronet AGH.
30
+ The creation and training of the Bielik-11B-v2.2-Instruct was propelled by the support of computational grant number PLG/2024/016951, conducted on the Athena and Helios supercomputer,
31
+ enabling the use of cutting-edge technology and computational resources essential for large-scale machine learning processes.
32
+ As a result, the model exhibits an exceptional ability to understand and process the Polish language, providing accurate responses and performing a variety of linguistic tasks with high precision.
33
+
34
+ 🎥 Demo: https://chat.bielik.ai
35
+
36
+ 🗣️ Chat Arena<span style="color:red;">*</span>: https://arena.speakleash.org.pl/
37
+
38
+ <span style="color:red;">*</span>Chat Arena is a platform for testing and comparing different AI language models, allowing users to evaluate their performance and quality.
39
+
40
+ ## Model
41
+
42
+ The [SpeakLeash](https://speakleash.org/) team is working on their own set of instructions in Polish, which is continuously being expanded and refined by annotators. A portion of these instructions, which had been manually verified and corrected, has been utilized for training purposes. Moreover, due to the limited availability of high-quality instructions in Polish, synthetic instructions were generated with [Mixtral 8x22B](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) and used in training. The dataset used for training comprised over 20 million instructions, consisting of more than 10 billion tokens. The instructions varied in quality, leading to a deterioration in the model’s performance. To counteract this while still allowing ourselves to utilize the aforementioned datasets, several improvements were introduced:
43
+ * Weighted tokens level loss - a strategy inspired by [offline reinforcement learning](https://arxiv.org/abs/2005.01643) and [C-RLFT](https://arxiv.org/abs/2309.11235)
44
+ * Adaptive learning rate inspired by the study on [Learning Rates as a Function of Batch Size](https://arxiv.org/abs/2006.09092)
45
+ * Masked prompt tokens
46
+
47
+ To align the model with user preferences we tested many different techniques: DPO, PPO, KTO, SiMPO. Finally the [DPO-Positive](https://arxiv.org/abs/2402.13228) method was employed, utilizing both generated and manually corrected examples, which were scored by a metamodel. A dataset comprising over 66,000 examples of varying lengths to address different aspects of response style. It was filtered and evaluated by the reward model to select instructions with the right level of difference between chosen and rejected. The novelty introduced in DPO-P was multi-turn conversations introduction.
48
+
49
+ Bielik-11B-v2.2-Instruct has been trained with the use of an original open source framework called [ALLaMo](https://github.com/chrisociepa/allamo) implemented by [Krzysztof Ociepa](https://www.linkedin.com/in/krzysztof-ociepa-44886550/). This framework allows users to train language models with architecture similar to LLaMA and Mistral in fast and efficient way.
50
+
51
+
52
+ ### Model description:
53
+
54
+ * **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/)
55
+ * **Language:** Polish
56
+ * **Model type:** causal decoder-only
57
+ * **Finetuned from:** [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2)
58
+ * **License:** Apache 2.0 and [Terms of Use](https://bielik.ai/terms/)
59
+ * **Model ref:** speakleash:0deb975c3780df3a3ae98b619185faa1
60
+
61
+
62
+ ### Quantized models:
63
+ We know that some people want to explore smaller models or don't have the resources to run a full model. Therefore, we have prepared quantized versions of the Bielik-11B-v2.2-Instruct model in separate repositories:
64
+ - [GGUF - Q4_K_M, Q5_K_M, Q6_K, Q8_0](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-GGUF)
65
+ - [GPTQ - 4bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-GPTQ)
66
+ - HQQ - [4bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-HQQ-4bit-128gs), [8bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-HQQ-8bit-128gs)
67
+ - [AWQ - 4bit GEMM](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-AWQ)
68
+ - EXL2 - [4.5bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-EXL2-4.5bit), [6.5bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-EXL2-6.5bit)
69
+ - MLX - [4bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-MLX-4bit), [8bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-MLX-8bit)
70
+ - Quanto - [4bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-Quanto-4bit), [8bit](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-Quanto-8bit)
71
+ - [FP8](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-FP8) (vLLM, SGLang - Ada Lovelace, Hopper optimized)
72
+ - [INT8 W8A8](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-W8A8) (vLLM INT8 quantization Weights=8bits and Activations=8bits)
73
+ - [GGUF - experimental - IQ imatrix IQ2_XXS, IQ3_XXS, IQ4_XS and calibrated Q4_K_M, Q5_K_M, Q6_K, Q8_0](https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct-GGUF-IQ-Imatrix)
74
+
75
+ Please note that quantized models may offer lower quality of generated answers compared to full sized variatns.
76
+
77
+
78
+ ### Chat template
79
+
80
+ Bielik-11B-v2.2-Instruct uses [ChatML](https://github.com/cognitivecomputations/OpenChatML) as the prompt format.
81
+
82
+ E.g.
83
+ ```
84
+ prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
85
+ completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"
86
+ ```
87
+
88
+ This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
89
+
90
+ ```python
91
+ import torch
92
+ from transformers import AutoModelForCausalLM, AutoTokenizer
93
+
94
+ device = "cuda" # the device to load the model onto
95
+
96
+ model_name = "speakleash/Bielik-11B-v2.2-Instruct"
97
+
98
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
99
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
100
+
101
+ messages = [
102
+ {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
103
+ {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
104
+ {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
105
+ {"role": "user", "content": "Która jest najcieplejsza?"}
106
+ ]
107
+
108
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
109
+
110
+ model_inputs = input_ids.to(device)
111
+ model.to(device)
112
+
113
+ generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
114
+ decoded = tokenizer.batch_decode(generated_ids)
115
+ print(decoded[0])
116
+ ```
117
+
118
+ Fully formated input conversation by apply_chat_template from previous example:
119
+
120
+ ```
121
+ <s><|im_start|> system
122
+ Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|>
123
+ <|im_start|> user
124
+ Jakie mamy pory roku w Polsce?<|im_end|>
125
+ <|im_start|> assistant
126
+ W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|>
127
+ <|im_start|> user
128
+ Która jest najcieplejsza?<|im_end|>
129
+ ```
130
+
131
+
132
+ ## Evaluation
133
+
134
+ Bielik-11B-v2.2-Instruct has been evaluated on several benchmarks to assess its performance across various tasks and languages. These benchmarks include:
135
+
136
+ 1. Open PL LLM Leaderboard
137
+ 2. Open LLM Leaderboard
138
+ 3. Polish MT-Bench
139
+ 4. Polish EQ-Bench (Emotional Intelligence Benchmark)
140
+ 5. MixEval
141
+
142
+ The following sections provide detailed results for each of these benchmarks, demonstrating the model's capabilities in both Polish and English language tasks.
143
+
144
+ ### Open PL LLM Leaderboard
145
+
146
+ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard) 5-shot. The benchmark evaluates models in NLP tasks like sentiment analysis, categorization, text classification but does not test chatting skills. Average column is an average score among all tasks normalized by baseline scores.
147
+
148
+
149
+ | Model | Parameters (B)| Average |
150
+ |---------------------------------|------------|---------|
151
+ | Meta-Llama-3.1-405B-Instruct-FP8,API | 405 | 69.44 |
152
+ | Mistral-Large-Instruct-2407 | 123 | 69.11 |
153
+ | Qwen2-72B-Instruct | 72 | 65.87 |
154
+ | **Bielik-11B-v2.2-Instruct** | **11** | **65.57** |
155
+ | Meta-Llama-3.1-70B-Instruct | 70 | 65.49 |
156
+ | Bielik-11B-v2.1-Instruct | 11 | 65.45 |
157
+ | Mixtral-8x22B-Instruct-v0.1 | 141 | 65.23 |
158
+ | Bielik-11B-v2.0-Instruct | 11 | 64.98 |
159
+ | Meta-Llama-3-70B-Instruct | 70 | 64.45 |
160
+ | Athene-70B | 70 | 63.65 |
161
+ | WizardLM-2-8x22B | 141 | 62.35 |
162
+ | Qwen1.5-72B-Chat | 72 | 58.67 |
163
+ | Qwen2-57B-A14B-Instruct | 57 | 56.89 |
164
+ | glm-4-9b-chat | 9 | 56.61 |
165
+ | aya-23-35B | 35 | 56.37 |
166
+ | Phi-3.5-MoE-instruct | 41.9 | 56.34 |
167
+ | openchat-3.5-0106-gemma | 7 | 55.69 |
168
+ | Mistral-Nemo-Instruct-2407 | 12 | 55.27 |
169
+ | SOLAR-10.7B-Instruct-v1.0 | 10.7 | 55.24 |
170
+ | Mixtral-8x7B-Instruct-v0.1 | 46.7 | 55.07 |
171
+ | Bielik-7B-Instruct-v0.1 | 7 | 44.70 |
172
+ | trurl-2-13b-academic | 13 | 36.28 |
173
+ | trurl-2-7b | 7 | 26.93 |
174
+
175
+ The results from the Open PL LLM Leaderboard demonstrate the exceptional performance of Bielik-11B-v2.2-Instruct:
176
+
177
+ 1. Superior performance in its class: Bielik-11B-v2.2-Instruct outperforms all other models with less than 70B parameters. This is a significant achievement, showcasing its efficiency and effectiveness despite having fewer parameters than many competitors.
178
+
179
+ 2. Competitive with larger models: with a score of 65.57, Bielik-11B-v2.2-Instruct performs on par with models in the 70B parameter range. This indicates that it achieves comparable results to much larger models, demonstrating its advanced architecture and training methodology.
180
+
181
+ 3. Substantial improvement over previous version: the model shows a marked improvement over its predecessor, Bielik-7B-Instruct-v0.1, which scored 43.64. This leap in performance highlights the successful enhancements and optimizations implemented in this newer version.
182
+
183
+ 4. Leading position for Polish language models: in the context of Polish language models, Bielik-11B-v2.2-Instruct stands out as a leader. There are no other competitive models specifically tailored for the Polish language that match its performance, making it a crucial resource for Polish NLP tasks.
184
+
185
+ These results underscore Bielik-11B-v2.2-Instruct's position as a state-of-the-art model for Polish language processing, offering high performance with relatively modest computational requirements.
186
+
187
+ #### Open PL LLM Leaderboard - Generative Tasks Performance
188
+
189
+ This section presents a focused comparison of generative Polish language task performance between Bielik models and GPT-3.5. The evaluation is limited to generative tasks due to the constraints of assessing OpenAI models. The comprehensive nature and associated costs of the benchmark explain the limited number of models evaluated.
190
+
191
+ | Model | Parameters (B) | Average g |
192
+ |-------------------------------|----------------|---------------|
193
+ | Bielik-11B-v2.1-Instruct | 11 | 66.58 |
194
+ | **Bielik-11B-v2.2-Instruct** | 11 | **66.11** |
195
+ | Bielik-11B-v2.0-Instruct | 11 | 65.58 |
196
+ | gpt-3.5-turbo-instruct | Unknown | 55.65 |
197
+
198
+ The performance variation among Bielik versions is minimal, indicating consistent quality across iterations. Bielik-11B-v2.2-Instruct demonstrates an impressive 18.8% performance advantage over GPT-3.5.
199
+
200
+
201
+ ### Open LLM Leaderboard
202
+
203
+ The Open LLM Leaderboard evaluates models on various English language tasks, providing insights into the model's performance across different linguistic challenges.
204
+
205
+ | Model | AVG | arc_challenge | hellaswag | truthfulqa_mc2 | mmlu | winogrande | gsm8k |
206
+ |--------------------------|-------|---------------|-----------|----------------|-------|------------|-------|
207
+ | **Bielik-11B-v2.2-Instruct** | **69.86** | 59.90 | 80.16 | 58.34 | 64.34 | 75.30 | 81.12 |
208
+ | Bielik-11B-v2.1-Instruct | 69.82 | 59.56 | 80.20 | 59.35 | 64.18 | 75.06 | 80.59 |
209
+ | Bielik-11B-v2.0-Instruct | 68.04 | 58.62 | 78.65 | 54.65 | 63.71 | 76.32 | 76.27 |
210
+ | Bielik-11B-v2 | 65.87 | 60.58 | 79.84 | 46.13 | 63.06 | 77.82 | 67.78 |
211
+ | Mistral-7B-Instruct-v0.2 | 65.71 | 63.14 | 84.88 | 68.26 | 60.78 | 77.19 | 40.03 |
212
+ | Bielik-7B-Instruct-v0.1 | 51.26 | 47.53 | 68.91 | 49.47 | 46.18 | 65.51 | 29.95 |
213
+
214
+
215
+
216
+ Bielik-11B-v2.2-Instruct shows impressive performance on English language tasks:
217
+
218
+ 1. Significant improvement over its base model (4-point increase).
219
+ 2. Substantial 18-point improvement over Bielik-7B-Instruct-v0.1.
220
+
221
+ These results demonstrate Bielik-11B-v2.2-Instruct's versatility in both Polish and English, highlighting the effectiveness of its instruction tuning process.
222
+
223
+ ### Polish MT-Bench
224
+ The Bielik-11B-v2.2-Instruct (16 bit) model was also evaluated using the MT-Bench benchmark. The quality of the model was evaluated using the English version (original version without modifications) and the Polish version created by Speakleash (tasks and evaluation in Polish, the content of the tasks was also changed to take into account the context of the Polish language).
225
+
226
+ #### MT-Bench English
227
+ | Model | Score |
228
+ |-----------------|----------|
229
+ | Bielik-11B-v2.1 | 8.537500 |
230
+ | **Bielik-11B-v2.2** | **8.390625** |
231
+ | Bielik-11B-v2.0 | 8.159375 |
232
+
233
+ #### MT-Bench Polish
234
+ | Model | Parameters (B) | Score |
235
+ |-------------------------------------|----------------|----------|
236
+ | Qwen2-72B-Instruct | 72 | 8.775000 |
237
+ | Mistral-Large-Instruct-2407 (123B) | 123 | 8.662500 |
238
+ | gemma-2-27b-it | 27 | 8.618750 |
239
+ | Mixtral-8x22b | 141 | 8.231250 |
240
+ | Meta-Llama-3.1-405B-Instruct | 405 | 8.168750 |
241
+ | Meta-Llama-3.1-70B-Instruct | 70 | 8.150000 |
242
+ | **Bielik-11B-v2.2-Instruct** | **11** | **8.115625** |
243
+ | Bielik-11B-v2.1-Instruct | 11 | 7.996875 |
244
+ | gpt-3.5-turbo | Unknown | 7.868750 |
245
+ | Mixtral-8x7b | 46.7 | 7.637500 |
246
+ | Bielik-11B-v2.0-Instruct | 11 | 7.562500 |
247
+ | Mistral-Nemo-Instruct-2407 | 12 | 7.368750 |
248
+ | openchat-3.5-0106-gemma | 7 | 6.812500 |
249
+ | Mistral-7B-Instruct-v0.2 | 7 | 6.556250 |
250
+ | Meta-Llama-3.1-8B-Instruct | 8 | 6.556250 |
251
+ | Bielik-7B-Instruct-v0.1 | 7 | 6.081250 |
252
+ | Mistral-7B-Instruct-v0.3 | 7 | 5.818750 |
253
+ | Polka-Mistral-7B-SFT | 7 | 4.518750 |
254
+ | trurl-2-7b | 7 | 2.762500 |
255
+
256
+ Key observations on Bielik-11B-v2.2 performance:
257
+
258
+ 1. Strong performance among mid-sized models: Bielik-11B-v2.2-Instruct scored **8.115625**, placing it ahead of several well-known models like GPT-3.5-turbo (7.868750) and Mixtral-8x7b (7.637500). This indicates that Bielik-11B-v2.2-Instruct is competitive among mid-sized models, particularly those in the 11B-70B parameter range.
259
+
260
+ 2. Competitive against larger models: Bielik-11B-v2.2-Instruct performs close to Meta-Llama-3.1-70B-Instruct (8.150000), Meta-Llama-3.1-405B-Instruct (8.168750) and even Mixtral-8x22b (8.231250), which have significantly more parameters. This efficiency in performance relative to size could make it an attractive option for tasks where resource constraints are a consideration. Bielik 100% generated answers in Polish, while other models (not typically trained for Polish) can answer Polish questions in English.
261
+
262
+ 3. Significant improvement over previous versions: compared to its predecessor, **Bielik-7B-Instruct-v0.1**, which scored **6.081250**, the Bielik-11B-v2.2-Instruct shows a significant improvement. The score increased by more than **2 points**, highlighting substantial advancements in model quality, optimization and training methodology.
263
+
264
+ For more information - answers to test tasks and values in each category, visit the [MT-Bench PL](https://huggingface.co/spaces/speakleash/mt-bench-pl) website.
265
+
266
+ ### Polish EQ-Bench
267
+
268
+ [Polish Emotional Intelligence Benchmark for LLMs](https://huggingface.co/spaces/speakleash/polish_eq-bench)
269
+
270
+ | Model | Parameters (B) | Score |
271
+ |-------------------------------|--------|-------|
272
+ | Mistral-Large-Instruct-2407 | 123 | 78.07 |
273
+ | Meta-Llama-3.1-405B-Instruct-FP8 | 405 | 77.23 |
274
+ | gpt-4o-2024-08-06 | ? | 75.15 |
275
+ | gpt-4-turbo-2024-04-09 | ? | 74.59 |
276
+ | Meta-Llama-3.1-70B-Instruct | 70 | 72.53 |
277
+ | Qwen2-72B-Instruct | 72 | 71.23 |
278
+ | Meta-Llama-3-70B-Instruct | 70 | 71.21 |
279
+ | gpt-4o-mini-2024-07-18 | ? | 71.15 |
280
+ | WizardLM-2-8x22B | 141 | 69.56 |
281
+ | **Bielik-11B-v2.2-Instruct** | **11** | **69.05** |
282
+ | Bielik-11B-v2.0-Instruct | 11 | 68.24 |
283
+ | Qwen1.5-72B-Chat | 72 | 68.03 |
284
+ | Mixtral-8x22B-Instruct-v0.1 | 141 | 67.63 |
285
+ | Bielik-11B-v2.1-Instruct | 11 | 60.07 |
286
+ | Qwen1.5-32B-Chat | 32 | 59.63 |
287
+ | openchat-3.5-0106-gemma | 7 | 59.58 |
288
+ | aya-23-35B | 35 | 58.41 |
289
+ | gpt-3.5-turbo | ? | 57.7 |
290
+ | Qwen2-57B-A14B-Instruct | 57 | 57.64 |
291
+ | Mixtral-8x7B-Instruct-v0.1 | 47 | 57.61 |
292
+ | SOLAR-10.7B-Instruct-v1.0 | 10.7 | 55.21 |
293
+ | Mistral-7B-Instruct-v0.2 | 7 | 47.02 |
294
+
295
+
296
+ The results show that Bielik-11B-v2.2-Instruct is the best performing model among those with less than 70B parameters. With a score of 69.05, it outperforms larger models like Qwen1.5-72B-Chat and Mixtral-8x22B-Instruct-v0.1, demonstrating its exceptional efficiency and effectiveness despite its smaller parameter count.
297
+
298
+
299
+ ### MixEval
300
+
301
+ MixEval is a ground-truth-based English benchmark designed to evaluate Large Language Models (LLMs) efficiently and effectively. Key features of MixEval include:
302
+
303
+ 1. Derived from off-the-shelf benchmark mixtures
304
+ 2. Highly capable model ranking with a 0.96 correlation to Chatbot Arena
305
+ 3. Local and quick execution, requiring only 6% of the time and cost compared to running MMLU
306
+
307
+ This benchmark provides a robust and time-efficient method for assessing LLM performance, making it a valuable tool for ongoing model evaluation and comparison.
308
+
309
+ | Model | MixEval | MixEval-Hard |
310
+ |-------------------------------|---------|--------------|
311
+ | Bielik-11B-v2.1-Instruct | 74.55 | 45.00 |
312
+ | **Bielik-11B-v2.2-Instruct** | **72.35** | **39.65** |
313
+ | Bielik-11B-v2.0-Instruct | 72.10 | 40.20 |
314
+ | Mistral-7B-Instruct-v0.2 | 70.00 | 36.20 |
315
+
316
+ The results show that Bielik-11B-v2.2-Instruct performs well on the MixEval benchmark, achieving a score of 72.35 on the standard MixEval and 39.65 on MixEval-Hard. Notably, Bielik-11B-v2.2-Instruct significantly outperforms Mistral-7B-Instruct-v0.2 on both metrics, demonstrating its improved capabilities despite being based on a similar architecture.
317
+
318
+
319
+ ### Chat Arena PL
320
+
321
+ Chat Arena PL is a human-evaluated benchmark that provides a direct comparison of model performance through head-to-head battles. Unlike the automated benchmarks mentioned above, this evaluation relies on human judgment to assess the quality and effectiveness of model responses. The results offer valuable insights into how different models perform in real-world, conversational scenarios as perceived by human evaluators.
322
+
323
+ Results accessed on 2024-08-26.
324
+
325
+ | # | Model | Battles | Won | Lost | Draws | Win % | ELO |
326
+ |---|-------|-------|---------|-----------|--------|-------------|-----|
327
+ | 1 | **Bielik-11B-v2.2-Instruct** | 92 | 72 | 14 | 6 | **83.72%** | 1234 |
328
+ | 2 | Bielik-11B-v2.1-Instruct | 240 | 171 | 50 | 19 | 77.38% | 1174 |
329
+ | 3 | gpt-4o-mini | 639 | 402 | 117 | 120 | 77.46% | 1141 |
330
+ | 4 | Mistral Large 2 (2024-07) | 324 | 188 | 69 | 67 | 73.15% | 1125 |
331
+ | 5 | Llama-3.1-405B | 548 | 297 | 144 | 107 | 67.35% | 1090 |
332
+ | 6 | Bielik-11B-v2.0-Instruct | 1289 | 695 | 352 | 242 | 66.38% | 1059 |
333
+ | 7 | Llama-3.1-70B | 498 | 221 | 187 | 90 | 54.17% | 1033 |
334
+ | 8 | Bielik-1-7B | 2041 | 1029 | 638 | 374 | 61.73% | 1020 |
335
+ | 9 | Mixtral-8x22B-v0.1 | 432 | 166 | 167 | 99 | 49.85% | 1018 |
336
+ | 10 | Qwen2-72B | 451 | 179 | 177 | 95 | 50.28% | 1011 |
337
+ | 11 | gpt-3.5-turbo | 2186 | 1007 | 731 | 448 | 57.94% | 1008 |
338
+ | 12 | Llama-3.1-8B | 440 | 155 | 227 | 58 | 40.58% | 975 |
339
+ | 13 | Mixtral-8x7B-v0.1 | 1997 | 794 | 804 | 399 | 49.69% | 973 |
340
+ | 14 | Llama-3-70b | 2008 | 733 | 909 | 366 | 44.64% | 956 |
341
+ | 15 | Mistral Nemo (2024-07) | 301 | 84 | 164 | 53 | 33.87% | 954 |
342
+ | 16 | Llama-3-8b | 1911 | 473 | 1091 | 347 | 30.24% | 909 |
343
+ | 17 | gemma-7b-it | 1928 | 418 | 1221 | 289 | 25.5% | 888 |
344
+
345
+ The results show that Bielik-11B-v2.2-Instruct outperforms all other models in this benchmark, achieving the highest win percentage (83.72%) and ELO score (1234). This impressive performance demonstrates its effectiveness in real-world conversational scenarios, as judged by human evaluators.
346
+
347
+ ## Limitations and Biases
348
+
349
+ Bielik-11B-v2.2-Instruct is a quick demonstration that the base model can be easily fine-tuned to achieve compelling and promising performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community in ways to make the model respect guardrails, allowing for deployment in environments requiring moderated outputs.
350
+
351
+ Bielik-11B-v2.2-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v2.2-Instruct was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
352
+
353
+ ## Citation
354
+ Please cite this model using the following format:
355
+
356
+ ```
357
+ @misc{Bielik11Bv2i,
358
+ title = {Bielik-11B-v2.2-Instruct model card},
359
+ author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof and {SpeakLeash Team} and {Cyfronet Team}},
360
+ year = {2024},
361
+ url = {https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct},
362
+ note = {Accessed: 2024-08-28}, % change this date
363
+ urldate = {2024-08-28} % change this date
364
+ }
365
+ @unpublished{Bielik11Bv2a,
366
+ author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof},
367
+ title = {Bielik: A Family of Large Language Models for the Polish Language - Development, Insights, and Evaluation},
368
+ year = {2024},
369
+ }
370
+ ```
371
+
372
+ ## Responsible for training the model
373
+
374
+ * [Krzysztof Ociepa](https://www.linkedin.com/in/krzysztof-ociepa-44886550/)<sup>SpeakLeash</sup> - team leadership, conceptualizing, data preparation, process optimization and oversight of training
375
+ * [Łukasz Flis](https://www.linkedin.com/in/lukasz-flis-0a39631/)<sup>Cyfronet AGH</sup> - coordinating and supervising the training
376
+ * [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/)<sup>SpeakLeash</sup> - conceptualizing and coordinating DPO training, data preparation
377
+ * [Adrian Gwoździej](https://www.linkedin.com/in/adrgwo/)<sup>SpeakLeash</sup> - data preparation and ensuring data quality
378
+ * [Krzysztof Wróbel](https://www.linkedin.com/in/wrobelkrzysztof/)<sup>SpeakLeash</sup> - benchmarks
379
+
380
+
381
+ The model could not have been created without the commitment and work of the entire SpeakLeash team, whose contribution is invaluable. Thanks to the hard work of many individuals, it was possible to gather a large amount of content in Polish and establish collaboration between the open-science SpeakLeash project and the HPC center: ACK Cyfronet AGH. Individuals who contributed to the creation of the model:
382
+ [Sebastian Kondracki](https://www.linkedin.com/in/sebastian-kondracki/),
383
+ [Igor Ciuciura](https://www.linkedin.com/in/igor-ciuciura-1763b52a6/),
384
+ [Paweł Kiszczak](https://www.linkedin.com/in/paveu-kiszczak/),
385
+ [Szymon Baczyński](https://www.linkedin.com/in/szymon-baczynski/),
386
+ [Jacek Chwiła](https://www.linkedin.com/in/jacek-chwila/),
387
+ [Maria Filipkowska](https://www.linkedin.com/in/maria-filipkowska/),
388
+ [Jan Maria Kowalski](https://www.linkedin.com/in/janmariakowalski/),
389
+ [Karol Jezierski](https://www.linkedin.com/in/karol-jezierski/),
390
+ [Kacper Milan](https://www.linkedin.com/in/kacper-milan/),
391
+ [Jan Sowa](https://www.linkedin.com/in/janpiotrsowa/),
392
+ [Len Krawczyk](https://www.linkedin.com/in/magdalena-krawczyk-7810942ab/),
393
+ [Marta Seidler](https://www.linkedin.com/in/marta-seidler-751102259/),
394
+ [Agnieszka Ratajska](https://www.linkedin.com/in/agnieszka-ratajska/),
395
+ [Krzysztof Koziarek](https://www.linkedin.com/in/krzysztofkoziarek/),
396
+ [Szymon Pepliński](http://linkedin.com/in/szymonpeplinski/),
397
+ [Zuzanna Dabić](https://www.linkedin.com/in/zuzanna-dabic/),
398
+ [Filip Bogacz](https://linkedin.com/in/Fibogacci),
399
+ [Agnieszka Kosiak](https://www.linkedin.com/in/agn-kosiak),
400
+ [Izabela Babis](https://www.linkedin.com/in/izabela-babis-2274b8105/),
401
+ [Nina Babis](https://www.linkedin.com/in/nina-babis-00055a140/).
402
+
403
+ Members of the ACK Cyfronet AGH team providing valuable support and expertise:
404
+ [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/),
405
+ [Marek Magryś](https://www.linkedin.com/in/magrys/),
406
+ [Mieszko Cholewa ](https://www.linkedin.com/in/mieszko-cholewa-613726301/).
407
+
408
+ ## Contact Us
409
+
410
+ If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our [Discord SpeakLeash](https://discord.gg/pv4brQMDTy).
README.org.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: speakleash/Bielik-11B-v2.2-Instruct
4
+ language:
5
+ - pl
6
+ library_name: transformers
7
+ tags:
8
+ - finetuned
9
+ inference: false
10
+ ---
11
+
12
+ # Bielik-11B-v2.2-Instruct
13
+
14
+ ExLlamav2 8 bpw quant of https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct
added_tokens.json ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|control_100|>": 32099,
3
+ "<|control_101|>": 32100,
4
+ "<|control_102|>": 32101,
5
+ "<|control_103|>": 32102,
6
+ "<|control_104|>": 32103,
7
+ "<|control_105|>": 32104,
8
+ "<|control_106|>": 32105,
9
+ "<|control_107|>": 32106,
10
+ "<|control_108|>": 32107,
11
+ "<|control_109|>": 32108,
12
+ "<|control_10|>": 32009,
13
+ "<|control_110|>": 32109,
14
+ "<|control_111|>": 32110,
15
+ "<|control_112|>": 32111,
16
+ "<|control_113|>": 32112,
17
+ "<|control_114|>": 32113,
18
+ "<|control_115|>": 32114,
19
+ "<|control_116|>": 32115,
20
+ "<|control_117|>": 32116,
21
+ "<|control_118|>": 32117,
22
+ "<|control_119|>": 32118,
23
+ "<|control_11|>": 32010,
24
+ "<|control_120|>": 32119,
25
+ "<|control_121|>": 32120,
26
+ "<|control_122|>": 32121,
27
+ "<|control_123|>": 32122,
28
+ "<|control_124|>": 32123,
29
+ "<|control_125|>": 32124,
30
+ "<|control_126|>": 32125,
31
+ "<|control_127|>": 32126,
32
+ "<|control_128|>": 32127,
33
+ "<|control_12|>": 32011,
34
+ "<|control_13|>": 32012,
35
+ "<|control_14|>": 32013,
36
+ "<|control_15|>": 32014,
37
+ "<|control_16|>": 32015,
38
+ "<|control_17|>": 32016,
39
+ "<|control_18|>": 32017,
40
+ "<|control_19|>": 32018,
41
+ "<|control_20|>": 32019,
42
+ "<|control_21|>": 32020,
43
+ "<|control_22|>": 32021,
44
+ "<|control_23|>": 32022,
45
+ "<|control_24|>": 32023,
46
+ "<|control_25|>": 32024,
47
+ "<|control_26|>": 32025,
48
+ "<|control_27|>": 32026,
49
+ "<|control_28|>": 32027,
50
+ "<|control_29|>": 32028,
51
+ "<|control_30|>": 32029,
52
+ "<|control_31|>": 32030,
53
+ "<|control_32|>": 32031,
54
+ "<|control_33|>": 32032,
55
+ "<|control_34|>": 32033,
56
+ "<|control_35|>": 32034,
57
+ "<|control_36|>": 32035,
58
+ "<|control_37|>": 32036,
59
+ "<|control_38|>": 32037,
60
+ "<|control_39|>": 32038,
61
+ "<|control_40|>": 32039,
62
+ "<|control_41|>": 32040,
63
+ "<|control_42|>": 32041,
64
+ "<|control_43|>": 32042,
65
+ "<|control_44|>": 32043,
66
+ "<|control_45|>": 32044,
67
+ "<|control_46|>": 32045,
68
+ "<|control_47|>": 32046,
69
+ "<|control_48|>": 32047,
70
+ "<|control_49|>": 32048,
71
+ "<|control_50|>": 32049,
72
+ "<|control_51|>": 32050,
73
+ "<|control_52|>": 32051,
74
+ "<|control_53|>": 32052,
75
+ "<|control_54|>": 32053,
76
+ "<|control_55|>": 32054,
77
+ "<|control_56|>": 32055,
78
+ "<|control_57|>": 32056,
79
+ "<|control_58|>": 32057,
80
+ "<|control_59|>": 32058,
81
+ "<|control_60|>": 32059,
82
+ "<|control_61|>": 32060,
83
+ "<|control_62|>": 32061,
84
+ "<|control_63|>": 32062,
85
+ "<|control_64|>": 32063,
86
+ "<|control_65|>": 32064,
87
+ "<|control_66|>": 32065,
88
+ "<|control_67|>": 32066,
89
+ "<|control_68|>": 32067,
90
+ "<|control_69|>": 32068,
91
+ "<|control_6|>": 32005,
92
+ "<|control_70|>": 32069,
93
+ "<|control_71|>": 32070,
94
+ "<|control_72|>": 32071,
95
+ "<|control_73|>": 32072,
96
+ "<|control_74|>": 32073,
97
+ "<|control_75|>": 32074,
98
+ "<|control_76|>": 32075,
99
+ "<|control_77|>": 32076,
100
+ "<|control_78|>": 32077,
101
+ "<|control_79|>": 32078,
102
+ "<|control_7|>": 32006,
103
+ "<|control_80|>": 32079,
104
+ "<|control_81|>": 32080,
105
+ "<|control_82|>": 32081,
106
+ "<|control_83|>": 32082,
107
+ "<|control_84|>": 32083,
108
+ "<|control_85|>": 32084,
109
+ "<|control_86|>": 32085,
110
+ "<|control_87|>": 32086,
111
+ "<|control_88|>": 32087,
112
+ "<|control_89|>": 32088,
113
+ "<|control_8|>": 32007,
114
+ "<|control_90|>": 32089,
115
+ "<|control_91|>": 32090,
116
+ "<|control_92|>": 32091,
117
+ "<|control_93|>": 32092,
118
+ "<|control_94|>": 32093,
119
+ "<|control_95|>": 32094,
120
+ "<|control_96|>": 32095,
121
+ "<|control_97|>": 32096,
122
+ "<|control_98|>": 32097,
123
+ "<|control_99|>": 32098,
124
+ "<|control_9|>": 32008,
125
+ "<|function_call|>": 32004,
126
+ "<|function_list|>": 32002,
127
+ "<|function_output|>": 32003,
128
+ "<|im_end|>": 32001,
129
+ "<|im_start|>": 32000
130
+ }
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 32001,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 14336,
12
+ "max_position_embeddings": 32768,
13
+ "model_type": "mistral",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 50,
16
+ "num_key_value_heads": 8,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_theta": 1000000,
19
+ "sliding_window": null,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.39.3",
23
+ "use_cache": true,
24
+ "vocab_size": 32128,
25
+ "quantization_config": {
26
+ "quant_method": "exl2",
27
+ "version": "0.1.8",
28
+ "bits": 8.0,
29
+ "head_bits": 8,
30
+ "calibration": {
31
+ "rows": 115,
32
+ "length": 2048,
33
+ "dataset": "(default)"
34
+ }
35
+ }
36
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 32001,
5
+ "pad_token": 2,
6
+ "unk_token": 0,
7
+ "transformers_version": "4.39.3"
8
+ }
huggingface-metadata.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ url: https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct
2
+ branch: main
3
+ download date: 2024-09-13 14:38:05
4
+ sha256sum:
5
+ bbe26ff97f87a26a4707ab73445ce19a693f9a32f472945f3301d7d009929ea9 model-00001-of-00005.safetensors
6
+ 2a189757128983755b1fe21616dbec69138d6de576bbb7585355c7f7994af169 model-00002-of-00005.safetensors
7
+ 29709bf596c0a0981262651cfa227aa77fc71022a31ab2222f6922f130c77e49 model-00003-of-00005.safetensors
8
+ db946e22c9720bd802dde87f7074bf68b28ba2e2368770bd18caed3b818d13a5 model-00004-of-00005.safetensors
9
+ 79e88fee3555322da7be628092a2c07ce5df88c5825878a4b86f5307af95b2c5 model-00005-of-00005.safetensors
10
+ dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055 tokenizer.model
model.safetensors.index.json ADDED
@@ -0,0 +1,460 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 22337593344
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00005-of-00005.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00005.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00005.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00005.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00005.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00005.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00005.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00005.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00005.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00005.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00005.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00005.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00005.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00005.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00005.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00005.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00005.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00005.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00003-of-00005.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00004-of-00005.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00004-of-00005.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00004-of-00005.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00004-of-00005.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00004-of-00005.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00004-of-00005.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00004-of-00005.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
323
+ "model.layers.40.input_layernorm.weight": "model-00004-of-00005.safetensors",
324
+ "model.layers.40.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
325
+ "model.layers.40.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
326
+ "model.layers.40.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
327
+ "model.layers.40.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
328
+ "model.layers.40.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
329
+ "model.layers.40.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
330
+ "model.layers.40.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
331
+ "model.layers.40.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
332
+ "model.layers.41.input_layernorm.weight": "model-00004-of-00005.safetensors",
333
+ "model.layers.41.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
334
+ "model.layers.41.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
335
+ "model.layers.41.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
336
+ "model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
337
+ "model.layers.41.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
338
+ "model.layers.41.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
339
+ "model.layers.41.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
340
+ "model.layers.41.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
341
+ "model.layers.42.input_layernorm.weight": "model-00004-of-00005.safetensors",
342
+ "model.layers.42.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
343
+ "model.layers.42.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
344
+ "model.layers.42.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
345
+ "model.layers.42.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
346
+ "model.layers.42.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
347
+ "model.layers.42.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
348
+ "model.layers.42.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
349
+ "model.layers.42.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
350
+ "model.layers.43.input_layernorm.weight": "model-00004-of-00005.safetensors",
351
+ "model.layers.43.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
352
+ "model.layers.43.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
353
+ "model.layers.43.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
354
+ "model.layers.43.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
355
+ "model.layers.43.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
356
+ "model.layers.43.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
357
+ "model.layers.43.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
358
+ "model.layers.43.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
359
+ "model.layers.44.input_layernorm.weight": "model-00005-of-00005.safetensors",
360
+ "model.layers.44.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
361
+ "model.layers.44.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
362
+ "model.layers.44.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
363
+ "model.layers.44.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
364
+ "model.layers.44.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
365
+ "model.layers.44.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
366
+ "model.layers.44.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
367
+ "model.layers.44.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
368
+ "model.layers.45.input_layernorm.weight": "model-00005-of-00005.safetensors",
369
+ "model.layers.45.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
370
+ "model.layers.45.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
371
+ "model.layers.45.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
372
+ "model.layers.45.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
373
+ "model.layers.45.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
374
+ "model.layers.45.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
375
+ "model.layers.45.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
376
+ "model.layers.45.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
377
+ "model.layers.46.input_layernorm.weight": "model-00005-of-00005.safetensors",
378
+ "model.layers.46.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
379
+ "model.layers.46.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
380
+ "model.layers.46.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
381
+ "model.layers.46.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
382
+ "model.layers.46.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
383
+ "model.layers.46.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
384
+ "model.layers.46.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
385
+ "model.layers.46.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
386
+ "model.layers.47.input_layernorm.weight": "model-00005-of-00005.safetensors",
387
+ "model.layers.47.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
388
+ "model.layers.47.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
389
+ "model.layers.47.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
390
+ "model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
391
+ "model.layers.47.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
392
+ "model.layers.47.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
393
+ "model.layers.47.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
394
+ "model.layers.47.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
395
+ "model.layers.48.input_layernorm.weight": "model-00005-of-00005.safetensors",
396
+ "model.layers.48.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
397
+ "model.layers.48.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
398
+ "model.layers.48.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
399
+ "model.layers.48.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
400
+ "model.layers.48.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
401
+ "model.layers.48.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
402
+ "model.layers.48.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
403
+ "model.layers.48.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
404
+ "model.layers.49.input_layernorm.weight": "model-00005-of-00005.safetensors",
405
+ "model.layers.49.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
406
+ "model.layers.49.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
407
+ "model.layers.49.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
408
+ "model.layers.49.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
409
+ "model.layers.49.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
410
+ "model.layers.49.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
411
+ "model.layers.49.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
412
+ "model.layers.49.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
413
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors",
414
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
415
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
416
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
417
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
418
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
419
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
420
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
421
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
422
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00005.safetensors",
423
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
424
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
425
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
426
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
427
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
428
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
429
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
430
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
431
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00005.safetensors",
432
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
433
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
434
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
435
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
436
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
437
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
438
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
439
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
440
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00005.safetensors",
441
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
442
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
443
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
444
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
445
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
446
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
447
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
448
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
449
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00005.safetensors",
450
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
451
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
452
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
453
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
454
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
455
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
456
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
457
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
458
+ "model.norm.weight": "model-00005-of-00005.safetensors"
459
+ }
460
+ }
output-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2c3da6e0aaf9d7f3aa13804303d106bf88f0db52e0f3fe56d847b86bf418cb3
3
+ size 8537928016
output-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1468026564245867540a1ea4ea786eae9121afd39ffefd90bdfca8021d600c4
3
+ size 1966873608
special_tokens_map.json ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|function_list|>",
6
+ "<|function_output|>",
7
+ "<|function_call|>",
8
+ "<|control_6|>",
9
+ "<|control_7|>",
10
+ "<|control_8|>",
11
+ "<|control_9|>",
12
+ "<|control_10|>",
13
+ "<|control_11|>",
14
+ "<|control_12|>",
15
+ "<|control_13|>",
16
+ "<|control_14|>",
17
+ "<|control_15|>",
18
+ "<|control_16|>",
19
+ "<|control_17|>",
20
+ "<|control_18|>",
21
+ "<|control_19|>",
22
+ "<|control_20|>",
23
+ "<|control_21|>",
24
+ "<|control_22|>",
25
+ "<|control_23|>",
26
+ "<|control_24|>",
27
+ "<|control_25|>",
28
+ "<|control_26|>",
29
+ "<|control_27|>",
30
+ "<|control_28|>",
31
+ "<|control_29|>",
32
+ "<|control_30|>",
33
+ "<|control_31|>",
34
+ "<|control_32|>",
35
+ "<|control_33|>",
36
+ "<|control_34|>",
37
+ "<|control_35|>",
38
+ "<|control_36|>",
39
+ "<|control_37|>",
40
+ "<|control_38|>",
41
+ "<|control_39|>",
42
+ "<|control_40|>",
43
+ "<|control_41|>",
44
+ "<|control_42|>",
45
+ "<|control_43|>",
46
+ "<|control_44|>",
47
+ "<|control_45|>",
48
+ "<|control_46|>",
49
+ "<|control_47|>",
50
+ "<|control_48|>",
51
+ "<|control_49|>",
52
+ "<|control_50|>",
53
+ "<|control_51|>",
54
+ "<|control_52|>",
55
+ "<|control_53|>",
56
+ "<|control_54|>",
57
+ "<|control_55|>",
58
+ "<|control_56|>",
59
+ "<|control_57|>",
60
+ "<|control_58|>",
61
+ "<|control_59|>",
62
+ "<|control_60|>",
63
+ "<|control_61|>",
64
+ "<|control_62|>",
65
+ "<|control_63|>",
66
+ "<|control_64|>",
67
+ "<|control_65|>",
68
+ "<|control_66|>",
69
+ "<|control_67|>",
70
+ "<|control_68|>",
71
+ "<|control_69|>",
72
+ "<|control_70|>",
73
+ "<|control_71|>",
74
+ "<|control_72|>",
75
+ "<|control_73|>",
76
+ "<|control_74|>",
77
+ "<|control_75|>",
78
+ "<|control_76|>",
79
+ "<|control_77|>",
80
+ "<|control_78|>",
81
+ "<|control_79|>",
82
+ "<|control_80|>",
83
+ "<|control_81|>",
84
+ "<|control_82|>",
85
+ "<|control_83|>",
86
+ "<|control_84|>",
87
+ "<|control_85|>",
88
+ "<|control_86|>",
89
+ "<|control_87|>",
90
+ "<|control_88|>",
91
+ "<|control_89|>",
92
+ "<|control_90|>",
93
+ "<|control_91|>",
94
+ "<|control_92|>",
95
+ "<|control_93|>",
96
+ "<|control_94|>",
97
+ "<|control_95|>",
98
+ "<|control_96|>",
99
+ "<|control_97|>",
100
+ "<|control_98|>",
101
+ "<|control_99|>",
102
+ "<|control_100|>",
103
+ "<|control_101|>",
104
+ "<|control_102|>",
105
+ "<|control_103|>",
106
+ "<|control_104|>",
107
+ "<|control_105|>",
108
+ "<|control_106|>",
109
+ "<|control_107|>",
110
+ "<|control_108|>",
111
+ "<|control_109|>",
112
+ "<|control_110|>",
113
+ "<|control_111|>",
114
+ "<|control_112|>",
115
+ "<|control_113|>",
116
+ "<|control_114|>",
117
+ "<|control_115|>",
118
+ "<|control_116|>",
119
+ "<|control_117|>",
120
+ "<|control_118|>",
121
+ "<|control_119|>",
122
+ "<|control_120|>",
123
+ "<|control_121|>",
124
+ "<|control_122|>",
125
+ "<|control_123|>",
126
+ "<|control_124|>",
127
+ "<|control_125|>",
128
+ "<|control_126|>",
129
+ "<|control_127|>",
130
+ "<|control_128|>"
131
+ ],
132
+ "bos_token": {
133
+ "content": "<s>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false
138
+ },
139
+ "eos_token": {
140
+ "content": "<|im_end|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false
145
+ },
146
+ "pad_token": {
147
+ "content": "</s>",
148
+ "lstrip": false,
149
+ "normalized": false,
150
+ "rstrip": false,
151
+ "single_word": false
152
+ },
153
+ "unk_token": {
154
+ "content": "<unk>",
155
+ "lstrip": false,
156
+ "normalized": false,
157
+ "rstrip": false,
158
+ "single_word": false
159
+ }
160
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,1197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "32000": {
31
+ "content": "<|im_start|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "32001": {
39
+ "content": "<|im_end|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "32002": {
47
+ "content": "<|function_list|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "32003": {
55
+ "content": "<|function_output|>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "32004": {
63
+ "content": "<|function_call|>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "32005": {
71
+ "content": "<|control_6|>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "32006": {
79
+ "content": "<|control_7|>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "32007": {
87
+ "content": "<|control_8|>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "32008": {
95
+ "content": "<|control_9|>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "32009": {
103
+ "content": "<|control_10|>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "32010": {
111
+ "content": "<|control_11|>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "32011": {
119
+ "content": "<|control_12|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "32012": {
127
+ "content": "<|control_13|>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "32013": {
135
+ "content": "<|control_14|>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": true
141
+ },
142
+ "32014": {
143
+ "content": "<|control_15|>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": true
149
+ },
150
+ "32015": {
151
+ "content": "<|control_16|>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": true
157
+ },
158
+ "32016": {
159
+ "content": "<|control_17|>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": true
165
+ },
166
+ "32017": {
167
+ "content": "<|control_18|>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": true
173
+ },
174
+ "32018": {
175
+ "content": "<|control_19|>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": true
181
+ },
182
+ "32019": {
183
+ "content": "<|control_20|>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": true
189
+ },
190
+ "32020": {
191
+ "content": "<|control_21|>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": true
197
+ },
198
+ "32021": {
199
+ "content": "<|control_22|>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": true
205
+ },
206
+ "32022": {
207
+ "content": "<|control_23|>",
208
+ "lstrip": false,
209
+ "normalized": false,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": true
213
+ },
214
+ "32023": {
215
+ "content": "<|control_24|>",
216
+ "lstrip": false,
217
+ "normalized": false,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": true
221
+ },
222
+ "32024": {
223
+ "content": "<|control_25|>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": true
229
+ },
230
+ "32025": {
231
+ "content": "<|control_26|>",
232
+ "lstrip": false,
233
+ "normalized": false,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": true
237
+ },
238
+ "32026": {
239
+ "content": "<|control_27|>",
240
+ "lstrip": false,
241
+ "normalized": false,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": true
245
+ },
246
+ "32027": {
247
+ "content": "<|control_28|>",
248
+ "lstrip": false,
249
+ "normalized": false,
250
+ "rstrip": false,
251
+ "single_word": false,
252
+ "special": true
253
+ },
254
+ "32028": {
255
+ "content": "<|control_29|>",
256
+ "lstrip": false,
257
+ "normalized": false,
258
+ "rstrip": false,
259
+ "single_word": false,
260
+ "special": true
261
+ },
262
+ "32029": {
263
+ "content": "<|control_30|>",
264
+ "lstrip": false,
265
+ "normalized": false,
266
+ "rstrip": false,
267
+ "single_word": false,
268
+ "special": true
269
+ },
270
+ "32030": {
271
+ "content": "<|control_31|>",
272
+ "lstrip": false,
273
+ "normalized": false,
274
+ "rstrip": false,
275
+ "single_word": false,
276
+ "special": true
277
+ },
278
+ "32031": {
279
+ "content": "<|control_32|>",
280
+ "lstrip": false,
281
+ "normalized": false,
282
+ "rstrip": false,
283
+ "single_word": false,
284
+ "special": true
285
+ },
286
+ "32032": {
287
+ "content": "<|control_33|>",
288
+ "lstrip": false,
289
+ "normalized": false,
290
+ "rstrip": false,
291
+ "single_word": false,
292
+ "special": true
293
+ },
294
+ "32033": {
295
+ "content": "<|control_34|>",
296
+ "lstrip": false,
297
+ "normalized": false,
298
+ "rstrip": false,
299
+ "single_word": false,
300
+ "special": true
301
+ },
302
+ "32034": {
303
+ "content": "<|control_35|>",
304
+ "lstrip": false,
305
+ "normalized": false,
306
+ "rstrip": false,
307
+ "single_word": false,
308
+ "special": true
309
+ },
310
+ "32035": {
311
+ "content": "<|control_36|>",
312
+ "lstrip": false,
313
+ "normalized": false,
314
+ "rstrip": false,
315
+ "single_word": false,
316
+ "special": true
317
+ },
318
+ "32036": {
319
+ "content": "<|control_37|>",
320
+ "lstrip": false,
321
+ "normalized": false,
322
+ "rstrip": false,
323
+ "single_word": false,
324
+ "special": true
325
+ },
326
+ "32037": {
327
+ "content": "<|control_38|>",
328
+ "lstrip": false,
329
+ "normalized": false,
330
+ "rstrip": false,
331
+ "single_word": false,
332
+ "special": true
333
+ },
334
+ "32038": {
335
+ "content": "<|control_39|>",
336
+ "lstrip": false,
337
+ "normalized": false,
338
+ "rstrip": false,
339
+ "single_word": false,
340
+ "special": true
341
+ },
342
+ "32039": {
343
+ "content": "<|control_40|>",
344
+ "lstrip": false,
345
+ "normalized": false,
346
+ "rstrip": false,
347
+ "single_word": false,
348
+ "special": true
349
+ },
350
+ "32040": {
351
+ "content": "<|control_41|>",
352
+ "lstrip": false,
353
+ "normalized": false,
354
+ "rstrip": false,
355
+ "single_word": false,
356
+ "special": true
357
+ },
358
+ "32041": {
359
+ "content": "<|control_42|>",
360
+ "lstrip": false,
361
+ "normalized": false,
362
+ "rstrip": false,
363
+ "single_word": false,
364
+ "special": true
365
+ },
366
+ "32042": {
367
+ "content": "<|control_43|>",
368
+ "lstrip": false,
369
+ "normalized": false,
370
+ "rstrip": false,
371
+ "single_word": false,
372
+ "special": true
373
+ },
374
+ "32043": {
375
+ "content": "<|control_44|>",
376
+ "lstrip": false,
377
+ "normalized": false,
378
+ "rstrip": false,
379
+ "single_word": false,
380
+ "special": true
381
+ },
382
+ "32044": {
383
+ "content": "<|control_45|>",
384
+ "lstrip": false,
385
+ "normalized": false,
386
+ "rstrip": false,
387
+ "single_word": false,
388
+ "special": true
389
+ },
390
+ "32045": {
391
+ "content": "<|control_46|>",
392
+ "lstrip": false,
393
+ "normalized": false,
394
+ "rstrip": false,
395
+ "single_word": false,
396
+ "special": true
397
+ },
398
+ "32046": {
399
+ "content": "<|control_47|>",
400
+ "lstrip": false,
401
+ "normalized": false,
402
+ "rstrip": false,
403
+ "single_word": false,
404
+ "special": true
405
+ },
406
+ "32047": {
407
+ "content": "<|control_48|>",
408
+ "lstrip": false,
409
+ "normalized": false,
410
+ "rstrip": false,
411
+ "single_word": false,
412
+ "special": true
413
+ },
414
+ "32048": {
415
+ "content": "<|control_49|>",
416
+ "lstrip": false,
417
+ "normalized": false,
418
+ "rstrip": false,
419
+ "single_word": false,
420
+ "special": true
421
+ },
422
+ "32049": {
423
+ "content": "<|control_50|>",
424
+ "lstrip": false,
425
+ "normalized": false,
426
+ "rstrip": false,
427
+ "single_word": false,
428
+ "special": true
429
+ },
430
+ "32050": {
431
+ "content": "<|control_51|>",
432
+ "lstrip": false,
433
+ "normalized": false,
434
+ "rstrip": false,
435
+ "single_word": false,
436
+ "special": true
437
+ },
438
+ "32051": {
439
+ "content": "<|control_52|>",
440
+ "lstrip": false,
441
+ "normalized": false,
442
+ "rstrip": false,
443
+ "single_word": false,
444
+ "special": true
445
+ },
446
+ "32052": {
447
+ "content": "<|control_53|>",
448
+ "lstrip": false,
449
+ "normalized": false,
450
+ "rstrip": false,
451
+ "single_word": false,
452
+ "special": true
453
+ },
454
+ "32053": {
455
+ "content": "<|control_54|>",
456
+ "lstrip": false,
457
+ "normalized": false,
458
+ "rstrip": false,
459
+ "single_word": false,
460
+ "special": true
461
+ },
462
+ "32054": {
463
+ "content": "<|control_55|>",
464
+ "lstrip": false,
465
+ "normalized": false,
466
+ "rstrip": false,
467
+ "single_word": false,
468
+ "special": true
469
+ },
470
+ "32055": {
471
+ "content": "<|control_56|>",
472
+ "lstrip": false,
473
+ "normalized": false,
474
+ "rstrip": false,
475
+ "single_word": false,
476
+ "special": true
477
+ },
478
+ "32056": {
479
+ "content": "<|control_57|>",
480
+ "lstrip": false,
481
+ "normalized": false,
482
+ "rstrip": false,
483
+ "single_word": false,
484
+ "special": true
485
+ },
486
+ "32057": {
487
+ "content": "<|control_58|>",
488
+ "lstrip": false,
489
+ "normalized": false,
490
+ "rstrip": false,
491
+ "single_word": false,
492
+ "special": true
493
+ },
494
+ "32058": {
495
+ "content": "<|control_59|>",
496
+ "lstrip": false,
497
+ "normalized": false,
498
+ "rstrip": false,
499
+ "single_word": false,
500
+ "special": true
501
+ },
502
+ "32059": {
503
+ "content": "<|control_60|>",
504
+ "lstrip": false,
505
+ "normalized": false,
506
+ "rstrip": false,
507
+ "single_word": false,
508
+ "special": true
509
+ },
510
+ "32060": {
511
+ "content": "<|control_61|>",
512
+ "lstrip": false,
513
+ "normalized": false,
514
+ "rstrip": false,
515
+ "single_word": false,
516
+ "special": true
517
+ },
518
+ "32061": {
519
+ "content": "<|control_62|>",
520
+ "lstrip": false,
521
+ "normalized": false,
522
+ "rstrip": false,
523
+ "single_word": false,
524
+ "special": true
525
+ },
526
+ "32062": {
527
+ "content": "<|control_63|>",
528
+ "lstrip": false,
529
+ "normalized": false,
530
+ "rstrip": false,
531
+ "single_word": false,
532
+ "special": true
533
+ },
534
+ "32063": {
535
+ "content": "<|control_64|>",
536
+ "lstrip": false,
537
+ "normalized": false,
538
+ "rstrip": false,
539
+ "single_word": false,
540
+ "special": true
541
+ },
542
+ "32064": {
543
+ "content": "<|control_65|>",
544
+ "lstrip": false,
545
+ "normalized": false,
546
+ "rstrip": false,
547
+ "single_word": false,
548
+ "special": true
549
+ },
550
+ "32065": {
551
+ "content": "<|control_66|>",
552
+ "lstrip": false,
553
+ "normalized": false,
554
+ "rstrip": false,
555
+ "single_word": false,
556
+ "special": true
557
+ },
558
+ "32066": {
559
+ "content": "<|control_67|>",
560
+ "lstrip": false,
561
+ "normalized": false,
562
+ "rstrip": false,
563
+ "single_word": false,
564
+ "special": true
565
+ },
566
+ "32067": {
567
+ "content": "<|control_68|>",
568
+ "lstrip": false,
569
+ "normalized": false,
570
+ "rstrip": false,
571
+ "single_word": false,
572
+ "special": true
573
+ },
574
+ "32068": {
575
+ "content": "<|control_69|>",
576
+ "lstrip": false,
577
+ "normalized": false,
578
+ "rstrip": false,
579
+ "single_word": false,
580
+ "special": true
581
+ },
582
+ "32069": {
583
+ "content": "<|control_70|>",
584
+ "lstrip": false,
585
+ "normalized": false,
586
+ "rstrip": false,
587
+ "single_word": false,
588
+ "special": true
589
+ },
590
+ "32070": {
591
+ "content": "<|control_71|>",
592
+ "lstrip": false,
593
+ "normalized": false,
594
+ "rstrip": false,
595
+ "single_word": false,
596
+ "special": true
597
+ },
598
+ "32071": {
599
+ "content": "<|control_72|>",
600
+ "lstrip": false,
601
+ "normalized": false,
602
+ "rstrip": false,
603
+ "single_word": false,
604
+ "special": true
605
+ },
606
+ "32072": {
607
+ "content": "<|control_73|>",
608
+ "lstrip": false,
609
+ "normalized": false,
610
+ "rstrip": false,
611
+ "single_word": false,
612
+ "special": true
613
+ },
614
+ "32073": {
615
+ "content": "<|control_74|>",
616
+ "lstrip": false,
617
+ "normalized": false,
618
+ "rstrip": false,
619
+ "single_word": false,
620
+ "special": true
621
+ },
622
+ "32074": {
623
+ "content": "<|control_75|>",
624
+ "lstrip": false,
625
+ "normalized": false,
626
+ "rstrip": false,
627
+ "single_word": false,
628
+ "special": true
629
+ },
630
+ "32075": {
631
+ "content": "<|control_76|>",
632
+ "lstrip": false,
633
+ "normalized": false,
634
+ "rstrip": false,
635
+ "single_word": false,
636
+ "special": true
637
+ },
638
+ "32076": {
639
+ "content": "<|control_77|>",
640
+ "lstrip": false,
641
+ "normalized": false,
642
+ "rstrip": false,
643
+ "single_word": false,
644
+ "special": true
645
+ },
646
+ "32077": {
647
+ "content": "<|control_78|>",
648
+ "lstrip": false,
649
+ "normalized": false,
650
+ "rstrip": false,
651
+ "single_word": false,
652
+ "special": true
653
+ },
654
+ "32078": {
655
+ "content": "<|control_79|>",
656
+ "lstrip": false,
657
+ "normalized": false,
658
+ "rstrip": false,
659
+ "single_word": false,
660
+ "special": true
661
+ },
662
+ "32079": {
663
+ "content": "<|control_80|>",
664
+ "lstrip": false,
665
+ "normalized": false,
666
+ "rstrip": false,
667
+ "single_word": false,
668
+ "special": true
669
+ },
670
+ "32080": {
671
+ "content": "<|control_81|>",
672
+ "lstrip": false,
673
+ "normalized": false,
674
+ "rstrip": false,
675
+ "single_word": false,
676
+ "special": true
677
+ },
678
+ "32081": {
679
+ "content": "<|control_82|>",
680
+ "lstrip": false,
681
+ "normalized": false,
682
+ "rstrip": false,
683
+ "single_word": false,
684
+ "special": true
685
+ },
686
+ "32082": {
687
+ "content": "<|control_83|>",
688
+ "lstrip": false,
689
+ "normalized": false,
690
+ "rstrip": false,
691
+ "single_word": false,
692
+ "special": true
693
+ },
694
+ "32083": {
695
+ "content": "<|control_84|>",
696
+ "lstrip": false,
697
+ "normalized": false,
698
+ "rstrip": false,
699
+ "single_word": false,
700
+ "special": true
701
+ },
702
+ "32084": {
703
+ "content": "<|control_85|>",
704
+ "lstrip": false,
705
+ "normalized": false,
706
+ "rstrip": false,
707
+ "single_word": false,
708
+ "special": true
709
+ },
710
+ "32085": {
711
+ "content": "<|control_86|>",
712
+ "lstrip": false,
713
+ "normalized": false,
714
+ "rstrip": false,
715
+ "single_word": false,
716
+ "special": true
717
+ },
718
+ "32086": {
719
+ "content": "<|control_87|>",
720
+ "lstrip": false,
721
+ "normalized": false,
722
+ "rstrip": false,
723
+ "single_word": false,
724
+ "special": true
725
+ },
726
+ "32087": {
727
+ "content": "<|control_88|>",
728
+ "lstrip": false,
729
+ "normalized": false,
730
+ "rstrip": false,
731
+ "single_word": false,
732
+ "special": true
733
+ },
734
+ "32088": {
735
+ "content": "<|control_89|>",
736
+ "lstrip": false,
737
+ "normalized": false,
738
+ "rstrip": false,
739
+ "single_word": false,
740
+ "special": true
741
+ },
742
+ "32089": {
743
+ "content": "<|control_90|>",
744
+ "lstrip": false,
745
+ "normalized": false,
746
+ "rstrip": false,
747
+ "single_word": false,
748
+ "special": true
749
+ },
750
+ "32090": {
751
+ "content": "<|control_91|>",
752
+ "lstrip": false,
753
+ "normalized": false,
754
+ "rstrip": false,
755
+ "single_word": false,
756
+ "special": true
757
+ },
758
+ "32091": {
759
+ "content": "<|control_92|>",
760
+ "lstrip": false,
761
+ "normalized": false,
762
+ "rstrip": false,
763
+ "single_word": false,
764
+ "special": true
765
+ },
766
+ "32092": {
767
+ "content": "<|control_93|>",
768
+ "lstrip": false,
769
+ "normalized": false,
770
+ "rstrip": false,
771
+ "single_word": false,
772
+ "special": true
773
+ },
774
+ "32093": {
775
+ "content": "<|control_94|>",
776
+ "lstrip": false,
777
+ "normalized": false,
778
+ "rstrip": false,
779
+ "single_word": false,
780
+ "special": true
781
+ },
782
+ "32094": {
783
+ "content": "<|control_95|>",
784
+ "lstrip": false,
785
+ "normalized": false,
786
+ "rstrip": false,
787
+ "single_word": false,
788
+ "special": true
789
+ },
790
+ "32095": {
791
+ "content": "<|control_96|>",
792
+ "lstrip": false,
793
+ "normalized": false,
794
+ "rstrip": false,
795
+ "single_word": false,
796
+ "special": true
797
+ },
798
+ "32096": {
799
+ "content": "<|control_97|>",
800
+ "lstrip": false,
801
+ "normalized": false,
802
+ "rstrip": false,
803
+ "single_word": false,
804
+ "special": true
805
+ },
806
+ "32097": {
807
+ "content": "<|control_98|>",
808
+ "lstrip": false,
809
+ "normalized": false,
810
+ "rstrip": false,
811
+ "single_word": false,
812
+ "special": true
813
+ },
814
+ "32098": {
815
+ "content": "<|control_99|>",
816
+ "lstrip": false,
817
+ "normalized": false,
818
+ "rstrip": false,
819
+ "single_word": false,
820
+ "special": true
821
+ },
822
+ "32099": {
823
+ "content": "<|control_100|>",
824
+ "lstrip": false,
825
+ "normalized": false,
826
+ "rstrip": false,
827
+ "single_word": false,
828
+ "special": true
829
+ },
830
+ "32100": {
831
+ "content": "<|control_101|>",
832
+ "lstrip": false,
833
+ "normalized": false,
834
+ "rstrip": false,
835
+ "single_word": false,
836
+ "special": true
837
+ },
838
+ "32101": {
839
+ "content": "<|control_102|>",
840
+ "lstrip": false,
841
+ "normalized": false,
842
+ "rstrip": false,
843
+ "single_word": false,
844
+ "special": true
845
+ },
846
+ "32102": {
847
+ "content": "<|control_103|>",
848
+ "lstrip": false,
849
+ "normalized": false,
850
+ "rstrip": false,
851
+ "single_word": false,
852
+ "special": true
853
+ },
854
+ "32103": {
855
+ "content": "<|control_104|>",
856
+ "lstrip": false,
857
+ "normalized": false,
858
+ "rstrip": false,
859
+ "single_word": false,
860
+ "special": true
861
+ },
862
+ "32104": {
863
+ "content": "<|control_105|>",
864
+ "lstrip": false,
865
+ "normalized": false,
866
+ "rstrip": false,
867
+ "single_word": false,
868
+ "special": true
869
+ },
870
+ "32105": {
871
+ "content": "<|control_106|>",
872
+ "lstrip": false,
873
+ "normalized": false,
874
+ "rstrip": false,
875
+ "single_word": false,
876
+ "special": true
877
+ },
878
+ "32106": {
879
+ "content": "<|control_107|>",
880
+ "lstrip": false,
881
+ "normalized": false,
882
+ "rstrip": false,
883
+ "single_word": false,
884
+ "special": true
885
+ },
886
+ "32107": {
887
+ "content": "<|control_108|>",
888
+ "lstrip": false,
889
+ "normalized": false,
890
+ "rstrip": false,
891
+ "single_word": false,
892
+ "special": true
893
+ },
894
+ "32108": {
895
+ "content": "<|control_109|>",
896
+ "lstrip": false,
897
+ "normalized": false,
898
+ "rstrip": false,
899
+ "single_word": false,
900
+ "special": true
901
+ },
902
+ "32109": {
903
+ "content": "<|control_110|>",
904
+ "lstrip": false,
905
+ "normalized": false,
906
+ "rstrip": false,
907
+ "single_word": false,
908
+ "special": true
909
+ },
910
+ "32110": {
911
+ "content": "<|control_111|>",
912
+ "lstrip": false,
913
+ "normalized": false,
914
+ "rstrip": false,
915
+ "single_word": false,
916
+ "special": true
917
+ },
918
+ "32111": {
919
+ "content": "<|control_112|>",
920
+ "lstrip": false,
921
+ "normalized": false,
922
+ "rstrip": false,
923
+ "single_word": false,
924
+ "special": true
925
+ },
926
+ "32112": {
927
+ "content": "<|control_113|>",
928
+ "lstrip": false,
929
+ "normalized": false,
930
+ "rstrip": false,
931
+ "single_word": false,
932
+ "special": true
933
+ },
934
+ "32113": {
935
+ "content": "<|control_114|>",
936
+ "lstrip": false,
937
+ "normalized": false,
938
+ "rstrip": false,
939
+ "single_word": false,
940
+ "special": true
941
+ },
942
+ "32114": {
943
+ "content": "<|control_115|>",
944
+ "lstrip": false,
945
+ "normalized": false,
946
+ "rstrip": false,
947
+ "single_word": false,
948
+ "special": true
949
+ },
950
+ "32115": {
951
+ "content": "<|control_116|>",
952
+ "lstrip": false,
953
+ "normalized": false,
954
+ "rstrip": false,
955
+ "single_word": false,
956
+ "special": true
957
+ },
958
+ "32116": {
959
+ "content": "<|control_117|>",
960
+ "lstrip": false,
961
+ "normalized": false,
962
+ "rstrip": false,
963
+ "single_word": false,
964
+ "special": true
965
+ },
966
+ "32117": {
967
+ "content": "<|control_118|>",
968
+ "lstrip": false,
969
+ "normalized": false,
970
+ "rstrip": false,
971
+ "single_word": false,
972
+ "special": true
973
+ },
974
+ "32118": {
975
+ "content": "<|control_119|>",
976
+ "lstrip": false,
977
+ "normalized": false,
978
+ "rstrip": false,
979
+ "single_word": false,
980
+ "special": true
981
+ },
982
+ "32119": {
983
+ "content": "<|control_120|>",
984
+ "lstrip": false,
985
+ "normalized": false,
986
+ "rstrip": false,
987
+ "single_word": false,
988
+ "special": true
989
+ },
990
+ "32120": {
991
+ "content": "<|control_121|>",
992
+ "lstrip": false,
993
+ "normalized": false,
994
+ "rstrip": false,
995
+ "single_word": false,
996
+ "special": true
997
+ },
998
+ "32121": {
999
+ "content": "<|control_122|>",
1000
+ "lstrip": false,
1001
+ "normalized": false,
1002
+ "rstrip": false,
1003
+ "single_word": false,
1004
+ "special": true
1005
+ },
1006
+ "32122": {
1007
+ "content": "<|control_123|>",
1008
+ "lstrip": false,
1009
+ "normalized": false,
1010
+ "rstrip": false,
1011
+ "single_word": false,
1012
+ "special": true
1013
+ },
1014
+ "32123": {
1015
+ "content": "<|control_124|>",
1016
+ "lstrip": false,
1017
+ "normalized": false,
1018
+ "rstrip": false,
1019
+ "single_word": false,
1020
+ "special": true
1021
+ },
1022
+ "32124": {
1023
+ "content": "<|control_125|>",
1024
+ "lstrip": false,
1025
+ "normalized": false,
1026
+ "rstrip": false,
1027
+ "single_word": false,
1028
+ "special": true
1029
+ },
1030
+ "32125": {
1031
+ "content": "<|control_126|>",
1032
+ "lstrip": false,
1033
+ "normalized": false,
1034
+ "rstrip": false,
1035
+ "single_word": false,
1036
+ "special": true
1037
+ },
1038
+ "32126": {
1039
+ "content": "<|control_127|>",
1040
+ "lstrip": false,
1041
+ "normalized": false,
1042
+ "rstrip": false,
1043
+ "single_word": false,
1044
+ "special": true
1045
+ },
1046
+ "32127": {
1047
+ "content": "<|control_128|>",
1048
+ "lstrip": false,
1049
+ "normalized": false,
1050
+ "rstrip": false,
1051
+ "single_word": false,
1052
+ "special": true
1053
+ }
1054
+ },
1055
+ "additional_special_tokens": [
1056
+ "<|im_start|>",
1057
+ "<|im_end|>",
1058
+ "<|function_list|>",
1059
+ "<|function_output|>",
1060
+ "<|function_call|>",
1061
+ "<|control_6|>",
1062
+ "<|control_7|>",
1063
+ "<|control_8|>",
1064
+ "<|control_9|>",
1065
+ "<|control_10|>",
1066
+ "<|control_11|>",
1067
+ "<|control_12|>",
1068
+ "<|control_13|>",
1069
+ "<|control_14|>",
1070
+ "<|control_15|>",
1071
+ "<|control_16|>",
1072
+ "<|control_17|>",
1073
+ "<|control_18|>",
1074
+ "<|control_19|>",
1075
+ "<|control_20|>",
1076
+ "<|control_21|>",
1077
+ "<|control_22|>",
1078
+ "<|control_23|>",
1079
+ "<|control_24|>",
1080
+ "<|control_25|>",
1081
+ "<|control_26|>",
1082
+ "<|control_27|>",
1083
+ "<|control_28|>",
1084
+ "<|control_29|>",
1085
+ "<|control_30|>",
1086
+ "<|control_31|>",
1087
+ "<|control_32|>",
1088
+ "<|control_33|>",
1089
+ "<|control_34|>",
1090
+ "<|control_35|>",
1091
+ "<|control_36|>",
1092
+ "<|control_37|>",
1093
+ "<|control_38|>",
1094
+ "<|control_39|>",
1095
+ "<|control_40|>",
1096
+ "<|control_41|>",
1097
+ "<|control_42|>",
1098
+ "<|control_43|>",
1099
+ "<|control_44|>",
1100
+ "<|control_45|>",
1101
+ "<|control_46|>",
1102
+ "<|control_47|>",
1103
+ "<|control_48|>",
1104
+ "<|control_49|>",
1105
+ "<|control_50|>",
1106
+ "<|control_51|>",
1107
+ "<|control_52|>",
1108
+ "<|control_53|>",
1109
+ "<|control_54|>",
1110
+ "<|control_55|>",
1111
+ "<|control_56|>",
1112
+ "<|control_57|>",
1113
+ "<|control_58|>",
1114
+ "<|control_59|>",
1115
+ "<|control_60|>",
1116
+ "<|control_61|>",
1117
+ "<|control_62|>",
1118
+ "<|control_63|>",
1119
+ "<|control_64|>",
1120
+ "<|control_65|>",
1121
+ "<|control_66|>",
1122
+ "<|control_67|>",
1123
+ "<|control_68|>",
1124
+ "<|control_69|>",
1125
+ "<|control_70|>",
1126
+ "<|control_71|>",
1127
+ "<|control_72|>",
1128
+ "<|control_73|>",
1129
+ "<|control_74|>",
1130
+ "<|control_75|>",
1131
+ "<|control_76|>",
1132
+ "<|control_77|>",
1133
+ "<|control_78|>",
1134
+ "<|control_79|>",
1135
+ "<|control_80|>",
1136
+ "<|control_81|>",
1137
+ "<|control_82|>",
1138
+ "<|control_83|>",
1139
+ "<|control_84|>",
1140
+ "<|control_85|>",
1141
+ "<|control_86|>",
1142
+ "<|control_87|>",
1143
+ "<|control_88|>",
1144
+ "<|control_89|>",
1145
+ "<|control_90|>",
1146
+ "<|control_91|>",
1147
+ "<|control_92|>",
1148
+ "<|control_93|>",
1149
+ "<|control_94|>",
1150
+ "<|control_95|>",
1151
+ "<|control_96|>",
1152
+ "<|control_97|>",
1153
+ "<|control_98|>",
1154
+ "<|control_99|>",
1155
+ "<|control_100|>",
1156
+ "<|control_101|>",
1157
+ "<|control_102|>",
1158
+ "<|control_103|>",
1159
+ "<|control_104|>",
1160
+ "<|control_105|>",
1161
+ "<|control_106|>",
1162
+ "<|control_107|>",
1163
+ "<|control_108|>",
1164
+ "<|control_109|>",
1165
+ "<|control_110|>",
1166
+ "<|control_111|>",
1167
+ "<|control_112|>",
1168
+ "<|control_113|>",
1169
+ "<|control_114|>",
1170
+ "<|control_115|>",
1171
+ "<|control_116|>",
1172
+ "<|control_117|>",
1173
+ "<|control_118|>",
1174
+ "<|control_119|>",
1175
+ "<|control_120|>",
1176
+ "<|control_121|>",
1177
+ "<|control_122|>",
1178
+ "<|control_123|>",
1179
+ "<|control_124|>",
1180
+ "<|control_125|>",
1181
+ "<|control_126|>",
1182
+ "<|control_127|>",
1183
+ "<|control_128|>"
1184
+ ],
1185
+ "bos_token": "<s>",
1186
+ "chat_template": "{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
1187
+ "clean_up_tokenization_spaces": true,
1188
+ "eos_token": "<|im_end|>",
1189
+ "legacy": true,
1190
+ "model_max_length": 1000000000000000019884624838656,
1191
+ "pad_token": "</s>",
1192
+ "sp_model_kwargs": {},
1193
+ "spaces_between_special_tokens": false,
1194
+ "tokenizer_class": "LlamaTokenizer",
1195
+ "unk_token": "<unk>",
1196
+ "use_default_system_prompt": false
1197
+ }