adamo1139 commited on
Commit
f516e57
·
verified ·
1 Parent(s): 912be65

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tekken.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - it
8
+ - pt
9
+ - zh
10
+ - ja
11
+ - ru
12
+ - ko
13
+ license: apache-2.0
14
+ library_name: vllm
15
+ base_model:
16
+ - mistralai/Mistral-Small-24B-Base-2501
17
+ extra_gated_description: If you want to learn more about how we process your personal
18
+ data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
19
+ ---
20
+
21
+ # Model Card for Mistral-Small-24B-Instruct-2501
22
+
23
+ Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
24
+ This model is an instruction-fine-tuned version of the base model: [Mistral-Small-24B-Base-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501).
25
+
26
+ Mistral Small can be deployed locally and is exceptionally "knowledge-dense", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized.
27
+ Perfect for:
28
+ - Fast response conversational agents.
29
+ - Low latency function calling.
30
+ - Subject matter experts via fine-tuning.
31
+ - Local inference for hobbyists and organizations handling sensitive data.
32
+
33
+ For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community.
34
+
35
+ This release demonstrates our commitment to open source, serving as a strong base model.
36
+
37
+ Learn more about Mistral Small in our [blog post](https://mistral.ai/news/mistral-small-3/).
38
+
39
+ ## Key Features
40
+ - **Multilingual:** Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
41
+ - **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting.
42
+ - **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities.
43
+ - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
44
+ - **Context Window:** A 32k context window.
45
+ - **System Prompt:** Maintains strong adherence and support for system prompts.
46
+ - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
47
+
48
+ ### Basic Instruct Template (V7-Tekken)
49
+
50
+ ```
51
+ <s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
52
+ ```
53
+ *`<system_prompt>`, `<user message>` and `<assistant response>` are placeholders.*
54
+
55
+ ***Please make sure to use [mistral-common](https://github.com/mistralai/mistral-common) as the source of truth***
56
+
57
+ ## Usage
58
+
59
+ The model can be used with the following frameworks;
60
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vLLM)
61
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#Transformers)
62
+
63
+ ### vLLM
64
+
65
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
66
+ to implement production-ready inference pipelines.
67
+
68
+ **_Installation_**
69
+
70
+ Make sure you install [`vLLM >= 0.6.4`](https://github.com/vllm-project/vllm/releases/tag/v0.6.4):
71
+
72
+ ```
73
+ pip install --upgrade vllm
74
+ ```
75
+
76
+ Also make sure you have [`mistral_common >= 1.5.2`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.2) installed:
77
+
78
+ ```
79
+ pip install --upgrade mistral_common
80
+ ```
81
+
82
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
83
+
84
+ #### Server
85
+
86
+ We recommand that you use Mistral-Small-Instruct-2501 in a server/client setting.
87
+
88
+ 1. Spin up a server:
89
+
90
+ ```
91
+ vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --enable-auto-tool-choice
92
+ ```
93
+
94
+ **Note:** Running Mistral-Small-Instruct-2501 on GPU requires 60 GB of GPU RAM.
95
+
96
+
97
+ 2. To ping the client you can use a simple Python snippet.
98
+
99
+ ```py
100
+ import requests
101
+ import json
102
+ from datetime import datetime, timedelta
103
+
104
+ url = "http://<your-server>:8000/v1/chat/completions"
105
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
106
+
107
+ model = "mistralai/Mistral-Small-24B-Instruct-2501"
108
+
109
+ messages = [
110
+ {
111
+ "role": "system",
112
+ "content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
113
+ },
114
+ {
115
+ "role": "user",
116
+ "content": "Give me 5 non-formal ways to say 'See you later' in French."
117
+ },
118
+ ]
119
+
120
+ data = {"model": model, "messages": messages}
121
+
122
+ response = requests.post(url, headers=headers, data=json.dumps(data))
123
+ print(response.json()["choices"][0]["message"]["content"])
124
+
125
+ # Sure, here are five non-formal ways to say "See you later" in French:
126
+ #
127
+ # 1. À plus tard
128
+ # 2. À plus
129
+ # 3. Salut
130
+ # 4. À toute
131
+ # 5. Bisous
132
+ #
133
+ # ```
134
+ # /\_/\
135
+ # ( o.o )
136
+ # > ^ <
137
+ # ```
138
+ ```
139
+
140
+ #### Offline
141
+
142
+ ```py
143
+ from vllm import LLM
144
+ from vllm.sampling_params import SamplingParams
145
+ from datetime import datetime, timedelta
146
+
147
+ SYSTEM_PROMPT = "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
148
+
149
+ user_prompt = "Give me 5 non-formal ways to say 'See you later' in French."
150
+
151
+ messages = [
152
+ {
153
+ "role": "system",
154
+ "content": SYSTEM_PROMPT
155
+ },
156
+ {
157
+ "role": "user",
158
+ "content": user_prompt
159
+ },
160
+ ]
161
+
162
+ # note that running this model on GPU requires over 60 GB of GPU RAM
163
+ llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)
164
+
165
+ sampling_params = SamplingParams(max_tokens=512)
166
+
167
+ outputs = llm.chat(messages, sampling_params=sampling_params)
168
+
169
+ print(outputs[0].outputs[0].text)
170
+ # Sure, here are five non-formal ways to say "See you later" in French:
171
+ #
172
+ # 1. À plus tard
173
+ # 2. À plus
174
+ # 3. Salut
175
+ # 4. À toute
176
+ # 5. Bisous
177
+ #
178
+ # ```
179
+ # /\_/\
180
+ # ( o.o )
181
+ # > ^ <
182
+ # ```
183
+ ```
184
+
185
+ ### Transformers
186
+
187
+ If you want to use Hugging Face transformers to generate text, you can do something like this.
188
+
189
+ ```py
190
+ from transformers import pipeline
191
+
192
+ messages = [
193
+ {"role": "system", "content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."},
194
+ {"role": "user", "content": "Give me 5 non-formal ways to say 'See you later' in French."},
195
+ ]
196
+ chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256)
197
+ chatbot(messages)
198
+ ```
199
+
200
+ ## The Mistral AI Team
201
+
202
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 5120,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 32768,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 40,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 100000000.0,
20
+ "sliding_window": null,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.49.0.dev0",
24
+ "use_cache": true,
25
+ "vocab_size": 131072
26
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0.dev0"
6
+ }
model-00001-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75a14c708eea501700a723dc74bc886cf36a1393686a3fb098ee106b160da32f
3
+ size 4781571736
model-00002-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ff40fbfd9e042b7dab3f3c9442f870a4701f53e394dda769807a160ba40f32a
3
+ size 4781592784
model-00003-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cc2d059fded71efd2947a414f32053b4ed3fa84383edf97b6d91fd9f04e4235
3
+ size 4781592800
model-00004-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa0e9acacf161c45ae0d71ca3f7e4ec9ee55dae2153398da52f81ee4f9e1b8d2
3
+ size 4886471600
model-00005-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dafb696763d31a1fda58010b73ecc05c19d395da8ec2c24aa9c41da33f2230d3
3
+ size 4781592824
model-00006-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9a433b19fd4d6986660a616a0d6fc7d02d9e8c0ab3c9b98940217ee6bd4e053
3
+ size 4781592816
model-00007-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ac5c7b042491f917016c3e9635583177058f736be5fa315019b959fc3c43b63
3
+ size 4886471600
model-00008-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c460c5b957ab3ac81f03bb20c0348820225dd9b819fa4487ae733b1e696e573
3
+ size 4781592824
model-00009-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:304f66de1aeb55f0b4e1181885e3d15b65b485d5ce5c93b4adcdf7dd2c2d8cc5
3
+ size 4781592816
model-00010-of-00010.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4c3bcedc02f4dd7e04c8b0fe1199f4f27de7a37790d1510a8772ffe05093543
3
+ size 3900777072
model.safetensors.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 47144806400
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00010-of-00010.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00010.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00010.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00010.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00010.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00004-of-00010.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00004-of-00010.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00004-of-00010.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00010.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00010.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00005-of-00010.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00005-of-00010.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00005-of-00010.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00005-of-00010.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00010.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00006-of-00010.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00006-of-00010.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00006-of-00010.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00006-of-00010.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00007-of-00010.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00007-of-00010.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00007-of-00010.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00007-of-00010.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00007-of-00010.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00008-of-00010.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00010.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00008-of-00010.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00008-of-00010.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00008-of-00010.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00009-of-00010.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00009-of-00010.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00009-of-00010.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00009-of-00010.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00010-of-00010.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00010-of-00010.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00010-of-00010.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00010.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
323
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00010.safetensors",
324
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
325
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
326
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
327
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
328
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
329
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
330
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
331
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
332
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00010.safetensors",
333
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
334
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
335
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
336
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
337
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
338
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
339
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
340
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
341
+ "model.layers.7.input_layernorm.weight": "model-00003-of-00010.safetensors",
342
+ "model.layers.7.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
343
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
344
+ "model.layers.7.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
345
+ "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
346
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
347
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
348
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
349
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
350
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00010.safetensors",
351
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
352
+ "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
353
+ "model.layers.8.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
354
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
355
+ "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
356
+ "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
357
+ "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
358
+ "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
359
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00010.safetensors",
360
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
361
+ "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
362
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
363
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
364
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
365
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
366
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
367
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
368
+ "model.norm.weight": "model-00010-of-00010.safetensors"
369
+ }
370
+ }
params.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 5120,
3
+ "n_layers": 40,
4
+ "head_dim": 128,
5
+ "hidden_dim": 32768,
6
+ "n_heads": 32,
7
+ "n_kv_heads": 8,
8
+ "norm_eps": 1e-05,
9
+ "vocab_size": 131072,
10
+ "rope_theta": 100000000.0,
11
+ "max_seq_len": 32768
12
+ }
special_tokens_map.json ADDED
The diff for this file is too large to render. See raw diff
 
tekken.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4b90a968dbc67ef3975129d0b78a2e3cbb6bea340ab9205f22e8a0308b1ffc5
3
+ size 14801223
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949
3
+ size 17078037
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff