Add files using upload-large-folder tool
Browse files- README.md +274 -0
- config.json +36 -0
- config.yml +228 -0
- generation_config.json +6 -0
- measurements.json +0 -0
- model.safetensors.index.json +802 -0
- output-00001-of-00006.safetensors +3 -0
- output-00002-of-00006.safetensors +3 -0
- output-00003-of-00006.safetensors +3 -0
- output-00004-of-00006.safetensors +3 -0
- output-00005-of-00006.safetensors +3 -0
- output-00006-of-00006.safetensors +3 -0
- params.json +11 -0
- test.py +26 -0
- tokenizer.json +0 -0
- tokenizer.model +3 -0
- tokenizer.model.v3 +0 -0
- tokenizer_config.json +0 -0
- upload.py +45 -0
README.md
ADDED
@@ -0,0 +1,274 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: mrl
|
4 |
+
license_link: https://mistral.ai/licenses/MRL-0.1.md
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
- fr
|
8 |
+
- de
|
9 |
+
- es
|
10 |
+
- it
|
11 |
+
- pt
|
12 |
+
- zh
|
13 |
+
- ja
|
14 |
+
- ru
|
15 |
+
- ko
|
16 |
+
|
17 |
+
extra_gated_description: If you want to learn more about how we process your personal data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
|
18 |
+
---
|
19 |
+
|
20 |
+
# Model Card for Mistral-Large-Instruct-2407
|
21 |
+
|
22 |
+
Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
|
23 |
+
|
24 |
+
For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-large-2407/).
|
25 |
+
|
26 |
+
## Key features
|
27 |
+
- **Multi-lingual by design:** Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
|
28 |
+
- **Proficient in coding:** Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran.
|
29 |
+
- **Agentic-centric:** Best-in-class agentic capabilities with native function calling and JSON outputting.
|
30 |
+
- **Advanced Reasoning:** State-of-the-art mathematical and reasoning capabilities.
|
31 |
+
- **Mistral Research License:** Allows usage and modification for research and non-commercial usages.
|
32 |
+
- **Large Context:** A large 128k context window.
|
33 |
+
|
34 |
+
## Metrics
|
35 |
+
|
36 |
+
### Base Pretrained Benchmarks
|
37 |
+
|
38 |
+
| Benchmark | Score |
|
39 |
+
| --- | --- |
|
40 |
+
| MMLU | 84.0% |
|
41 |
+
|
42 |
+
|
43 |
+
### Base Pretrained Multilingual Benchmarks (MMLU)
|
44 |
+
| Benchmark | Score |
|
45 |
+
| --- | --- |
|
46 |
+
| French | 82.8% |
|
47 |
+
| German | 81.6% |
|
48 |
+
| Spanish | 82.7% |
|
49 |
+
| Italian | 82.7% |
|
50 |
+
| Dutch | 80.7% |
|
51 |
+
| Portuguese | 81.6% |
|
52 |
+
| Russian | 79.0% |
|
53 |
+
| Korean | 60.1% |
|
54 |
+
| Japanese | 78.8% |
|
55 |
+
| Chinese | 74.8% |
|
56 |
+
|
57 |
+
|
58 |
+
### Instruction Benchmarks
|
59 |
+
|
60 |
+
| Benchmark | Score |
|
61 |
+
| --- | --- |
|
62 |
+
| MT Bench | 8.63 |
|
63 |
+
| Wild Bench | 56.3 |
|
64 |
+
| Arena Hard| 73.2 |
|
65 |
+
|
66 |
+
### Code & Reasoning Benchmarks
|
67 |
+
| Benchmark | Score |
|
68 |
+
| --- | --- |
|
69 |
+
| Human Eval | 92% |
|
70 |
+
| Human Eval Plus| 87% |
|
71 |
+
| MBPP Base| 80% |
|
72 |
+
| MBPP Plus| 69% |
|
73 |
+
|
74 |
+
### Math Benchmarks
|
75 |
+
|
76 |
+
| Benchmark | Score |
|
77 |
+
| --- | --- |
|
78 |
+
| GSM8K | 93% |
|
79 |
+
| Math Instruct (0-shot, no CoT) | 70% |
|
80 |
+
| Math Instruct (0-shot, CoT)| 71.5% |
|
81 |
+
|
82 |
+
## Usage
|
83 |
+
|
84 |
+
The model can be used with two different frameworks
|
85 |
+
|
86 |
+
- [`mistral_inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
|
87 |
+
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
|
88 |
+
|
89 |
+
### Mistral Inference
|
90 |
+
|
91 |
+
#### Install
|
92 |
+
|
93 |
+
It is recommended to use `mistralai/Mistral-Large-Instruct-2407` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
|
94 |
+
|
95 |
+
```
|
96 |
+
pip install mistral_inference
|
97 |
+
```
|
98 |
+
|
99 |
+
#### Download
|
100 |
+
|
101 |
+
```py
|
102 |
+
from huggingface_hub import snapshot_download
|
103 |
+
from pathlib import Path
|
104 |
+
|
105 |
+
mistral_models_path = Path.home().joinpath('mistral_models', 'Large')
|
106 |
+
mistral_models_path.mkdir(parents=True, exist_ok=True)
|
107 |
+
|
108 |
+
snapshot_download(repo_id="mistralai/Mistral-Large-Instruct-2407", allow_patterns=["params.json", "consolidated-*.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
|
109 |
+
```
|
110 |
+
|
111 |
+
#### Chat
|
112 |
+
|
113 |
+
After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment.
|
114 |
+
Given the size of this model, you will need a node with several GPUs (more than 300GB cumulated vRAM).
|
115 |
+
If you have 8 GPUs on your machine, you can chat with the model using
|
116 |
+
|
117 |
+
```
|
118 |
+
torchrun --nproc-per-node 8 --no-python mistral-chat $HOME/mistral_models/Large --instruct --max_tokens 256 --temperature 0.7
|
119 |
+
```
|
120 |
+
|
121 |
+
*E.g.* Try out something like:
|
122 |
+
```
|
123 |
+
How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar.
|
124 |
+
```
|
125 |
+
|
126 |
+
#### Instruct following
|
127 |
+
|
128 |
+
```py
|
129 |
+
from mistral_inference.transformer import Transformer
|
130 |
+
from mistral_inference.generate import generate
|
131 |
+
|
132 |
+
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
|
133 |
+
from mistral_common.protocol.instruct.messages import UserMessage
|
134 |
+
from mistral_common.protocol.instruct.request import ChatCompletionRequest
|
135 |
+
|
136 |
+
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
|
137 |
+
model = Transformer.from_folder(mistral_models_path)
|
138 |
+
|
139 |
+
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
|
140 |
+
|
141 |
+
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
|
142 |
+
|
143 |
+
tokens = tokenizer.encode_chat_completion(completion_request).tokens
|
144 |
+
|
145 |
+
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
|
146 |
+
result = tokenizer.decode(out_tokens[0])
|
147 |
+
|
148 |
+
print(result)
|
149 |
+
```
|
150 |
+
|
151 |
+
#### Function calling
|
152 |
+
|
153 |
+
```py
|
154 |
+
from mistral_common.protocol.instruct.tool_calls import Function, Tool
|
155 |
+
from mistral_inference.transformer import Transformer
|
156 |
+
from mistral_inference.generate import generate
|
157 |
+
|
158 |
+
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
|
159 |
+
from mistral_common.protocol.instruct.messages import UserMessage
|
160 |
+
from mistral_common.protocol.instruct.request import ChatCompletionRequest
|
161 |
+
|
162 |
+
|
163 |
+
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
|
164 |
+
model = Transformer.from_folder(mistral_models_path)
|
165 |
+
|
166 |
+
completion_request = ChatCompletionRequest(
|
167 |
+
tools=[
|
168 |
+
Tool(
|
169 |
+
function=Function(
|
170 |
+
name="get_current_weather",
|
171 |
+
description="Get the current weather",
|
172 |
+
parameters={
|
173 |
+
"type": "object",
|
174 |
+
"properties": {
|
175 |
+
"location": {
|
176 |
+
"type": "string",
|
177 |
+
"description": "The city and state, e.g. San Francisco, CA",
|
178 |
+
},
|
179 |
+
"format": {
|
180 |
+
"type": "string",
|
181 |
+
"enum": ["celsius", "fahrenheit"],
|
182 |
+
"description": "The temperature unit to use. Infer this from the users location.",
|
183 |
+
},
|
184 |
+
},
|
185 |
+
"required": ["location", "format"],
|
186 |
+
},
|
187 |
+
)
|
188 |
+
)
|
189 |
+
],
|
190 |
+
messages=[
|
191 |
+
UserMessage(content="What's the weather like today in Paris?"),
|
192 |
+
],
|
193 |
+
)
|
194 |
+
|
195 |
+
tokens = tokenizer.encode_chat_completion(completion_request).tokens
|
196 |
+
|
197 |
+
out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
|
198 |
+
result = tokenizer.decode(out_tokens[0])
|
199 |
+
|
200 |
+
print(result)
|
201 |
+
```
|
202 |
+
|
203 |
+
### Transformers
|
204 |
+
|
205 |
+
If you want to use Hugging Face `transformers` to generate text, you can do something like this.
|
206 |
+
|
207 |
+
```py
|
208 |
+
from transformers import pipeline
|
209 |
+
|
210 |
+
messages = [
|
211 |
+
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
212 |
+
{"role": "user", "content": "Who are you?"},
|
213 |
+
]
|
214 |
+
chatbot = pipeline("text-generation", model="mistralai/Mistral-Large-Instruct-2407")
|
215 |
+
chatbot(messages)
|
216 |
+
```
|
217 |
+
|
218 |
+
## Function calling with `transformers`
|
219 |
+
|
220 |
+
To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the
|
221 |
+
[function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling)
|
222 |
+
in the `transformers` docs for more information.
|
223 |
+
|
224 |
+
```python
|
225 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
226 |
+
import torch
|
227 |
+
|
228 |
+
model_id = "mistralai/Mistral-Large-Instruct-2407"
|
229 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
230 |
+
|
231 |
+
def get_current_weather(location: str, format: str):
|
232 |
+
"""
|
233 |
+
Get the current weather
|
234 |
+
|
235 |
+
Args:
|
236 |
+
location: The city and state, e.g. San Francisco, CA
|
237 |
+
format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
|
238 |
+
"""
|
239 |
+
pass
|
240 |
+
|
241 |
+
conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
|
242 |
+
tools = [get_current_weather]
|
243 |
+
|
244 |
+
# format and tokenize the tool use prompt
|
245 |
+
inputs = tokenizer.apply_chat_template(
|
246 |
+
conversation,
|
247 |
+
tools=tools,
|
248 |
+
add_generation_prompt=True,
|
249 |
+
return_dict=True,
|
250 |
+
return_tensors="pt",
|
251 |
+
)
|
252 |
+
|
253 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
|
254 |
+
|
255 |
+
inputs.to(model.device)
|
256 |
+
outputs = model.generate(**inputs, max_new_tokens=1000)
|
257 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
258 |
+
```
|
259 |
+
|
260 |
+
Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool
|
261 |
+
results to the chat history so that the model can use them in its next generation. For a full tool calling example, please
|
262 |
+
see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling),
|
263 |
+
and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be
|
264 |
+
exactly 9 alphanumeric characters.
|
265 |
+
|
266 |
+
## Limitations
|
267 |
+
|
268 |
+
The Mistral Large model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
|
269 |
+
It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
|
270 |
+
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|
271 |
+
|
272 |
+
## The Mistral AI Team
|
273 |
+
|
274 |
+
Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
|
config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"MistralForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_dropout": 0.0,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"hidden_act": "silu",
|
9 |
+
"hidden_size": 12288,
|
10 |
+
"initializer_range": 0.02,
|
11 |
+
"intermediate_size": 28672,
|
12 |
+
"max_position_embeddings": 131072,
|
13 |
+
"model_type": "mistral",
|
14 |
+
"num_attention_heads": 96,
|
15 |
+
"num_hidden_layers": 88,
|
16 |
+
"num_key_value_heads": 8,
|
17 |
+
"rms_norm_eps": 1e-05,
|
18 |
+
"rope_theta": 1000000.0,
|
19 |
+
"sliding_window": null,
|
20 |
+
"tie_word_embeddings": false,
|
21 |
+
"torch_dtype": "bfloat16",
|
22 |
+
"transformers_version": "4.42.3",
|
23 |
+
"use_cache": true,
|
24 |
+
"vocab_size": 32768,
|
25 |
+
"quantization_config": {
|
26 |
+
"quant_method": "exl2",
|
27 |
+
"version": "0.2.1",
|
28 |
+
"bits": 2.85,
|
29 |
+
"head_bits": 6,
|
30 |
+
"calibration": {
|
31 |
+
"rows": 115,
|
32 |
+
"length": 2048,
|
33 |
+
"dataset": "(default)"
|
34 |
+
}
|
35 |
+
}
|
36 |
+
}
|
config.yml
ADDED
@@ -0,0 +1,228 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Sample YAML file for configuration.
|
2 |
+
# Comment and uncomment values as needed. Every value has a default within the application.
|
3 |
+
# This file serves to be a drop in for config.yml
|
4 |
+
|
5 |
+
# Unless specified in the comments, DO NOT put these options in quotes!
|
6 |
+
# You can use https://www.yamllint.com/ if you want to check your YAML formatting.
|
7 |
+
|
8 |
+
# Options for networking
|
9 |
+
network:
|
10 |
+
# The IP to host on (default: 127.0.0.1).
|
11 |
+
# Use 0.0.0.0 to expose on all network adapters
|
12 |
+
host: 0.0.0.0
|
13 |
+
|
14 |
+
# The port to host on (default: 5000)
|
15 |
+
port: 5000
|
16 |
+
|
17 |
+
# Disable HTTP token authenticaion with requests
|
18 |
+
# WARNING: This will make your instance vulnerable!
|
19 |
+
# Turn on this option if you are ONLY connecting from localhost
|
20 |
+
disable_auth: False
|
21 |
+
|
22 |
+
# Send tracebacks over the API to clients (default: False)
|
23 |
+
# NOTE: Only enable this for debug purposes
|
24 |
+
send_tracebacks: False
|
25 |
+
|
26 |
+
# Select API servers to enable (default: ["OAI"])
|
27 |
+
# Possible values: OAI
|
28 |
+
api_servers: ["OAI"]
|
29 |
+
|
30 |
+
# Options for logging
|
31 |
+
logging:
|
32 |
+
# Enable prompt logging (default: False)
|
33 |
+
prompt: False
|
34 |
+
|
35 |
+
# Enable generation parameter logging (default: False)
|
36 |
+
generation_params: False
|
37 |
+
|
38 |
+
# Enable request logging (default: False)
|
39 |
+
# NOTE: Only use this for debugging!
|
40 |
+
requests: False
|
41 |
+
|
42 |
+
# Options for sampling
|
43 |
+
sampling:
|
44 |
+
# Override preset name. Find this in the sampler-overrides folder (default: None)
|
45 |
+
# This overrides default fallbacks for sampler values that are passed to the API
|
46 |
+
# Server-side overrides are NOT needed by default
|
47 |
+
# WARNING: Using this can result in a generation speed penalty
|
48 |
+
#override_preset:
|
49 |
+
|
50 |
+
# Options for development and experimentation
|
51 |
+
developer:
|
52 |
+
# Skips exllamav2 version check (default: False)
|
53 |
+
# It's highly recommended to update your dependencies rather than enabling this flag
|
54 |
+
# WARNING: Don't set this unless you know what you're doing!
|
55 |
+
#unsafe_launch: False
|
56 |
+
|
57 |
+
# Disable all request streaming (default: False)
|
58 |
+
# A kill switch for turning off SSE in the API server
|
59 |
+
#disable_request_streaming: False
|
60 |
+
|
61 |
+
# Enable the torch CUDA malloc backend (default: False)
|
62 |
+
# This can save a few MBs of VRAM, but has a risk of errors. Use at your own risk.
|
63 |
+
cuda_malloc_backend: True
|
64 |
+
|
65 |
+
# Enable Uvloop or Winloop (default: False)
|
66 |
+
# Make the program utilize a faster async event loop which can improve performance
|
67 |
+
# NOTE: It's recommended to enable this, but if something breaks, turn this off.
|
68 |
+
uvloop: True
|
69 |
+
|
70 |
+
# Set process to use a higher priority
|
71 |
+
# For realtime process priority, run as administrator or sudo
|
72 |
+
# Otherwise, the priority will be set to high
|
73 |
+
realtime_process_priority: True
|
74 |
+
|
75 |
+
# Options for model overrides and loading
|
76 |
+
# Please read the comments to understand how arguments are handled between initial and API loads
|
77 |
+
model:
|
78 |
+
# Overrides the directory to look for models (default: models)
|
79 |
+
# Windows users, DO NOT put this path in quotes! This directory will be invalid otherwise.
|
80 |
+
model_dir: models
|
81 |
+
|
82 |
+
# Sends dummy model names when the models endpoint is queried
|
83 |
+
# Enable this if the program is looking for a specific OAI model
|
84 |
+
#use_dummy_models: False
|
85 |
+
|
86 |
+
# An initial model to load. Make sure the model is located in the model directory!
|
87 |
+
# A model can be loaded later via the API.
|
88 |
+
# REQUIRED: This must be filled out to load a model on startup!
|
89 |
+
model_name: Mistral-Large-Instruct-2407_exl2_2.85bpw
|
90 |
+
|
91 |
+
# The below parameters only apply for initial loads
|
92 |
+
# All API based loads do NOT inherit these settings unless specified in use_as_default
|
93 |
+
|
94 |
+
# Names of args to use as a default fallback for API load requests (default: [])
|
95 |
+
# For example, if you always want cache_mode to be Q4 instead of on the inital model load,
|
96 |
+
# Add "cache_mode" to this array
|
97 |
+
# Ex. ["max_seq_len", "cache_mode"]
|
98 |
+
#use_as_default: []
|
99 |
+
|
100 |
+
# The below parameters apply only if model_name is set
|
101 |
+
|
102 |
+
# Max sequence length (default: Empty)
|
103 |
+
# Fetched from the model's base sequence length in config.json by default
|
104 |
+
max_seq_len: 32768
|
105 |
+
|
106 |
+
# Overrides base model context length (default: Empty)
|
107 |
+
# WARNING: Don't set this unless you know what you're doing!
|
108 |
+
# Again, do NOT use this for configuring context length, use max_seq_len above ^
|
109 |
+
# Only use this if the model's base sequence length in config.json is incorrect (ex. Mistral 7B)
|
110 |
+
#override_base_seq_len:
|
111 |
+
|
112 |
+
# Load model with tensor parallelism
|
113 |
+
# If a GPU split isn't provided, the TP loader will fallback to autosplit
|
114 |
+
# Enabling ignores the gpu_split_auto and autosplit_reserve values
|
115 |
+
#tensor_parallel: True
|
116 |
+
|
117 |
+
# Automatically allocate resources to GPUs (default: True)
|
118 |
+
# NOTE: Not parsed for single GPU users
|
119 |
+
gpu_split_auto: True
|
120 |
+
|
121 |
+
# Reserve VRAM used for autosplit loading (default: 96 MB on GPU 0)
|
122 |
+
# This is represented as an array of MB per GPU used
|
123 |
+
autosplit_reserve: [0]
|
124 |
+
|
125 |
+
# An integer array of GBs of vram to split between GPUs (default: [])
|
126 |
+
# Used with tensor parallelism
|
127 |
+
# NOTE: Not parsed for single GPU users
|
128 |
+
#gpu_split: [20.6, 24]
|
129 |
+
|
130 |
+
# Rope scale (default: 1.0)
|
131 |
+
# Same thing as compress_pos_emb
|
132 |
+
# Only use if your model was trained on long context with rope (check config.json)
|
133 |
+
# Leave blank to pull the value from the model
|
134 |
+
#rope_scale: 1.0
|
135 |
+
|
136 |
+
# Rope alpha (default: 1.0)
|
137 |
+
# Same thing as alpha_value
|
138 |
+
# Leave blank to automatically calculate alpha
|
139 |
+
#rope_alpha: 1.0
|
140 |
+
|
141 |
+
# Enable different cache modes for VRAM savings (slight performance hit).
|
142 |
+
# Possible values FP16, Q8, Q6, Q4. (default: FP16)
|
143 |
+
cache_mode: Q4
|
144 |
+
|
145 |
+
# Size of the prompt cache to allocate (default: max_seq_len)
|
146 |
+
# This must be a multiple of 256. A larger cache uses more VRAM, but allows for more prompts to be processed at once.
|
147 |
+
# NOTE: Cache size should not be less than max_seq_len.
|
148 |
+
# For CFG, set this to 2 * max_seq_len to make room for both positive and negative prompts.
|
149 |
+
# cache_size:
|
150 |
+
|
151 |
+
# Chunk size for prompt ingestion. A lower value reduces VRAM usage at the cost of ingestion speed (default: 2048)
|
152 |
+
# NOTE: Effects vary depending on the model. An ideal value is between 512 and 4096
|
153 |
+
chunk_size: 1024
|
154 |
+
|
155 |
+
# Set the maximum amount of prompts to process at one time (default: None/Automatic)
|
156 |
+
# This will be automatically calculated if left blank.
|
157 |
+
# A max batch size of 1 processes prompts one at a time.
|
158 |
+
# NOTE: Only available for Nvidia ampere (30 series) and above GPUs
|
159 |
+
#max_batch_size:
|
160 |
+
|
161 |
+
# Set the prompt template for this model. If empty, attempts to look for the model's chat template. (default: None)
|
162 |
+
# If a model contains multiple templates in its tokenizer_config.json, set prompt_template to the name
|
163 |
+
# of the template you want to use.
|
164 |
+
# NOTE: Only works with chat completion message lists!
|
165 |
+
#prompt_template:
|
166 |
+
|
167 |
+
# Number of experts to use PER TOKEN. Fetched from the model's config.json if not specified (default: Empty)
|
168 |
+
# WARNING: Don't set this unless you know what you're doing!
|
169 |
+
# NOTE: For MoE models (ex. Mixtral) only!
|
170 |
+
#num_experts_per_token:
|
171 |
+
|
172 |
+
# Enables fasttensors to possibly increase model loading speeds (default: False)
|
173 |
+
fasttensors: true
|
174 |
+
|
175 |
+
# Options for draft models (speculative decoding). This will use more VRAM!
|
176 |
+
#draft:
|
177 |
+
# Overrides the directory to look for draft (default: models)
|
178 |
+
#draft_model_dir: models
|
179 |
+
|
180 |
+
# An initial draft model to load. Make sure this model is located in the model directory!
|
181 |
+
# A draft model can be loaded later via the API.
|
182 |
+
#draft_model_name: A model name
|
183 |
+
|
184 |
+
# The below parameters only apply for initial loads
|
185 |
+
# All API based loads do NOT inherit these settings unless specified in use_as_default
|
186 |
+
|
187 |
+
# Rope scale for draft models (default: 1.0)
|
188 |
+
# Same thing as compress_pos_emb
|
189 |
+
# Only use if your draft model was trained on long context with rope (check config.json)
|
190 |
+
#draft_rope_scale: 1.0
|
191 |
+
|
192 |
+
# Rope alpha for draft model (default: 1.0)
|
193 |
+
# Same thing as alpha_value
|
194 |
+
# Leave blank to automatically calculate alpha value
|
195 |
+
#draft_rope_alpha: 1.0
|
196 |
+
|
197 |
+
# Enable different draft model cache modes for VRAM savings (slight performance hit).
|
198 |
+
# Possible values FP16, Q8, Q6, Q4. (default: FP16)
|
199 |
+
#draft_cache_mode: FP16
|
200 |
+
|
201 |
+
# Options for loras
|
202 |
+
#lora:
|
203 |
+
# Overrides the directory to look for loras (default: loras)
|
204 |
+
#lora_dir: loras
|
205 |
+
|
206 |
+
# List of loras to load and associated scaling factors (default: 1.0). Comment out unused entries or add more rows as needed.
|
207 |
+
#loras:
|
208 |
+
#- name: lora1
|
209 |
+
# scaling: 1.0
|
210 |
+
|
211 |
+
# Options for embedding models and loading.
|
212 |
+
# NOTE: Embeddings requires the "extras" feature to be installed
|
213 |
+
# Install it via "pip install .[extras]"
|
214 |
+
embeddings:
|
215 |
+
# Overrides directory to look for embedding models (default: models)
|
216 |
+
embedding_model_dir: models
|
217 |
+
|
218 |
+
# Device to load embedding models on (default: cpu)
|
219 |
+
# Possible values: cpu, auto, cuda
|
220 |
+
# NOTE: It's recommended to load embedding models on the CPU.
|
221 |
+
# If you'd like to load on an AMD gpu, set this value to "cuda" as well.
|
222 |
+
embeddings_device: cpu
|
223 |
+
|
224 |
+
# The below parameters only apply for initial loads
|
225 |
+
# All API based loads do NOT inherit these settings unless specified in use_as_default
|
226 |
+
|
227 |
+
# An initial embedding model to load on the infinity backend (default: None)
|
228 |
+
embedding_model_name:
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.42.3"
|
6 |
+
}
|
measurements.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,802 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 245220139008
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "model-00051-of-00051.safetensors",
|
7 |
+
"model.embed_tokens.weight": "model-00001-of-00051.safetensors",
|
8 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00051.safetensors",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00051.safetensors",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00051.safetensors",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00051.safetensors",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
|
17 |
+
"model.layers.1.input_layernorm.weight": "model-00002-of-00051.safetensors",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
|
26 |
+
"model.layers.10.input_layernorm.weight": "model-00007-of-00051.safetensors",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
|
35 |
+
"model.layers.11.input_layernorm.weight": "model-00007-of-00051.safetensors",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00007-of-00051.safetensors",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00007-of-00051.safetensors",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00007-of-00051.safetensors",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00007-of-00051.safetensors",
|
44 |
+
"model.layers.12.input_layernorm.weight": "model-00008-of-00051.safetensors",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "model-00008-of-00051.safetensors",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00008-of-00051.safetensors",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
|
53 |
+
"model.layers.13.input_layernorm.weight": "model-00009-of-00051.safetensors",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
|
62 |
+
"model.layers.14.input_layernorm.weight": "model-00009-of-00051.safetensors",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "model-00009-of-00051.safetensors",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
|
71 |
+
"model.layers.15.input_layernorm.weight": "model-00010-of-00051.safetensors",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
|
80 |
+
"model.layers.16.input_layernorm.weight": "model-00010-of-00051.safetensors",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00010-of-00051.safetensors",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
|
89 |
+
"model.layers.17.input_layernorm.weight": "model-00011-of-00051.safetensors",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
|
98 |
+
"model.layers.18.input_layernorm.weight": "model-00011-of-00051.safetensors",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00011-of-00051.safetensors",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00011-of-00051.safetensors",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00011-of-00051.safetensors",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00011-of-00051.safetensors",
|
107 |
+
"model.layers.19.input_layernorm.weight": "model-00012-of-00051.safetensors",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "model-00012-of-00051.safetensors",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00012-of-00051.safetensors",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
|
116 |
+
"model.layers.2.input_layernorm.weight": "model-00002-of-00051.safetensors",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00051.safetensors",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
|
125 |
+
"model.layers.20.input_layernorm.weight": "model-00013-of-00051.safetensors",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
|
134 |
+
"model.layers.21.input_layernorm.weight": "model-00013-of-00051.safetensors",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "model-00013-of-00051.safetensors",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
|
143 |
+
"model.layers.22.input_layernorm.weight": "model-00014-of-00051.safetensors",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
|
152 |
+
"model.layers.23.input_layernorm.weight": "model-00014-of-00051.safetensors",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00014-of-00051.safetensors",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
|
161 |
+
"model.layers.24.input_layernorm.weight": "model-00015-of-00051.safetensors",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
|
170 |
+
"model.layers.25.input_layernorm.weight": "model-00015-of-00051.safetensors",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00015-of-00051.safetensors",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00015-of-00051.safetensors",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00015-of-00051.safetensors",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00015-of-00051.safetensors",
|
179 |
+
"model.layers.26.input_layernorm.weight": "model-00016-of-00051.safetensors",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "model-00016-of-00051.safetensors",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00016-of-00051.safetensors",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
|
188 |
+
"model.layers.27.input_layernorm.weight": "model-00017-of-00051.safetensors",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
|
197 |
+
"model.layers.28.input_layernorm.weight": "model-00017-of-00051.safetensors",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "model-00017-of-00051.safetensors",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
|
206 |
+
"model.layers.29.input_layernorm.weight": "model-00018-of-00051.safetensors",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
|
215 |
+
"model.layers.3.input_layernorm.weight": "model-00003-of-00051.safetensors",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
|
224 |
+
"model.layers.30.input_layernorm.weight": "model-00018-of-00051.safetensors",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00018-of-00051.safetensors",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
|
233 |
+
"model.layers.31.input_layernorm.weight": "model-00019-of-00051.safetensors",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
|
242 |
+
"model.layers.32.input_layernorm.weight": "model-00019-of-00051.safetensors",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "model-00019-of-00051.safetensors",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "model-00019-of-00051.safetensors",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "model-00019-of-00051.safetensors",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "model-00019-of-00051.safetensors",
|
251 |
+
"model.layers.33.input_layernorm.weight": "model-00020-of-00051.safetensors",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "model-00020-of-00051.safetensors",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "model-00020-of-00051.safetensors",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
|
260 |
+
"model.layers.34.input_layernorm.weight": "model-00021-of-00051.safetensors",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
|
269 |
+
"model.layers.35.input_layernorm.weight": "model-00021-of-00051.safetensors",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "model-00021-of-00051.safetensors",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
|
278 |
+
"model.layers.36.input_layernorm.weight": "model-00022-of-00051.safetensors",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
|
287 |
+
"model.layers.37.input_layernorm.weight": "model-00022-of-00051.safetensors",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "model-00022-of-00051.safetensors",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
|
296 |
+
"model.layers.38.input_layernorm.weight": "model-00023-of-00051.safetensors",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
|
305 |
+
"model.layers.39.input_layernorm.weight": "model-00023-of-00051.safetensors",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "model-00023-of-00051.safetensors",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "model-00023-of-00051.safetensors",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "model-00023-of-00051.safetensors",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "model-00023-of-00051.safetensors",
|
314 |
+
"model.layers.4.input_layernorm.weight": "model-00003-of-00051.safetensors",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00003-of-00051.safetensors",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00003-of-00051.safetensors",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00003-of-00051.safetensors",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00003-of-00051.safetensors",
|
323 |
+
"model.layers.40.input_layernorm.weight": "model-00024-of-00051.safetensors",
|
324 |
+
"model.layers.40.mlp.down_proj.weight": "model-00024-of-00051.safetensors",
|
325 |
+
"model.layers.40.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
|
326 |
+
"model.layers.40.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
|
327 |
+
"model.layers.40.post_attention_layernorm.weight": "model-00024-of-00051.safetensors",
|
328 |
+
"model.layers.40.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
|
329 |
+
"model.layers.40.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
|
330 |
+
"model.layers.40.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
|
331 |
+
"model.layers.40.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
|
332 |
+
"model.layers.41.input_layernorm.weight": "model-00025-of-00051.safetensors",
|
333 |
+
"model.layers.41.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
|
334 |
+
"model.layers.41.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
|
335 |
+
"model.layers.41.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
|
336 |
+
"model.layers.41.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
|
337 |
+
"model.layers.41.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
|
338 |
+
"model.layers.41.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
|
339 |
+
"model.layers.41.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
|
340 |
+
"model.layers.41.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
|
341 |
+
"model.layers.42.input_layernorm.weight": "model-00025-of-00051.safetensors",
|
342 |
+
"model.layers.42.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
|
343 |
+
"model.layers.42.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
|
344 |
+
"model.layers.42.mlp.up_proj.weight": "model-00025-of-00051.safetensors",
|
345 |
+
"model.layers.42.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
|
346 |
+
"model.layers.42.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
|
347 |
+
"model.layers.42.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
|
348 |
+
"model.layers.42.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
|
349 |
+
"model.layers.42.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
|
350 |
+
"model.layers.43.input_layernorm.weight": "model-00026-of-00051.safetensors",
|
351 |
+
"model.layers.43.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
|
352 |
+
"model.layers.43.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
|
353 |
+
"model.layers.43.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
|
354 |
+
"model.layers.43.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
|
355 |
+
"model.layers.43.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
|
356 |
+
"model.layers.43.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
|
357 |
+
"model.layers.43.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
|
358 |
+
"model.layers.43.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
|
359 |
+
"model.layers.44.input_layernorm.weight": "model-00026-of-00051.safetensors",
|
360 |
+
"model.layers.44.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
|
361 |
+
"model.layers.44.mlp.gate_proj.weight": "model-00026-of-00051.safetensors",
|
362 |
+
"model.layers.44.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
|
363 |
+
"model.layers.44.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
|
364 |
+
"model.layers.44.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
|
365 |
+
"model.layers.44.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
|
366 |
+
"model.layers.44.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
|
367 |
+
"model.layers.44.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
|
368 |
+
"model.layers.45.input_layernorm.weight": "model-00027-of-00051.safetensors",
|
369 |
+
"model.layers.45.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
|
370 |
+
"model.layers.45.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
|
371 |
+
"model.layers.45.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
|
372 |
+
"model.layers.45.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
|
373 |
+
"model.layers.45.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
|
374 |
+
"model.layers.45.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
|
375 |
+
"model.layers.45.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
|
376 |
+
"model.layers.45.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
|
377 |
+
"model.layers.46.input_layernorm.weight": "model-00027-of-00051.safetensors",
|
378 |
+
"model.layers.46.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
|
379 |
+
"model.layers.46.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
|
380 |
+
"model.layers.46.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
|
381 |
+
"model.layers.46.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
|
382 |
+
"model.layers.46.self_attn.k_proj.weight": "model-00027-of-00051.safetensors",
|
383 |
+
"model.layers.46.self_attn.o_proj.weight": "model-00027-of-00051.safetensors",
|
384 |
+
"model.layers.46.self_attn.q_proj.weight": "model-00027-of-00051.safetensors",
|
385 |
+
"model.layers.46.self_attn.v_proj.weight": "model-00027-of-00051.safetensors",
|
386 |
+
"model.layers.47.input_layernorm.weight": "model-00028-of-00051.safetensors",
|
387 |
+
"model.layers.47.mlp.down_proj.weight": "model-00028-of-00051.safetensors",
|
388 |
+
"model.layers.47.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
|
389 |
+
"model.layers.47.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
|
390 |
+
"model.layers.47.post_attention_layernorm.weight": "model-00028-of-00051.safetensors",
|
391 |
+
"model.layers.47.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
|
392 |
+
"model.layers.47.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
|
393 |
+
"model.layers.47.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
|
394 |
+
"model.layers.47.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
|
395 |
+
"model.layers.48.input_layernorm.weight": "model-00029-of-00051.safetensors",
|
396 |
+
"model.layers.48.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
|
397 |
+
"model.layers.48.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
|
398 |
+
"model.layers.48.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
|
399 |
+
"model.layers.48.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
|
400 |
+
"model.layers.48.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
|
401 |
+
"model.layers.48.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
|
402 |
+
"model.layers.48.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
|
403 |
+
"model.layers.48.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
|
404 |
+
"model.layers.49.input_layernorm.weight": "model-00029-of-00051.safetensors",
|
405 |
+
"model.layers.49.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
|
406 |
+
"model.layers.49.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
|
407 |
+
"model.layers.49.mlp.up_proj.weight": "model-00029-of-00051.safetensors",
|
408 |
+
"model.layers.49.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
|
409 |
+
"model.layers.49.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
|
410 |
+
"model.layers.49.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
|
411 |
+
"model.layers.49.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
|
412 |
+
"model.layers.49.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
|
413 |
+
"model.layers.5.input_layernorm.weight": "model-00004-of-00051.safetensors",
|
414 |
+
"model.layers.5.mlp.down_proj.weight": "model-00004-of-00051.safetensors",
|
415 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
|
416 |
+
"model.layers.5.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
|
417 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00004-of-00051.safetensors",
|
418 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
|
419 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
|
420 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
|
421 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
|
422 |
+
"model.layers.50.input_layernorm.weight": "model-00030-of-00051.safetensors",
|
423 |
+
"model.layers.50.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
|
424 |
+
"model.layers.50.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
|
425 |
+
"model.layers.50.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
|
426 |
+
"model.layers.50.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
|
427 |
+
"model.layers.50.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
|
428 |
+
"model.layers.50.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
|
429 |
+
"model.layers.50.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
|
430 |
+
"model.layers.50.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
|
431 |
+
"model.layers.51.input_layernorm.weight": "model-00030-of-00051.safetensors",
|
432 |
+
"model.layers.51.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
|
433 |
+
"model.layers.51.mlp.gate_proj.weight": "model-00030-of-00051.safetensors",
|
434 |
+
"model.layers.51.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
|
435 |
+
"model.layers.51.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
|
436 |
+
"model.layers.51.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
|
437 |
+
"model.layers.51.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
|
438 |
+
"model.layers.51.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
|
439 |
+
"model.layers.51.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
|
440 |
+
"model.layers.52.input_layernorm.weight": "model-00031-of-00051.safetensors",
|
441 |
+
"model.layers.52.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
|
442 |
+
"model.layers.52.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
|
443 |
+
"model.layers.52.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
|
444 |
+
"model.layers.52.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
|
445 |
+
"model.layers.52.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
|
446 |
+
"model.layers.52.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
|
447 |
+
"model.layers.52.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
|
448 |
+
"model.layers.52.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
|
449 |
+
"model.layers.53.input_layernorm.weight": "model-00031-of-00051.safetensors",
|
450 |
+
"model.layers.53.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
|
451 |
+
"model.layers.53.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
|
452 |
+
"model.layers.53.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
|
453 |
+
"model.layers.53.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
|
454 |
+
"model.layers.53.self_attn.k_proj.weight": "model-00031-of-00051.safetensors",
|
455 |
+
"model.layers.53.self_attn.o_proj.weight": "model-00031-of-00051.safetensors",
|
456 |
+
"model.layers.53.self_attn.q_proj.weight": "model-00031-of-00051.safetensors",
|
457 |
+
"model.layers.53.self_attn.v_proj.weight": "model-00031-of-00051.safetensors",
|
458 |
+
"model.layers.54.input_layernorm.weight": "model-00032-of-00051.safetensors",
|
459 |
+
"model.layers.54.mlp.down_proj.weight": "model-00032-of-00051.safetensors",
|
460 |
+
"model.layers.54.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
|
461 |
+
"model.layers.54.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
|
462 |
+
"model.layers.54.post_attention_layernorm.weight": "model-00032-of-00051.safetensors",
|
463 |
+
"model.layers.54.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
|
464 |
+
"model.layers.54.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
|
465 |
+
"model.layers.54.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
|
466 |
+
"model.layers.54.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
|
467 |
+
"model.layers.55.input_layernorm.weight": "model-00033-of-00051.safetensors",
|
468 |
+
"model.layers.55.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
|
469 |
+
"model.layers.55.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
|
470 |
+
"model.layers.55.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
|
471 |
+
"model.layers.55.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
|
472 |
+
"model.layers.55.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
|
473 |
+
"model.layers.55.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
|
474 |
+
"model.layers.55.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
|
475 |
+
"model.layers.55.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
|
476 |
+
"model.layers.56.input_layernorm.weight": "model-00033-of-00051.safetensors",
|
477 |
+
"model.layers.56.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
|
478 |
+
"model.layers.56.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
|
479 |
+
"model.layers.56.mlp.up_proj.weight": "model-00033-of-00051.safetensors",
|
480 |
+
"model.layers.56.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
|
481 |
+
"model.layers.56.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
|
482 |
+
"model.layers.56.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
|
483 |
+
"model.layers.56.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
|
484 |
+
"model.layers.56.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
|
485 |
+
"model.layers.57.input_layernorm.weight": "model-00034-of-00051.safetensors",
|
486 |
+
"model.layers.57.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
|
487 |
+
"model.layers.57.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
|
488 |
+
"model.layers.57.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
|
489 |
+
"model.layers.57.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
|
490 |
+
"model.layers.57.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
|
491 |
+
"model.layers.57.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
|
492 |
+
"model.layers.57.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
|
493 |
+
"model.layers.57.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
|
494 |
+
"model.layers.58.input_layernorm.weight": "model-00034-of-00051.safetensors",
|
495 |
+
"model.layers.58.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
|
496 |
+
"model.layers.58.mlp.gate_proj.weight": "model-00034-of-00051.safetensors",
|
497 |
+
"model.layers.58.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
|
498 |
+
"model.layers.58.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
|
499 |
+
"model.layers.58.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
|
500 |
+
"model.layers.58.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
|
501 |
+
"model.layers.58.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
|
502 |
+
"model.layers.58.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
|
503 |
+
"model.layers.59.input_layernorm.weight": "model-00035-of-00051.safetensors",
|
504 |
+
"model.layers.59.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
|
505 |
+
"model.layers.59.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
|
506 |
+
"model.layers.59.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
|
507 |
+
"model.layers.59.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
|
508 |
+
"model.layers.59.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
|
509 |
+
"model.layers.59.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
|
510 |
+
"model.layers.59.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
|
511 |
+
"model.layers.59.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
|
512 |
+
"model.layers.6.input_layernorm.weight": "model-00005-of-00051.safetensors",
|
513 |
+
"model.layers.6.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
|
514 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
|
515 |
+
"model.layers.6.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
|
516 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
|
517 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
|
518 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
|
519 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
|
520 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
|
521 |
+
"model.layers.60.input_layernorm.weight": "model-00035-of-00051.safetensors",
|
522 |
+
"model.layers.60.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
|
523 |
+
"model.layers.60.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
|
524 |
+
"model.layers.60.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
|
525 |
+
"model.layers.60.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
|
526 |
+
"model.layers.60.self_attn.k_proj.weight": "model-00035-of-00051.safetensors",
|
527 |
+
"model.layers.60.self_attn.o_proj.weight": "model-00035-of-00051.safetensors",
|
528 |
+
"model.layers.60.self_attn.q_proj.weight": "model-00035-of-00051.safetensors",
|
529 |
+
"model.layers.60.self_attn.v_proj.weight": "model-00035-of-00051.safetensors",
|
530 |
+
"model.layers.61.input_layernorm.weight": "model-00036-of-00051.safetensors",
|
531 |
+
"model.layers.61.mlp.down_proj.weight": "model-00036-of-00051.safetensors",
|
532 |
+
"model.layers.61.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
|
533 |
+
"model.layers.61.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
|
534 |
+
"model.layers.61.post_attention_layernorm.weight": "model-00036-of-00051.safetensors",
|
535 |
+
"model.layers.61.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
|
536 |
+
"model.layers.61.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
|
537 |
+
"model.layers.61.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
|
538 |
+
"model.layers.61.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
|
539 |
+
"model.layers.62.input_layernorm.weight": "model-00037-of-00051.safetensors",
|
540 |
+
"model.layers.62.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
|
541 |
+
"model.layers.62.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
|
542 |
+
"model.layers.62.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
|
543 |
+
"model.layers.62.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
|
544 |
+
"model.layers.62.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
|
545 |
+
"model.layers.62.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
|
546 |
+
"model.layers.62.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
|
547 |
+
"model.layers.62.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
|
548 |
+
"model.layers.63.input_layernorm.weight": "model-00037-of-00051.safetensors",
|
549 |
+
"model.layers.63.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
|
550 |
+
"model.layers.63.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
|
551 |
+
"model.layers.63.mlp.up_proj.weight": "model-00037-of-00051.safetensors",
|
552 |
+
"model.layers.63.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
|
553 |
+
"model.layers.63.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
|
554 |
+
"model.layers.63.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
|
555 |
+
"model.layers.63.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
|
556 |
+
"model.layers.63.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
|
557 |
+
"model.layers.64.input_layernorm.weight": "model-00038-of-00051.safetensors",
|
558 |
+
"model.layers.64.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
|
559 |
+
"model.layers.64.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
|
560 |
+
"model.layers.64.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
|
561 |
+
"model.layers.64.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
|
562 |
+
"model.layers.64.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
|
563 |
+
"model.layers.64.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
|
564 |
+
"model.layers.64.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
|
565 |
+
"model.layers.64.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
|
566 |
+
"model.layers.65.input_layernorm.weight": "model-00038-of-00051.safetensors",
|
567 |
+
"model.layers.65.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
|
568 |
+
"model.layers.65.mlp.gate_proj.weight": "model-00038-of-00051.safetensors",
|
569 |
+
"model.layers.65.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
|
570 |
+
"model.layers.65.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
|
571 |
+
"model.layers.65.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
|
572 |
+
"model.layers.65.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
|
573 |
+
"model.layers.65.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
|
574 |
+
"model.layers.65.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
|
575 |
+
"model.layers.66.input_layernorm.weight": "model-00039-of-00051.safetensors",
|
576 |
+
"model.layers.66.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
|
577 |
+
"model.layers.66.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
|
578 |
+
"model.layers.66.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
|
579 |
+
"model.layers.66.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
|
580 |
+
"model.layers.66.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
|
581 |
+
"model.layers.66.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
|
582 |
+
"model.layers.66.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
|
583 |
+
"model.layers.66.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
|
584 |
+
"model.layers.67.input_layernorm.weight": "model-00039-of-00051.safetensors",
|
585 |
+
"model.layers.67.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
|
586 |
+
"model.layers.67.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
|
587 |
+
"model.layers.67.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
|
588 |
+
"model.layers.67.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
|
589 |
+
"model.layers.67.self_attn.k_proj.weight": "model-00039-of-00051.safetensors",
|
590 |
+
"model.layers.67.self_attn.o_proj.weight": "model-00039-of-00051.safetensors",
|
591 |
+
"model.layers.67.self_attn.q_proj.weight": "model-00039-of-00051.safetensors",
|
592 |
+
"model.layers.67.self_attn.v_proj.weight": "model-00039-of-00051.safetensors",
|
593 |
+
"model.layers.68.input_layernorm.weight": "model-00040-of-00051.safetensors",
|
594 |
+
"model.layers.68.mlp.down_proj.weight": "model-00040-of-00051.safetensors",
|
595 |
+
"model.layers.68.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
|
596 |
+
"model.layers.68.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
|
597 |
+
"model.layers.68.post_attention_layernorm.weight": "model-00040-of-00051.safetensors",
|
598 |
+
"model.layers.68.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
|
599 |
+
"model.layers.68.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
|
600 |
+
"model.layers.68.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
|
601 |
+
"model.layers.68.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
|
602 |
+
"model.layers.69.input_layernorm.weight": "model-00041-of-00051.safetensors",
|
603 |
+
"model.layers.69.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
|
604 |
+
"model.layers.69.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
|
605 |
+
"model.layers.69.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
|
606 |
+
"model.layers.69.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
|
607 |
+
"model.layers.69.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
|
608 |
+
"model.layers.69.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
|
609 |
+
"model.layers.69.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
|
610 |
+
"model.layers.69.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
|
611 |
+
"model.layers.7.input_layernorm.weight": "model-00005-of-00051.safetensors",
|
612 |
+
"model.layers.7.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
|
613 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
|
614 |
+
"model.layers.7.mlp.up_proj.weight": "model-00005-of-00051.safetensors",
|
615 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
|
616 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
|
617 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
|
618 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
|
619 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
|
620 |
+
"model.layers.70.input_layernorm.weight": "model-00041-of-00051.safetensors",
|
621 |
+
"model.layers.70.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
|
622 |
+
"model.layers.70.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
|
623 |
+
"model.layers.70.mlp.up_proj.weight": "model-00041-of-00051.safetensors",
|
624 |
+
"model.layers.70.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
|
625 |
+
"model.layers.70.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
|
626 |
+
"model.layers.70.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
|
627 |
+
"model.layers.70.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
|
628 |
+
"model.layers.70.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
|
629 |
+
"model.layers.71.input_layernorm.weight": "model-00042-of-00051.safetensors",
|
630 |
+
"model.layers.71.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
|
631 |
+
"model.layers.71.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
|
632 |
+
"model.layers.71.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
|
633 |
+
"model.layers.71.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
|
634 |
+
"model.layers.71.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
|
635 |
+
"model.layers.71.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
|
636 |
+
"model.layers.71.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
|
637 |
+
"model.layers.71.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
|
638 |
+
"model.layers.72.input_layernorm.weight": "model-00042-of-00051.safetensors",
|
639 |
+
"model.layers.72.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
|
640 |
+
"model.layers.72.mlp.gate_proj.weight": "model-00042-of-00051.safetensors",
|
641 |
+
"model.layers.72.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
|
642 |
+
"model.layers.72.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
|
643 |
+
"model.layers.72.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
|
644 |
+
"model.layers.72.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
|
645 |
+
"model.layers.72.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
|
646 |
+
"model.layers.72.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
|
647 |
+
"model.layers.73.input_layernorm.weight": "model-00043-of-00051.safetensors",
|
648 |
+
"model.layers.73.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
|
649 |
+
"model.layers.73.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
|
650 |
+
"model.layers.73.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
|
651 |
+
"model.layers.73.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
|
652 |
+
"model.layers.73.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
|
653 |
+
"model.layers.73.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
|
654 |
+
"model.layers.73.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
|
655 |
+
"model.layers.73.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
|
656 |
+
"model.layers.74.input_layernorm.weight": "model-00043-of-00051.safetensors",
|
657 |
+
"model.layers.74.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
|
658 |
+
"model.layers.74.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
|
659 |
+
"model.layers.74.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
|
660 |
+
"model.layers.74.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
|
661 |
+
"model.layers.74.self_attn.k_proj.weight": "model-00043-of-00051.safetensors",
|
662 |
+
"model.layers.74.self_attn.o_proj.weight": "model-00043-of-00051.safetensors",
|
663 |
+
"model.layers.74.self_attn.q_proj.weight": "model-00043-of-00051.safetensors",
|
664 |
+
"model.layers.74.self_attn.v_proj.weight": "model-00043-of-00051.safetensors",
|
665 |
+
"model.layers.75.input_layernorm.weight": "model-00044-of-00051.safetensors",
|
666 |
+
"model.layers.75.mlp.down_proj.weight": "model-00044-of-00051.safetensors",
|
667 |
+
"model.layers.75.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
|
668 |
+
"model.layers.75.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
|
669 |
+
"model.layers.75.post_attention_layernorm.weight": "model-00044-of-00051.safetensors",
|
670 |
+
"model.layers.75.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
|
671 |
+
"model.layers.75.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
|
672 |
+
"model.layers.75.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
|
673 |
+
"model.layers.75.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
|
674 |
+
"model.layers.76.input_layernorm.weight": "model-00045-of-00051.safetensors",
|
675 |
+
"model.layers.76.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
|
676 |
+
"model.layers.76.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
|
677 |
+
"model.layers.76.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
|
678 |
+
"model.layers.76.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
|
679 |
+
"model.layers.76.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
|
680 |
+
"model.layers.76.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
|
681 |
+
"model.layers.76.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
|
682 |
+
"model.layers.76.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
|
683 |
+
"model.layers.77.input_layernorm.weight": "model-00045-of-00051.safetensors",
|
684 |
+
"model.layers.77.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
|
685 |
+
"model.layers.77.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
|
686 |
+
"model.layers.77.mlp.up_proj.weight": "model-00045-of-00051.safetensors",
|
687 |
+
"model.layers.77.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
|
688 |
+
"model.layers.77.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
|
689 |
+
"model.layers.77.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
|
690 |
+
"model.layers.77.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
|
691 |
+
"model.layers.77.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
|
692 |
+
"model.layers.78.input_layernorm.weight": "model-00046-of-00051.safetensors",
|
693 |
+
"model.layers.78.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
|
694 |
+
"model.layers.78.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
|
695 |
+
"model.layers.78.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
|
696 |
+
"model.layers.78.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
|
697 |
+
"model.layers.78.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
|
698 |
+
"model.layers.78.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
|
699 |
+
"model.layers.78.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
|
700 |
+
"model.layers.78.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
|
701 |
+
"model.layers.79.input_layernorm.weight": "model-00046-of-00051.safetensors",
|
702 |
+
"model.layers.79.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
|
703 |
+
"model.layers.79.mlp.gate_proj.weight": "model-00046-of-00051.safetensors",
|
704 |
+
"model.layers.79.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
|
705 |
+
"model.layers.79.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
|
706 |
+
"model.layers.79.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
|
707 |
+
"model.layers.79.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
|
708 |
+
"model.layers.79.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
|
709 |
+
"model.layers.79.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
|
710 |
+
"model.layers.8.input_layernorm.weight": "model-00006-of-00051.safetensors",
|
711 |
+
"model.layers.8.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
|
712 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
|
713 |
+
"model.layers.8.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
|
714 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
|
715 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
|
716 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
|
717 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
|
718 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
|
719 |
+
"model.layers.80.input_layernorm.weight": "model-00047-of-00051.safetensors",
|
720 |
+
"model.layers.80.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
|
721 |
+
"model.layers.80.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
|
722 |
+
"model.layers.80.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
|
723 |
+
"model.layers.80.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
|
724 |
+
"model.layers.80.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
|
725 |
+
"model.layers.80.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
|
726 |
+
"model.layers.80.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
|
727 |
+
"model.layers.80.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
|
728 |
+
"model.layers.81.input_layernorm.weight": "model-00047-of-00051.safetensors",
|
729 |
+
"model.layers.81.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
|
730 |
+
"model.layers.81.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
|
731 |
+
"model.layers.81.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
|
732 |
+
"model.layers.81.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
|
733 |
+
"model.layers.81.self_attn.k_proj.weight": "model-00047-of-00051.safetensors",
|
734 |
+
"model.layers.81.self_attn.o_proj.weight": "model-00047-of-00051.safetensors",
|
735 |
+
"model.layers.81.self_attn.q_proj.weight": "model-00047-of-00051.safetensors",
|
736 |
+
"model.layers.81.self_attn.v_proj.weight": "model-00047-of-00051.safetensors",
|
737 |
+
"model.layers.82.input_layernorm.weight": "model-00048-of-00051.safetensors",
|
738 |
+
"model.layers.82.mlp.down_proj.weight": "model-00048-of-00051.safetensors",
|
739 |
+
"model.layers.82.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
|
740 |
+
"model.layers.82.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
|
741 |
+
"model.layers.82.post_attention_layernorm.weight": "model-00048-of-00051.safetensors",
|
742 |
+
"model.layers.82.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
|
743 |
+
"model.layers.82.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
|
744 |
+
"model.layers.82.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
|
745 |
+
"model.layers.82.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
|
746 |
+
"model.layers.83.input_layernorm.weight": "model-00049-of-00051.safetensors",
|
747 |
+
"model.layers.83.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
|
748 |
+
"model.layers.83.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
|
749 |
+
"model.layers.83.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
|
750 |
+
"model.layers.83.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
|
751 |
+
"model.layers.83.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
|
752 |
+
"model.layers.83.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
|
753 |
+
"model.layers.83.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
|
754 |
+
"model.layers.83.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
|
755 |
+
"model.layers.84.input_layernorm.weight": "model-00049-of-00051.safetensors",
|
756 |
+
"model.layers.84.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
|
757 |
+
"model.layers.84.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
|
758 |
+
"model.layers.84.mlp.up_proj.weight": "model-00049-of-00051.safetensors",
|
759 |
+
"model.layers.84.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
|
760 |
+
"model.layers.84.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
|
761 |
+
"model.layers.84.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
|
762 |
+
"model.layers.84.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
|
763 |
+
"model.layers.84.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
|
764 |
+
"model.layers.85.input_layernorm.weight": "model-00050-of-00051.safetensors",
|
765 |
+
"model.layers.85.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
|
766 |
+
"model.layers.85.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
|
767 |
+
"model.layers.85.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
|
768 |
+
"model.layers.85.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
|
769 |
+
"model.layers.85.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
|
770 |
+
"model.layers.85.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
|
771 |
+
"model.layers.85.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
|
772 |
+
"model.layers.85.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
|
773 |
+
"model.layers.86.input_layernorm.weight": "model-00050-of-00051.safetensors",
|
774 |
+
"model.layers.86.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
|
775 |
+
"model.layers.86.mlp.gate_proj.weight": "model-00050-of-00051.safetensors",
|
776 |
+
"model.layers.86.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
|
777 |
+
"model.layers.86.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
|
778 |
+
"model.layers.86.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
|
779 |
+
"model.layers.86.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
|
780 |
+
"model.layers.86.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
|
781 |
+
"model.layers.86.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
|
782 |
+
"model.layers.87.input_layernorm.weight": "model-00051-of-00051.safetensors",
|
783 |
+
"model.layers.87.mlp.down_proj.weight": "model-00051-of-00051.safetensors",
|
784 |
+
"model.layers.87.mlp.gate_proj.weight": "model-00051-of-00051.safetensors",
|
785 |
+
"model.layers.87.mlp.up_proj.weight": "model-00051-of-00051.safetensors",
|
786 |
+
"model.layers.87.post_attention_layernorm.weight": "model-00051-of-00051.safetensors",
|
787 |
+
"model.layers.87.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
|
788 |
+
"model.layers.87.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
|
789 |
+
"model.layers.87.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
|
790 |
+
"model.layers.87.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
|
791 |
+
"model.layers.9.input_layernorm.weight": "model-00006-of-00051.safetensors",
|
792 |
+
"model.layers.9.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
|
793 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00006-of-00051.safetensors",
|
794 |
+
"model.layers.9.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
|
795 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
|
796 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
|
797 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
|
798 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
|
799 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
|
800 |
+
"model.norm.weight": "model-00051-of-00051.safetensors"
|
801 |
+
}
|
802 |
+
}
|
output-00001-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:904ebc593a37172a1859ba3ab4f1600f7cfd239059abf5340d42f8b5bb344e38
|
3 |
+
size 8542303534
|
output-00002-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ec9329d43017108b0f356fa95821ffb6905100bf1474c59601ccefa2d8efa7d9
|
3 |
+
size 8444257302
|
output-00003-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d8b6888f7e80666294ec77a48ddd01f5678fa8be0166a4e0e1ac2a6849cc0e47
|
3 |
+
size 8477914022
|
output-00004-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:93aace360a0720cbffcfe11818f5ed7ca566853b20e47b65091f73e384da5964
|
3 |
+
size 8571883530
|
output-00005-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:28912fa047378865f68eb97882411891bcbf8688a059713e54bf211a1cf354ea
|
3 |
+
size 8506114580
|
output-00006-of-00006.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:81b778f7aa381cfc1344f832bf850cedfe24dd56dc8711842750eb11e75cf3a4
|
3 |
+
size 1967327076
|
params.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dim": 12288,
|
3 |
+
"n_layers": 88,
|
4 |
+
"head_dim": 128,
|
5 |
+
"hidden_dim": 28672,
|
6 |
+
"n_heads": 96,
|
7 |
+
"n_kv_heads": 8,
|
8 |
+
"norm_eps": 1e-05,
|
9 |
+
"vocab_size": 32768,
|
10 |
+
"rope_theta": 1000000.0
|
11 |
+
}
|
test.py
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import json
|
2 |
+
from typing import Dict
|
3 |
+
|
4 |
+
from safetensors.torch import load_file, save_file
|
5 |
+
from huggingface_hub import split_torch_state_dict_into_shards
|
6 |
+
import torch
|
7 |
+
import os
|
8 |
+
|
9 |
+
def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
|
10 |
+
state_dict_split = split_torch_state_dict_into_shards(state_dict, filename_pattern='consolidated{suffix}.safetensors')
|
11 |
+
for filename, tensors in state_dict_split.filename_to_tensors.items():
|
12 |
+
shard = {tensor: state_dict[tensor] for tensor in tensors}
|
13 |
+
print("Saving", save_directory, filename)
|
14 |
+
save_file(shard, os.path.join(save_directory, filename))
|
15 |
+
if state_dict_split.is_sharded:
|
16 |
+
index = {
|
17 |
+
"metadata": state_dict_split.metadata,
|
18 |
+
"weight_map": state_dict_split.tensor_to_filename,
|
19 |
+
}
|
20 |
+
with open(os.path.join(save_directory, "consolidated.safetensors.index.json"), "w") as f:
|
21 |
+
f.write(json.dumps(index, indent=2))
|
22 |
+
|
23 |
+
big_file = 'consolidated.safetensors'
|
24 |
+
loaded = load_file(big_file)
|
25 |
+
|
26 |
+
save_state_dict(loaded, save_directory=f'.')
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
|
3 |
+
size 587583
|
tokenizer.model.v3
ADDED
Binary file (588 kB). View file
|
|
tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
upload.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from huggingface_hub import HfApi
|
2 |
+
from pathlib import Path
|
3 |
+
|
4 |
+
# Define the parameters for uploading
|
5 |
+
repo_id = "DBMe/Mistral-Large-Instruct-2407-2.85bpw-h6-exl2" # Replace with your actual repo ID
|
6 |
+
folder_path = "/home/asusws-x570-ace/programs/tabbyAPI-new/models/Mistral-Large-Instruct-2407_exl2_2.85bpw/" # Replace with your folder path
|
7 |
+
repo_type = "model" # Change to "model" or "space" if applicable
|
8 |
+
revision = "main" # Optional: specify the branch or use "main"
|
9 |
+
private = False # Set to True if the repository should be private
|
10 |
+
allow_patterns = None # Optional: specify patterns of files to include
|
11 |
+
ignore_patterns = None # Optional: specify patterns of files to exclude
|
12 |
+
num_workers = 4 # Set based on your system; lower if your internet is unstable
|
13 |
+
print_report = True # Enable progress reporting
|
14 |
+
print_report_every = 60 # Report frequency in seconds
|
15 |
+
|
16 |
+
# Initialize the Hugging Face API client
|
17 |
+
api = HfApi()
|
18 |
+
|
19 |
+
# Function to upload the folder in a resumable manner
|
20 |
+
def upload_resumable():
|
21 |
+
try:
|
22 |
+
print("Starting upload process...")
|
23 |
+
|
24 |
+
# Perform the upload with the provided parameters
|
25 |
+
api.upload_large_folder(
|
26 |
+
repo_id=repo_id,
|
27 |
+
folder_path=Path(folder_path),
|
28 |
+
repo_type=repo_type,
|
29 |
+
revision=revision,
|
30 |
+
private=private,
|
31 |
+
allow_patterns=allow_patterns,
|
32 |
+
ignore_patterns=ignore_patterns,
|
33 |
+
num_workers=num_workers,
|
34 |
+
print_report=print_report,
|
35 |
+
print_report_every=print_report_every,
|
36 |
+
)
|
37 |
+
|
38 |
+
print("Upload completed successfully!")
|
39 |
+
|
40 |
+
except Exception as e:
|
41 |
+
print(f"Upload interrupted due to error: {e}")
|
42 |
+
print("You can resume the upload by running the script again.")
|
43 |
+
|
44 |
+
# Call the function to start the upload
|
45 |
+
upload_resumable()
|