DBMe commited on
Commit
7f77f88
1 Parent(s): c2a5391

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: mrl
4
+ license_link: https://mistral.ai/licenses/MRL-0.1.md
5
+ language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - zh
13
+ - ja
14
+ - ru
15
+ - ko
16
+
17
+ extra_gated_description: If you want to learn more about how we process your personal data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
18
+ ---
19
+
20
+ # Model Card for Mistral-Large-Instruct-2407
21
+
22
+ Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
23
+
24
+ For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-large-2407/).
25
+
26
+ ## Key features
27
+ - **Multi-lingual by design:** Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
28
+ - **Proficient in coding:** Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran.
29
+ - **Agentic-centric:** Best-in-class agentic capabilities with native function calling and JSON outputting.
30
+ - **Advanced Reasoning:** State-of-the-art mathematical and reasoning capabilities.
31
+ - **Mistral Research License:** Allows usage and modification for research and non-commercial usages.
32
+ - **Large Context:** A large 128k context window.
33
+
34
+ ## Metrics
35
+
36
+ ### Base Pretrained Benchmarks
37
+
38
+ | Benchmark | Score |
39
+ | --- | --- |
40
+ | MMLU | 84.0% |
41
+
42
+
43
+ ### Base Pretrained Multilingual Benchmarks (MMLU)
44
+ | Benchmark | Score |
45
+ | --- | --- |
46
+ | French | 82.8% |
47
+ | German | 81.6% |
48
+ | Spanish | 82.7% |
49
+ | Italian | 82.7% |
50
+ | Dutch | 80.7% |
51
+ | Portuguese | 81.6% |
52
+ | Russian | 79.0% |
53
+ | Korean | 60.1% |
54
+ | Japanese | 78.8% |
55
+ | Chinese | 74.8% |
56
+
57
+
58
+ ### Instruction Benchmarks
59
+
60
+ | Benchmark | Score |
61
+ | --- | --- |
62
+ | MT Bench | 8.63 |
63
+ | Wild Bench | 56.3 |
64
+ | Arena Hard| 73.2 |
65
+
66
+ ### Code & Reasoning Benchmarks
67
+ | Benchmark | Score |
68
+ | --- | --- |
69
+ | Human Eval | 92% |
70
+ | Human Eval Plus| 87% |
71
+ | MBPP Base| 80% |
72
+ | MBPP Plus| 69% |
73
+
74
+ ### Math Benchmarks
75
+
76
+ | Benchmark | Score |
77
+ | --- | --- |
78
+ | GSM8K | 93% |
79
+ | Math Instruct (0-shot, no CoT) | 70% |
80
+ | Math Instruct (0-shot, CoT)| 71.5% |
81
+
82
+ ## Usage
83
+
84
+ The model can be used with two different frameworks
85
+
86
+ - [`mistral_inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
87
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
88
+
89
+ ### Mistral Inference
90
+
91
+ #### Install
92
+
93
+ It is recommended to use `mistralai/Mistral-Large-Instruct-2407` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
94
+
95
+ ```
96
+ pip install mistral_inference
97
+ ```
98
+
99
+ #### Download
100
+
101
+ ```py
102
+ from huggingface_hub import snapshot_download
103
+ from pathlib import Path
104
+
105
+ mistral_models_path = Path.home().joinpath('mistral_models', 'Large')
106
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
107
+
108
+ snapshot_download(repo_id="mistralai/Mistral-Large-Instruct-2407", allow_patterns=["params.json", "consolidated-*.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
109
+ ```
110
+
111
+ #### Chat
112
+
113
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment.
114
+ Given the size of this model, you will need a node with several GPUs (more than 300GB cumulated vRAM).
115
+ If you have 8 GPUs on your machine, you can chat with the model using
116
+
117
+ ```
118
+ torchrun --nproc-per-node 8 --no-python mistral-chat $HOME/mistral_models/Large --instruct --max_tokens 256 --temperature 0.7
119
+ ```
120
+
121
+ *E.g.* Try out something like:
122
+ ```
123
+ How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar.
124
+ ```
125
+
126
+ #### Instruct following
127
+
128
+ ```py
129
+ from mistral_inference.transformer import Transformer
130
+ from mistral_inference.generate import generate
131
+
132
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
133
+ from mistral_common.protocol.instruct.messages import UserMessage
134
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
135
+
136
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
137
+ model = Transformer.from_folder(mistral_models_path)
138
+
139
+ prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
140
+
141
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
142
+
143
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
144
+
145
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
146
+ result = tokenizer.decode(out_tokens[0])
147
+
148
+ print(result)
149
+ ```
150
+
151
+ #### Function calling
152
+
153
+ ```py
154
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
155
+ from mistral_inference.transformer import Transformer
156
+ from mistral_inference.generate import generate
157
+
158
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
159
+ from mistral_common.protocol.instruct.messages import UserMessage
160
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
161
+
162
+
163
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
164
+ model = Transformer.from_folder(mistral_models_path)
165
+
166
+ completion_request = ChatCompletionRequest(
167
+ tools=[
168
+ Tool(
169
+ function=Function(
170
+ name="get_current_weather",
171
+ description="Get the current weather",
172
+ parameters={
173
+ "type": "object",
174
+ "properties": {
175
+ "location": {
176
+ "type": "string",
177
+ "description": "The city and state, e.g. San Francisco, CA",
178
+ },
179
+ "format": {
180
+ "type": "string",
181
+ "enum": ["celsius", "fahrenheit"],
182
+ "description": "The temperature unit to use. Infer this from the users location.",
183
+ },
184
+ },
185
+ "required": ["location", "format"],
186
+ },
187
+ )
188
+ )
189
+ ],
190
+ messages=[
191
+ UserMessage(content="What's the weather like today in Paris?"),
192
+ ],
193
+ )
194
+
195
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
196
+
197
+ out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
198
+ result = tokenizer.decode(out_tokens[0])
199
+
200
+ print(result)
201
+ ```
202
+
203
+ ### Transformers
204
+
205
+ If you want to use Hugging Face `transformers` to generate text, you can do something like this.
206
+
207
+ ```py
208
+ from transformers import pipeline
209
+
210
+ messages = [
211
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
212
+ {"role": "user", "content": "Who are you?"},
213
+ ]
214
+ chatbot = pipeline("text-generation", model="mistralai/Mistral-Large-Instruct-2407")
215
+ chatbot(messages)
216
+ ```
217
+
218
+ ## Function calling with `transformers`
219
+
220
+ To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the
221
+ [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling)
222
+ in the `transformers` docs for more information.
223
+
224
+ ```python
225
+ from transformers import AutoModelForCausalLM, AutoTokenizer
226
+ import torch
227
+
228
+ model_id = "mistralai/Mistral-Large-Instruct-2407"
229
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
230
+
231
+ def get_current_weather(location: str, format: str):
232
+ """
233
+ Get the current weather
234
+
235
+ Args:
236
+ location: The city and state, e.g. San Francisco, CA
237
+ format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
238
+ """
239
+ pass
240
+
241
+ conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
242
+ tools = [get_current_weather]
243
+
244
+ # format and tokenize the tool use prompt
245
+ inputs = tokenizer.apply_chat_template(
246
+ conversation,
247
+ tools=tools,
248
+ add_generation_prompt=True,
249
+ return_dict=True,
250
+ return_tensors="pt",
251
+ )
252
+
253
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
254
+
255
+ inputs.to(model.device)
256
+ outputs = model.generate(**inputs, max_new_tokens=1000)
257
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
258
+ ```
259
+
260
+ Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool
261
+ results to the chat history so that the model can use them in its next generation. For a full tool calling example, please
262
+ see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling),
263
+ and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be
264
+ exactly 9 alphanumeric characters.
265
+
266
+ ## Limitations
267
+
268
+ The Mistral Large model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
269
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
270
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
271
+
272
+ ## The Mistral AI Team
273
+
274
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 12288,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 28672,
12
+ "max_position_embeddings": 131072,
13
+ "model_type": "mistral",
14
+ "num_attention_heads": 96,
15
+ "num_hidden_layers": 88,
16
+ "num_key_value_heads": 8,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_theta": 1000000.0,
19
+ "sliding_window": null,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.42.3",
23
+ "use_cache": true,
24
+ "vocab_size": 32768,
25
+ "quantization_config": {
26
+ "quant_method": "exl2",
27
+ "version": "0.2.1",
28
+ "bits": 2.85,
29
+ "head_bits": 6,
30
+ "calibration": {
31
+ "rows": 115,
32
+ "length": 2048,
33
+ "dataset": "(default)"
34
+ }
35
+ }
36
+ }
config.yml ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sample YAML file for configuration.
2
+ # Comment and uncomment values as needed. Every value has a default within the application.
3
+ # This file serves to be a drop in for config.yml
4
+
5
+ # Unless specified in the comments, DO NOT put these options in quotes!
6
+ # You can use https://www.yamllint.com/ if you want to check your YAML formatting.
7
+
8
+ # Options for networking
9
+ network:
10
+ # The IP to host on (default: 127.0.0.1).
11
+ # Use 0.0.0.0 to expose on all network adapters
12
+ host: 0.0.0.0
13
+
14
+ # The port to host on (default: 5000)
15
+ port: 5000
16
+
17
+ # Disable HTTP token authenticaion with requests
18
+ # WARNING: This will make your instance vulnerable!
19
+ # Turn on this option if you are ONLY connecting from localhost
20
+ disable_auth: False
21
+
22
+ # Send tracebacks over the API to clients (default: False)
23
+ # NOTE: Only enable this for debug purposes
24
+ send_tracebacks: False
25
+
26
+ # Select API servers to enable (default: ["OAI"])
27
+ # Possible values: OAI
28
+ api_servers: ["OAI"]
29
+
30
+ # Options for logging
31
+ logging:
32
+ # Enable prompt logging (default: False)
33
+ prompt: False
34
+
35
+ # Enable generation parameter logging (default: False)
36
+ generation_params: False
37
+
38
+ # Enable request logging (default: False)
39
+ # NOTE: Only use this for debugging!
40
+ requests: False
41
+
42
+ # Options for sampling
43
+ sampling:
44
+ # Override preset name. Find this in the sampler-overrides folder (default: None)
45
+ # This overrides default fallbacks for sampler values that are passed to the API
46
+ # Server-side overrides are NOT needed by default
47
+ # WARNING: Using this can result in a generation speed penalty
48
+ #override_preset:
49
+
50
+ # Options for development and experimentation
51
+ developer:
52
+ # Skips exllamav2 version check (default: False)
53
+ # It's highly recommended to update your dependencies rather than enabling this flag
54
+ # WARNING: Don't set this unless you know what you're doing!
55
+ #unsafe_launch: False
56
+
57
+ # Disable all request streaming (default: False)
58
+ # A kill switch for turning off SSE in the API server
59
+ #disable_request_streaming: False
60
+
61
+ # Enable the torch CUDA malloc backend (default: False)
62
+ # This can save a few MBs of VRAM, but has a risk of errors. Use at your own risk.
63
+ cuda_malloc_backend: True
64
+
65
+ # Enable Uvloop or Winloop (default: False)
66
+ # Make the program utilize a faster async event loop which can improve performance
67
+ # NOTE: It's recommended to enable this, but if something breaks, turn this off.
68
+ uvloop: True
69
+
70
+ # Set process to use a higher priority
71
+ # For realtime process priority, run as administrator or sudo
72
+ # Otherwise, the priority will be set to high
73
+ realtime_process_priority: True
74
+
75
+ # Options for model overrides and loading
76
+ # Please read the comments to understand how arguments are handled between initial and API loads
77
+ model:
78
+ # Overrides the directory to look for models (default: models)
79
+ # Windows users, DO NOT put this path in quotes! This directory will be invalid otherwise.
80
+ model_dir: models
81
+
82
+ # Sends dummy model names when the models endpoint is queried
83
+ # Enable this if the program is looking for a specific OAI model
84
+ #use_dummy_models: False
85
+
86
+ # An initial model to load. Make sure the model is located in the model directory!
87
+ # A model can be loaded later via the API.
88
+ # REQUIRED: This must be filled out to load a model on startup!
89
+ model_name: Mistral-Large-Instruct-2407_exl2_2.85bpw
90
+
91
+ # The below parameters only apply for initial loads
92
+ # All API based loads do NOT inherit these settings unless specified in use_as_default
93
+
94
+ # Names of args to use as a default fallback for API load requests (default: [])
95
+ # For example, if you always want cache_mode to be Q4 instead of on the inital model load,
96
+ # Add "cache_mode" to this array
97
+ # Ex. ["max_seq_len", "cache_mode"]
98
+ #use_as_default: []
99
+
100
+ # The below parameters apply only if model_name is set
101
+
102
+ # Max sequence length (default: Empty)
103
+ # Fetched from the model's base sequence length in config.json by default
104
+ max_seq_len: 32768
105
+
106
+ # Overrides base model context length (default: Empty)
107
+ # WARNING: Don't set this unless you know what you're doing!
108
+ # Again, do NOT use this for configuring context length, use max_seq_len above ^
109
+ # Only use this if the model's base sequence length in config.json is incorrect (ex. Mistral 7B)
110
+ #override_base_seq_len:
111
+
112
+ # Load model with tensor parallelism
113
+ # If a GPU split isn't provided, the TP loader will fallback to autosplit
114
+ # Enabling ignores the gpu_split_auto and autosplit_reserve values
115
+ #tensor_parallel: True
116
+
117
+ # Automatically allocate resources to GPUs (default: True)
118
+ # NOTE: Not parsed for single GPU users
119
+ gpu_split_auto: True
120
+
121
+ # Reserve VRAM used for autosplit loading (default: 96 MB on GPU 0)
122
+ # This is represented as an array of MB per GPU used
123
+ autosplit_reserve: [0]
124
+
125
+ # An integer array of GBs of vram to split between GPUs (default: [])
126
+ # Used with tensor parallelism
127
+ # NOTE: Not parsed for single GPU users
128
+ #gpu_split: [20.6, 24]
129
+
130
+ # Rope scale (default: 1.0)
131
+ # Same thing as compress_pos_emb
132
+ # Only use if your model was trained on long context with rope (check config.json)
133
+ # Leave blank to pull the value from the model
134
+ #rope_scale: 1.0
135
+
136
+ # Rope alpha (default: 1.0)
137
+ # Same thing as alpha_value
138
+ # Leave blank to automatically calculate alpha
139
+ #rope_alpha: 1.0
140
+
141
+ # Enable different cache modes for VRAM savings (slight performance hit).
142
+ # Possible values FP16, Q8, Q6, Q4. (default: FP16)
143
+ cache_mode: Q4
144
+
145
+ # Size of the prompt cache to allocate (default: max_seq_len)
146
+ # This must be a multiple of 256. A larger cache uses more VRAM, but allows for more prompts to be processed at once.
147
+ # NOTE: Cache size should not be less than max_seq_len.
148
+ # For CFG, set this to 2 * max_seq_len to make room for both positive and negative prompts.
149
+ # cache_size:
150
+
151
+ # Chunk size for prompt ingestion. A lower value reduces VRAM usage at the cost of ingestion speed (default: 2048)
152
+ # NOTE: Effects vary depending on the model. An ideal value is between 512 and 4096
153
+ chunk_size: 1024
154
+
155
+ # Set the maximum amount of prompts to process at one time (default: None/Automatic)
156
+ # This will be automatically calculated if left blank.
157
+ # A max batch size of 1 processes prompts one at a time.
158
+ # NOTE: Only available for Nvidia ampere (30 series) and above GPUs
159
+ #max_batch_size:
160
+
161
+ # Set the prompt template for this model. If empty, attempts to look for the model's chat template. (default: None)
162
+ # If a model contains multiple templates in its tokenizer_config.json, set prompt_template to the name
163
+ # of the template you want to use.
164
+ # NOTE: Only works with chat completion message lists!
165
+ #prompt_template:
166
+
167
+ # Number of experts to use PER TOKEN. Fetched from the model's config.json if not specified (default: Empty)
168
+ # WARNING: Don't set this unless you know what you're doing!
169
+ # NOTE: For MoE models (ex. Mixtral) only!
170
+ #num_experts_per_token:
171
+
172
+ # Enables fasttensors to possibly increase model loading speeds (default: False)
173
+ fasttensors: true
174
+
175
+ # Options for draft models (speculative decoding). This will use more VRAM!
176
+ #draft:
177
+ # Overrides the directory to look for draft (default: models)
178
+ #draft_model_dir: models
179
+
180
+ # An initial draft model to load. Make sure this model is located in the model directory!
181
+ # A draft model can be loaded later via the API.
182
+ #draft_model_name: A model name
183
+
184
+ # The below parameters only apply for initial loads
185
+ # All API based loads do NOT inherit these settings unless specified in use_as_default
186
+
187
+ # Rope scale for draft models (default: 1.0)
188
+ # Same thing as compress_pos_emb
189
+ # Only use if your draft model was trained on long context with rope (check config.json)
190
+ #draft_rope_scale: 1.0
191
+
192
+ # Rope alpha for draft model (default: 1.0)
193
+ # Same thing as alpha_value
194
+ # Leave blank to automatically calculate alpha value
195
+ #draft_rope_alpha: 1.0
196
+
197
+ # Enable different draft model cache modes for VRAM savings (slight performance hit).
198
+ # Possible values FP16, Q8, Q6, Q4. (default: FP16)
199
+ #draft_cache_mode: FP16
200
+
201
+ # Options for loras
202
+ #lora:
203
+ # Overrides the directory to look for loras (default: loras)
204
+ #lora_dir: loras
205
+
206
+ # List of loras to load and associated scaling factors (default: 1.0). Comment out unused entries or add more rows as needed.
207
+ #loras:
208
+ #- name: lora1
209
+ # scaling: 1.0
210
+
211
+ # Options for embedding models and loading.
212
+ # NOTE: Embeddings requires the "extras" feature to be installed
213
+ # Install it via "pip install .[extras]"
214
+ embeddings:
215
+ # Overrides directory to look for embedding models (default: models)
216
+ embedding_model_dir: models
217
+
218
+ # Device to load embedding models on (default: cpu)
219
+ # Possible values: cpu, auto, cuda
220
+ # NOTE: It's recommended to load embedding models on the CPU.
221
+ # If you'd like to load on an AMD gpu, set this value to "cuda" as well.
222
+ embeddings_device: cpu
223
+
224
+ # The below parameters only apply for initial loads
225
+ # All API based loads do NOT inherit these settings unless specified in use_as_default
226
+
227
+ # An initial embedding model to load on the infinity backend (default: None)
228
+ embedding_model_name:
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.42.3"
6
+ }
measurements.json ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
@@ -0,0 +1,802 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 245220139008
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00051-of-00051.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00051.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00051.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00051.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00051.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00051.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00002-of-00051.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00007-of-00051.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00007-of-00051.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00007-of-00051.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00007-of-00051.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00007-of-00051.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00007-of-00051.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00008-of-00051.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00008-of-00051.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00008-of-00051.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00009-of-00051.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00009-of-00051.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00009-of-00051.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00010-of-00051.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00010-of-00051.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00010-of-00051.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00011-of-00051.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00011-of-00051.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00011-of-00051.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00011-of-00051.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00011-of-00051.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00011-of-00051.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00012-of-00051.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00012-of-00051.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00012-of-00051.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00002-of-00051.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00051.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00013-of-00051.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00013-of-00051.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00013-of-00051.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00014-of-00051.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00014-of-00051.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00014-of-00051.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00015-of-00051.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00015-of-00051.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00015-of-00051.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00015-of-00051.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00015-of-00051.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00015-of-00051.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00016-of-00051.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00016-of-00051.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00016-of-00051.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00017-of-00051.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00017-of-00051.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00017-of-00051.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00018-of-00051.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00003-of-00051.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00018-of-00051.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00018-of-00051.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00019-of-00051.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00019-of-00051.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00019-of-00051.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00019-of-00051.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00019-of-00051.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00019-of-00051.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00020-of-00051.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00020-of-00051.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00020-of-00051.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00021-of-00051.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00021-of-00051.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00021-of-00051.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00022-of-00051.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00022-of-00051.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00022-of-00051.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00023-of-00051.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00023-of-00051.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00023-of-00051.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00023-of-00051.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00023-of-00051.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00023-of-00051.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00003-of-00051.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00051.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00051.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00051.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00051.safetensors",
323
+ "model.layers.40.input_layernorm.weight": "model-00024-of-00051.safetensors",
324
+ "model.layers.40.mlp.down_proj.weight": "model-00024-of-00051.safetensors",
325
+ "model.layers.40.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
326
+ "model.layers.40.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
327
+ "model.layers.40.post_attention_layernorm.weight": "model-00024-of-00051.safetensors",
328
+ "model.layers.40.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
329
+ "model.layers.40.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
330
+ "model.layers.40.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
331
+ "model.layers.40.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
332
+ "model.layers.41.input_layernorm.weight": "model-00025-of-00051.safetensors",
333
+ "model.layers.41.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
334
+ "model.layers.41.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
335
+ "model.layers.41.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
336
+ "model.layers.41.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
337
+ "model.layers.41.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
338
+ "model.layers.41.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
339
+ "model.layers.41.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
340
+ "model.layers.41.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
341
+ "model.layers.42.input_layernorm.weight": "model-00025-of-00051.safetensors",
342
+ "model.layers.42.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
343
+ "model.layers.42.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
344
+ "model.layers.42.mlp.up_proj.weight": "model-00025-of-00051.safetensors",
345
+ "model.layers.42.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
346
+ "model.layers.42.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
347
+ "model.layers.42.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
348
+ "model.layers.42.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
349
+ "model.layers.42.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
350
+ "model.layers.43.input_layernorm.weight": "model-00026-of-00051.safetensors",
351
+ "model.layers.43.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
352
+ "model.layers.43.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
353
+ "model.layers.43.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
354
+ "model.layers.43.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
355
+ "model.layers.43.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
356
+ "model.layers.43.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
357
+ "model.layers.43.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
358
+ "model.layers.43.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
359
+ "model.layers.44.input_layernorm.weight": "model-00026-of-00051.safetensors",
360
+ "model.layers.44.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
361
+ "model.layers.44.mlp.gate_proj.weight": "model-00026-of-00051.safetensors",
362
+ "model.layers.44.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
363
+ "model.layers.44.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
364
+ "model.layers.44.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
365
+ "model.layers.44.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
366
+ "model.layers.44.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
367
+ "model.layers.44.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
368
+ "model.layers.45.input_layernorm.weight": "model-00027-of-00051.safetensors",
369
+ "model.layers.45.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
370
+ "model.layers.45.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
371
+ "model.layers.45.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
372
+ "model.layers.45.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
373
+ "model.layers.45.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
374
+ "model.layers.45.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
375
+ "model.layers.45.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
376
+ "model.layers.45.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
377
+ "model.layers.46.input_layernorm.weight": "model-00027-of-00051.safetensors",
378
+ "model.layers.46.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
379
+ "model.layers.46.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
380
+ "model.layers.46.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
381
+ "model.layers.46.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
382
+ "model.layers.46.self_attn.k_proj.weight": "model-00027-of-00051.safetensors",
383
+ "model.layers.46.self_attn.o_proj.weight": "model-00027-of-00051.safetensors",
384
+ "model.layers.46.self_attn.q_proj.weight": "model-00027-of-00051.safetensors",
385
+ "model.layers.46.self_attn.v_proj.weight": "model-00027-of-00051.safetensors",
386
+ "model.layers.47.input_layernorm.weight": "model-00028-of-00051.safetensors",
387
+ "model.layers.47.mlp.down_proj.weight": "model-00028-of-00051.safetensors",
388
+ "model.layers.47.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
389
+ "model.layers.47.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
390
+ "model.layers.47.post_attention_layernorm.weight": "model-00028-of-00051.safetensors",
391
+ "model.layers.47.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
392
+ "model.layers.47.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
393
+ "model.layers.47.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
394
+ "model.layers.47.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
395
+ "model.layers.48.input_layernorm.weight": "model-00029-of-00051.safetensors",
396
+ "model.layers.48.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
397
+ "model.layers.48.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
398
+ "model.layers.48.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
399
+ "model.layers.48.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
400
+ "model.layers.48.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
401
+ "model.layers.48.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
402
+ "model.layers.48.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
403
+ "model.layers.48.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
404
+ "model.layers.49.input_layernorm.weight": "model-00029-of-00051.safetensors",
405
+ "model.layers.49.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
406
+ "model.layers.49.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
407
+ "model.layers.49.mlp.up_proj.weight": "model-00029-of-00051.safetensors",
408
+ "model.layers.49.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
409
+ "model.layers.49.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
410
+ "model.layers.49.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
411
+ "model.layers.49.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
412
+ "model.layers.49.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
413
+ "model.layers.5.input_layernorm.weight": "model-00004-of-00051.safetensors",
414
+ "model.layers.5.mlp.down_proj.weight": "model-00004-of-00051.safetensors",
415
+ "model.layers.5.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
416
+ "model.layers.5.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
417
+ "model.layers.5.post_attention_layernorm.weight": "model-00004-of-00051.safetensors",
418
+ "model.layers.5.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
419
+ "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
420
+ "model.layers.5.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
421
+ "model.layers.5.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
422
+ "model.layers.50.input_layernorm.weight": "model-00030-of-00051.safetensors",
423
+ "model.layers.50.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
424
+ "model.layers.50.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
425
+ "model.layers.50.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
426
+ "model.layers.50.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
427
+ "model.layers.50.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
428
+ "model.layers.50.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
429
+ "model.layers.50.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
430
+ "model.layers.50.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
431
+ "model.layers.51.input_layernorm.weight": "model-00030-of-00051.safetensors",
432
+ "model.layers.51.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
433
+ "model.layers.51.mlp.gate_proj.weight": "model-00030-of-00051.safetensors",
434
+ "model.layers.51.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
435
+ "model.layers.51.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
436
+ "model.layers.51.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
437
+ "model.layers.51.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
438
+ "model.layers.51.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
439
+ "model.layers.51.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
440
+ "model.layers.52.input_layernorm.weight": "model-00031-of-00051.safetensors",
441
+ "model.layers.52.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
442
+ "model.layers.52.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
443
+ "model.layers.52.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
444
+ "model.layers.52.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
445
+ "model.layers.52.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
446
+ "model.layers.52.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
447
+ "model.layers.52.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
448
+ "model.layers.52.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
449
+ "model.layers.53.input_layernorm.weight": "model-00031-of-00051.safetensors",
450
+ "model.layers.53.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
451
+ "model.layers.53.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
452
+ "model.layers.53.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
453
+ "model.layers.53.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
454
+ "model.layers.53.self_attn.k_proj.weight": "model-00031-of-00051.safetensors",
455
+ "model.layers.53.self_attn.o_proj.weight": "model-00031-of-00051.safetensors",
456
+ "model.layers.53.self_attn.q_proj.weight": "model-00031-of-00051.safetensors",
457
+ "model.layers.53.self_attn.v_proj.weight": "model-00031-of-00051.safetensors",
458
+ "model.layers.54.input_layernorm.weight": "model-00032-of-00051.safetensors",
459
+ "model.layers.54.mlp.down_proj.weight": "model-00032-of-00051.safetensors",
460
+ "model.layers.54.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
461
+ "model.layers.54.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
462
+ "model.layers.54.post_attention_layernorm.weight": "model-00032-of-00051.safetensors",
463
+ "model.layers.54.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
464
+ "model.layers.54.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
465
+ "model.layers.54.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
466
+ "model.layers.54.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
467
+ "model.layers.55.input_layernorm.weight": "model-00033-of-00051.safetensors",
468
+ "model.layers.55.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
469
+ "model.layers.55.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
470
+ "model.layers.55.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
471
+ "model.layers.55.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
472
+ "model.layers.55.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
473
+ "model.layers.55.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
474
+ "model.layers.55.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
475
+ "model.layers.55.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
476
+ "model.layers.56.input_layernorm.weight": "model-00033-of-00051.safetensors",
477
+ "model.layers.56.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
478
+ "model.layers.56.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
479
+ "model.layers.56.mlp.up_proj.weight": "model-00033-of-00051.safetensors",
480
+ "model.layers.56.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
481
+ "model.layers.56.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
482
+ "model.layers.56.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
483
+ "model.layers.56.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
484
+ "model.layers.56.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
485
+ "model.layers.57.input_layernorm.weight": "model-00034-of-00051.safetensors",
486
+ "model.layers.57.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
487
+ "model.layers.57.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
488
+ "model.layers.57.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
489
+ "model.layers.57.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
490
+ "model.layers.57.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
491
+ "model.layers.57.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
492
+ "model.layers.57.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
493
+ "model.layers.57.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
494
+ "model.layers.58.input_layernorm.weight": "model-00034-of-00051.safetensors",
495
+ "model.layers.58.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
496
+ "model.layers.58.mlp.gate_proj.weight": "model-00034-of-00051.safetensors",
497
+ "model.layers.58.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
498
+ "model.layers.58.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
499
+ "model.layers.58.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
500
+ "model.layers.58.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
501
+ "model.layers.58.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
502
+ "model.layers.58.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
503
+ "model.layers.59.input_layernorm.weight": "model-00035-of-00051.safetensors",
504
+ "model.layers.59.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
505
+ "model.layers.59.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
506
+ "model.layers.59.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
507
+ "model.layers.59.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
508
+ "model.layers.59.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
509
+ "model.layers.59.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
510
+ "model.layers.59.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
511
+ "model.layers.59.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
512
+ "model.layers.6.input_layernorm.weight": "model-00005-of-00051.safetensors",
513
+ "model.layers.6.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
514
+ "model.layers.6.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
515
+ "model.layers.6.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
516
+ "model.layers.6.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
517
+ "model.layers.6.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
518
+ "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
519
+ "model.layers.6.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
520
+ "model.layers.6.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
521
+ "model.layers.60.input_layernorm.weight": "model-00035-of-00051.safetensors",
522
+ "model.layers.60.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
523
+ "model.layers.60.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
524
+ "model.layers.60.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
525
+ "model.layers.60.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
526
+ "model.layers.60.self_attn.k_proj.weight": "model-00035-of-00051.safetensors",
527
+ "model.layers.60.self_attn.o_proj.weight": "model-00035-of-00051.safetensors",
528
+ "model.layers.60.self_attn.q_proj.weight": "model-00035-of-00051.safetensors",
529
+ "model.layers.60.self_attn.v_proj.weight": "model-00035-of-00051.safetensors",
530
+ "model.layers.61.input_layernorm.weight": "model-00036-of-00051.safetensors",
531
+ "model.layers.61.mlp.down_proj.weight": "model-00036-of-00051.safetensors",
532
+ "model.layers.61.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
533
+ "model.layers.61.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
534
+ "model.layers.61.post_attention_layernorm.weight": "model-00036-of-00051.safetensors",
535
+ "model.layers.61.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
536
+ "model.layers.61.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
537
+ "model.layers.61.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
538
+ "model.layers.61.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
539
+ "model.layers.62.input_layernorm.weight": "model-00037-of-00051.safetensors",
540
+ "model.layers.62.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
541
+ "model.layers.62.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
542
+ "model.layers.62.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
543
+ "model.layers.62.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
544
+ "model.layers.62.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
545
+ "model.layers.62.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
546
+ "model.layers.62.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
547
+ "model.layers.62.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
548
+ "model.layers.63.input_layernorm.weight": "model-00037-of-00051.safetensors",
549
+ "model.layers.63.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
550
+ "model.layers.63.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
551
+ "model.layers.63.mlp.up_proj.weight": "model-00037-of-00051.safetensors",
552
+ "model.layers.63.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
553
+ "model.layers.63.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
554
+ "model.layers.63.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
555
+ "model.layers.63.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
556
+ "model.layers.63.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
557
+ "model.layers.64.input_layernorm.weight": "model-00038-of-00051.safetensors",
558
+ "model.layers.64.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
559
+ "model.layers.64.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
560
+ "model.layers.64.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
561
+ "model.layers.64.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
562
+ "model.layers.64.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
563
+ "model.layers.64.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
564
+ "model.layers.64.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
565
+ "model.layers.64.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
566
+ "model.layers.65.input_layernorm.weight": "model-00038-of-00051.safetensors",
567
+ "model.layers.65.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
568
+ "model.layers.65.mlp.gate_proj.weight": "model-00038-of-00051.safetensors",
569
+ "model.layers.65.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
570
+ "model.layers.65.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
571
+ "model.layers.65.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
572
+ "model.layers.65.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
573
+ "model.layers.65.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
574
+ "model.layers.65.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
575
+ "model.layers.66.input_layernorm.weight": "model-00039-of-00051.safetensors",
576
+ "model.layers.66.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
577
+ "model.layers.66.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
578
+ "model.layers.66.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
579
+ "model.layers.66.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
580
+ "model.layers.66.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
581
+ "model.layers.66.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
582
+ "model.layers.66.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
583
+ "model.layers.66.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
584
+ "model.layers.67.input_layernorm.weight": "model-00039-of-00051.safetensors",
585
+ "model.layers.67.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
586
+ "model.layers.67.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
587
+ "model.layers.67.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
588
+ "model.layers.67.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
589
+ "model.layers.67.self_attn.k_proj.weight": "model-00039-of-00051.safetensors",
590
+ "model.layers.67.self_attn.o_proj.weight": "model-00039-of-00051.safetensors",
591
+ "model.layers.67.self_attn.q_proj.weight": "model-00039-of-00051.safetensors",
592
+ "model.layers.67.self_attn.v_proj.weight": "model-00039-of-00051.safetensors",
593
+ "model.layers.68.input_layernorm.weight": "model-00040-of-00051.safetensors",
594
+ "model.layers.68.mlp.down_proj.weight": "model-00040-of-00051.safetensors",
595
+ "model.layers.68.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
596
+ "model.layers.68.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
597
+ "model.layers.68.post_attention_layernorm.weight": "model-00040-of-00051.safetensors",
598
+ "model.layers.68.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
599
+ "model.layers.68.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
600
+ "model.layers.68.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
601
+ "model.layers.68.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
602
+ "model.layers.69.input_layernorm.weight": "model-00041-of-00051.safetensors",
603
+ "model.layers.69.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
604
+ "model.layers.69.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
605
+ "model.layers.69.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
606
+ "model.layers.69.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
607
+ "model.layers.69.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
608
+ "model.layers.69.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
609
+ "model.layers.69.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
610
+ "model.layers.69.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
611
+ "model.layers.7.input_layernorm.weight": "model-00005-of-00051.safetensors",
612
+ "model.layers.7.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
613
+ "model.layers.7.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
614
+ "model.layers.7.mlp.up_proj.weight": "model-00005-of-00051.safetensors",
615
+ "model.layers.7.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
616
+ "model.layers.7.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
617
+ "model.layers.7.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
618
+ "model.layers.7.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
619
+ "model.layers.7.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
620
+ "model.layers.70.input_layernorm.weight": "model-00041-of-00051.safetensors",
621
+ "model.layers.70.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
622
+ "model.layers.70.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
623
+ "model.layers.70.mlp.up_proj.weight": "model-00041-of-00051.safetensors",
624
+ "model.layers.70.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
625
+ "model.layers.70.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
626
+ "model.layers.70.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
627
+ "model.layers.70.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
628
+ "model.layers.70.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
629
+ "model.layers.71.input_layernorm.weight": "model-00042-of-00051.safetensors",
630
+ "model.layers.71.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
631
+ "model.layers.71.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
632
+ "model.layers.71.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
633
+ "model.layers.71.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
634
+ "model.layers.71.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
635
+ "model.layers.71.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
636
+ "model.layers.71.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
637
+ "model.layers.71.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
638
+ "model.layers.72.input_layernorm.weight": "model-00042-of-00051.safetensors",
639
+ "model.layers.72.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
640
+ "model.layers.72.mlp.gate_proj.weight": "model-00042-of-00051.safetensors",
641
+ "model.layers.72.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
642
+ "model.layers.72.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
643
+ "model.layers.72.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
644
+ "model.layers.72.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
645
+ "model.layers.72.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
646
+ "model.layers.72.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
647
+ "model.layers.73.input_layernorm.weight": "model-00043-of-00051.safetensors",
648
+ "model.layers.73.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
649
+ "model.layers.73.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
650
+ "model.layers.73.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
651
+ "model.layers.73.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
652
+ "model.layers.73.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
653
+ "model.layers.73.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
654
+ "model.layers.73.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
655
+ "model.layers.73.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
656
+ "model.layers.74.input_layernorm.weight": "model-00043-of-00051.safetensors",
657
+ "model.layers.74.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
658
+ "model.layers.74.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
659
+ "model.layers.74.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
660
+ "model.layers.74.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
661
+ "model.layers.74.self_attn.k_proj.weight": "model-00043-of-00051.safetensors",
662
+ "model.layers.74.self_attn.o_proj.weight": "model-00043-of-00051.safetensors",
663
+ "model.layers.74.self_attn.q_proj.weight": "model-00043-of-00051.safetensors",
664
+ "model.layers.74.self_attn.v_proj.weight": "model-00043-of-00051.safetensors",
665
+ "model.layers.75.input_layernorm.weight": "model-00044-of-00051.safetensors",
666
+ "model.layers.75.mlp.down_proj.weight": "model-00044-of-00051.safetensors",
667
+ "model.layers.75.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
668
+ "model.layers.75.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
669
+ "model.layers.75.post_attention_layernorm.weight": "model-00044-of-00051.safetensors",
670
+ "model.layers.75.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
671
+ "model.layers.75.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
672
+ "model.layers.75.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
673
+ "model.layers.75.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
674
+ "model.layers.76.input_layernorm.weight": "model-00045-of-00051.safetensors",
675
+ "model.layers.76.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
676
+ "model.layers.76.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
677
+ "model.layers.76.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
678
+ "model.layers.76.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
679
+ "model.layers.76.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
680
+ "model.layers.76.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
681
+ "model.layers.76.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
682
+ "model.layers.76.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
683
+ "model.layers.77.input_layernorm.weight": "model-00045-of-00051.safetensors",
684
+ "model.layers.77.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
685
+ "model.layers.77.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
686
+ "model.layers.77.mlp.up_proj.weight": "model-00045-of-00051.safetensors",
687
+ "model.layers.77.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
688
+ "model.layers.77.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
689
+ "model.layers.77.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
690
+ "model.layers.77.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
691
+ "model.layers.77.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
692
+ "model.layers.78.input_layernorm.weight": "model-00046-of-00051.safetensors",
693
+ "model.layers.78.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
694
+ "model.layers.78.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
695
+ "model.layers.78.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
696
+ "model.layers.78.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
697
+ "model.layers.78.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
698
+ "model.layers.78.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
699
+ "model.layers.78.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
700
+ "model.layers.78.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
701
+ "model.layers.79.input_layernorm.weight": "model-00046-of-00051.safetensors",
702
+ "model.layers.79.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
703
+ "model.layers.79.mlp.gate_proj.weight": "model-00046-of-00051.safetensors",
704
+ "model.layers.79.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
705
+ "model.layers.79.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
706
+ "model.layers.79.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
707
+ "model.layers.79.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
708
+ "model.layers.79.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
709
+ "model.layers.79.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
710
+ "model.layers.8.input_layernorm.weight": "model-00006-of-00051.safetensors",
711
+ "model.layers.8.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
712
+ "model.layers.8.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
713
+ "model.layers.8.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
714
+ "model.layers.8.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
715
+ "model.layers.8.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
716
+ "model.layers.8.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
717
+ "model.layers.8.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
718
+ "model.layers.8.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
719
+ "model.layers.80.input_layernorm.weight": "model-00047-of-00051.safetensors",
720
+ "model.layers.80.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
721
+ "model.layers.80.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
722
+ "model.layers.80.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
723
+ "model.layers.80.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
724
+ "model.layers.80.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
725
+ "model.layers.80.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
726
+ "model.layers.80.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
727
+ "model.layers.80.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
728
+ "model.layers.81.input_layernorm.weight": "model-00047-of-00051.safetensors",
729
+ "model.layers.81.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
730
+ "model.layers.81.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
731
+ "model.layers.81.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
732
+ "model.layers.81.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
733
+ "model.layers.81.self_attn.k_proj.weight": "model-00047-of-00051.safetensors",
734
+ "model.layers.81.self_attn.o_proj.weight": "model-00047-of-00051.safetensors",
735
+ "model.layers.81.self_attn.q_proj.weight": "model-00047-of-00051.safetensors",
736
+ "model.layers.81.self_attn.v_proj.weight": "model-00047-of-00051.safetensors",
737
+ "model.layers.82.input_layernorm.weight": "model-00048-of-00051.safetensors",
738
+ "model.layers.82.mlp.down_proj.weight": "model-00048-of-00051.safetensors",
739
+ "model.layers.82.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
740
+ "model.layers.82.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
741
+ "model.layers.82.post_attention_layernorm.weight": "model-00048-of-00051.safetensors",
742
+ "model.layers.82.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
743
+ "model.layers.82.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
744
+ "model.layers.82.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
745
+ "model.layers.82.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
746
+ "model.layers.83.input_layernorm.weight": "model-00049-of-00051.safetensors",
747
+ "model.layers.83.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
748
+ "model.layers.83.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
749
+ "model.layers.83.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
750
+ "model.layers.83.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
751
+ "model.layers.83.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
752
+ "model.layers.83.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
753
+ "model.layers.83.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
754
+ "model.layers.83.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
755
+ "model.layers.84.input_layernorm.weight": "model-00049-of-00051.safetensors",
756
+ "model.layers.84.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
757
+ "model.layers.84.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
758
+ "model.layers.84.mlp.up_proj.weight": "model-00049-of-00051.safetensors",
759
+ "model.layers.84.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
760
+ "model.layers.84.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
761
+ "model.layers.84.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
762
+ "model.layers.84.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
763
+ "model.layers.84.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
764
+ "model.layers.85.input_layernorm.weight": "model-00050-of-00051.safetensors",
765
+ "model.layers.85.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
766
+ "model.layers.85.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
767
+ "model.layers.85.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
768
+ "model.layers.85.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
769
+ "model.layers.85.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
770
+ "model.layers.85.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
771
+ "model.layers.85.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
772
+ "model.layers.85.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
773
+ "model.layers.86.input_layernorm.weight": "model-00050-of-00051.safetensors",
774
+ "model.layers.86.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
775
+ "model.layers.86.mlp.gate_proj.weight": "model-00050-of-00051.safetensors",
776
+ "model.layers.86.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
777
+ "model.layers.86.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
778
+ "model.layers.86.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
779
+ "model.layers.86.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
780
+ "model.layers.86.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
781
+ "model.layers.86.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
782
+ "model.layers.87.input_layernorm.weight": "model-00051-of-00051.safetensors",
783
+ "model.layers.87.mlp.down_proj.weight": "model-00051-of-00051.safetensors",
784
+ "model.layers.87.mlp.gate_proj.weight": "model-00051-of-00051.safetensors",
785
+ "model.layers.87.mlp.up_proj.weight": "model-00051-of-00051.safetensors",
786
+ "model.layers.87.post_attention_layernorm.weight": "model-00051-of-00051.safetensors",
787
+ "model.layers.87.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
788
+ "model.layers.87.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
789
+ "model.layers.87.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
790
+ "model.layers.87.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
791
+ "model.layers.9.input_layernorm.weight": "model-00006-of-00051.safetensors",
792
+ "model.layers.9.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
793
+ "model.layers.9.mlp.gate_proj.weight": "model-00006-of-00051.safetensors",
794
+ "model.layers.9.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
795
+ "model.layers.9.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
796
+ "model.layers.9.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
797
+ "model.layers.9.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
798
+ "model.layers.9.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
799
+ "model.layers.9.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
800
+ "model.norm.weight": "model-00051-of-00051.safetensors"
801
+ }
802
+ }
output-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:904ebc593a37172a1859ba3ab4f1600f7cfd239059abf5340d42f8b5bb344e38
3
+ size 8542303534
output-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec9329d43017108b0f356fa95821ffb6905100bf1474c59601ccefa2d8efa7d9
3
+ size 8444257302
output-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8b6888f7e80666294ec77a48ddd01f5678fa8be0166a4e0e1ac2a6849cc0e47
3
+ size 8477914022
output-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93aace360a0720cbffcfe11818f5ed7ca566853b20e47b65091f73e384da5964
3
+ size 8571883530
output-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28912fa047378865f68eb97882411891bcbf8688a059713e54bf211a1cf354ea
3
+ size 8506114580
output-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81b778f7aa381cfc1344f832bf850cedfe24dd56dc8711842750eb11e75cf3a4
3
+ size 1967327076
params.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 12288,
3
+ "n_layers": 88,
4
+ "head_dim": 128,
5
+ "hidden_dim": 28672,
6
+ "n_heads": 96,
7
+ "n_kv_heads": 8,
8
+ "norm_eps": 1e-05,
9
+ "vocab_size": 32768,
10
+ "rope_theta": 1000000.0
11
+ }
test.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Dict
3
+
4
+ from safetensors.torch import load_file, save_file
5
+ from huggingface_hub import split_torch_state_dict_into_shards
6
+ import torch
7
+ import os
8
+
9
+ def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
10
+ state_dict_split = split_torch_state_dict_into_shards(state_dict, filename_pattern='consolidated{suffix}.safetensors')
11
+ for filename, tensors in state_dict_split.filename_to_tensors.items():
12
+ shard = {tensor: state_dict[tensor] for tensor in tensors}
13
+ print("Saving", save_directory, filename)
14
+ save_file(shard, os.path.join(save_directory, filename))
15
+ if state_dict_split.is_sharded:
16
+ index = {
17
+ "metadata": state_dict_split.metadata,
18
+ "weight_map": state_dict_split.tensor_to_filename,
19
+ }
20
+ with open(os.path.join(save_directory, "consolidated.safetensors.index.json"), "w") as f:
21
+ f.write(json.dumps(index, indent=2))
22
+
23
+ big_file = 'consolidated.safetensors'
24
+ loaded = load_file(big_file)
25
+
26
+ save_state_dict(loaded, save_directory=f'.')
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer.model.v3 ADDED
Binary file (588 kB). View file
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
upload.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from huggingface_hub import HfApi
2
+ from pathlib import Path
3
+
4
+ # Define the parameters for uploading
5
+ repo_id = "DBMe/Mistral-Large-Instruct-2407-2.85bpw-h6-exl2" # Replace with your actual repo ID
6
+ folder_path = "/home/asusws-x570-ace/programs/tabbyAPI-new/models/Mistral-Large-Instruct-2407_exl2_2.85bpw/" # Replace with your folder path
7
+ repo_type = "model" # Change to "model" or "space" if applicable
8
+ revision = "main" # Optional: specify the branch or use "main"
9
+ private = False # Set to True if the repository should be private
10
+ allow_patterns = None # Optional: specify patterns of files to include
11
+ ignore_patterns = None # Optional: specify patterns of files to exclude
12
+ num_workers = 4 # Set based on your system; lower if your internet is unstable
13
+ print_report = True # Enable progress reporting
14
+ print_report_every = 60 # Report frequency in seconds
15
+
16
+ # Initialize the Hugging Face API client
17
+ api = HfApi()
18
+
19
+ # Function to upload the folder in a resumable manner
20
+ def upload_resumable():
21
+ try:
22
+ print("Starting upload process...")
23
+
24
+ # Perform the upload with the provided parameters
25
+ api.upload_large_folder(
26
+ repo_id=repo_id,
27
+ folder_path=Path(folder_path),
28
+ repo_type=repo_type,
29
+ revision=revision,
30
+ private=private,
31
+ allow_patterns=allow_patterns,
32
+ ignore_patterns=ignore_patterns,
33
+ num_workers=num_workers,
34
+ print_report=print_report,
35
+ print_report_every=print_report_every,
36
+ )
37
+
38
+ print("Upload completed successfully!")
39
+
40
+ except Exception as e:
41
+ print(f"Upload interrupted due to error: {e}")
42
+ print("You can resume the upload by running the script again.")
43
+
44
+ # Call the function to start the upload
45
+ upload_resumable()