h2m commited on
Commit
87bd093
1 Parent(s): dde2654

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - merge
5
+ - moe
6
+ language:
7
+ - en
8
+ ---
9
+ ![image/webp](https://cdn.discordapp.com/attachments/1194008951805714563/1197228918097313903/OIG.png?ex=65ba8151&is=65a80c51&hm=465a1a8f7aa4c0002017987123951efed25cd8d87d91f5a9ecc30d8c04e88f46&)
10
+
11
+ # Burning-Bruce - 4x7b
12
+
13
+ We didn't start the **fire**.
14
+
15
+ This model is a Mixture of Experts (MoE) made with [mergekit](https://github.com/cg123/mergekit/tree/mixtral)
16
+
17
+ by Kquant03, Dontriskit and NeuralNovel
18
+
19
+ [Join our Discord!](https://discord.gg/Qge8Ds9C)
20
+
21
+
22
+ ## Models used:
23
+ - [leveldevai/TurdusBeagle-7B](https://huggingface.co/leveldevai/TurdusBeagle-7B) - base
24
+ - [leveldevai/TurdusBeagle-7B](https://huggingface.co/leveldevai/TurdusBeagle-7B) - expert #1
25
+ - [udkai/Turdus](https://huggingface.co/nfaheem/udkai/Turdus) - expert #2
26
+ - [nfaheem/Marcoroni-7b-DPO-Merge](https://huggingface.co/nfaheem/Marcoroni-7b-DPO-Merge) - expert #3
27
+ - [Toten5/Marcoroni-neural-chat-7B-v2](https://huggingface.co/Toten5/Marcoroni-neural-chat-7B-v2) - expert #4
28
+
29
+ # "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
30
+ ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
31
+
32
+ The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
33
+
34
+ Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.
35
+
36
+ So, what exactly is a MoE? In the context of transformer models, a MoE consists of two main elements:
37
+
38
+ Sparse MoE layers are used instead of dense feed-forward network (FFN) layers. MoE layers have a certain number of “experts” (e.g. 32 in my "frankenMoE"), where each expert is a neural network. In practice, the experts are FFNs, but they can also be more complex networks or even a MoE itself, leading to hierarchical MoEs!
39
+
40
+ A gate network or router, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token "Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs - the router is composed of learned parameters and is pretrained at the same time as the rest of the network.
41
+
42
+ At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively.
43
+
44
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/up_I0R2TQGjqTShZp_1Sz.png)
45
+
46
+ Switch Layer
47
+ MoE layer from the [Switch Transformers paper](https://arxiv.org/abs/2101.03961)
48
+
49
+ So, to recap, in MoEs we replace every FFN layer of the transformer model with an MoE layer, which is composed of a gate network and a certain number of experts.
50
+
51
+ Although MoEs provide benefits like efficient pretraining and faster inference compared to dense models, they also come with challenges:
52
+
53
+ Training: MoEs enable significantly more compute-efficient pretraining, but they’ve historically struggled to generalize during fine-tuning, leading to overfitting.
54
+ Inference: Although a MoE might have many parameters, only some of them are used during inference. This leads to much faster inference compared to a dense model with the same number of parameters. However, all parameters need to be loaded in RAM, so memory requirements are high. For example, [given a MoE like Mixtral 8x7B](https://huggingface.co/blog/moe), we’ll need to have enough VRAM to hold a dense 47B parameter model. Why 47B parameters and not 8 x 7B = 56B? That’s because in MoE models, only the FFN layers are treated as individual experts, and the rest of the model parameters are shared. At the same time, assuming just two experts are being used per token, the inference speed (FLOPs) is like using a 12B model (as opposed to a 14B model), because it computes 2x7B matrix multiplications, but with some layers shared (more on this soon).
55
+
56
+ If all our tokens are sent to just a few popular experts, that will make training inefficient. In a normal MoE training, the gating network converges to mostly activate the same few experts. This self-reinforces as favored experts are trained quicker and hence selected more. To mitigate this, an auxiliary loss is added to encourage giving all experts equal importance. This loss ensures that all experts receive a roughly equal number of training examples. The following sections will also explore the concept of expert capacity, which introduces a threshold of how many tokens can be processed by an expert. In transformers, the auxiliary loss is exposed via the aux_loss parameter.
57
+
58
+
59
+ ## "Wait...but you called this a frankenMoE?"
60
+ The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously.
61
+
62
+ Sponsored by: [Dontriskit](https://huggingface.co/h2m)
63
+
64
+ # Evals
65
+
66
+
67
+ *coming soon*
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<|im_end|>": 32000,
3
+ "<|im_start|>": 32001
4
+ }
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "leveldevai/TurdusBeagle-7B",
3
+ "architectures": [
4
+ "MixtralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mixtral",
15
+ "num_attention_heads": 32,
16
+ "num_experts_per_tok": 2,
17
+ "num_hidden_layers": 32,
18
+ "num_key_value_heads": 8,
19
+ "num_local_experts": 4,
20
+ "output_router_logits": false,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_theta": 10000.0,
23
+ "router_aux_loss_coef": 0.001,
24
+ "sliding_window": null,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "float16",
27
+ "transformers_version": "4.36.2",
28
+ "use_cache": false,
29
+ "vocab_size": 32000
30
+ }
measurement.json ADDED
The diff for this file is too large to render. See raw diff
 
mergekit_moe_config.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ base_model: leveldevai/TurdusBeagle-7B
3
+ gate_mode: hidden
4
+ dtype: bfloat16
5
+ experts:
6
+ - source_model: leveldevai/TurdusBeagle-7B
7
+ positive_prompts:
8
+ - "Answer this question from the ARC (Argument Reasoning Comprehension)."
9
+ - "Use common sense and logical reasoning skills."
10
+ - source_model: udkai/Turdus
11
+ positive_prompts:
12
+ - "Answer this question from the Winogrande test."
13
+ - "Use advanced knowledge of culture and humanity"
14
+ - source_model: nfaheem/Marcoroni-7b-DPO-Merge
15
+ positive_prompts:
16
+ - "answer questions with realistic and correct answers...with the honest truth"
17
+ - source_model: Toten5/Marcoroni-neural-chat-7B-v2
18
+ positive_prompts:
19
+ - "Calculate the answer to this math problem"
20
+ - "My mathematical capabilities are strong, allowing me to handle complex mathematical queries"
21
+ - "solve for"
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.3.2"}, "weight_map": {"model.embed_tokens.weight": "model-00001-of-00005.safetensors", "model.norm.weight": "model-00001-of-00005.safetensors", "lm_head.weight": "model-00001-of-00005.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.6.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.7.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.8.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.9.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.13.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.14.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.15.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.16.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.17.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.18.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.19.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.20.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.21.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.22.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.23.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.24.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.25.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.26.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.27.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.28.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.29.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.30.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.31.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.0.w3.weight": "model-00001-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.1.w3.weight": "model-00001-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.2.w3.weight": "model-00001-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.3.w3.weight": "model-00001-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.0.w3.weight": "model-00002-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.1.w3.weight": "model-00002-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.2.w3.weight": "model-00002-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.3.w3.weight": "model-00002-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.1.w2.weight": "model-00002-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.2.w2.weight": "model-00002-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.3.w2.weight": "model-00002-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.0.w2.weight": "model-00002-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.2.w2.weight": "model-00003-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.3.w2.weight": "model-00003-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.0.w2.weight": "model-00003-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.1.w2.weight": "model-00003-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.2.w2.weight": "model-00004-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.3.w2.weight": "model-00004-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.0.w2.weight": "model-00004-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.1.w2.weight": "model-00004-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.2.w2.weight": "model-00004-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.3.w2.weight": "model-00004-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.0.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.1.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.2.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.3.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.4.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.5.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.6.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.7.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.8.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.9.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.10.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.11.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.12.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.13.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.14.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.15.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.16.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.17.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.18.block_sparse_moe.experts.3.w1.weight": "model-00004-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.0.w1.weight": "model-00004-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.1.w1.weight": "model-00004-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.2.w1.weight": "model-00004-of-00005.safetensors", "model.layers.19.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.20.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.21.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.22.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.23.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.24.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.25.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.26.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.27.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.28.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.29.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.30.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.0.w1.weight": "model-00005-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.1.w1.weight": "model-00005-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.2.w1.weight": "model-00005-of-00005.safetensors", "model.layers.31.block_sparse_moe.experts.3.w1.weight": "model-00005-of-00005.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.0.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.1.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.2.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.3.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.4.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.5.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.6.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.7.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.8.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.9.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.10.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.11.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.12.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.13.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.14.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.15.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.16.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.17.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.18.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.19.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.20.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.21.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.22.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.23.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.24.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.25.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.26.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.27.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.28.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.29.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.30.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors", "model.layers.31.block_sparse_moe.gate.weight": "model-00005-of-00005.safetensors"}}
output-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6937e207ec36acad91d9709ca83da03a355492bd882aad65ef9dad09cf41661a
3
+ size 8563151080
output-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9552a426e4de91403cea3839e924a80db77bc7128f9ad007553f9d33c7b48e5a
3
+ size 8543524552
output-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d83981add908cb57e9e8fc62a37f6a1d7b879b670d2c99457e0e5ec7d0a43320
3
+ size 7175147896
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "32000": {
30
+ "content": "<|im_end|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "32001": {
38
+ "content": "<|im_start|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ }
45
+ },
46
+ "additional_special_tokens": [],
47
+ "bos_token": "<s>",
48
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
49
+ "clean_up_tokenization_spaces": false,
50
+ "eos_token": "<|im_end|>",
51
+ "legacy": true,
52
+ "model_max_length": 1000000000000000019884624838656,
53
+ "pad_token": "<s>",
54
+ "sp_model_kwargs": {},
55
+ "spaces_between_special_tokens": false,
56
+ "tokenizer_class": "LlamaTokenizer",
57
+ "trust_remote_code": false,
58
+ "unk_token": "<unk>",
59
+ "use_default_system_prompt": true,
60
+ "use_fast": true
61
+ }