Mohammad Bani Almarjeh commited on
Commit
52e5d4a
1 Parent(s): 99a33ef

update model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.csv filter=lfs diff=lfs merge=lfs -text
README.md DELETED
@@ -1,95 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- inference: false
4
- tags:
5
- - generated_from_trainer
6
- - text-generation-inference
7
- model-index:
8
- - name: Mistral-7B-Banking-v2
9
- results: []
10
- model_type: mistral
11
- pipeline_tag: text-generation
12
- widget:
13
- - messages:
14
- - role: user
15
- content: I want to close an online account
16
- ---
17
-
18
- # Mistral-7B-Banking-v2
19
-
20
- ## Model Description
21
-
22
- This model, "Mistral-7B-Banking-v2", is a fine-tuned version of the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), specifically tailored for the Banking domain. It is optimized to answer questions and assist users with various banking transactions. It has been trained using hybrid synthetic data generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools.
23
-
24
- The goal of this model is to show that a generic verticalized model makes customization for a final use case much easier. For example, if you are "ACME Bank", you can create your own customized model by using this fine-tuned model and a doing an additional fine-tuning using a small amount of your own data. An overview of this approach can be found at: [From General-Purpose LLMs to Verticalized Enterprise Models](https://www.bitext.com/blog/general-purpose-models-verticalized-enterprise-genai/)
25
-
26
- ## Intended Use
27
-
28
- - **Recommended applications**: This model is designed to be used as the first step in Bitext’s two-step approach to LLM fine-tuning for the creation of chatbots, virtual assistants and copilots for the Banking domain, providing customers with fast and accurate answers about their banking needs.
29
- - **Out-of-scope**: This model is not suited for non-banking related questions and should not be used for providing health, legal, or critical safety advice.
30
-
31
- ## Usage Example
32
-
33
- ```python
34
- from transformers import AutoModelForCausalLM, AutoTokenizer
35
-
36
- model = AutoModelForCausalLM.from_pretrained("bitext-llm/Mistral-7B-Banking-v2")
37
- tokenizer = AutoTokenizer.from_pretrained("bitext-llm/Mistral-7B-Banking-v2")
38
-
39
- inputs = tokenizer("<s>[INST] How can I transfer money to another account?[/INST]", return_tensors="pt")
40
- outputs = model.generate(inputs['input_ids'], max_length=50)
41
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
42
- ```
43
-
44
- ## Model Architecture
45
-
46
- This model utilizes the `MistralForCausalLM` architecture with a `LlamaTokenizer`, ensuring it retains the foundational capabilities of the base model while being specifically enhanced for banking-related interactions.
47
-
48
- ## Training Data
49
-
50
- The model was fine-tuned on a dataset comprising various banking-related intents, including transactions like balance checks, money transfers, loan applications, and more, totaling 89 intents each represented by approximately 1000 examples. This comprehensive training helps the model address a broad spectrum of banking-related questions effectively. The dataset follows the same structured approach as our dataset published on Hugging Face as [bitext/Bitext-customer-support-llm-chatbot-training-dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset), but with a focus on banking.
51
-
52
- ## Training Procedure
53
-
54
- ### Hyperparameters
55
-
56
- - **Optimizer**: AdamW
57
- - **Learning Rate**: 0.0002 with a cosine learning rate scheduler
58
- - **Epochs**: 4
59
- - **Batch Size**: 10
60
- - **Gradient Accumulation Steps**: 8
61
- - **Maximum Sequence Length**: 8192 tokens
62
-
63
- ### Environment
64
-
65
- - **Transformers Version**: 4.40.0.dev0
66
- - **Framework**: PyTorch 2.2.1+cu121
67
- - **Tokenizers**: Tokenizers 0.15.0
68
-
69
- ## Limitations and Bias
70
-
71
- - The model is trained for banking-specific contexts but may underperform in unrelated areas.
72
- - Potential biases in the training data could affect the neutrality of the responses; users are encouraged to evaluate responses critically.
73
-
74
- ## Ethical Considerations
75
-
76
- It is important to use this technology thoughtfully, ensuring it does not substitute for human judgment where necessary, especially in sensitive financial situations.
77
-
78
- ## Acknowledgments
79
-
80
- This model was developed and trained by Bitext using proprietary data and technology.
81
-
82
- ## License
83
-
84
- This model, "Mistral-7B-Banking-v2", is licensed under the Apache License 2.0 by Bitext Innovations International, Inc. This open-source license allows for free use, modification, and distribution of the model but requires that proper credit be given to Bitext.
85
-
86
- ### Key Points of the Apache 2.0 License
87
-
88
- - **Permissibility**: Users are allowed to use, modify, and distribute this software freely.
89
- - **Attribution**: You must provide proper credit to Bitext Innovations International, Inc. when using this model, in accordance with the original copyright notices and the license.
90
- - **Patent Grant**: The license includes a grant of patent rights from the contributors of the model.
91
- - **No Warranty**: The model is provided "as is" without warranties of any kind.
92
-
93
- You may view the full license text at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
94
-
95
- This licensing ensures the model can be used widely and freely while respecting the intellectual contributions of Bitext. For more detailed information or specific legal questions about using this license, please refer to the official license documentation linked above.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json DELETED
@@ -1,26 +0,0 @@
1
- {
2
- "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
3
- "architectures": [
4
- "MistralForCausalLM"
5
- ],
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "eos_token_id": 2,
9
- "hidden_act": "silu",
10
- "hidden_size": 4096,
11
- "initializer_range": 0.02,
12
- "intermediate_size": 14336,
13
- "max_position_embeddings": 32768,
14
- "model_type": "mistral",
15
- "num_attention_heads": 32,
16
- "num_hidden_layers": 32,
17
- "num_key_value_heads": 8,
18
- "rms_norm_eps": 1e-05,
19
- "rope_theta": 1000000.0,
20
- "sliding_window": null,
21
- "tie_word_embeddings": false,
22
- "torch_dtype": "bfloat16",
23
- "transformers_version": "4.40.0.dev0",
24
- "use_cache": false,
25
- "vocab_size": 32000
26
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
generation_config.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "_from_model_config": true,
3
- "bos_token_id": 1,
4
- "do_sample": true,
5
- "eos_token_id": 2,
6
- "transformers_version": "4.40.0.dev0"
7
- }
 
 
 
 
 
 
 
 
model-00001-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2a4a8ffcdfaa925c1fb067998de010bdbd382a9ddca24ab95118f251e418677f
3
- size 4943162336
 
 
 
 
model-00002-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e0402539c103084db512a1f2d2e5fea81819a79ff883bf1faa113e46f9af90f
3
- size 4999819336
 
 
 
 
model-00003-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9faeca1e64a4c7fe2d99199e02c1e01d88aeb7f2702b42efc6fc9ad117952ed3
3
- size 4540516344
 
 
 
 
model.safetensors.index.json DELETED
@@ -1,298 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_size": 14483464192
4
- },
5
- "weight_map": {
6
- "lm_head.weight": "model-00003-of-00003.safetensors",
7
- "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
- "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
- "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
- "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
27
- "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
28
- "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
- "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
- "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
31
- "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
- "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
- "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
- "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
- "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
- "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
- "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
- "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
- "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
- "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
41
- "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
42
- "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
43
- "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
44
- "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
- "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
- "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
- "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
- "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
- "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
- "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
- "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
- "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
- "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
- "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
- "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
- "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
- "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
- "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
- "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
- "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
- "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
- "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
- "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
- "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
- "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
- "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
- "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
- "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
- "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
- "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
- "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
- "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
- "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
- "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
- "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
- "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
- "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
- "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
- "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
- "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
- "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
- "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
- "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
- "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
- "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
- "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
- "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
- "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
- "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
- "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
- "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
- "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
- "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
- "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
- "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
- "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
- "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
- "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
- "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
- "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
- "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
- "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
- "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
- "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
- "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
- "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
- "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
- "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
- "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
- "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
- "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
- "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
- "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
- "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
- "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
- "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
- "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
- "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
- "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
- "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
- "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
- "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
- "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
- "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
- "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
- "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
- "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
- "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
- "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
- "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
- "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
- "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
- "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
- "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
- "model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
144
- "model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
145
- "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
146
- "model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
147
- "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
148
- "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
- "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
- "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
- "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
- "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
- "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
- "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
155
- "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
- "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
- "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
158
- "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
159
- "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
160
- "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
161
- "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
- "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
- "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
- "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
- "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
- "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
- "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
- "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
- "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
- "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
- "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
- "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
- "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
- "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
- "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
- "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
- "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
- "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
- "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
- "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
- "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
- "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
- "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
- "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
- "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
- "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
- "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
- "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
- "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
- "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
- "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
- "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
- "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
- "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
- "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
- "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
- "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
- "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
- "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
- "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
- "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
- "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
- "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
- "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
- "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
- "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
- "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
- "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
- "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
- "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
- "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
- "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
- "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
- "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
- "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
- "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
- "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
- "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
- "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
- "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
- "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
- "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
- "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
- "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
- "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
- "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
- "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
- "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
- "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
- "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
- "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
- "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
- "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
- "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
- "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
- "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
- "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
- "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
- "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
- "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
- "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
- "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
- "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
- "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
- "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
- "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
- "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
- "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
- "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
- "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
- "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
- "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
- "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
- "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
- "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
- "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
- "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
- "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
- "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
- "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
- "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
- "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
- "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
- "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
- "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
- "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
- "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
- "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
- "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
- "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
- "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
- "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
- "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
- "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
- "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
- "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
- "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
- "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
- "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
- "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
- "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
- "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
- "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
- "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
- "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
- "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
- "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
- "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
- "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
- "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
- "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
- "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
- "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
- "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
- "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
- "model.norm.weight": "model-00003-of-00003.safetensors"
297
- }
298
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pytorch_model-00001-of-00003.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d5690374baa48df9e45b7d407216b74326fe9f04de7cd5bf7cb717d17937795c
3
- size 4943185632
 
 
 
 
pytorch_model-00002-of-00003.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ebc177b973beaca8664185b294a142b5d982e114c3f1da3f6e903ff80641204
3
- size 4999844744
 
 
 
 
pytorch_model-00003-of-00003.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ca102869494a47080330ab03aa0af3d3eea0d5a96cd4b4eec350d77861afcef0
3
- size 4540537414
 
 
 
 
pytorch_model.bin.index.json DELETED
@@ -1,298 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_size": 14483464192
4
- },
5
- "weight_map": {
6
- "lm_head.weight": "pytorch_model-00003-of-00003.bin",
7
- "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
8
- "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
9
- "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
10
- "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
11
- "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
12
- "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
13
- "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
14
- "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
15
- "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
16
- "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
17
- "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
18
- "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
19
- "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
20
- "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
21
- "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
22
- "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
23
- "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
24
- "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
25
- "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
26
- "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
27
- "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
28
- "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
29
- "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
30
- "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
31
- "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
32
- "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
33
- "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
34
- "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
35
- "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
36
- "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
37
- "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
38
- "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
39
- "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
40
- "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
41
- "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
42
- "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
43
- "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
44
- "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
45
- "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
46
- "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
47
- "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
48
- "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
49
- "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
50
- "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
51
- "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
52
- "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
53
- "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
54
- "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
55
- "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
56
- "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
57
- "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
58
- "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
59
- "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
60
- "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
61
- "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
62
- "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
63
- "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
64
- "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
65
- "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
66
- "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
67
- "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
68
- "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
69
- "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
70
- "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
71
- "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
72
- "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
73
- "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
74
- "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
75
- "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
76
- "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
77
- "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
78
- "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
79
- "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
80
- "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
81
- "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
82
- "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
83
- "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
84
- "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
85
- "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
86
- "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
87
- "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
88
- "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
89
- "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
90
- "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
91
- "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
92
- "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
93
- "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
94
- "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
95
- "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
96
- "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
97
- "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
98
- "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
99
- "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
100
- "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
101
- "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
102
- "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
103
- "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
104
- "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
105
- "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
106
- "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
107
- "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
108
- "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
109
- "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
110
- "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
111
- "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
112
- "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
113
- "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
114
- "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
115
- "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
116
- "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
117
- "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
118
- "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
119
- "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
120
- "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
121
- "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
122
- "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
123
- "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
124
- "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
125
- "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
126
- "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
127
- "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
128
- "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
129
- "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
130
- "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
131
- "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
132
- "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
133
- "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
134
- "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
135
- "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
136
- "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
137
- "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
138
- "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
139
- "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
140
- "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
141
- "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
142
- "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
143
- "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
144
- "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
145
- "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
146
- "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
147
- "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
148
- "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
149
- "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
150
- "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
151
- "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
152
- "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
153
- "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
154
- "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
155
- "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
156
- "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
157
- "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
158
- "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
159
- "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
160
- "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
161
- "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
162
- "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
163
- "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
164
- "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
165
- "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
166
- "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
167
- "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
168
- "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
169
- "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
170
- "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
171
- "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
172
- "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
173
- "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
174
- "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
175
- "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
176
- "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
177
- "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
178
- "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
179
- "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
180
- "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
181
- "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
182
- "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
183
- "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
184
- "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
185
- "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
186
- "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
187
- "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
188
- "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
189
- "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
190
- "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
191
- "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
192
- "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
193
- "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
194
- "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
195
- "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
196
- "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
197
- "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
198
- "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
199
- "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
200
- "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
201
- "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
202
- "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
203
- "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
204
- "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
205
- "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
206
- "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
207
- "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
208
- "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
209
- "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
210
- "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
211
- "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
212
- "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
213
- "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
214
- "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
215
- "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
216
- "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
217
- "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
218
- "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
219
- "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
220
- "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
221
- "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
222
- "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
223
- "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
224
- "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
225
- "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
226
- "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
227
- "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
228
- "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
229
- "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
230
- "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
231
- "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
232
- "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
233
- "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
234
- "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
235
- "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
236
- "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
237
- "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
238
- "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
239
- "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
240
- "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
241
- "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
242
- "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
243
- "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
244
- "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
245
- "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
246
- "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
247
- "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
248
- "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
249
- "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
250
- "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
251
- "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
252
- "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
253
- "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
254
- "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
255
- "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
256
- "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
257
- "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
258
- "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
259
- "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
260
- "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
261
- "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
262
- "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
263
- "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
264
- "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
265
- "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
266
- "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
267
- "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
268
- "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
269
- "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
270
- "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
271
- "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
272
- "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
273
- "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
274
- "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
275
- "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
276
- "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
277
- "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
278
- "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
279
- "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
280
- "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
281
- "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
282
- "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
283
- "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
284
- "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
285
- "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
286
- "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
287
- "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
288
- "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
289
- "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
290
- "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
291
- "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
292
- "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
293
- "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
294
- "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
295
- "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
296
- "model.norm.weight": "pytorch_model-00003-of-00003.bin"
297
- }
298
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
special_tokens_map.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "bos_token": {
3
- "content": "<s>",
4
- "lstrip": false,
5
- "normalized": false,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "</s>",
11
- "lstrip": false,
12
- "normalized": false,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
- "pad_token": "</s>",
17
- "unk_token": {
18
- "content": "<unk>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- }
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tokenizer.model DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
- size 493443
 
 
 
 
tokenizer_config.json DELETED
@@ -1,45 +0,0 @@
1
- {
2
- "add_bos_token": true,
3
- "add_eos_token": false,
4
- "add_prefix_space": true,
5
- "added_tokens_decoder": {
6
- "0": {
7
- "content": "<unk>",
8
- "lstrip": false,
9
- "normalized": false,
10
- "rstrip": false,
11
- "single_word": false,
12
- "special": true
13
- },
14
- "1": {
15
- "content": "<s>",
16
- "lstrip": false,
17
- "normalized": false,
18
- "rstrip": false,
19
- "single_word": false,
20
- "special": true
21
- },
22
- "2": {
23
- "content": "</s>",
24
- "lstrip": false,
25
- "normalized": false,
26
- "rstrip": false,
27
- "single_word": false,
28
- "special": true
29
- }
30
- },
31
- "additional_special_tokens": [],
32
- "bos_token": "<s>",
33
- "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
34
- "clean_up_tokenization_spaces": false,
35
- "eos_token": "</s>",
36
- "legacy": true,
37
- "model_max_length": 1000000000000000019884624838656,
38
- "pad_token": "</s>",
39
- "sp_model_kwargs": {},
40
- "spaces_between_special_tokens": false,
41
- "tokenizer_class": "LlamaTokenizer",
42
- "unk_token": "<unk>",
43
- "use_default_system_prompt": false,
44
- "use_fast": true
45
- }