amezasor commited on
Commit
4ba1c97
·
verified ·
1 Parent(s): 0840c2a

instruct model - initial commit

Browse files
Files changed (1) hide show
  1. README.md +315 -3
README.md CHANGED
@@ -1,3 +1,315 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ # datasets:
6
+ # metrics:
7
+ # - code_eval
8
+ library_name: transformers
9
+ tags:
10
+ - language
11
+ - granite-3.0
12
+ model-index:
13
+ - name: granite-3.0-3b-a800m-instruct
14
+ results:
15
+ - task:
16
+ type: text-generation
17
+ dataset:
18
+ type: human-exams
19
+ name: MMLU
20
+ metrics:
21
+ - name: pass@1
22
+ type: pass@1
23
+ value:
24
+ veriefied: false
25
+ - task:
26
+ type: text-generation
27
+ dataset:
28
+ type: human-exams
29
+ name: MMLU-Pro
30
+ metrics:
31
+ - name: pass@1
32
+ type: pass@1
33
+ value:
34
+ veriefied: false
35
+ - task:
36
+ type: text-generation
37
+ dataset:
38
+ type: human-exams
39
+ name: AGI-Eval
40
+ metrics:
41
+ - name: pass@1
42
+ type: pass@1
43
+ value:
44
+ veriefied: false
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ type: commonsense
49
+ name: WinoGrande
50
+ metrics:
51
+ - name: pass@1
52
+ type: pass@1
53
+ value:
54
+ veriefied: false
55
+ - task:
56
+ type: text-generation
57
+ dataset:
58
+ type: commonsense
59
+ name: OBQA
60
+ metrics:
61
+ - name: pass@1
62
+ type: pass@1
63
+ value:
64
+ veriefied: false
65
+ - task:
66
+ type: text-generation
67
+ dataset:
68
+ type: commonsense
69
+ name: SIQA
70
+ metrics:
71
+ - name: pass@1
72
+ type: pass@1
73
+ value:
74
+ veriefied: false
75
+ - task:
76
+ type: text-generation
77
+ dataset:
78
+ type: commonsense
79
+ name: PIQA
80
+ metrics:
81
+ - name: pass@1
82
+ type: pass@1
83
+ value:
84
+ veriefied: false
85
+ - task:
86
+ type: text-generation
87
+ dataset:
88
+ type: commonsense
89
+ name: Hellaswag
90
+ metrics:
91
+ - name: pass@1
92
+ type: pass@1
93
+ value:
94
+ veriefied: false
95
+ - task:
96
+ type: text-generation
97
+ dataset:
98
+ type: commonsense
99
+ name: TruthfulQA
100
+ metrics:
101
+ - name: pass@1
102
+ type: pass@1
103
+ value:
104
+ veriefied: false
105
+ - task:
106
+ type: text-generation
107
+ dataset:
108
+ type: reading-comprehension
109
+ name: BoolQ
110
+ metrics:
111
+ - name: pass@1
112
+ type: pass@1
113
+ value:
114
+ veriefied: false
115
+ - task:
116
+ type: text-generation
117
+ dataset:
118
+ type: reading-comprehension
119
+ name: SQuAD v2
120
+ metrics:
121
+ - name: pass@1
122
+ type: pass@1
123
+ value:
124
+ veriefied: false
125
+ - task:
126
+ type: text-generation
127
+ dataset:
128
+ type: reasoning
129
+ name: ARC-C
130
+ metrics:
131
+ - name: pass@1
132
+ type: pass@1
133
+ value:
134
+ veriefied: false
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ type: reasoning
139
+ name: GPQA
140
+ metrics:
141
+ - name: pass@1
142
+ type: pass@1
143
+ value:
144
+ veriefied: false
145
+ - task:
146
+ type: text-generation
147
+ dataset:
148
+ type: reasoning
149
+ name: BBH
150
+ metrics:
151
+ - name: pass@1
152
+ type: pass@1
153
+ value:
154
+ veriefied: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: code
159
+ name: HumanEval
160
+ metrics:
161
+ - name: pass@1
162
+ type: pass@1
163
+ value:
164
+ veriefied: false
165
+ - task:
166
+ type: text-generation
167
+ dataset:
168
+ type: code
169
+ name: MBPP
170
+ metrics:
171
+ - name: pass@1
172
+ type: pass@1
173
+ value:
174
+ veriefied: false
175
+ - task:
176
+ type: text-generation
177
+ dataset:
178
+ type: math
179
+ name: GSM8K
180
+ metrics:
181
+ - name: pass@1
182
+ type: pass@1
183
+ value:
184
+ veriefied: false
185
+ - task:
186
+ type: text-generation
187
+ dataset:
188
+ type: math
189
+ name: MATH
190
+ metrics:
191
+ - name: pass@1
192
+ type: pass@1
193
+ value:
194
+ veriefied: false
195
+ - task:
196
+ type: text-generation
197
+ dataset:
198
+ type: multilingual
199
+ name: MGSM
200
+ metrics:
201
+ - name: pass@1
202
+ type: pass@1
203
+ value:
204
+ veriefied: false
205
+ ---
206
+
207
+ <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
208
+
209
+ # Granite-3.0-3B-A800M-Instruct
210
+
211
+ ## Model Summary
212
+ **Granite-3.0-3B-A800M-Instruct** is a lightweight and open-source 3B parameter model fine tuned from *Granite-3.0-3B-A800M-Base-4K* on a combination of open-source and proprietary instruction data with a **permissively licensed**. This language model is designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, funcion-calling, and more.
213
+
214
+ - **Developers:** IBM Research
215
+ - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
216
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
217
+ - **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo link when it is ready -->
218
+ - **Release Date**: October 21st, 2024
219
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
220
+
221
+ ## Supported Languages
222
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
223
+
224
+ ## Usage
225
+ ### Intended use
226
+ The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including bussiness applications.
227
+
228
+ ### Capabilities
229
+ * Summarization
230
+ * Text classification
231
+ * Text extraction
232
+ * Question-answering
233
+ * Retrieval Augmented Generation (RAG)
234
+ * Code related
235
+ * Function-calling
236
+ * Multilingual dialog use cases
237
+
238
+ ### Generation
239
+ This is a simple example of how to use **Granite-3.0-3B-A800M-Instruct** model.
240
+
241
+ Install the following libraries:
242
+
243
+ ```shell
244
+ pip install torch torchvision torchaudio
245
+ pip install accelerate
246
+ pip install transformers
247
+ ```
248
+ Then, copy the snippet from the section that is relevant for your usecase.
249
+
250
+ ```python
251
+ import torch
252
+ from transformers import AutoModelForCausalLM, AutoTokenizer
253
+
254
+ device = "auto"
255
+ model_path = "ibm-granite/granite-3.0-3b-a800m-instruct"
256
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
257
+ # drop device_map if running on CPU
258
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
259
+ model.eval()
260
+ # change input text as desired
261
+ chat = [
262
+ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
263
+ ]
264
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
265
+ # tokenize the text
266
+ input_tokens = tokenizer(chat, return_tensors="pt").to(device)
267
+ # generate output tokens
268
+ output = model.generate(**input_tokens,
269
+ max_new_tokens=100)
270
+ # decode output tokens into text
271
+ output = tokenizer.batch_decode(output)
272
+ # print output
273
+ print(output)
274
+ ```
275
+
276
+ <!-- TO DO: function-calling-example
277
+ -->
278
+
279
+ <!-- ['<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>1. IBM Research - Almaden, San Jose, California<|end_of_text|>'] -->
280
+
281
+ ## Model Architeture
282
+ **Granite-3.0-3B-A800M-Instruct** is based on a decoder-only sparse Mixture of Experts(MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
283
+
284
+ | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
285
+ | :-------- | :--------| :--------| :--------| :-------- |
286
+ | Embedding size | 2048 | 4096 | 1024 | **1536** |
287
+ | Number of layers | 40 | 40 | 24 | **32** |
288
+ | Attention head size | 64 | 128 | 64 | **64** |
289
+ | Number of attention heads | 32 | 32 | 16 | **24** |
290
+ | Number of KV heads | 8 | 8 | 8 | **8** |
291
+ | MLP hidden size | 8192 | 12800 | 512 | **512** |
292
+ | MLP activation | SwiGLU | SwiGLU | SwiGLU | **SwiGLU** |
293
+ | Number of Experts | — | — | 32 | **40** |
294
+ | MoE TopK | — | — | 8 | **8** |
295
+ | Initialization std | 0.1 | 0.1 | 0.1 | **0.1** |
296
+ | Sequence Length | 4096 | 4096 | 4096 | **4096** |
297
+ | Position Embedding | RoPE | RoPE | RoPE | **RoPE** |
298
+ | # Paremeters | 2.5B | 8.1B | 1.3B | **3.3B** |
299
+ | # Active Parameters | 2.5B | 8.1B | 400M | **800M** |
300
+ | # Training tokens | 12T | 12T | 10T | **10T** |
301
+
302
+ <!-- TO DO: To be completed once the paper is ready, we may changed title to Supervised Finetuning -->
303
+ ## Training Data
304
+ This model is trained on a mix of open-source and proprietary datasets.
305
+ <!-- ### Instruction Datasets
306
+ * Language Instruction Datasets: We include high-quality datasets such as [TO DO: List of datasets]
307
+ * Synthetic Instruction Datasets: [TO DO: paragraph about synthetic data]
308
+ ### Processing
309
+ * [TO DO: Data annotation with MagPie pipeline: quality, duplicates] -->
310
+
311
+ ## Infrastructure
312
+ We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
313
+
314
+ ## Ethical Considerations and Limitations
315
+ Granite instruct models are primarily finetuned using instruction-response pairs mostly in English, but also in German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese (Simplified). As this model has been exposed to multilingual data, it can handle multilingual dialog use cases with a limited performance in non-English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-3.0-3B-A800M-Base-4K](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base)* model card.