Text Generation
Transformers
PyTorch
Safetensors
English
gpt_refact
code
custom_code
Eval Results
svakhreev commited on
Commit
8188de0
·
1 Parent(s): c39bf99

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +647 -0
README.md ADDED
@@ -0,0 +1,647 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'def print_hello_world():'
6
+ example_title: Hello world
7
+ group: Python
8
+ license: bigscience-openrail-m
9
+ pretrain-datasets:
10
+ - books
11
+ - arxiv
12
+ - c4
13
+ - falcon-refinedweb
14
+ - wiki
15
+ - github-issues
16
+ - stack_markdown
17
+ - self-made dataset of permissive github code
18
+ datasets:
19
+ - bigcode/the-stack-dedup
20
+ - rombodawg/2XUNCENSORED_MegaCodeTraining188k
21
+ - bigcode/commitpackft
22
+ metrics:
23
+ - code_eval
24
+ library_name: transformers
25
+ tags:
26
+ - code
27
+ model-index:
28
+ - name: Refact-1.6B
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ dataset:
33
+ type: openai_humaneval
34
+ name: HumanEval
35
+ metrics:
36
+ - name: pass@1 (T=0.01)
37
+ type: pass@1
38
+ value: 32.0
39
+ verified: false
40
+ - name: pass@1 (T=0.2)
41
+ type: pass@1
42
+ value: 31.5
43
+ verified: false
44
+ - name: pass@10 (T=0.8)
45
+ type: pass@10
46
+ value: 53.0
47
+ verified: false
48
+ - name: pass@100 (T=0.8)
49
+ type: pass@100
50
+ value: 76.9
51
+ verified: false
52
+ - task:
53
+ type: text-generation
54
+ dataset:
55
+ type: bigcode/humanevalpack
56
+ name: HumanEvalSynthesize Python
57
+ metrics:
58
+ - name: pass@1 (T=0.2)
59
+ type: pass@1
60
+ value: 35.8
61
+ verified: false
62
+ - task:
63
+ type: text-generation
64
+ dataset:
65
+ type: bigcode/humanevalpack
66
+ name: HumanEvalSynthesize JavaScript
67
+ metrics:
68
+ - name: pass@1 (T=0.2)
69
+ type: pass@1
70
+ value: 31.6
71
+ verified: false
72
+ - task:
73
+ type: text-generation
74
+ dataset:
75
+ type: bigcode/humanevalpack
76
+ name: HumanEvalSynthesize Java
77
+ metrics:
78
+ - name: pass@1 (T=0.2)
79
+ type: pass@1
80
+ value: 29.1
81
+ verified: false
82
+ - task:
83
+ type: text-generation
84
+ dataset:
85
+ type: bigcode/humanevalpack
86
+ name: HumanEvalSynthesize Go
87
+ metrics:
88
+ - name: pass@1 (T=0.2)
89
+ type: pass@1
90
+ value: -1
91
+ verified: false
92
+ - task:
93
+ type: text-generation
94
+ dataset:
95
+ type: bigcode/humanevalpack
96
+ name: HumanEvalSynthesize C++
97
+ metrics:
98
+ - name: pass@1 (T=0.2)
99
+ type: pass@1
100
+ value: 26.3
101
+ verified: false
102
+ - task:
103
+ type: text-generation
104
+ dataset:
105
+ type: bigcode/humanevalpack
106
+ name: HumanEvalSynthesize Rust
107
+ metrics:
108
+ - name: pass@1 (T=0.2)
109
+ type: pass@1
110
+ value: -1
111
+ verified: false
112
+ - task:
113
+ type: text-generation
114
+ dataset:
115
+ type: bigcode/humanevalpack
116
+ name: HumanEvalSynthesize Average
117
+ metrics:
118
+ - name: pass@1 (T=0.2)
119
+ type: pass@1
120
+ value: -1
121
+ verified: false
122
+
123
+
124
+
125
+
126
+
127
+ - task:
128
+ type: text-generation
129
+ dataset:
130
+ type: bigcode/humanevalpack
131
+ name: HumanEvalFix Python
132
+ metrics:
133
+ - name: pass@1 (T=0.2)
134
+ type: pass@1
135
+ value: 23.6
136
+ verified: false
137
+ - task:
138
+ type: text-generation
139
+ dataset:
140
+ type: bigcode/humanevalpack
141
+ name: HumanEvalFix JavaScript
142
+ metrics:
143
+ - name: pass@1 (T=0.2)
144
+ type: pass@1
145
+ value: -1
146
+ verified: false
147
+ - task:
148
+ type: text-generation
149
+ dataset:
150
+ type: bigcode/humanevalpack
151
+ name: HumanEvalFix Java
152
+ metrics:
153
+ - name: pass@1 (T=0.2)
154
+ type: pass@1
155
+ value: -1
156
+ verified: false
157
+ - task:
158
+ type: text-generation
159
+ dataset:
160
+ type: bigcode/humanevalpack
161
+ name: HumanEvalFix Go
162
+ metrics:
163
+ - name: pass@1 (T=0.2)
164
+ type: pass@1
165
+ value: -1
166
+ verified: false
167
+ - task:
168
+ type: text-generation
169
+ dataset:
170
+ type: bigcode/humanevalpack
171
+ name: HumanEvalFix C++
172
+ metrics:
173
+ - name: pass@1 (T=0.2)
174
+ type: pass@1
175
+ value: -1
176
+ verified: false
177
+ - task:
178
+ type: text-generation
179
+ dataset:
180
+ type: bigcode/humanevalpack
181
+ name: HumanEvalFix Rust
182
+ metrics:
183
+ - name: pass@1 (T=0.2)
184
+ type: pass@1
185
+ value: -1
186
+ verified: false
187
+ - task:
188
+ type: text-generation
189
+ dataset:
190
+ type: bigcode/humanevalpack
191
+ name: HumanEvalFix Average
192
+ metrics:
193
+ - name: pass@1 (T=0.2)
194
+ type: pass@1
195
+ value: -1
196
+ verified: false
197
+
198
+
199
+
200
+
201
+
202
+
203
+ - task:
204
+ type: text-generation
205
+ dataset:
206
+ type: bigcode/humanevalpack
207
+ name: HumanEvalExplain Python
208
+ metrics:
209
+ - name: pass@1 (T=0.2)
210
+ type: pass@1
211
+ value: -1
212
+ verified: false
213
+ - task:
214
+ type: text-generation
215
+ dataset:
216
+ type: bigcode/humanevalpack
217
+ name: HumanEvalExplain JavaScript
218
+ metrics:
219
+ - name: pass@1 (T=0.2)
220
+ type: pass@1
221
+ value: -1
222
+ verified: false
223
+ - task:
224
+ type: text-generation
225
+ dataset:
226
+ type: bigcode/humanevalpack
227
+ name: HumanEvalExplain Java
228
+ metrics:
229
+ - name: pass@1 (T=0.2)
230
+ type: pass@1
231
+ value: -1
232
+ verified: false
233
+ - task:
234
+ type: text-generation
235
+ dataset:
236
+ type: bigcode/humanevalpack
237
+ name: HumanEvalExplain Go
238
+ metrics:
239
+ - name: pass@1 (T=0.2)
240
+ type: pass@1
241
+ value: -1
242
+ verified: false
243
+ - task:
244
+ type: text-generation
245
+ dataset:
246
+ type: bigcode/humanevalpack
247
+ name: HumanEvalExplain C++
248
+ metrics:
249
+ - name: pass@1 (T=0.2)
250
+ type: pass@1
251
+ value: -1
252
+ verified: false
253
+ - task:
254
+ type: text-generation
255
+ dataset:
256
+ type: bigcode/humanevalpack
257
+ name: HumanEvalExplain Rust
258
+ metrics:
259
+ - name: pass@1 (T=0.2)
260
+ type: pass@1
261
+ value: -1
262
+ verified: false
263
+ - task:
264
+ type: text-generation
265
+ dataset:
266
+ type: bigcode/humanevalpack
267
+ name: HumanEvalExplain Average
268
+ metrics:
269
+ - name: pass@1 (T=0.2)
270
+ type: pass@1
271
+ value: -1
272
+ verified: false
273
+
274
+
275
+ - task:
276
+ type: text-generation
277
+ dataset:
278
+ type: mbpp
279
+ name: MBPP
280
+ metrics:
281
+ - name: pass@1 (T=0.01)
282
+ type: pass@1
283
+ value: 31.15
284
+ verified: false
285
+ - task:
286
+ type: text-generation
287
+ dataset:
288
+ type: ds1000
289
+ name: DS-1000 (Overall Completion)
290
+ metrics:
291
+ - name: pass@1 (T=0.2)
292
+ type: pass@1
293
+ value: -1
294
+ verified: false
295
+ - task:
296
+ type: text-generation
297
+ dataset:
298
+ type: nuprl/MultiPL-E
299
+ name: MultiPL-HumanEval (C++)
300
+ metrics:
301
+ - name: pass@1 (T=0.2)
302
+ type: pass@1
303
+ value: 21.61
304
+ verified: false
305
+ - task:
306
+ type: text-generation
307
+ dataset:
308
+ type: nuprl/MultiPL-E
309
+ name: MultiPL-HumanEval (C#)
310
+ metrics:
311
+ - name: pass@1 (T=0.2)
312
+ type: pass@1
313
+ value: 13.91
314
+ verified: false
315
+ - task:
316
+ type: text-generation
317
+ dataset:
318
+ type: nuprl/MultiPL-E
319
+ name: MultiPL-HumanEval (D)
320
+ metrics:
321
+ - name: pass@1 (T=0.2)
322
+ type: pass@1
323
+ value: 9.5
324
+ verified: false
325
+ - task:
326
+ type: text-generation
327
+ dataset:
328
+ type: nuprl/MultiPL-E
329
+ name: MultiPL-HumanEval (Go)
330
+ metrics:
331
+ - name: pass@1 (T=0.2)
332
+ type: pass@1
333
+ value: 53.57
334
+ verified: false
335
+ - task:
336
+ type: text-generation
337
+ dataset:
338
+ type: nuprl/MultiPL-E
339
+ name: MultiPL-HumanEval (Java)
340
+ metrics:
341
+ - name: pass@1 (T=0.2)
342
+ type: pass@1
343
+ value: 21.58
344
+ verified: false
345
+ - task:
346
+ type: text-generation
347
+ dataset:
348
+ type: nuprl/MultiPL-E
349
+ name: MultiPL-HumanEval (Julia)
350
+ metrics:
351
+ - name: pass@1 (T=0.2)
352
+ type: pass@1
353
+ value: 13.75
354
+ verified: false
355
+ - task:
356
+ type: text-generation
357
+ dataset:
358
+ type: nuprl/MultiPL-E
359
+ name: MultiPL-HumanEval (JavaScript)
360
+ metrics:
361
+ - name: pass@1 (T=0.2)
362
+ type: pass@1
363
+ value: 26.88
364
+ verified: false
365
+ - task:
366
+ type: text-generation
367
+ dataset:
368
+ type: nuprl/MultiPL-E
369
+ name: MultiPL-HumanEval (Lua)
370
+ metrics:
371
+ - name: pass@1 (T=0.2)
372
+ type: pass@1
373
+ value: 15.26
374
+ verified: false
375
+ - task:
376
+ type: text-generation
377
+ dataset:
378
+ type: nuprl/MultiPL-E
379
+ name: MultiPL-HumanEval (PHP)
380
+ metrics:
381
+ - name: pass@1 (T=0.2)
382
+ type: pass@1
383
+ value: 23.04
384
+ verified: false
385
+ - task:
386
+ type: text-generation
387
+ dataset:
388
+ type: nuprl/MultiPL-E
389
+ name: MultiPL-HumanEval (Perl)
390
+ metrics:
391
+ - name: pass@1 (T=0.2)
392
+ type: pass@1
393
+ value: 12.1
394
+ verified: false
395
+ - task:
396
+ type: text-generation
397
+ dataset:
398
+ type: nuprl/MultiPL-E
399
+ name: MultiPL-HumanEval (Python)
400
+ metrics:
401
+ - name: pass@1 (T=0.2)
402
+ type: pass@1
403
+ value: 29.6
404
+ verified: false
405
+ - task:
406
+ type: text-generation
407
+ dataset:
408
+ type: nuprl/MultiPL-E
409
+ name: MultiPL-HumanEval (R)
410
+ metrics:
411
+ - name: pass@1 (T=0.2)
412
+ type: pass@1
413
+ value: 13.77
414
+ verified: false
415
+ - task:
416
+ type: text-generation
417
+ dataset:
418
+ type: nuprl/MultiPL-E
419
+ name: MultiPL-HumanEval (Ruby)
420
+ metrics:
421
+ - name: pass@1 (T=0.2)
422
+ type: pass@1
423
+ value: 12.68
424
+ verified: false
425
+ - task:
426
+ type: text-generation
427
+ dataset:
428
+ type: nuprl/MultiPL-E
429
+ name: MultiPL-HumanEval (Racket)
430
+ metrics:
431
+ - name: pass@1 (T=0.2)
432
+ type: pass@1
433
+ value: 4.29
434
+ verified: false
435
+ - task:
436
+ type: text-generation
437
+ dataset:
438
+ type: nuprl/MultiPL-E
439
+ name: MultiPL-HumanEval (Rust)
440
+ metrics:
441
+ - name: pass@1 (T=0.2)
442
+ type: pass@1
443
+ value: 19.54
444
+ verified: false
445
+ - task:
446
+ type: text-generation
447
+ dataset:
448
+ type: nuprl/MultiPL-E
449
+ name: MultiPL-HumanEval (Scala)
450
+ metrics:
451
+ - name: pass@1 (T=0.2)
452
+ type: pass@1
453
+ value: -1
454
+ verified: false
455
+ - task:
456
+ type: text-generation
457
+ dataset:
458
+ type: nuprl/MultiPL-E
459
+ name: MultiPL-HumanEval (Bash)
460
+ metrics:
461
+ - name: pass@1 (T=0.2)
462
+ type: pass@1
463
+ value: 5.7
464
+ verified: false
465
+ - task:
466
+ type: text-generation
467
+ dataset:
468
+ type: nuprl/MultiPL-E
469
+ name: MultiPL-HumanEval (Swift)
470
+ metrics:
471
+ - name: pass@1 (T=0.2)
472
+ type: pass@1
473
+ value: 0.1768
474
+ verified: false
475
+ - task:
476
+ type: text-generation
477
+ dataset:
478
+ type: nuprl/MultiPL-E
479
+ name: MultiPL-HumanEval (TypeScript)
480
+ metrics:
481
+ - name: pass@1 (T=0.2)
482
+ type: pass@1
483
+ value: -1
484
+ verified: false
485
+
486
+ language:
487
+ - en
488
+ ---
489
+
490
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
491
+
492
+
493
+ # Refact-1.6B
494
+
495
+ Finally, the model we started training with our blog post
496
+ [Applying Recent Innovations](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
497
+
498
+ After fine-tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats
499
+ StarCoder ten times the size!
500
+
501
+
502
+ Model | Size | HumanEval pass@1 | HumanEval pass@10 |
503
+ ----------------------|---------------|--------------------|--------------------|
504
+ DeciCoder-1b | 1b | 19.1% | |
505
+ <b>Refact-1.6-fim</b> | <b>1.6b</b> | <b>32.0%</b> | <b>53.0%</b> |
506
+ StableCode | 3b | 20.2% | 33.8% |
507
+ ReplitCode v1 | 3b | 21.9% | |
508
+ CodeLlama | 7b | 33.5% | 59.6% |
509
+ StarCoder | 15b | 33.6% | |
510
+
511
+ Likely, it's the best model for practical use in your IDE for code completion because it's smart and fast!
512
+ You can start using it right now by downloading the
513
+ [Refact plugin](https://refact.ai/). You can host the model yourself, too, using the
514
+ [open source docker container](https://github.com/smallcloudai/refact).
515
+
516
+ And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
517
+
518
+
519
+ # Architecture
520
+
521
+ As described in more detail in the blog post, we used:
522
+
523
+ - [ALiBi](https://arxiv.org/abs/2108.12409) based attention
524
+ - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
525
+ - [Multi Query Attention](https://arxiv.org/abs/1911.02150)
526
+
527
+ We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
528
+
529
+
530
+ # Pretraining
531
+
532
+ For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
533
+ Filtering is the key to success of this model:
534
+
535
+ - We only used text in English
536
+ - Only topics related to computer science
537
+ - Applied heavy deduplication
538
+
539
+ The text to code proportion was 50:50, model trained for 1.2T tokens.
540
+
541
+ We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
542
+ its practical use is limited. But if you still want it, write us a message on discord.
543
+
544
+
545
+ # Finetuning
546
+
547
+ We tested our hypothesis that chat data should boost base model performance in FIM and
548
+ regular left-to-right code completion. We found that just 15% of open
549
+ [code](https://huggingface.co/datasets/bigcode/commitpackft)
550
+ [instruction-following](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) datasets,
551
+ that we filtered for quality, improves almost all metrics.
552
+
553
+ Additionally, to improve FIM, we observed common failure modes, and prepared a synthetic dataset based on
554
+ [The Stack dedup v1.1](https://huggingface.co/datasets/bigcode/the-stack-dedup) to address them.
555
+
556
+ There is a distribution shift between typical code on the internet, and the code you write in your IDE.
557
+ The former is likely finished, so the model tries to come up with a suggestion that makes the code complete.
558
+ You are likely to have half-written code as you work on it, there is no single addition that can repair it
559
+ fully.
560
+
561
+ In practice, model needs to have a tendency to stop after a couple of lines added, and sometimes don't write
562
+ anything at all. We found that just giving it empty completions, single line completions, multiline
563
+ completions that end with a smaller text indent or at least a newline -- makes it much more usable. This data
564
+ was used as the rest 85% of the finetune dataset.
565
+
566
+ The final model is the result of several attempts to make it work as good as possible for code completion,
567
+ and to perform well on a wide range of metrics. The best attempt took 40B tokens.
568
+
569
+
570
+ # Limitations and Bias
571
+
572
+ The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
573
+ code comments. Its performance on non-English languages is lower, for sure.
574
+
575
+
576
+ # It Works As a Chat
577
+
578
+ The primary application of this model is code completion (infill) in multiple programming languages.
579
+ But it works as a chat quite well.
580
+
581
+ HumanEval results using instruction following (chat) format, against models specialized for chat only:
582
+
583
+ Model | Size | pass@1 | pass@10 |
584
+ -----------------------|--------|----------|----------|
585
+ <b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
586
+ StableCode-instruct | 3b | 26.9% | 36.2% |
587
+ OctoGeeX | 6b | 44.7% | |
588
+ CodeLlama-instruct | 7b | 34.8% | 64.3% |
589
+ CodeLlama-instruct | 13b | 42.7% | 71.6% |
590
+ StarChat-β | 15b | 33.5% | |
591
+ OctoCoder | 15b | 46.2% | |
592
+
593
+
594
+ # Example
595
+
596
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
597
+
598
+ ```python
599
+ # pip install -q transformers
600
+ from transformers import AutoModelForCausalLM, AutoTokenizer
601
+
602
+ checkpoint = "smallcloudai/Refact-1.6B-fim"
603
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
604
+
605
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
606
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
607
+
608
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
609
+
610
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
611
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
612
+ print("-"*80)
613
+ print(tokenizer.decode(outputs[0]))
614
+ ```
615
+
616
+ # Chat Format
617
+
618
+ The same model works as chat (experimental).
619
+
620
+ ```python
621
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
622
+ "<empty_output>USER {query}\n" \
623
+ "<empty_output>ASSISTANT"
624
+ prompt = prompt_template.format(system="You are a programming assistant",
625
+ query="How do I sort a list in Python?")
626
+ ```
627
+
628
+ # Model Stats
629
+
630
+ - **Architecture:** LLAMA-like model with multi-query attention
631
+ - **Objectives** Fill-in-the-Middle, Chat
632
+ - **Tokens context:** 4096
633
+ - **Pretraining tokens:** 1.2T
634
+ - **Finetuning tokens:** 40B
635
+ - **Precision:** bfloat16
636
+ - **GPUs** 64 NVidia A5000
637
+ - **Training time** 28 days
638
+
639
+
640
+ # License
641
+
642
+ The model is licensed under the BigScience OpenRAIL-M v1 license agreement
643
+
644
+
645
+ # Citation
646
+
647
+ If you are using this model, please give a link to this page.