Text Generation
English
Eval Results
d-matrix-user commited on
Commit
3434f81
1 Parent(s): 4b07dd5

commit tokenizer

Browse files
BASELINE.yaml ADDED
@@ -0,0 +1,447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ lm_head:
3
+ accum_format: SAME
4
+ approximation_function: NONE
5
+ input_format: SAME
6
+ instance: Linear
7
+ output_format: SAME
8
+ weight_format: SAME
9
+ weight_sparseness: DENSE
10
+ transformer.drop:
11
+ approximation_function: NONE
12
+ input_format: SAME
13
+ instance: Dropout
14
+ output_format: SAME
15
+ transformer.h.0.attn.attn_dropout:
16
+ approximation_function: NONE
17
+ input_format: SAME
18
+ instance: Dropout
19
+ output_format: SAME
20
+ transformer.h.0.attn.c_attn:
21
+ approximation_function: NONE
22
+ bias_format: SAME
23
+ input_format: SAME
24
+ instance: HFTransformersConv1D
25
+ output_format: SAME
26
+ weight_format: SAME
27
+ weight_sparseness: DENSE
28
+ transformer.h.0.attn.c_proj:
29
+ approximation_function: NONE
30
+ bias_format: SAME
31
+ input_format: SAME
32
+ instance: HFTransformersConv1D
33
+ output_format: SAME
34
+ weight_format: SAME
35
+ weight_sparseness: DENSE
36
+ transformer.h.0.attn.resid_dropout:
37
+ approximation_function: NONE
38
+ input_format: SAME
39
+ instance: Dropout
40
+ output_format: SAME
41
+ transformer.h.0.attn.softmax:
42
+ approximation_function: NONE
43
+ input_format: SAME
44
+ instance: Softmax
45
+ output_format: SAME
46
+ transformer.h.0.ln_1:
47
+ approximation_function: NONE
48
+ bias_format: SAME
49
+ input_format: SAME
50
+ instance: LayerNorm
51
+ output_format: SAME
52
+ weight_format: SAME
53
+ transformer.h.0.ln_2:
54
+ approximation_function: NONE
55
+ bias_format: SAME
56
+ input_format: SAME
57
+ instance: LayerNorm
58
+ output_format: SAME
59
+ weight_format: SAME
60
+ transformer.h.0.mlp.act:
61
+ approximation_function: NONE
62
+ input_format: SAME
63
+ instance: GELU
64
+ output_format: SAME
65
+ transformer.h.0.mlp.c_fc:
66
+ approximation_function: NONE
67
+ bias_format: SAME
68
+ input_format: SAME
69
+ instance: HFTransformersConv1D
70
+ output_format: SAME
71
+ weight_format: SAME
72
+ weight_sparseness: DENSE
73
+ transformer.h.0.mlp.c_proj:
74
+ approximation_function: NONE
75
+ bias_format: SAME
76
+ input_format: SAME
77
+ instance: HFTransformersConv1D
78
+ output_format: SAME
79
+ weight_format: SAME
80
+ weight_sparseness: DENSE
81
+ transformer.h.0.mlp.dropout:
82
+ approximation_function: NONE
83
+ input_format: SAME
84
+ instance: Dropout
85
+ output_format: SAME
86
+ transformer.h.1.attn.attn_dropout:
87
+ approximation_function: NONE
88
+ input_format: SAME
89
+ instance: Dropout
90
+ output_format: SAME
91
+ transformer.h.1.attn.c_attn:
92
+ approximation_function: NONE
93
+ bias_format: SAME
94
+ input_format: SAME
95
+ instance: HFTransformersConv1D
96
+ output_format: SAME
97
+ weight_format: SAME
98
+ weight_sparseness: DENSE
99
+ transformer.h.1.attn.c_proj:
100
+ approximation_function: NONE
101
+ bias_format: SAME
102
+ input_format: SAME
103
+ instance: HFTransformersConv1D
104
+ output_format: SAME
105
+ weight_format: SAME
106
+ weight_sparseness: DENSE
107
+ transformer.h.1.attn.resid_dropout:
108
+ approximation_function: NONE
109
+ input_format: SAME
110
+ instance: Dropout
111
+ output_format: SAME
112
+ transformer.h.1.attn.softmax:
113
+ approximation_function: NONE
114
+ input_format: SAME
115
+ instance: Softmax
116
+ output_format: SAME
117
+ transformer.h.1.ln_1:
118
+ approximation_function: NONE
119
+ bias_format: SAME
120
+ input_format: SAME
121
+ instance: LayerNorm
122
+ output_format: SAME
123
+ weight_format: SAME
124
+ transformer.h.1.ln_2:
125
+ approximation_function: NONE
126
+ bias_format: SAME
127
+ input_format: SAME
128
+ instance: LayerNorm
129
+ output_format: SAME
130
+ weight_format: SAME
131
+ transformer.h.1.mlp.act:
132
+ approximation_function: NONE
133
+ input_format: SAME
134
+ instance: GELU
135
+ output_format: SAME
136
+ transformer.h.1.mlp.c_fc:
137
+ approximation_function: NONE
138
+ bias_format: SAME
139
+ input_format: SAME
140
+ instance: HFTransformersConv1D
141
+ output_format: SAME
142
+ weight_format: SAME
143
+ weight_sparseness: DENSE
144
+ transformer.h.1.mlp.c_proj:
145
+ approximation_function: NONE
146
+ bias_format: SAME
147
+ input_format: SAME
148
+ instance: HFTransformersConv1D
149
+ output_format: SAME
150
+ weight_format: SAME
151
+ weight_sparseness: DENSE
152
+ transformer.h.1.mlp.dropout:
153
+ approximation_function: NONE
154
+ input_format: SAME
155
+ instance: Dropout
156
+ output_format: SAME
157
+ transformer.h.2.attn.attn_dropout:
158
+ approximation_function: NONE
159
+ input_format: SAME
160
+ instance: Dropout
161
+ output_format: SAME
162
+ transformer.h.2.attn.c_attn:
163
+ approximation_function: NONE
164
+ bias_format: SAME
165
+ input_format: SAME
166
+ instance: HFTransformersConv1D
167
+ output_format: SAME
168
+ weight_format: SAME
169
+ weight_sparseness: DENSE
170
+ transformer.h.2.attn.c_proj:
171
+ approximation_function: NONE
172
+ bias_format: SAME
173
+ input_format: SAME
174
+ instance: HFTransformersConv1D
175
+ output_format: SAME
176
+ weight_format: SAME
177
+ weight_sparseness: DENSE
178
+ transformer.h.2.attn.resid_dropout:
179
+ approximation_function: NONE
180
+ input_format: SAME
181
+ instance: Dropout
182
+ output_format: SAME
183
+ transformer.h.2.attn.softmax:
184
+ approximation_function: NONE
185
+ input_format: SAME
186
+ instance: Softmax
187
+ output_format: SAME
188
+ transformer.h.2.ln_1:
189
+ approximation_function: NONE
190
+ bias_format: SAME
191
+ input_format: SAME
192
+ instance: LayerNorm
193
+ output_format: SAME
194
+ weight_format: SAME
195
+ transformer.h.2.ln_2:
196
+ approximation_function: NONE
197
+ bias_format: SAME
198
+ input_format: SAME
199
+ instance: LayerNorm
200
+ output_format: SAME
201
+ weight_format: SAME
202
+ transformer.h.2.mlp.act:
203
+ approximation_function: NONE
204
+ input_format: SAME
205
+ instance: GELU
206
+ output_format: SAME
207
+ transformer.h.2.mlp.c_fc:
208
+ approximation_function: NONE
209
+ bias_format: SAME
210
+ input_format: SAME
211
+ instance: HFTransformersConv1D
212
+ output_format: SAME
213
+ weight_format: SAME
214
+ weight_sparseness: DENSE
215
+ transformer.h.2.mlp.c_proj:
216
+ approximation_function: NONE
217
+ bias_format: SAME
218
+ input_format: SAME
219
+ instance: HFTransformersConv1D
220
+ output_format: SAME
221
+ weight_format: SAME
222
+ weight_sparseness: DENSE
223
+ transformer.h.2.mlp.dropout:
224
+ approximation_function: NONE
225
+ input_format: SAME
226
+ instance: Dropout
227
+ output_format: SAME
228
+ transformer.h.3.attn.attn_dropout:
229
+ approximation_function: NONE
230
+ input_format: SAME
231
+ instance: Dropout
232
+ output_format: SAME
233
+ transformer.h.3.attn.c_attn:
234
+ approximation_function: NONE
235
+ bias_format: SAME
236
+ input_format: SAME
237
+ instance: HFTransformersConv1D
238
+ output_format: SAME
239
+ weight_format: SAME
240
+ weight_sparseness: DENSE
241
+ transformer.h.3.attn.c_proj:
242
+ approximation_function: NONE
243
+ bias_format: SAME
244
+ input_format: SAME
245
+ instance: HFTransformersConv1D
246
+ output_format: SAME
247
+ weight_format: SAME
248
+ weight_sparseness: DENSE
249
+ transformer.h.3.attn.resid_dropout:
250
+ approximation_function: NONE
251
+ input_format: SAME
252
+ instance: Dropout
253
+ output_format: SAME
254
+ transformer.h.3.attn.softmax:
255
+ approximation_function: NONE
256
+ input_format: SAME
257
+ instance: Softmax
258
+ output_format: SAME
259
+ transformer.h.3.ln_1:
260
+ approximation_function: NONE
261
+ bias_format: SAME
262
+ input_format: SAME
263
+ instance: LayerNorm
264
+ output_format: SAME
265
+ weight_format: SAME
266
+ transformer.h.3.ln_2:
267
+ approximation_function: NONE
268
+ bias_format: SAME
269
+ input_format: SAME
270
+ instance: LayerNorm
271
+ output_format: SAME
272
+ weight_format: SAME
273
+ transformer.h.3.mlp.act:
274
+ approximation_function: NONE
275
+ input_format: SAME
276
+ instance: GELU
277
+ output_format: SAME
278
+ transformer.h.3.mlp.c_fc:
279
+ approximation_function: NONE
280
+ bias_format: SAME
281
+ input_format: SAME
282
+ instance: HFTransformersConv1D
283
+ output_format: SAME
284
+ weight_format: SAME
285
+ weight_sparseness: DENSE
286
+ transformer.h.3.mlp.c_proj:
287
+ approximation_function: NONE
288
+ bias_format: SAME
289
+ input_format: SAME
290
+ instance: HFTransformersConv1D
291
+ output_format: SAME
292
+ weight_format: SAME
293
+ weight_sparseness: DENSE
294
+ transformer.h.3.mlp.dropout:
295
+ approximation_function: NONE
296
+ input_format: SAME
297
+ instance: Dropout
298
+ output_format: SAME
299
+ transformer.h.4.attn.attn_dropout:
300
+ approximation_function: NONE
301
+ input_format: SAME
302
+ instance: Dropout
303
+ output_format: SAME
304
+ transformer.h.4.attn.c_attn:
305
+ approximation_function: NONE
306
+ bias_format: SAME
307
+ input_format: SAME
308
+ instance: HFTransformersConv1D
309
+ output_format: SAME
310
+ weight_format: SAME
311
+ weight_sparseness: DENSE
312
+ transformer.h.4.attn.c_proj:
313
+ approximation_function: NONE
314
+ bias_format: SAME
315
+ input_format: SAME
316
+ instance: HFTransformersConv1D
317
+ output_format: SAME
318
+ weight_format: SAME
319
+ weight_sparseness: DENSE
320
+ transformer.h.4.attn.resid_dropout:
321
+ approximation_function: NONE
322
+ input_format: SAME
323
+ instance: Dropout
324
+ output_format: SAME
325
+ transformer.h.4.attn.softmax:
326
+ approximation_function: NONE
327
+ input_format: SAME
328
+ instance: Softmax
329
+ output_format: SAME
330
+ transformer.h.4.ln_1:
331
+ approximation_function: NONE
332
+ bias_format: SAME
333
+ input_format: SAME
334
+ instance: LayerNorm
335
+ output_format: SAME
336
+ weight_format: SAME
337
+ transformer.h.4.ln_2:
338
+ approximation_function: NONE
339
+ bias_format: SAME
340
+ input_format: SAME
341
+ instance: LayerNorm
342
+ output_format: SAME
343
+ weight_format: SAME
344
+ transformer.h.4.mlp.act:
345
+ approximation_function: NONE
346
+ input_format: SAME
347
+ instance: GELU
348
+ output_format: SAME
349
+ transformer.h.4.mlp.c_fc:
350
+ approximation_function: NONE
351
+ bias_format: SAME
352
+ input_format: SAME
353
+ instance: HFTransformersConv1D
354
+ output_format: SAME
355
+ weight_format: SAME
356
+ weight_sparseness: DENSE
357
+ transformer.h.4.mlp.c_proj:
358
+ approximation_function: NONE
359
+ bias_format: SAME
360
+ input_format: SAME
361
+ instance: HFTransformersConv1D
362
+ output_format: SAME
363
+ weight_format: SAME
364
+ weight_sparseness: DENSE
365
+ transformer.h.4.mlp.dropout:
366
+ approximation_function: NONE
367
+ input_format: SAME
368
+ instance: Dropout
369
+ output_format: SAME
370
+ transformer.h.5.attn.attn_dropout:
371
+ approximation_function: NONE
372
+ input_format: SAME
373
+ instance: Dropout
374
+ output_format: SAME
375
+ transformer.h.5.attn.c_attn:
376
+ approximation_function: NONE
377
+ bias_format: SAME
378
+ input_format: SAME
379
+ instance: HFTransformersConv1D
380
+ output_format: SAME
381
+ weight_format: SAME
382
+ weight_sparseness: DENSE
383
+ transformer.h.5.attn.c_proj:
384
+ approximation_function: NONE
385
+ bias_format: SAME
386
+ input_format: SAME
387
+ instance: HFTransformersConv1D
388
+ output_format: SAME
389
+ weight_format: SAME
390
+ weight_sparseness: DENSE
391
+ transformer.h.5.attn.resid_dropout:
392
+ approximation_function: NONE
393
+ input_format: SAME
394
+ instance: Dropout
395
+ output_format: SAME
396
+ transformer.h.5.attn.softmax:
397
+ approximation_function: NONE
398
+ input_format: SAME
399
+ instance: Softmax
400
+ output_format: SAME
401
+ transformer.h.5.ln_1:
402
+ approximation_function: NONE
403
+ bias_format: SAME
404
+ input_format: SAME
405
+ instance: LayerNorm
406
+ output_format: SAME
407
+ weight_format: SAME
408
+ transformer.h.5.ln_2:
409
+ approximation_function: NONE
410
+ bias_format: SAME
411
+ input_format: SAME
412
+ instance: LayerNorm
413
+ output_format: SAME
414
+ weight_format: SAME
415
+ transformer.h.5.mlp.act:
416
+ approximation_function: NONE
417
+ input_format: SAME
418
+ instance: GELU
419
+ output_format: SAME
420
+ transformer.h.5.mlp.c_fc:
421
+ approximation_function: NONE
422
+ bias_format: SAME
423
+ input_format: SAME
424
+ instance: HFTransformersConv1D
425
+ output_format: SAME
426
+ weight_format: SAME
427
+ weight_sparseness: DENSE
428
+ transformer.h.5.mlp.c_proj:
429
+ approximation_function: NONE
430
+ bias_format: SAME
431
+ input_format: SAME
432
+ instance: HFTransformersConv1D
433
+ output_format: SAME
434
+ weight_format: SAME
435
+ weight_sparseness: DENSE
436
+ transformer.h.5.mlp.dropout:
437
+ approximation_function: NONE
438
+ input_format: SAME
439
+ instance: Dropout
440
+ output_format: SAME
441
+ transformer.ln_f:
442
+ approximation_function: NONE
443
+ bias_format: SAME
444
+ input_format: SAME
445
+ instance: LayerNorm
446
+ output_format: SAME
447
+ weight_format: SAME
FALLBACK.yaml ADDED
@@ -0,0 +1,447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ lm_head:
3
+ accum_format: SAME
4
+ approximation_function: NONE
5
+ input_format: SAME
6
+ instance: Linear
7
+ output_format: SAME
8
+ weight_format: SAME
9
+ weight_sparseness: DENSE
10
+ transformer.drop:
11
+ approximation_function: NONE
12
+ input_format: SAME
13
+ instance: Dropout
14
+ output_format: SAME
15
+ transformer.h.0.attn.attn_dropout:
16
+ approximation_function: NONE
17
+ input_format: SAME
18
+ instance: Dropout
19
+ output_format: BFP[8|8]{64,-1}(SN)
20
+ transformer.h.0.attn.c_attn:
21
+ approximation_function: NONE
22
+ bias_format: SAME
23
+ input_format: BFP[8|8]{64,-1}(SN)
24
+ instance: HFTransformersConv1D
25
+ output_format: BFP[8|8]{64,-1}(SN)
26
+ weight_format: BFP[8|8]{64,0}(SN)
27
+ weight_sparseness: DENSE
28
+ transformer.h.0.attn.c_proj:
29
+ approximation_function: NONE
30
+ bias_format: SAME
31
+ input_format: BFP[8|8]{64,-1}(SN)
32
+ instance: HFTransformersConv1D
33
+ output_format: SAME
34
+ weight_format: BFP[8|8]{64,0}(SN)
35
+ weight_sparseness: DENSE
36
+ transformer.h.0.attn.resid_dropout:
37
+ approximation_function: NONE
38
+ input_format: SAME
39
+ instance: Dropout
40
+ output_format: SAME
41
+ transformer.h.0.attn.softmax:
42
+ approximation_function: SOFTMAX(base2,float16)
43
+ input_format: SAME
44
+ instance: Softmax
45
+ output_format: SAME
46
+ transformer.h.0.ln_1:
47
+ approximation_function: LAYERNORM(fallback,4,float16)
48
+ bias_format: SAME
49
+ input_format: SAME
50
+ instance: LayerNorm
51
+ output_format: SAME
52
+ weight_format: SAME
53
+ transformer.h.0.ln_2:
54
+ approximation_function: LAYERNORM(fallback,4,float16)
55
+ bias_format: SAME
56
+ input_format: SAME
57
+ instance: LayerNorm
58
+ output_format: SAME
59
+ weight_format: SAME
60
+ transformer.h.0.mlp.act:
61
+ approximation_function: GELU(vsimd)
62
+ input_format: SAME
63
+ instance: GELU
64
+ output_format: SAME
65
+ transformer.h.0.mlp.c_fc:
66
+ approximation_function: NONE
67
+ bias_format: SAME
68
+ input_format: BFP[8|8]{64,-1}(SN)
69
+ instance: HFTransformersConv1D
70
+ output_format: SAME
71
+ weight_format: BFP[8|8]{64,0}(SN)
72
+ weight_sparseness: DENSE
73
+ transformer.h.0.mlp.c_proj:
74
+ approximation_function: NONE
75
+ bias_format: SAME
76
+ input_format: BFP[8|8]{64,-1}(SN)
77
+ instance: HFTransformersConv1D
78
+ output_format: SAME
79
+ weight_format: BFP[8|8]{64,0}(SN)
80
+ weight_sparseness: DENSE
81
+ transformer.h.0.mlp.dropout:
82
+ approximation_function: NONE
83
+ input_format: SAME
84
+ instance: Dropout
85
+ output_format: SAME
86
+ transformer.h.1.attn.attn_dropout:
87
+ approximation_function: NONE
88
+ input_format: SAME
89
+ instance: Dropout
90
+ output_format: BFP[8|8]{64,-1}(SN)
91
+ transformer.h.1.attn.c_attn:
92
+ approximation_function: NONE
93
+ bias_format: SAME
94
+ input_format: BFP[8|8]{64,-1}(SN)
95
+ instance: HFTransformersConv1D
96
+ output_format: BFP[8|8]{64,-1}(SN)
97
+ weight_format: BFP[8|8]{64,0}(SN)
98
+ weight_sparseness: DENSE
99
+ transformer.h.1.attn.c_proj:
100
+ approximation_function: NONE
101
+ bias_format: SAME
102
+ input_format: BFP[8|8]{64,-1}(SN)
103
+ instance: HFTransformersConv1D
104
+ output_format: SAME
105
+ weight_format: BFP[8|8]{64,0}(SN)
106
+ weight_sparseness: DENSE
107
+ transformer.h.1.attn.resid_dropout:
108
+ approximation_function: NONE
109
+ input_format: SAME
110
+ instance: Dropout
111
+ output_format: SAME
112
+ transformer.h.1.attn.softmax:
113
+ approximation_function: SOFTMAX(base2,float16)
114
+ input_format: SAME
115
+ instance: Softmax
116
+ output_format: SAME
117
+ transformer.h.1.ln_1:
118
+ approximation_function: LAYERNORM(fallback,4,float16)
119
+ bias_format: SAME
120
+ input_format: SAME
121
+ instance: LayerNorm
122
+ output_format: SAME
123
+ weight_format: SAME
124
+ transformer.h.1.ln_2:
125
+ approximation_function: LAYERNORM(fallback,4,float16)
126
+ bias_format: SAME
127
+ input_format: SAME
128
+ instance: LayerNorm
129
+ output_format: SAME
130
+ weight_format: SAME
131
+ transformer.h.1.mlp.act:
132
+ approximation_function: GELU(vsimd)
133
+ input_format: SAME
134
+ instance: GELU
135
+ output_format: SAME
136
+ transformer.h.1.mlp.c_fc:
137
+ approximation_function: NONE
138
+ bias_format: SAME
139
+ input_format: BFP[8|8]{64,-1}(SN)
140
+ instance: HFTransformersConv1D
141
+ output_format: SAME
142
+ weight_format: BFP[8|8]{64,0}(SN)
143
+ weight_sparseness: DENSE
144
+ transformer.h.1.mlp.c_proj:
145
+ approximation_function: NONE
146
+ bias_format: SAME
147
+ input_format: BFP[8|8]{64,-1}(SN)
148
+ instance: HFTransformersConv1D
149
+ output_format: SAME
150
+ weight_format: BFP[8|8]{64,0}(SN)
151
+ weight_sparseness: DENSE
152
+ transformer.h.1.mlp.dropout:
153
+ approximation_function: NONE
154
+ input_format: SAME
155
+ instance: Dropout
156
+ output_format: SAME
157
+ transformer.h.2.attn.attn_dropout:
158
+ approximation_function: NONE
159
+ input_format: SAME
160
+ instance: Dropout
161
+ output_format: BFP[8|8]{64,-1}(SN)
162
+ transformer.h.2.attn.c_attn:
163
+ approximation_function: NONE
164
+ bias_format: SAME
165
+ input_format: BFP[8|8]{64,-1}(SN)
166
+ instance: HFTransformersConv1D
167
+ output_format: BFP[8|8]{64,-1}(SN)
168
+ weight_format: BFP[8|8]{64,0}(SN)
169
+ weight_sparseness: DENSE
170
+ transformer.h.2.attn.c_proj:
171
+ approximation_function: NONE
172
+ bias_format: SAME
173
+ input_format: BFP[8|8]{64,-1}(SN)
174
+ instance: HFTransformersConv1D
175
+ output_format: SAME
176
+ weight_format: BFP[8|8]{64,0}(SN)
177
+ weight_sparseness: DENSE
178
+ transformer.h.2.attn.resid_dropout:
179
+ approximation_function: NONE
180
+ input_format: SAME
181
+ instance: Dropout
182
+ output_format: SAME
183
+ transformer.h.2.attn.softmax:
184
+ approximation_function: SOFTMAX(base2,float16)
185
+ input_format: SAME
186
+ instance: Softmax
187
+ output_format: SAME
188
+ transformer.h.2.ln_1:
189
+ approximation_function: LAYERNORM(fallback,4,float16)
190
+ bias_format: SAME
191
+ input_format: SAME
192
+ instance: LayerNorm
193
+ output_format: SAME
194
+ weight_format: SAME
195
+ transformer.h.2.ln_2:
196
+ approximation_function: LAYERNORM(fallback,4,float16)
197
+ bias_format: SAME
198
+ input_format: SAME
199
+ instance: LayerNorm
200
+ output_format: SAME
201
+ weight_format: SAME
202
+ transformer.h.2.mlp.act:
203
+ approximation_function: GELU(vsimd)
204
+ input_format: SAME
205
+ instance: GELU
206
+ output_format: SAME
207
+ transformer.h.2.mlp.c_fc:
208
+ approximation_function: NONE
209
+ bias_format: SAME
210
+ input_format: BFP[8|8]{64,-1}(SN)
211
+ instance: HFTransformersConv1D
212
+ output_format: SAME
213
+ weight_format: BFP[8|8]{64,0}(SN)
214
+ weight_sparseness: DENSE
215
+ transformer.h.2.mlp.c_proj:
216
+ approximation_function: NONE
217
+ bias_format: SAME
218
+ input_format: BFP[8|8]{64,-1}(SN)
219
+ instance: HFTransformersConv1D
220
+ output_format: SAME
221
+ weight_format: BFP[8|8]{64,0}(SN)
222
+ weight_sparseness: DENSE
223
+ transformer.h.2.mlp.dropout:
224
+ approximation_function: NONE
225
+ input_format: SAME
226
+ instance: Dropout
227
+ output_format: SAME
228
+ transformer.h.3.attn.attn_dropout:
229
+ approximation_function: NONE
230
+ input_format: SAME
231
+ instance: Dropout
232
+ output_format: BFP[8|8]{64,-1}(SN)
233
+ transformer.h.3.attn.c_attn:
234
+ approximation_function: NONE
235
+ bias_format: SAME
236
+ input_format: BFP[8|8]{64,-1}(SN)
237
+ instance: HFTransformersConv1D
238
+ output_format: BFP[8|8]{64,-1}(SN)
239
+ weight_format: BFP[8|8]{64,0}(SN)
240
+ weight_sparseness: DENSE
241
+ transformer.h.3.attn.c_proj:
242
+ approximation_function: NONE
243
+ bias_format: SAME
244
+ input_format: BFP[8|8]{64,-1}(SN)
245
+ instance: HFTransformersConv1D
246
+ output_format: SAME
247
+ weight_format: BFP[8|8]{64,0}(SN)
248
+ weight_sparseness: DENSE
249
+ transformer.h.3.attn.resid_dropout:
250
+ approximation_function: NONE
251
+ input_format: SAME
252
+ instance: Dropout
253
+ output_format: SAME
254
+ transformer.h.3.attn.softmax:
255
+ approximation_function: SOFTMAX(base2,float16)
256
+ input_format: SAME
257
+ instance: Softmax
258
+ output_format: SAME
259
+ transformer.h.3.ln_1:
260
+ approximation_function: LAYERNORM(fallback,4,float16)
261
+ bias_format: SAME
262
+ input_format: SAME
263
+ instance: LayerNorm
264
+ output_format: SAME
265
+ weight_format: SAME
266
+ transformer.h.3.ln_2:
267
+ approximation_function: LAYERNORM(fallback,4,float16)
268
+ bias_format: SAME
269
+ input_format: SAME
270
+ instance: LayerNorm
271
+ output_format: SAME
272
+ weight_format: SAME
273
+ transformer.h.3.mlp.act:
274
+ approximation_function: GELU(vsimd)
275
+ input_format: SAME
276
+ instance: GELU
277
+ output_format: SAME
278
+ transformer.h.3.mlp.c_fc:
279
+ approximation_function: NONE
280
+ bias_format: SAME
281
+ input_format: BFP[8|8]{64,-1}(SN)
282
+ instance: HFTransformersConv1D
283
+ output_format: SAME
284
+ weight_format: BFP[8|8]{64,0}(SN)
285
+ weight_sparseness: DENSE
286
+ transformer.h.3.mlp.c_proj:
287
+ approximation_function: NONE
288
+ bias_format: SAME
289
+ input_format: BFP[8|8]{64,-1}(SN)
290
+ instance: HFTransformersConv1D
291
+ output_format: SAME
292
+ weight_format: BFP[8|8]{64,0}(SN)
293
+ weight_sparseness: DENSE
294
+ transformer.h.3.mlp.dropout:
295
+ approximation_function: NONE
296
+ input_format: SAME
297
+ instance: Dropout
298
+ output_format: SAME
299
+ transformer.h.4.attn.attn_dropout:
300
+ approximation_function: NONE
301
+ input_format: SAME
302
+ instance: Dropout
303
+ output_format: BFP[8|8]{64,-1}(SN)
304
+ transformer.h.4.attn.c_attn:
305
+ approximation_function: NONE
306
+ bias_format: SAME
307
+ input_format: BFP[8|8]{64,-1}(SN)
308
+ instance: HFTransformersConv1D
309
+ output_format: BFP[8|8]{64,-1}(SN)
310
+ weight_format: BFP[8|8]{64,0}(SN)
311
+ weight_sparseness: DENSE
312
+ transformer.h.4.attn.c_proj:
313
+ approximation_function: NONE
314
+ bias_format: SAME
315
+ input_format: BFP[8|8]{64,-1}(SN)
316
+ instance: HFTransformersConv1D
317
+ output_format: SAME
318
+ weight_format: BFP[8|8]{64,0}(SN)
319
+ weight_sparseness: DENSE
320
+ transformer.h.4.attn.resid_dropout:
321
+ approximation_function: NONE
322
+ input_format: SAME
323
+ instance: Dropout
324
+ output_format: SAME
325
+ transformer.h.4.attn.softmax:
326
+ approximation_function: SOFTMAX(base2,float16)
327
+ input_format: SAME
328
+ instance: Softmax
329
+ output_format: SAME
330
+ transformer.h.4.ln_1:
331
+ approximation_function: LAYERNORM(fallback,4,float16)
332
+ bias_format: SAME
333
+ input_format: SAME
334
+ instance: LayerNorm
335
+ output_format: SAME
336
+ weight_format: SAME
337
+ transformer.h.4.ln_2:
338
+ approximation_function: LAYERNORM(fallback,4,float16)
339
+ bias_format: SAME
340
+ input_format: SAME
341
+ instance: LayerNorm
342
+ output_format: SAME
343
+ weight_format: SAME
344
+ transformer.h.4.mlp.act:
345
+ approximation_function: GELU(vsimd)
346
+ input_format: SAME
347
+ instance: GELU
348
+ output_format: SAME
349
+ transformer.h.4.mlp.c_fc:
350
+ approximation_function: NONE
351
+ bias_format: SAME
352
+ input_format: BFP[8|8]{64,-1}(SN)
353
+ instance: HFTransformersConv1D
354
+ output_format: SAME
355
+ weight_format: BFP[8|8]{64,0}(SN)
356
+ weight_sparseness: DENSE
357
+ transformer.h.4.mlp.c_proj:
358
+ approximation_function: NONE
359
+ bias_format: SAME
360
+ input_format: BFP[8|8]{64,-1}(SN)
361
+ instance: HFTransformersConv1D
362
+ output_format: SAME
363
+ weight_format: BFP[8|8]{64,0}(SN)
364
+ weight_sparseness: DENSE
365
+ transformer.h.4.mlp.dropout:
366
+ approximation_function: NONE
367
+ input_format: SAME
368
+ instance: Dropout
369
+ output_format: SAME
370
+ transformer.h.5.attn.attn_dropout:
371
+ approximation_function: NONE
372
+ input_format: SAME
373
+ instance: Dropout
374
+ output_format: BFP[8|8]{64,-1}(SN)
375
+ transformer.h.5.attn.c_attn:
376
+ approximation_function: NONE
377
+ bias_format: SAME
378
+ input_format: BFP[8|8]{64,-1}(SN)
379
+ instance: HFTransformersConv1D
380
+ output_format: BFP[8|8]{64,-1}(SN)
381
+ weight_format: BFP[8|8]{64,0}(SN)
382
+ weight_sparseness: DENSE
383
+ transformer.h.5.attn.c_proj:
384
+ approximation_function: NONE
385
+ bias_format: SAME
386
+ input_format: BFP[8|8]{64,-1}(SN)
387
+ instance: HFTransformersConv1D
388
+ output_format: SAME
389
+ weight_format: BFP[8|8]{64,0}(SN)
390
+ weight_sparseness: DENSE
391
+ transformer.h.5.attn.resid_dropout:
392
+ approximation_function: NONE
393
+ input_format: SAME
394
+ instance: Dropout
395
+ output_format: SAME
396
+ transformer.h.5.attn.softmax:
397
+ approximation_function: SOFTMAX(base2,float16)
398
+ input_format: SAME
399
+ instance: Softmax
400
+ output_format: SAME
401
+ transformer.h.5.ln_1:
402
+ approximation_function: LAYERNORM(fallback,4,float16)
403
+ bias_format: SAME
404
+ input_format: SAME
405
+ instance: LayerNorm
406
+ output_format: SAME
407
+ weight_format: SAME
408
+ transformer.h.5.ln_2:
409
+ approximation_function: LAYERNORM(fallback,4,float16)
410
+ bias_format: SAME
411
+ input_format: SAME
412
+ instance: LayerNorm
413
+ output_format: SAME
414
+ weight_format: SAME
415
+ transformer.h.5.mlp.act:
416
+ approximation_function: GELU(vsimd)
417
+ input_format: SAME
418
+ instance: GELU
419
+ output_format: SAME
420
+ transformer.h.5.mlp.c_fc:
421
+ approximation_function: NONE
422
+ bias_format: SAME
423
+ input_format: BFP[8|8]{64,-1}(SN)
424
+ instance: HFTransformersConv1D
425
+ output_format: SAME
426
+ weight_format: BFP[8|8]{64,0}(SN)
427
+ weight_sparseness: DENSE
428
+ transformer.h.5.mlp.c_proj:
429
+ approximation_function: NONE
430
+ bias_format: SAME
431
+ input_format: BFP[8|8]{64,-1}(SN)
432
+ instance: HFTransformersConv1D
433
+ output_format: SAME
434
+ weight_format: BFP[8|8]{64,0}(SN)
435
+ weight_sparseness: DENSE
436
+ transformer.h.5.mlp.dropout:
437
+ approximation_function: NONE
438
+ input_format: SAME
439
+ instance: Dropout
440
+ output_format: SAME
441
+ transformer.ln_f:
442
+ approximation_function: LAYERNORM(fallback,4,float16)
443
+ bias_format: SAME
444
+ input_format: SAME
445
+ instance: LayerNorm
446
+ output_format: SAME
447
+ weight_format: SAME
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"model_type": "gpt", "architectures": ["GPT2LMHeadModel"]}
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6defede097d338ec69a958c71b91bc74eedcc10368cd42d84da8638c73833892
3
+ size 334205321
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 1024,
7
+ "tokenizer_class": "GPT2Tokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff