disarmyouwitha commited on
Commit
86064b3
·
1 Parent(s): 2a92bce

Upload 2 files

Browse files
koala-13B-4bit_ooba_cuda_fast.txt ADDED
@@ -0,0 +1,762 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ python test_benchmark_inference.py -dbg -d ~/llm_models/koala-13B-GPTQ
2
+ -- Loading model
3
+ -- Tokenizer: /home/nap/llm_models/koala-13B-GPTQ/tokenizer.model
4
+ -- Model config: /home/nap/llm_models/koala-13B-GPTQ/config.json
5
+ -- Model: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_ooba_cuda_fast.safetensors
6
+ -- Sequence length: 2048
7
+ -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'debug']
8
+ !! Available CUDA devices:
9
+ " !! - cuda:0: NVIDIA GeForce RTX 4090
10
+ " !! - cuda:1: NVIDIA RTX A6000
11
+ !! Loading safetensors file: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_ooba_cuda_fast.safetensors
12
+ !! Begin load tensors
13
+ !! - lm_head.weight read: device: cpu, shape: [32000, 5120], dtype: float16
14
+ !! - lm_head.weight map: device: cuda:0, shape: [32000, 5120], dtype: float16, min: -0.316406, max: 0.361328, std: 0.020935
15
+ !! - model.embed_tokens.weight read: device: cpu, shape: [32000, 5120], dtype: float16
16
+ !! - model.embed_tokens.weight map: device: cpu, shape: [32000, 5120], dtype: float16
17
+ !! - model.layers.0.input_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
18
+ !! - model.layers.0.input_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593
19
+ !! - model.layers.0.mlp.down_proj.qweight read: device: cpu, shape: [1728, 5120], dtype: int32
20
+ !! - model.layers.0.mlp.down_proj.qweight map: device: cuda:0, shape: [1728, 5120], dtype: int32, min: -2147416079, max: 2147375608
21
+ !! - model.layers.0.mlp.down_proj.qzeros read: device: cpu, shape: [108, 640], dtype: int32
22
+ !! - model.layers.0.mlp.down_proj.qzeros map: device: cuda:0, shape: [108, 640], dtype: int32, min: -2106165417, max: 2089191031
23
+ !! - model.layers.0.mlp.down_proj.scales read: device: cpu, shape: [108, 5120], dtype: float32
24
+ !! - model.layers.0.mlp.down_proj.scales map: device: cuda:0, shape: [108, 5120], dtype: float16, min: 0.003326, max: 0.099487, std: 0.001260
25
+ !! - model.layers.0.mlp.gate_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
26
+ !! - model.layers.0.mlp.gate_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147459474, max: 2147466163
27
+ !! - model.layers.0.mlp.gate_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
28
+ !! - model.layers.0.mlp.gate_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2125109368, max: 2089248375
29
+ !! - model.layers.0.mlp.gate_proj.scales read: device: cpu, shape: [40, 13824], dtype: float32
30
+ !! - model.layers.0.mlp.gate_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002777, max: 0.060303, std: 0.000990
31
+ !! - model.layers.0.mlp.up_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
32
+ !! - model.layers.0.mlp.up_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147474830, max: 2147437148
33
+ !! - model.layers.0.mlp.up_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
34
+ !! - model.layers.0.mlp.up_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2107213722, max: 2089121671
35
+ !! - model.layers.0.mlp.up_proj.scales read: device: cpu, shape: [40, 13824], dtype: float32
36
+ !! - model.layers.0.mlp.up_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002075, max: 0.040131, std: 0.000730
37
+ !! - model.layers.0.post_attention_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
38
+ !! - model.layers.0.post_attention_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113
39
+ !! - model.layers.0.self_attn.k_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
40
+ !! - model.layers.0.self_attn.k_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147305928, max: 2147337675
41
+ !! - model.layers.0.self_attn.k_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
42
+ !! - model.layers.0.self_attn.k_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128119278, max: 2092336937
43
+ !! - model.layers.0.self_attn.k_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
44
+ !! - model.layers.0.self_attn.k_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001449, max: 0.082703, std: 0.005592
45
+ !! - model.layers.0.self_attn.o_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
46
+ !! - model.layers.0.self_attn.o_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147453144, max: 2147375548
47
+ !! - model.layers.0.self_attn.o_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
48
+ !! - model.layers.0.self_attn.o_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2107209387, max: 2071422582
49
+ !! - model.layers.0.self_attn.o_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
50
+ !! - model.layers.0.self_attn.o_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001521, max: 0.089478, std: 0.001425
51
+ !! - model.layers.0.self_attn.q_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
52
+ !! - model.layers.0.self_attn.q_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147399309, max: 2147314245
53
+ !! - model.layers.0.self_attn.q_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
54
+ !! - model.layers.0.self_attn.q_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128450726, max: 2092123285
55
+ !! - model.layers.0.self_attn.q_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
56
+ !! - model.layers.0.self_attn.q_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001049, max: 0.095764, std: 0.005581
57
+ !! - model.layers.0.self_attn.v_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
58
+ !! - model.layers.0.self_attn.v_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147441095, max: 2147387755
59
+ !! - model.layers.0.self_attn.v_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
60
+ !! - model.layers.0.self_attn.v_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2091420041, max: 2071422327
61
+ !! - model.layers.0.self_attn.v_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
62
+ !! - model.layers.0.self_attn.v_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001673, max: 0.015762, std: 0.001489
63
+ !! - model.norm.weight read: device: cpu, shape: [5120], dtype: float16
64
+ !! - model.norm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: 0.018066, max: 2.093750, std: 0.073120
65
+ !! Computing RoPE table for seq length: 2048
66
+ !! - stored for device: cuda:0
67
+ ** Time, Load model: 1.58 seconds
68
+ -- Groupsize (inferred): 128
69
+ -- Act-order (inferred): no
70
+ ** VRAM, Model: [cuda:0] 6,683.17 MB - [cuda:1] 0.00 MB
71
+ !! Inference, debug pass
72
+ !! Begin forward pass
73
+ !! Moving input_ids from cuda:0 to cpu
74
+ !! Built initial hidden state: device: cpu, shape: [1, 1920, 5120], dtype: float16, min: -0.110840, max: 0.124512, std: 0.018784
75
+ !! Prepared buffer for device: cuda:0
76
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
77
+ !! Moving hidden_states from cpu to cuda:0
78
+ !! Begin decoder 0
79
+ !! Begin self-attention
80
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -0.110840, max: 0.124512, std: 0.018784
81
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
82
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593 eps: 0.00000100
83
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001049/0.095764/0.005581
84
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001449/0.082703/0.005592
85
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.001673/0.015762/0.001489
86
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001521/0.089478/0.001425
87
+ !! - cache device: cuda:0, seq_len: 0
88
+ !! Begin MLP
89
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1.126953, max: 1.317383, std: 0.035309
90
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113 eps: 0.00000100
91
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002777/0.060303/0.000990
92
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002075/0.040131/0.000730
93
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003326/0.099487/0.001260
94
+ !! - method: normal
95
+ !! Begin decoder 1
96
+ !! Begin self-attention
97
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -10.023438, max: 37.812500, std: 0.116089
98
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
99
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012146, max: 0.326172, std: 0.022308 eps: 0.00000100
100
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001299/0.042847/0.005116
101
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001262/0.056030/0.005295
102
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.001407/0.011436/0.001119
103
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001063/0.086609/0.001472
104
+ !! - cache device: cuda:0, seq_len: 0
105
+ !! Begin MLP
106
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.671875, max: 35.968750, std: 0.113708
107
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003036, max: 0.166016, std: 0.010605 eps: 0.00000100
108
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003960/0.075562/0.001144
109
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003387/0.035187/0.000851
110
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002762/0.120483/0.001154
111
+ !! - method: normal
112
+ !! Begin decoder 2
113
+ !! Begin self-attention
114
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -12.234375, max: 33.375000, std: 0.152710
115
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
116
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.057617, max: 0.369141, std: 0.015396 eps: 0.00000100
117
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002361/0.074585/0.003971
118
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001963/0.050629/0.004532
119
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002445/0.020309/0.000759
120
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002083/0.110596/0.001124
121
+ !! - cache device: cuda:0, seq_len: 0
122
+ !! Begin MLP
123
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -12.398438, max: 29.312500, std: 0.156738
124
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014099, max: 0.161133, std: 0.011726 eps: 0.00000100
125
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002787/0.087097/0.001152
126
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003202/0.043213/0.000878
127
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002434/0.133301/0.001044
128
+ !! - method: normal
129
+ !! Begin decoder 3
130
+ !! Begin self-attention
131
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -933.000000, max: 26.562500, std: 0.348877
132
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
133
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.445312, std: 0.016769 eps: 0.00000100
134
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002218/0.064087/0.003193
135
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001682/0.047546/0.003334
136
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002258/0.013161/0.000889
137
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001929/0.086182/0.001017
138
+ !! - cache device: cuda:0, seq_len: 0
139
+ !! Begin MLP
140
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -933.000000, max: 27.828125, std: 0.353027
141
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.020508, max: 0.185547, std: 0.012711 eps: 0.00000100
142
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002598/0.055603/0.001158
143
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002819/0.043365/0.000893
144
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002668/0.083008/0.000952
145
+ !! - method: normal
146
+ !! Begin decoder 4
147
+ !! Begin self-attention
148
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.000000, max: 26.937500, std: 0.376465
149
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
150
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.036621, max: 0.458984, std: 0.017136 eps: 0.00000100
151
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002357/0.124084/0.003180
152
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001328/0.042419/0.003229
153
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002598/0.018280/0.000826
154
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001725/0.085449/0.000918
155
+ !! - cache device: cuda:0, seq_len: 0
156
+ !! Begin MLP
157
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.000000, max: 30.609375, std: 0.397705
158
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025391, max: 0.200195, std: 0.012398 eps: 0.00000100
159
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003830/0.047241/0.001214
160
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003572/0.041473/0.000900
161
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002481/0.095337/0.000922
162
+ !! - method: normal
163
+ !! Begin decoder 5
164
+ !! Begin self-attention
165
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.500000, max: 28.265625, std: 0.410889
166
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
167
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.492188, std: 0.019684 eps: 0.00000100
168
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001987/0.102661/0.003073
169
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001550/0.035492/0.003050
170
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002256/0.016541/0.000906
171
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002275/0.106079/0.001011
172
+ !! - cache device: cuda:0, seq_len: 0
173
+ !! Begin MLP
174
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.500000, max: 32.062500, std: 0.428223
175
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042236, max: 0.211914, std: 0.011848 eps: 0.00000100
176
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002773/0.047150/0.001265
177
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001515/0.041870/0.000920
178
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002594/0.062195/0.000935
179
+ !! - method: normal
180
+ !! Begin decoder 6
181
+ !! Begin self-attention
182
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -937.000000, max: 29.000000, std: 0.451904
183
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
184
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.067871, max: 0.558594, std: 0.019913 eps: 0.00000100
185
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002136/0.046173/0.003099
186
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001863/0.033478/0.003153
187
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002909/0.020889/0.000928
188
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001761/0.096313/0.001001
189
+ !! - cache device: cuda:0, seq_len: 0
190
+ !! Begin MLP
191
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -937.500000, max: 27.984375, std: 0.468262
192
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038574, max: 0.244141, std: 0.012810 eps: 0.00000100
193
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003599/0.058990/0.001412
194
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003576/0.044037/0.000947
195
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002380/0.090454/0.001029
196
+ !! - method: normal
197
+ !! Begin decoder 7
198
+ !! Begin self-attention
199
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.437500, std: 0.518066
200
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
201
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.010315, max: 0.609375, std: 0.018875 eps: 0.00000100
202
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002357/0.038116/0.002750
203
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002035/0.030289/0.002897
204
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002699/0.013130/0.000939
205
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001756/0.065430/0.000955
206
+ !! - cache device: cuda:0, seq_len: 0
207
+ !! Begin MLP
208
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.625000, std: 0.557129
209
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043701, max: 0.222656, std: 0.011360 eps: 0.00000100
210
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003187/0.053528/0.001369
211
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003983/0.029083/0.000935
212
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002668/0.070984/0.000947
213
+ !! - method: normal
214
+ !! Begin decoder 8
215
+ !! Begin self-attention
216
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.687500, std: 0.583008
217
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
218
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007812, max: 0.617188, std: 0.021469 eps: 0.00000100
219
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002020/0.036896/0.003115
220
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001634/0.027725/0.003042
221
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003176/0.019165/0.000947
222
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001910/0.084106/0.000935
223
+ !! - cache device: cuda:0, seq_len: 0
224
+ !! Begin MLP
225
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.812500, std: 0.605469
226
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.228516, std: 0.012070 eps: 0.00000100
227
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003246/0.053589/0.001263
228
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001094/0.036316/0.000944
229
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002659/0.075378/0.000929
230
+ !! - method: normal
231
+ !! Begin decoder 9
232
+ !! Begin self-attention
233
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.000000, std: 0.611816
234
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
235
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003876, max: 0.664062, std: 0.020859 eps: 0.00000100
236
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002146/0.038910/0.002712
237
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001664/0.032074/0.002876
238
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003122/0.015617/0.000871
239
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001311/0.095337/0.000900
240
+ !! - cache device: cuda:0, seq_len: 0
241
+ !! Begin MLP
242
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.062500, std: 0.624023
243
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.049805, max: 0.238281, std: 0.011787 eps: 0.00000100
244
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003241/0.061310/0.001322
245
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003132/0.040771/0.000956
246
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002480/0.081299/0.000928
247
+ !! - method: normal
248
+ !! Begin decoder 10
249
+ !! Begin self-attention
250
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.125000, std: 0.634277
251
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
252
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002594, max: 0.703125, std: 0.021515 eps: 0.00000100
253
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002222/0.033997/0.002638
254
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001856/0.029907/0.002831
255
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003365/0.014862/0.000932
256
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001518/0.084351/0.000958
257
+ !! - cache device: cuda:0, seq_len: 0
258
+ !! Begin MLP
259
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.187500, std: 0.652344
260
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.053955, max: 0.245117, std: 0.011978 eps: 0.00000100
261
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003246/0.042297/0.001295
262
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003368/0.040710/0.000970
263
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002800/0.089050/0.000934
264
+ !! - method: normal
265
+ !! Begin decoder 11
266
+ !! Begin self-attention
267
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.250000, std: 0.668457
268
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
269
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007355, max: 0.687500, std: 0.021606 eps: 0.00000100
270
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002106/0.034271/0.002579
271
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002033/0.028885/0.002792
272
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003374/0.014481/0.000937
273
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001925/0.075500/0.000946
274
+ !! - cache device: cuda:0, seq_len: 0
275
+ !! Begin MLP
276
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.312500, std: 0.690430
277
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.054443, max: 0.251953, std: 0.011749 eps: 0.00000100
278
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003128/0.051086/0.001299
279
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001537/0.041565/0.000993
280
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003239/0.079163/0.000940
281
+ !! - method: normal
282
+ !! Begin decoder 12
283
+ !! Begin self-attention
284
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.375000, std: 0.723145
285
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
286
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014771, max: 0.664062, std: 0.020920 eps: 0.00000100
287
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002449/0.034271/0.002655
288
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002136/0.032806/0.002867
289
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003397/0.019394/0.000961
290
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001609/0.057343/0.000999
291
+ !! - cache device: cuda:0, seq_len: 0
292
+ !! Begin MLP
293
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.500000, std: 0.749023
294
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.056396, max: 0.249023, std: 0.012207 eps: 0.00000100
295
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003019/0.043274/0.001330
296
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002712/0.043762/0.001000
297
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003359/0.118286/0.000953
298
+ !! - method: normal
299
+ !! Begin decoder 13
300
+ !! Begin self-attention
301
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.562500, std: 0.785645
302
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
303
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.687500, std: 0.021698 eps: 0.00000100
304
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002420/0.034241/0.002577
305
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002388/0.034241/0.002741
306
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003078/0.015854/0.000962
307
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002022/0.078918/0.000970
308
+ !! - cache device: cuda:0, seq_len: 0
309
+ !! Begin MLP
310
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.687500, std: 0.807617
311
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051025, max: 0.265625, std: 0.012978 eps: 0.00000100
312
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003170/0.036652/0.001327
313
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004108/0.028717/0.000996
314
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002531/0.052429/0.000926
315
+ !! - method: normal
316
+ !! Begin decoder 14
317
+ !! Begin self-attention
318
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.750000, std: 0.848633
319
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
320
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025879, max: 0.691406, std: 0.021164 eps: 0.00000100
321
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002115/0.035156/0.002348
322
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001471/0.031067/0.002569
323
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003618/0.020035/0.000957
324
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001540/0.086060/0.000992
325
+ !! - cache device: cuda:0, seq_len: 0
326
+ !! Begin MLP
327
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.875000, std: 0.866699
328
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.055420, max: 0.273438, std: 0.013245 eps: 0.00000100
329
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003336/0.032928/0.001335
330
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003906/0.045197/0.000993
331
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002605/0.088013/0.000936
332
+ !! - method: normal
333
+ !! Begin decoder 15
334
+ !! Begin self-attention
335
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.917480
336
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
337
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031494, max: 0.679688, std: 0.020615 eps: 0.00000100
338
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002296/0.038727/0.002529
339
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002375/0.030533/0.002689
340
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003328/0.015869/0.000980
341
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001546/0.124634/0.001021
342
+ !! - cache device: cuda:0, seq_len: 0
343
+ !! Begin MLP
344
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.946777
345
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040039, max: 0.291016, std: 0.014809 eps: 0.00000100
346
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003687/0.051025/0.001274
347
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004307/0.041656/0.000965
348
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002167/0.078613/0.000919
349
+ !! - method: normal
350
+ !! Begin decoder 16
351
+ !! Begin self-attention
352
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.994141
353
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
354
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012573, max: 0.652344, std: 0.020477 eps: 0.00000100
355
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002371/0.034912/0.002207
356
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001926/0.029617/0.002392
357
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003460/0.018524/0.000947
358
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001738/0.051270/0.000971
359
+ !! - cache device: cuda:0, seq_len: 0
360
+ !! Begin MLP
361
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.007812
362
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045898, max: 0.298828, std: 0.015106 eps: 0.00000100
363
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003387/0.036011/0.001249
364
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003696/0.035187/0.000964
365
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002268/0.065063/0.000917
366
+ !! - method: normal
367
+ !! Begin decoder 17
368
+ !! Begin self-attention
369
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.063477
370
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
371
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025146, max: 0.722656, std: 0.021576 eps: 0.00000100
372
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002331/0.036224/0.002277
373
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001755/0.030884/0.002550
374
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003754/0.020874/0.000970
375
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001672/0.116455/0.001009
376
+ !! - cache device: cuda:0, seq_len: 0
377
+ !! Begin MLP
378
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.102539
379
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042969, max: 0.310547, std: 0.015625 eps: 0.00000100
380
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003586/0.035492/0.001222
381
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004265/0.044525/0.000955
382
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002222/0.067993/0.000917
383
+ !! - method: normal
384
+ !! Begin decoder 18
385
+ !! Begin self-attention
386
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.062500, std: 1.158203
387
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
388
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.738281, std: 0.022064 eps: 0.00000100
389
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002323/0.033447/0.002235
390
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001904/0.030121/0.002382
391
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004002/0.014252/0.000932
392
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001740/0.083801/0.000958
393
+ !! - cache device: cuda:0, seq_len: 0
394
+ !! Begin MLP
395
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.192383
396
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048584, max: 0.318359, std: 0.015625 eps: 0.00000100
397
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003035/0.034271/0.001252
398
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003998/0.045654/0.000957
399
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002491/0.084534/0.000911
400
+ !! - method: normal
401
+ !! Begin decoder 19
402
+ !! Begin self-attention
403
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.264648
404
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
405
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.024170, max: 0.753906, std: 0.022308 eps: 0.00000100
406
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002134/0.031494/0.002193
407
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001934/0.030380/0.002371
408
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003841/0.015404/0.000981
409
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001974/0.084167/0.001057
410
+ !! - cache device: cuda:0, seq_len: 0
411
+ !! Begin MLP
412
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.292969
413
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033936, max: 0.347656, std: 0.016785 eps: 0.00000100
414
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003767/0.040405/0.001213
415
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004185/0.043823/0.000943
416
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002474/0.062683/0.000900
417
+ !! - method: normal
418
+ !! Begin decoder 20
419
+ !! Begin self-attention
420
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.062500, std: 1.365234
421
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
422
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.757812, std: 0.022324 eps: 0.00000100
423
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002235/0.035187/0.002100
424
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002291/0.032471/0.002190
425
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003658/0.014191/0.001044
426
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001817/0.078064/0.001065
427
+ !! - cache device: cuda:0, seq_len: 0
428
+ !! Begin MLP
429
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.402344
430
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048096, max: 0.345703, std: 0.016815 eps: 0.00000100
431
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003643/0.044281/0.001211
432
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004276/0.048615/0.000933
433
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002605/0.067444/0.000911
434
+ !! - method: normal
435
+ !! Begin decoder 21
436
+ !! Begin self-attention
437
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.490234
438
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
439
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037598, max: 0.796875, std: 0.023514 eps: 0.00000100
440
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002506/0.043945/0.002247
441
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002028/0.031616/0.002365
442
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004189/0.014427/0.001028
443
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001978/0.039856/0.001017
444
+ !! - cache device: cuda:0, seq_len: 0
445
+ !! Begin MLP
446
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.533203
447
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.347656, std: 0.017212 eps: 0.00000100
448
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003614/0.052155/0.001178
449
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.032867/0.000925
450
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002708/0.063232/0.000911
451
+ !! - method: normal
452
+ !! Begin decoder 22
453
+ !! Begin self-attention
454
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.623047
455
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
456
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.753906, std: 0.022934 eps: 0.00000100
457
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002468/0.036316/0.002068
458
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002302/0.030502/0.002201
459
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003658/0.014572/0.000998
460
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002260/0.096069/0.001020
461
+ !! - cache device: cuda:0, seq_len: 0
462
+ !! Begin MLP
463
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.687500
464
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045654, max: 0.361328, std: 0.018143 eps: 0.00000100
465
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003679/0.035217/0.001136
466
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004360/0.036133/0.000911
467
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002403/0.078796/0.000916
468
+ !! - method: normal
469
+ !! Begin decoder 23
470
+ !! Begin self-attention
471
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.782227
472
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
473
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033691, max: 0.792969, std: 0.024429 eps: 0.00000100
474
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002359/0.034546/0.002054
475
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002043/0.033936/0.002104
476
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004379/0.013702/0.000979
477
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001885/0.075256/0.000995
478
+ !! - cache device: cuda:0, seq_len: 0
479
+ !! Begin MLP
480
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.875000, std: 1.843750
481
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.367188, std: 0.019226 eps: 0.00000100
482
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003729/0.050964/0.001107
483
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.036224/0.000899
484
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002159/0.082642/0.000899
485
+ !! - method: normal
486
+ !! Begin decoder 24
487
+ !! Begin self-attention
488
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 80.000000, std: 1.940430
489
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
490
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051758, max: 0.812500, std: 0.025452 eps: 0.00000100
491
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002163/0.037628/0.002060
492
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002029/0.031433/0.002123
493
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003849/0.016617/0.000987
494
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001784/0.109741/0.001011
495
+ !! - cache device: cuda:0, seq_len: 0
496
+ !! Begin MLP
497
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 87.187500, std: 1.993164
498
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029419, max: 0.382812, std: 0.020203 eps: 0.00000100
499
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003664/0.039459/0.001067
500
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004559/0.033142/0.000891
501
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002037/0.088379/0.000898
502
+ !! - method: normal
503
+ !! Begin decoder 25
504
+ !! Begin self-attention
505
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 88.312500, std: 2.072266
506
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
507
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043213, max: 0.816406, std: 0.024796 eps: 0.00000100
508
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001895/0.034515/0.002041
509
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001381/0.040314/0.002146
510
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003727/0.015511/0.001091
511
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002243/0.103149/0.001124
512
+ !! - cache device: cuda:0, seq_len: 0
513
+ !! Begin MLP
514
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 98.000000, std: 2.152344
515
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029663, max: 0.404297, std: 0.020950 eps: 0.00000100
516
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003717/0.032501/0.001052
517
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004433/0.026627/0.000883
518
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002089/0.068298/0.000892
519
+ !! - method: normal
520
+ !! Begin decoder 26
521
+ !! Begin self-attention
522
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1070.000000, max: 101.000000, std: 2.234375
523
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
524
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.875000, std: 0.026947 eps: 0.00000100
525
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002312/0.030716/0.001928
526
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002153/0.033234/0.002005
527
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004166/0.014450/0.000995
528
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002365/0.091187/0.001030
529
+ !! - cache device: cuda:0, seq_len: 0
530
+ !! Begin MLP
531
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1070.000000, max: 105.500000, std: 2.265625
532
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030518, max: 0.400391, std: 0.021332 eps: 0.00000100
533
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004192/0.032410/0.001042
534
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004314/0.036591/0.000883
535
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002001/0.074585/0.000899
536
+ !! - method: normal
537
+ !! Begin decoder 27
538
+ !! Begin self-attention
539
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1069.000000, max: 108.812500, std: 2.341797
540
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
541
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.906250, std: 0.027390 eps: 0.00000100
542
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002163/0.037323/0.002039
543
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002100/0.032104/0.002142
544
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004280/0.019775/0.000985
545
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002172/0.070496/0.001004
546
+ !! - cache device: cuda:0, seq_len: 0
547
+ !! Begin MLP
548
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1069.000000, max: 115.812500, std: 2.398438
549
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.034180, max: 0.406250, std: 0.021439 eps: 0.00000100
550
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004284/0.040131/0.001047
551
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004375/0.046295/0.000883
552
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002033/0.049622/0.000891
553
+ !! - method: normal
554
+ !! Begin decoder 28
555
+ !! Begin self-attention
556
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1068.000000, max: 119.062500, std: 2.470703
557
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
558
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.937500, std: 0.027420 eps: 0.00000100
559
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002270/0.045990/0.002008
560
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002068/0.035706/0.002039
561
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003502/0.013725/0.001108
562
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002235/0.154175/0.001218
563
+ !! - cache device: cuda:0, seq_len: 0
564
+ !! Begin MLP
565
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1068.000000, max: 132.000000, std: 2.578125
566
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.022705, max: 0.423828, std: 0.022003 eps: 0.00000100
567
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004471/0.042694/0.001054
568
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004562/0.022446/0.000878
569
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001733/0.056427/0.000884
570
+ !! - method: normal
571
+ !! Begin decoder 29
572
+ !! Begin self-attention
573
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1067.000000, max: 134.750000, std: 2.632812
574
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
575
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003403, max: 0.957031, std: 0.027893 eps: 0.00000100
576
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002245/0.032928/0.001910
577
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002039/0.030350/0.001957
578
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004120/0.014153/0.001067
579
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002306/0.074097/0.001082
580
+ !! - cache device: cuda:0, seq_len: 0
581
+ !! Begin MLP
582
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1067.000000, max: 138.375000, std: 2.666016
583
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.028442, max: 0.691406, std: 0.022568 eps: 0.00000100
584
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004440/0.035675/0.001063
585
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004391/0.031128/0.000879
586
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001850/0.075684/0.000896
587
+ !! - method: normal
588
+ !! Begin decoder 30
589
+ !! Begin self-attention
590
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1066.000000, max: 141.125000, std: 2.714844
591
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
592
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.039062, max: 0.953125, std: 0.028458 eps: 0.00000100
593
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002247/0.030197/0.001984
594
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002272/0.032532/0.002090
595
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002539/0.015915/0.001025
596
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002310/0.092224/0.001046
597
+ !! - cache device: cuda:0, seq_len: 0
598
+ !! Begin MLP
599
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1066.000000, max: 145.625000, std: 2.767578
600
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.013855, max: 0.443359, std: 0.021713 eps: 0.00000100
601
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004665/0.045197/0.001092
602
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004078/0.036926/0.000885
603
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001813/0.072693/0.000899
604
+ !! - method: normal
605
+ !! Begin decoder 31
606
+ !! Begin self-attention
607
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1064.000000, max: 151.000000, std: 2.847656
608
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
609
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.005127, max: 0.949219, std: 0.028824 eps: 0.00000100
610
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002350/0.031052/0.001871
611
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002193/0.030899/0.001905
612
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004337/0.015503/0.001026
613
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002642/0.092957/0.001069
614
+ !! - cache device: cuda:0, seq_len: 0
615
+ !! Begin MLP
616
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1064.000000, max: 160.750000, std: 2.914062
617
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.015198, max: 0.449219, std: 0.022018 eps: 0.00000100
618
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004639/0.031525/0.001118
619
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004658/0.035858/0.000885
620
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001880/0.045258/0.000892
621
+ !! - method: normal
622
+ !! Begin decoder 32
623
+ !! Begin self-attention
624
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1062.000000, max: 162.625000, std: 2.964844
625
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
626
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002731, max: 0.898438, std: 0.028946 eps: 0.00000100
627
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002439/0.031342/0.001923
628
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001745/0.039093/0.001959
629
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003937/0.014107/0.001027
630
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002518/0.113953/0.001073
631
+ !! - cache device: cuda:0, seq_len: 0
632
+ !! Begin MLP
633
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1062.000000, max: 167.125000, std: 3.007812
634
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003967, max: 0.746094, std: 0.022736 eps: 0.00000100
635
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004223/0.046234/0.001122
636
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004738/0.031342/0.000886
637
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001719/0.055420/0.000911
638
+ !! - method: normal
639
+ !! Begin decoder 33
640
+ !! Begin self-attention
641
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1059.000000, max: 172.125000, std: 3.062500
642
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
643
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.910156, std: 0.029999 eps: 0.00000100
644
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002224/0.034576/0.001955
645
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002178/0.034698/0.001965
646
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003191/0.017090/0.001073
647
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002516/0.098511/0.001093
648
+ !! - cache device: cuda:0, seq_len: 0
649
+ !! Begin MLP
650
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1058.000000, max: 174.375000, std: 3.101562
651
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.021729, max: 0.457031, std: 0.021973 eps: 0.00000100
652
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004440/0.058960/0.001143
653
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004120/0.027802/0.000899
654
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001822/0.089966/0.000950
655
+ !! - method: normal
656
+ !! Begin decoder 34
657
+ !! Begin self-attention
658
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1054.000000, max: 176.625000, std: 3.140625
659
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
660
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038086, max: 0.953125, std: 0.030441 eps: 0.00000100
661
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002279/0.033783/0.001966
662
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002062/0.031311/0.002022
663
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003651/0.016846/0.001222
664
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002913/0.079651/0.001315
665
+ !! - cache device: cuda:0, seq_len: 0
666
+ !! Begin MLP
667
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1053.000000, max: 179.500000, std: 3.205078
668
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.460938, std: 0.021744 eps: 0.00000100
669
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004433/0.036102/0.001138
670
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004498/0.028717/0.000901
671
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001920/0.123169/0.001141
672
+ !! - method: normal
673
+ !! Begin decoder 35
674
+ !! Begin self-attention
675
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 183.500000, std: 3.283203
676
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
677
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040283, max: 0.917969, std: 0.029037 eps: 0.00000100
678
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002428/0.032837/0.001951
679
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002157/0.030807/0.002024
680
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003971/0.013626/0.001038
681
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003014/0.090149/0.001112
682
+ !! - cache device: cuda:0, seq_len: 0
683
+ !! Begin MLP
684
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1021.500000, max: 188.875000, std: 3.333984
685
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030151, max: 0.468750, std: 0.021896 eps: 0.00000100
686
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002829/0.039459/0.001129
687
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004147/0.044250/0.000917
688
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002134/0.148560/0.001385
689
+ !! - method: normal
690
+ !! Begin decoder 36
691
+ !! Begin self-attention
692
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -957.000000, max: 190.625000, std: 3.396484
693
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
694
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.004456, max: 0.941406, std: 0.031082 eps: 0.00000100
695
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001844/0.032776/0.001974
696
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001781/0.031769/0.002085
697
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004047/0.016876/0.001062
698
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002771/0.059174/0.001117
699
+ !! - cache device: cuda:0, seq_len: 0
700
+ !! Begin MLP
701
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -956.500000, max: 191.375000, std: 3.433594
702
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.050537, max: 0.839844, std: 0.022324 eps: 0.00000100
703
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004131/0.048218/0.001153
704
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004238/0.036469/0.000927
705
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002237/0.148193/0.001454
706
+ !! - method: normal
707
+ !! Begin decoder 37
708
+ !! Begin self-attention
709
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -895.000000, max: 188.875000, std: 3.443359
710
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
711
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002762, max: 1.054688, std: 0.032867 eps: 0.00000100
712
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002245/0.036652/0.001965
713
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001849/0.033752/0.002066
714
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003832/0.017563/0.001212
715
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003330/0.115906/0.001400
716
+ !! - cache device: cuda:0, seq_len: 0
717
+ !! Begin MLP
718
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -891.500000, max: 191.125000, std: 3.550781
719
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.066406, max: 0.593750, std: 0.021439 eps: 0.00000100
720
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003469/0.083496/0.001222
721
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003468/0.034821/0.000952
722
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002447/0.204346/0.002012
723
+ !! - method: normal
724
+ !! Begin decoder 38
725
+ !! Begin self-attention
726
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -595.000000, max: 182.000000, std: 3.615234
727
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
728
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.097656, max: 1.039062, std: 0.031891 eps: 0.00000100
729
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002375/0.045197/0.001980
730
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002171/0.030624/0.001997
731
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003450/0.017731/0.001331
732
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003344/0.227539/0.001991
733
+ !! - cache device: cuda:0, seq_len: 0
734
+ !! Begin MLP
735
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -95.500000, max: 199.750000, std: 3.875000
736
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.087891, max: 0.498047, std: 0.020370 eps: 0.00000100
737
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.031525/0.001246
738
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002453/0.059601/0.001083
739
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003397/0.199585/0.001426
740
+ !! - method: normal
741
+ !! Begin decoder 39
742
+ !! Begin self-attention
743
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -172.500000, max: 207.375000, std: 4.148438
744
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
745
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002625, max: 0.957031, std: 0.032471 eps: 0.00000100
746
+ !! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002300/0.047607/0.002197
747
+ !! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002066/0.033020/0.002274
748
+ !! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002975/0.016586/0.001257
749
+ !! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003019/0.146851/0.001698
750
+ !! - cache device: cuda:0, seq_len: 0
751
+ !! Begin MLP
752
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -152.500000, max: 230.750000, std: 4.437500
753
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.109863, max: 0.648438, std: 0.025543 eps: 0.00000100
754
+ !! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002789/0.032501/0.001303
755
+ !! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002787/0.085999/0.001245
756
+ !! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.004478/0.175049/0.001831
757
+ !! - method: normal
758
+ !! pre norm, hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -191.250000, max: 691.500000, std: 6.925781
759
+ !! pre lm_head, hidden_states: device: cuda:0, shape: [1, 1, 5120], dtype: float16, min: -18.781250, max: 24.484375, std: 1.285156
760
+ !! logits: device: cuda:0, shape: [1, 1, 32000], dtype: float16, min: -11.500000, max: 10.296875, std: 2.232422
761
+ !! Moving logits from cuda:0 to cpu
762
+ ** Time, Inference: 0.86 seconds
koala-13B-4bit_qwop_cuda_slow.txt ADDED
@@ -0,0 +1,776 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ python test_benchmark_inference.py -dbg -d ~/llm_models/koala-13B-GPTQ
2
+ -- Loading model
3
+ -- Tokenizer: /home/nap/llm_models/koala-13B-GPTQ/tokenizer.model
4
+ -- Model config: /home/nap/llm_models/koala-13B-GPTQ/config.json
5
+ -- Model: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_qwop_cuda_slow.safetensors
6
+ -- Sequence length: 2048
7
+ -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'debug']
8
+ !! Available CUDA devices:
9
+ " !! - cuda:0: NVIDIA GeForce RTX 4090
10
+ " !! - cuda:1: NVIDIA RTX A6000
11
+ !! Loading safetensors file: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_qwop_cuda_slow.safetensors
12
+ !! Begin load tensors
13
+ !! - lm_head.weight read: device: cpu, shape: [32000, 5120], dtype: float16
14
+ !! - lm_head.weight map: device: cuda:0, shape: [32000, 5120], dtype: float16, min: -0.316406, max: 0.361328, std: 0.020935
15
+ !! - model.embed_tokens.weight read: device: cpu, shape: [32000, 5120], dtype: float16
16
+ !! - model.embed_tokens.weight map: device: cpu, shape: [32000, 5120], dtype: float16
17
+ !! - model.layers.0.input_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
18
+ !! - model.layers.0.input_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593
19
+ !! - model.layers.0.mlp.down_proj.g_idx read: device: cpu, shape: [13824], dtype: int32
20
+ !! - model.layers.0.mlp.down_proj.g_idx map: device: cuda:0, shape: [13824], dtype: int32, min: 0, max: 107
21
+ !! - model.layers.0.mlp.down_proj.qweight read: device: cpu, shape: [1728, 5120], dtype: int32
22
+ !! - model.layers.0.mlp.down_proj.qweight map: device: cuda:0, shape: [1728, 5120], dtype: int32, min: -2147416079, max: 2147375608
23
+ !! - model.layers.0.mlp.down_proj.qzeros read: device: cpu, shape: [108, 640], dtype: int32
24
+ !! - model.layers.0.mlp.down_proj.qzeros map: device: cuda:0, shape: [108, 640], dtype: int32, min: -2106165417, max: 2089191031
25
+ !! - model.layers.0.mlp.down_proj.scales read: device: cpu, shape: [108, 5120], dtype: float16
26
+ !! - model.layers.0.mlp.down_proj.scales map: device: cuda:0, shape: [108, 5120], dtype: float16, min: 0.003326, max: 0.099487, std: 0.001260
27
+ !! - model.layers.0.mlp.gate_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
28
+ !! - model.layers.0.mlp.gate_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
29
+ !! - model.layers.0.mlp.gate_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
30
+ !! - model.layers.0.mlp.gate_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147459474, max: 2147466163
31
+ !! - model.layers.0.mlp.gate_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
32
+ !! - model.layers.0.mlp.gate_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2125109368, max: 2089248375
33
+ !! - model.layers.0.mlp.gate_proj.scales read: device: cpu, shape: [40, 13824], dtype: float16
34
+ !! - model.layers.0.mlp.gate_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002777, max: 0.060303, std: 0.000990
35
+ !! - model.layers.0.mlp.up_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
36
+ !! - model.layers.0.mlp.up_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
37
+ !! - model.layers.0.mlp.up_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
38
+ !! - model.layers.0.mlp.up_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147474830, max: 2147437148
39
+ !! - model.layers.0.mlp.up_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
40
+ !! - model.layers.0.mlp.up_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2107213722, max: 2089121671
41
+ !! - model.layers.0.mlp.up_proj.scales read: device: cpu, shape: [40, 13824], dtype: float16
42
+ !! - model.layers.0.mlp.up_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002075, max: 0.040131, std: 0.000730
43
+ !! - model.layers.0.post_attention_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
44
+ !! - model.layers.0.post_attention_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113
45
+ !! - model.layers.0.self_attn.k_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
46
+ !! - model.layers.0.self_attn.k_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
47
+ !! - model.layers.0.self_attn.k_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
48
+ !! - model.layers.0.self_attn.k_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147305928, max: 2147337675
49
+ !! - model.layers.0.self_attn.k_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
50
+ !! - model.layers.0.self_attn.k_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128119278, max: 2092336937
51
+ !! - model.layers.0.self_attn.k_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
52
+ !! - model.layers.0.self_attn.k_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001449, max: 0.082703, std: 0.005592
53
+ !! - model.layers.0.self_attn.o_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
54
+ !! - model.layers.0.self_attn.o_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
55
+ !! - model.layers.0.self_attn.o_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
56
+ !! - model.layers.0.self_attn.o_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147453144, max: 2147375548
57
+ !! - model.layers.0.self_attn.o_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
58
+ !! - model.layers.0.self_attn.o_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2107209387, max: 2071422582
59
+ !! - model.layers.0.self_attn.o_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
60
+ !! - model.layers.0.self_attn.o_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001521, max: 0.089478, std: 0.001425
61
+ !! - model.layers.0.self_attn.q_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
62
+ !! - model.layers.0.self_attn.q_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
63
+ !! - model.layers.0.self_attn.q_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
64
+ !! - model.layers.0.self_attn.q_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147399309, max: 2147314245
65
+ !! - model.layers.0.self_attn.q_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
66
+ !! - model.layers.0.self_attn.q_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128450726, max: 2092123285
67
+ !! - model.layers.0.self_attn.q_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
68
+ !! - model.layers.0.self_attn.q_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001049, max: 0.095764, std: 0.005581
69
+ !! - model.layers.0.self_attn.v_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
70
+ !! - model.layers.0.self_attn.v_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
71
+ !! - model.layers.0.self_attn.v_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
72
+ !! - model.layers.0.self_attn.v_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147441095, max: 2147387755
73
+ !! - model.layers.0.self_attn.v_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
74
+ !! - model.layers.0.self_attn.v_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2091420041, max: 2071422327
75
+ !! - model.layers.0.self_attn.v_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
76
+ !! - model.layers.0.self_attn.v_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001673, max: 0.015762, std: 0.001489
77
+ !! - model.norm.weight read: device: cpu, shape: [5120], dtype: float16
78
+ !! - model.norm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: 0.018066, max: 2.093750, std: 0.073120
79
+ !! Computing RoPE table for seq length: 2048
80
+ !! - stored for device: cuda:0
81
+ ** Time, Load model: 3.72 seconds
82
+ -- Groupsize (inferred): 128
83
+ -- Act-order (inferred): yes
84
+ ** VRAM, Model: [cuda:0] 6,689.96 MB - [cuda:1] 0.00 MB
85
+ !! Inference, debug pass
86
+ !! Begin forward pass
87
+ !! Moving input_ids from cuda:0 to cpu
88
+ !! Built initial hidden state: device: cpu, shape: [1, 1920, 5120], dtype: float16, min: -0.117676, max: 0.114746, std: 0.018738
89
+ !! Prepared buffer for device: cuda:0
90
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
91
+ !! Moving hidden_states from cpu to cuda:0
92
+ !! Begin decoder 0
93
+ !! Begin self-attention
94
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -0.117676, max: 0.114746, std: 0.018738
95
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
96
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593 eps: 0.00000100
97
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001049/0.095764/0.005581
98
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001449/0.082703/0.005592
99
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001673/0.015762/0.001489
100
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001521/0.089478/0.001425
101
+ !! - cache device: cuda:0, seq_len: 0
102
+ !! Begin MLP
103
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1.013672, max: 1.294922, std: 0.035309
104
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113 eps: 0.00000100
105
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002777/0.060303/0.000990
106
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002075/0.040131/0.000730
107
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003326/0.099487/0.001260
108
+ !! - method: normal
109
+ !! Begin decoder 1
110
+ !! Begin self-attention
111
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.375000, max: 35.843750, std: 0.119446
112
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
113
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012146, max: 0.326172, std: 0.022308 eps: 0.00000100
114
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001299/0.042847/0.005116
115
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001262/0.056030/0.005295
116
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001407/0.011436/0.001119
117
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001063/0.086609/0.001472
118
+ !! - cache device: cuda:0, seq_len: 0
119
+ !! Begin MLP
120
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.039062, max: 33.656250, std: 0.116211
121
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003036, max: 0.166016, std: 0.010605 eps: 0.00000100
122
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003960/0.075562/0.001144
123
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003387/0.035187/0.000851
124
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002762/0.120483/0.001154
125
+ !! - method: normal
126
+ !! Begin decoder 2
127
+ !! Begin self-attention
128
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -11.812500, max: 30.734375, std: 0.155029
129
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
130
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.057617, max: 0.369141, std: 0.015396 eps: 0.00000100
131
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002361/0.074585/0.003971
132
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001963/0.050629/0.004532
133
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002445/0.020309/0.000759
134
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002083/0.110596/0.001124
135
+ !! - cache device: cuda:0, seq_len: 0
136
+ !! Begin MLP
137
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -11.648438, max: 26.859375, std: 0.158203
138
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014099, max: 0.161133, std: 0.011726 eps: 0.00000100
139
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002787/0.087097/0.001152
140
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003202/0.043213/0.000878
141
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002434/0.133301/0.001044
142
+ !! - method: normal
143
+ !! Begin decoder 3
144
+ !! Begin self-attention
145
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -890.500000, max: 24.171875, std: 0.338135
146
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
147
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.445312, std: 0.016769 eps: 0.00000100
148
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002218/0.064087/0.003193
149
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001682/0.047546/0.003334
150
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002258/0.013161/0.000889
151
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001929/0.086182/0.001017
152
+ !! - cache device: cuda:0, seq_len: 0
153
+ !! Begin MLP
154
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -890.500000, max: 25.640625, std: 0.342529
155
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.020508, max: 0.185547, std: 0.012711 eps: 0.00000100
156
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002598/0.055603/0.001158
157
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002819/0.043365/0.000893
158
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002668/0.083008/0.000952
159
+ !! - method: normal
160
+ !! Begin decoder 4
161
+ !! Begin self-attention
162
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -892.500000, max: 24.625000, std: 0.366211
163
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
164
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.036621, max: 0.458984, std: 0.017136 eps: 0.00000100
165
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002357/0.124084/0.003180
166
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001328/0.042419/0.003229
167
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002598/0.018280/0.000826
168
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001725/0.085449/0.000918
169
+ !! - cache device: cuda:0, seq_len: 0
170
+ !! Begin MLP
171
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -892.500000, max: 28.000000, std: 0.385742
172
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025391, max: 0.200195, std: 0.012398 eps: 0.00000100
173
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003830/0.047241/0.001214
174
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003572/0.041473/0.000900
175
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002481/0.095337/0.000922
176
+ !! - method: normal
177
+ !! Begin decoder 5
178
+ !! Begin self-attention
179
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -893.000000, max: 25.609375, std: 0.400879
180
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
181
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.492188, std: 0.019684 eps: 0.00000100
182
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001987/0.102661/0.003073
183
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001550/0.035492/0.003050
184
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002256/0.016541/0.000906
185
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002275/0.106079/0.001011
186
+ !! - cache device: cuda:0, seq_len: 0
187
+ !! Begin MLP
188
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -893.000000, max: 29.265625, std: 0.418213
189
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042236, max: 0.211914, std: 0.011848 eps: 0.00000100
190
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002773/0.047150/0.001265
191
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001515/0.041870/0.000920
192
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002594/0.062195/0.000935
193
+ !! - method: normal
194
+ !! Begin decoder 6
195
+ !! Begin self-attention
196
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -894.500000, max: 26.140625, std: 0.445312
197
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
198
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.067871, max: 0.558594, std: 0.019913 eps: 0.00000100
199
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002136/0.046173/0.003099
200
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001863/0.033478/0.003153
201
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002909/0.020889/0.000928
202
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001761/0.096313/0.001001
203
+ !! - cache device: cuda:0, seq_len: 0
204
+ !! Begin MLP
205
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -895.000000, max: 25.453125, std: 0.462891
206
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038574, max: 0.244141, std: 0.012810 eps: 0.00000100
207
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003599/0.058990/0.001412
208
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003576/0.044037/0.000947
209
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002380/0.090454/0.001029
210
+ !! - method: normal
211
+ !! Begin decoder 7
212
+ !! Begin self-attention
213
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.125000, std: 0.513672
214
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
215
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.010315, max: 0.609375, std: 0.018875 eps: 0.00000100
216
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002357/0.038116/0.002750
217
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002035/0.030289/0.002897
218
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002699/0.013130/0.000939
219
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001756/0.065430/0.000955
220
+ !! - cache device: cuda:0, seq_len: 0
221
+ !! Begin MLP
222
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.312500, std: 0.554688
223
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043701, max: 0.222656, std: 0.011360 eps: 0.00000100
224
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003187/0.053528/0.001369
225
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003983/0.029083/0.000935
226
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002668/0.070984/0.000947
227
+ !! - method: normal
228
+ !! Begin decoder 8
229
+ !! Begin self-attention
230
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.375000, std: 0.583008
231
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
232
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007812, max: 0.617188, std: 0.021469 eps: 0.00000100
233
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002020/0.036896/0.003115
234
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001634/0.027725/0.003042
235
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003176/0.019165/0.000947
236
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001910/0.084106/0.000935
237
+ !! - cache device: cuda:0, seq_len: 0
238
+ !! Begin MLP
239
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.500000, std: 0.605469
240
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.228516, std: 0.012070 eps: 0.00000100
241
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003246/0.053589/0.001263
242
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001094/0.036316/0.000944
243
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002659/0.075378/0.000929
244
+ !! - method: normal
245
+ !! Begin decoder 9
246
+ !! Begin self-attention
247
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.687500, std: 0.612305
248
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
249
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003876, max: 0.664062, std: 0.020859 eps: 0.00000100
250
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002146/0.038910/0.002712
251
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001664/0.032074/0.002876
252
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003122/0.015617/0.000871
253
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001311/0.095337/0.000900
254
+ !! - cache device: cuda:0, seq_len: 0
255
+ !! Begin MLP
256
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.750000, std: 0.625000
257
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.049805, max: 0.238281, std: 0.011787 eps: 0.00000100
258
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003241/0.061310/0.001322
259
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003132/0.040771/0.000956
260
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002480/0.081299/0.000928
261
+ !! - method: normal
262
+ !! Begin decoder 10
263
+ !! Begin self-attention
264
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.812500, std: 0.635742
265
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
266
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002594, max: 0.703125, std: 0.021515 eps: 0.00000100
267
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002222/0.033997/0.002638
268
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001856/0.029907/0.002831
269
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003365/0.014862/0.000932
270
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001518/0.084351/0.000958
271
+ !! - cache device: cuda:0, seq_len: 0
272
+ !! Begin MLP
273
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.875000, std: 0.654785
274
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.053955, max: 0.245117, std: 0.011978 eps: 0.00000100
275
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003246/0.042297/0.001295
276
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003368/0.040710/0.000970
277
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002800/0.089050/0.000934
278
+ !! - method: normal
279
+ !! Begin decoder 11
280
+ !! Begin self-attention
281
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.937500, std: 0.669922
282
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
283
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007355, max: 0.687500, std: 0.021606 eps: 0.00000100
284
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002106/0.034271/0.002579
285
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002033/0.028885/0.002792
286
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003374/0.014481/0.000937
287
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001925/0.075500/0.000946
288
+ !! - cache device: cuda:0, seq_len: 0
289
+ !! Begin MLP
290
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.000000, std: 0.694336
291
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.054443, max: 0.251953, std: 0.011749 eps: 0.00000100
292
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003128/0.051086/0.001299
293
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001537/0.041565/0.000993
294
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003239/0.079163/0.000940
295
+ !! - method: normal
296
+ !! Begin decoder 12
297
+ !! Begin self-attention
298
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.062500, std: 0.726074
299
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
300
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014771, max: 0.664062, std: 0.020920 eps: 0.00000100
301
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002449/0.034271/0.002655
302
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002136/0.032806/0.002867
303
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003397/0.019394/0.000961
304
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001609/0.057343/0.000999
305
+ !! - cache device: cuda:0, seq_len: 0
306
+ !! Begin MLP
307
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.250000, std: 0.751953
308
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.056396, max: 0.249023, std: 0.012207 eps: 0.00000100
309
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003019/0.043274/0.001330
310
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002712/0.043762/0.001000
311
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003359/0.118286/0.000953
312
+ !! - method: normal
313
+ !! Begin decoder 13
314
+ !! Begin self-attention
315
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.312500, std: 0.787598
316
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
317
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.687500, std: 0.021698 eps: 0.00000100
318
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002420/0.034241/0.002577
319
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002388/0.034241/0.002741
320
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003078/0.015854/0.000962
321
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002022/0.078918/0.000970
322
+ !! - cache device: cuda:0, seq_len: 0
323
+ !! Begin MLP
324
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.500000, std: 0.809570
325
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051025, max: 0.265625, std: 0.012978 eps: 0.00000100
326
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003170/0.036652/0.001327
327
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004108/0.028717/0.000996
328
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002531/0.052429/0.000926
329
+ !! - method: normal
330
+ !! Begin decoder 14
331
+ !! Begin self-attention
332
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.562500, std: 0.849121
333
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
334
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025879, max: 0.691406, std: 0.021164 eps: 0.00000100
335
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002115/0.035156/0.002348
336
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001471/0.031067/0.002569
337
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003618/0.020035/0.000957
338
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001540/0.086060/0.000992
339
+ !! - cache device: cuda:0, seq_len: 0
340
+ !! Begin MLP
341
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.687500, std: 0.866699
342
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.055420, max: 0.273438, std: 0.013245 eps: 0.00000100
343
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003336/0.032928/0.001335
344
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003906/0.045197/0.000993
345
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002605/0.088013/0.000936
346
+ !! - method: normal
347
+ !! Begin decoder 15
348
+ !! Begin self-attention
349
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.687500, std: 0.916016
350
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
351
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031494, max: 0.679688, std: 0.020615 eps: 0.00000100
352
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002296/0.038727/0.002529
353
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002375/0.030533/0.002689
354
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003328/0.015869/0.000980
355
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001546/0.124634/0.001021
356
+ !! - cache device: cuda:0, seq_len: 0
357
+ !! Begin MLP
358
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.750000, std: 0.945801
359
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040039, max: 0.291016, std: 0.014809 eps: 0.00000100
360
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003687/0.051025/0.001274
361
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004307/0.041656/0.000965
362
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002167/0.078613/0.000919
363
+ !! - method: normal
364
+ !! Begin decoder 16
365
+ !! Begin self-attention
366
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.750000, std: 0.993164
367
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
368
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012573, max: 0.652344, std: 0.020477 eps: 0.00000100
369
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002371/0.034912/0.002207
370
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001926/0.029617/0.002392
371
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003460/0.018524/0.000947
372
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001738/0.051270/0.000971
373
+ !! - cache device: cuda:0, seq_len: 0
374
+ !! Begin MLP
375
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.812500, std: 1.004883
376
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045898, max: 0.298828, std: 0.015106 eps: 0.00000100
377
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003387/0.036011/0.001249
378
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003696/0.035187/0.000964
379
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002268/0.065063/0.000917
380
+ !! - method: normal
381
+ !! Begin decoder 17
382
+ !! Begin self-attention
383
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.812500, std: 1.059570
384
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
385
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025146, max: 0.722656, std: 0.021576 eps: 0.00000100
386
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002331/0.036224/0.002277
387
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001755/0.030884/0.002550
388
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003754/0.020874/0.000970
389
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001672/0.116455/0.001009
390
+ !! - cache device: cuda:0, seq_len: 0
391
+ !! Begin MLP
392
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.098633
393
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042969, max: 0.310547, std: 0.015625 eps: 0.00000100
394
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003586/0.035492/0.001222
395
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004265/0.044525/0.000955
396
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002222/0.067993/0.000917
397
+ !! - method: normal
398
+ !! Begin decoder 18
399
+ !! Begin self-attention
400
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.152344
401
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
402
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.738281, std: 0.022064 eps: 0.00000100
403
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002323/0.033447/0.002235
404
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001904/0.030121/0.002382
405
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004002/0.014252/0.000932
406
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001740/0.083801/0.000958
407
+ !! - cache device: cuda:0, seq_len: 0
408
+ !! Begin MLP
409
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.186523
410
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048584, max: 0.318359, std: 0.015625 eps: 0.00000100
411
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003035/0.034271/0.001252
412
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003998/0.045654/0.000957
413
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002491/0.084534/0.000911
414
+ !! - method: normal
415
+ !! Begin decoder 19
416
+ !! Begin self-attention
417
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.258789
418
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
419
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.024170, max: 0.753906, std: 0.022308 eps: 0.00000100
420
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002134/0.031494/0.002193
421
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001934/0.030380/0.002371
422
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003841/0.015404/0.000981
423
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001974/0.084167/0.001057
424
+ !! - cache device: cuda:0, seq_len: 0
425
+ !! Begin MLP
426
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.287109
427
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033936, max: 0.347656, std: 0.016785 eps: 0.00000100
428
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003767/0.040405/0.001213
429
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004185/0.043823/0.000943
430
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002474/0.062683/0.000900
431
+ !! - method: normal
432
+ !! Begin decoder 20
433
+ !! Begin self-attention
434
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.358398
435
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
436
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.757812, std: 0.022324 eps: 0.00000100
437
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002235/0.035187/0.002100
438
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002291/0.032471/0.002190
439
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003658/0.014191/0.001044
440
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001817/0.078064/0.001065
441
+ !! - cache device: cuda:0, seq_len: 0
442
+ !! Begin MLP
443
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.393555
444
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048096, max: 0.345703, std: 0.016815 eps: 0.00000100
445
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003643/0.044281/0.001211
446
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004276/0.048615/0.000933
447
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002605/0.067444/0.000911
448
+ !! - method: normal
449
+ !! Begin decoder 21
450
+ !! Begin self-attention
451
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.483398
452
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
453
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037598, max: 0.796875, std: 0.023514 eps: 0.00000100
454
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002506/0.043945/0.002247
455
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002028/0.031616/0.002365
456
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004189/0.014427/0.001028
457
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001978/0.039856/0.001017
458
+ !! - cache device: cuda:0, seq_len: 0
459
+ !! Begin MLP
460
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.525391
461
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.347656, std: 0.017212 eps: 0.00000100
462
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003614/0.052155/0.001178
463
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.032867/0.000925
464
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002708/0.063232/0.000911
465
+ !! - method: normal
466
+ !! Begin decoder 22
467
+ !! Begin self-attention
468
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.616211
469
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
470
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.753906, std: 0.022934 eps: 0.00000100
471
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002468/0.036316/0.002068
472
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002302/0.030502/0.002201
473
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003658/0.014572/0.000998
474
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002260/0.096069/0.001020
475
+ !! - cache device: cuda:0, seq_len: 0
476
+ !! Begin MLP
477
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.678711
478
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045654, max: 0.361328, std: 0.018143 eps: 0.00000100
479
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003679/0.035217/0.001136
480
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004360/0.036133/0.000911
481
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002403/0.078796/0.000916
482
+ !! - method: normal
483
+ !! Begin decoder 23
484
+ !! Begin self-attention
485
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.875000, std: 1.774414
486
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
487
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033691, max: 0.792969, std: 0.024429 eps: 0.00000100
488
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002359/0.034546/0.002054
489
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002043/0.033936/0.002104
490
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004379/0.013702/0.000979
491
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001885/0.075256/0.000995
492
+ !! - cache device: cuda:0, seq_len: 0
493
+ !! Begin MLP
494
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.812500, std: 1.833008
495
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.367188, std: 0.019226 eps: 0.00000100
496
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003729/0.050964/0.001107
497
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.036224/0.000899
498
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002159/0.082642/0.000899
499
+ !! - method: normal
500
+ !! Begin decoder 24
501
+ !! Begin self-attention
502
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 80.937500, std: 1.931641
503
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
504
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051758, max: 0.812500, std: 0.025452 eps: 0.00000100
505
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002163/0.037628/0.002060
506
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002029/0.031433/0.002123
507
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003849/0.016617/0.000987
508
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001784/0.109741/0.001011
509
+ !! - cache device: cuda:0, seq_len: 0
510
+ !! Begin MLP
511
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 82.437500, std: 1.982422
512
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029419, max: 0.382812, std: 0.020203 eps: 0.00000100
513
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003664/0.039459/0.001067
514
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004559/0.033142/0.000891
515
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002037/0.088379/0.000898
516
+ !! - method: normal
517
+ !! Begin decoder 25
518
+ !! Begin self-attention
519
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 85.312500, std: 2.062500
520
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
521
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043213, max: 0.816406, std: 0.024796 eps: 0.00000100
522
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001895/0.034515/0.002041
523
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001381/0.040314/0.002146
524
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003727/0.015511/0.001091
525
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002243/0.103149/0.001124
526
+ !! - cache device: cuda:0, seq_len: 0
527
+ !! Begin MLP
528
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 93.312500, std: 2.140625
529
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029663, max: 0.404297, std: 0.020950 eps: 0.00000100
530
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003717/0.032501/0.001052
531
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004433/0.026627/0.000883
532
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002089/0.068298/0.000892
533
+ !! - method: normal
534
+ !! Begin decoder 26
535
+ !! Begin self-attention
536
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1026.000000, max: 98.375000, std: 2.222656
537
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
538
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.875000, std: 0.026947 eps: 0.00000100
539
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002312/0.030716/0.001928
540
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002153/0.033234/0.002005
541
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004166/0.014450/0.000995
542
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002365/0.091187/0.001030
543
+ !! - cache device: cuda:0, seq_len: 0
544
+ !! Begin MLP
545
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1026.000000, max: 103.250000, std: 2.253906
546
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030518, max: 0.400391, std: 0.021332 eps: 0.00000100
547
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004192/0.032410/0.001042
548
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004314/0.036591/0.000883
549
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002001/0.074585/0.000899
550
+ !! - method: normal
551
+ !! Begin decoder 27
552
+ !! Begin self-attention
553
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1025.000000, max: 106.812500, std: 2.332031
554
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
555
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.906250, std: 0.027390 eps: 0.00000100
556
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002163/0.037323/0.002039
557
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002100/0.032104/0.002142
558
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004280/0.019775/0.000985
559
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002172/0.070496/0.001004
560
+ !! - cache device: cuda:0, seq_len: 0
561
+ !! Begin MLP
562
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1025.000000, max: 113.375000, std: 2.388672
563
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.034180, max: 0.406250, std: 0.021439 eps: 0.00000100
564
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004284/0.040131/0.001047
565
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004375/0.046295/0.000883
566
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002033/0.049622/0.000891
567
+ !! - method: normal
568
+ !! Begin decoder 28
569
+ !! Begin self-attention
570
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1024.000000, max: 116.187500, std: 2.458984
571
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
572
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.937500, std: 0.027420 eps: 0.00000100
573
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002270/0.045990/0.002008
574
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002068/0.035706/0.002039
575
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003502/0.013725/0.001108
576
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002235/0.154175/0.001218
577
+ !! - cache device: cuda:0, seq_len: 0
578
+ !! Begin MLP
579
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1024.000000, max: 128.750000, std: 2.568359
580
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.022705, max: 0.423828, std: 0.022003 eps: 0.00000100
581
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004471/0.042694/0.001054
582
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004562/0.022446/0.000878
583
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001733/0.056427/0.000884
584
+ !! - method: normal
585
+ !! Begin decoder 29
586
+ !! Begin self-attention
587
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 131.500000, std: 2.623047
588
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
589
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003403, max: 0.957031, std: 0.027893 eps: 0.00000100
590
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002245/0.032928/0.001910
591
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002039/0.030350/0.001957
592
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004120/0.014153/0.001067
593
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002306/0.074097/0.001082
594
+ !! - cache device: cuda:0, seq_len: 0
595
+ !! Begin MLP
596
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 135.375000, std: 2.656250
597
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.028442, max: 0.691406, std: 0.022568 eps: 0.00000100
598
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004440/0.035675/0.001063
599
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004391/0.031128/0.000879
600
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001850/0.075684/0.000896
601
+ !! - method: normal
602
+ !! Begin decoder 30
603
+ !! Begin self-attention
604
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1022.000000, max: 138.750000, std: 2.707031
605
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
606
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.039062, max: 0.953125, std: 0.028458 eps: 0.00000100
607
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002247/0.030197/0.001984
608
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002272/0.032532/0.002090
609
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002539/0.015915/0.001025
610
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002310/0.092224/0.001046
611
+ !! - cache device: cuda:0, seq_len: 0
612
+ !! Begin MLP
613
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1022.000000, max: 145.125000, std: 2.757812
614
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.013855, max: 0.443359, std: 0.021713 eps: 0.00000100
615
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004665/0.045197/0.001092
616
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004078/0.036926/0.000885
617
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001813/0.072693/0.000899
618
+ !! - method: normal
619
+ !! Begin decoder 31
620
+ !! Begin self-attention
621
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1020.500000, max: 151.500000, std: 2.837891
622
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
623
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.005127, max: 0.949219, std: 0.028824 eps: 0.00000100
624
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002350/0.031052/0.001871
625
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002193/0.030899/0.001905
626
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004337/0.015503/0.001026
627
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002642/0.092957/0.001069
628
+ !! - cache device: cuda:0, seq_len: 0
629
+ !! Begin MLP
630
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1020.500000, max: 163.125000, std: 2.910156
631
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.015198, max: 0.449219, std: 0.022018 eps: 0.00000100
632
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004639/0.031525/0.001118
633
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004658/0.035858/0.000885
634
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001880/0.045258/0.000892
635
+ !! - method: normal
636
+ !! Begin decoder 32
637
+ !! Begin self-attention
638
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1019.000000, max: 165.250000, std: 2.960938
639
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
640
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002731, max: 0.898438, std: 0.028946 eps: 0.00000100
641
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002439/0.031342/0.001923
642
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001745/0.039093/0.001959
643
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003937/0.014107/0.001027
644
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002518/0.113953/0.001073
645
+ !! - cache device: cuda:0, seq_len: 0
646
+ !! Begin MLP
647
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1019.000000, max: 170.125000, std: 3.003906
648
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003967, max: 0.746094, std: 0.022736 eps: 0.00000100
649
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004223/0.046234/0.001122
650
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004738/0.031342/0.000886
651
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001719/0.055420/0.000911
652
+ !! - method: normal
653
+ !! Begin decoder 33
654
+ !! Begin self-attention
655
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1016.500000, max: 172.750000, std: 3.056641
656
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
657
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.910156, std: 0.029999 eps: 0.00000100
658
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002224/0.034576/0.001955
659
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002178/0.034698/0.001965
660
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003191/0.017090/0.001073
661
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002516/0.098511/0.001093
662
+ !! - cache device: cuda:0, seq_len: 0
663
+ !! Begin MLP
664
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1015.500000, max: 177.375000, std: 3.095703
665
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.021729, max: 0.457031, std: 0.021973 eps: 0.00000100
666
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004440/0.058960/0.001143
667
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004120/0.027802/0.000899
668
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001822/0.089966/0.000950
669
+ !! - method: normal
670
+ !! Begin decoder 34
671
+ !! Begin self-attention
672
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1012.000000, max: 178.875000, std: 3.134766
673
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
674
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038086, max: 0.953125, std: 0.030441 eps: 0.00000100
675
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002279/0.033783/0.001966
676
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002062/0.031311/0.002022
677
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003651/0.016846/0.001222
678
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002913/0.079651/0.001315
679
+ !! - cache device: cuda:0, seq_len: 0
680
+ !! Begin MLP
681
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1011.000000, max: 181.750000, std: 3.199219
682
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.460938, std: 0.021744 eps: 0.00000100
683
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004433/0.036102/0.001138
684
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004498/0.028717/0.000901
685
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001920/0.123169/0.001141
686
+ !! - method: normal
687
+ !! Begin decoder 35
688
+ !! Begin self-attention
689
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -982.000000, max: 186.500000, std: 3.277344
690
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
691
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040283, max: 0.917969, std: 0.029037 eps: 0.00000100
692
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002428/0.032837/0.001951
693
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002157/0.030807/0.002024
694
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003971/0.013626/0.001038
695
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003014/0.090149/0.001112
696
+ !! - cache device: cuda:0, seq_len: 0
697
+ !! Begin MLP
698
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -981.000000, max: 191.500000, std: 3.328125
699
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030151, max: 0.468750, std: 0.021896 eps: 0.00000100
700
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002829/0.039459/0.001129
701
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004147/0.044250/0.000917
702
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002134/0.148560/0.001385
703
+ !! - method: normal
704
+ !! Begin decoder 36
705
+ !! Begin self-attention
706
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -919.500000, max: 191.500000, std: 3.392578
707
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
708
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.004456, max: 0.941406, std: 0.031082 eps: 0.00000100
709
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001844/0.032776/0.001974
710
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001781/0.031769/0.002085
711
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004047/0.016876/0.001062
712
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002771/0.059174/0.001117
713
+ !! - cache device: cuda:0, seq_len: 0
714
+ !! Begin MLP
715
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -919.000000, max: 193.875000, std: 3.429688
716
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.050537, max: 0.839844, std: 0.022324 eps: 0.00000100
717
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004131/0.048218/0.001153
718
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004238/0.036469/0.000927
719
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002237/0.148193/0.001454
720
+ !! - method: normal
721
+ !! Begin decoder 37
722
+ !! Begin self-attention
723
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -861.000000, max: 191.125000, std: 3.441406
724
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
725
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002762, max: 1.054688, std: 0.032867 eps: 0.00000100
726
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002245/0.036652/0.001965
727
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001849/0.033752/0.002066
728
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003832/0.017563/0.001212
729
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003330/0.115906/0.001400
730
+ !! - cache device: cuda:0, seq_len: 0
731
+ !! Begin MLP
732
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -857.500000, max: 195.500000, std: 3.544922
733
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.066406, max: 0.593750, std: 0.021439 eps: 0.00000100
734
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003469/0.083496/0.001222
735
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003468/0.034821/0.000952
736
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002447/0.204346/0.002012
737
+ !! - method: normal
738
+ !! Begin decoder 38
739
+ !! Begin self-attention
740
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -580.000000, max: 195.125000, std: 3.591797
741
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
742
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.097656, max: 1.039062, std: 0.031891 eps: 0.00000100
743
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002375/0.045197/0.001980
744
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002171/0.030624/0.001997
745
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003450/0.017731/0.001331
746
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003344/0.227539/0.001991
747
+ !! - cache device: cuda:0, seq_len: 0
748
+ !! Begin MLP
749
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -108.500000, max: 203.750000, std: 3.845703
750
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.087891, max: 0.498047, std: 0.020370 eps: 0.00000100
751
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.031525/0.001246
752
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002453/0.059601/0.001083
753
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003397/0.199585/0.001426
754
+ !! - method: normal
755
+ !! Begin decoder 39
756
+ !! Begin self-attention
757
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -168.000000, max: 226.125000, std: 4.089844
758
+ !! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
759
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002625, max: 0.957031, std: 0.032471 eps: 0.00000100
760
+ !! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002300/0.047607/0.002197
761
+ !! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002066/0.033020/0.002274
762
+ !! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002975/0.016586/0.001257
763
+ !! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003019/0.146851/0.001698
764
+ !! - cache device: cuda:0, seq_len: 0
765
+ !! Begin MLP
766
+ !! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -144.375000, max: 229.500000, std: 4.367188
767
+ !! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.109863, max: 0.648438, std: 0.025543 eps: 0.00000100
768
+ !! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002789/0.032501/0.001303
769
+ !! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002787/0.085999/0.001245
770
+ !! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004478/0.175049/0.001831
771
+ !! - method: normal
772
+ !! pre norm, hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -198.250000, max: 719.000000, std: 6.828125
773
+ !! pre lm_head, hidden_states: device: cuda:0, shape: [1, 1, 5120], dtype: float16, min: -13.359375, max: 17.625000, std: 1.145508
774
+ !! logits: device: cuda:0, shape: [1, 1, 32000], dtype: float16, min: -11.101562, max: 10.367188, std: 2.171875
775
+ !! Moving logits from cuda:0 to cpu
776
+ ** Time, Inference: 0.93 seconds