disarmyouwitha
commited on
Commit
·
86064b3
1
Parent(s):
2a92bce
Upload 2 files
Browse files- koala-13B-4bit_ooba_cuda_fast.txt +762 -0
- koala-13B-4bit_qwop_cuda_slow.txt +776 -0
koala-13B-4bit_ooba_cuda_fast.txt
ADDED
@@ -0,0 +1,762 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python test_benchmark_inference.py -dbg -d ~/llm_models/koala-13B-GPTQ
|
2 |
+
-- Loading model
|
3 |
+
-- Tokenizer: /home/nap/llm_models/koala-13B-GPTQ/tokenizer.model
|
4 |
+
-- Model config: /home/nap/llm_models/koala-13B-GPTQ/config.json
|
5 |
+
-- Model: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_ooba_cuda_fast.safetensors
|
6 |
+
-- Sequence length: 2048
|
7 |
+
-- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'debug']
|
8 |
+
!! Available CUDA devices:
|
9 |
+
" !! - cuda:0: NVIDIA GeForce RTX 4090
|
10 |
+
" !! - cuda:1: NVIDIA RTX A6000
|
11 |
+
!! Loading safetensors file: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_ooba_cuda_fast.safetensors
|
12 |
+
!! Begin load tensors
|
13 |
+
!! - lm_head.weight read: device: cpu, shape: [32000, 5120], dtype: float16
|
14 |
+
!! - lm_head.weight map: device: cuda:0, shape: [32000, 5120], dtype: float16, min: -0.316406, max: 0.361328, std: 0.020935
|
15 |
+
!! - model.embed_tokens.weight read: device: cpu, shape: [32000, 5120], dtype: float16
|
16 |
+
!! - model.embed_tokens.weight map: device: cpu, shape: [32000, 5120], dtype: float16
|
17 |
+
!! - model.layers.0.input_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
|
18 |
+
!! - model.layers.0.input_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593
|
19 |
+
!! - model.layers.0.mlp.down_proj.qweight read: device: cpu, shape: [1728, 5120], dtype: int32
|
20 |
+
!! - model.layers.0.mlp.down_proj.qweight map: device: cuda:0, shape: [1728, 5120], dtype: int32, min: -2147416079, max: 2147375608
|
21 |
+
!! - model.layers.0.mlp.down_proj.qzeros read: device: cpu, shape: [108, 640], dtype: int32
|
22 |
+
!! - model.layers.0.mlp.down_proj.qzeros map: device: cuda:0, shape: [108, 640], dtype: int32, min: -2106165417, max: 2089191031
|
23 |
+
!! - model.layers.0.mlp.down_proj.scales read: device: cpu, shape: [108, 5120], dtype: float32
|
24 |
+
!! - model.layers.0.mlp.down_proj.scales map: device: cuda:0, shape: [108, 5120], dtype: float16, min: 0.003326, max: 0.099487, std: 0.001260
|
25 |
+
!! - model.layers.0.mlp.gate_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
|
26 |
+
!! - model.layers.0.mlp.gate_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147459474, max: 2147466163
|
27 |
+
!! - model.layers.0.mlp.gate_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
|
28 |
+
!! - model.layers.0.mlp.gate_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2125109368, max: 2089248375
|
29 |
+
!! - model.layers.0.mlp.gate_proj.scales read: device: cpu, shape: [40, 13824], dtype: float32
|
30 |
+
!! - model.layers.0.mlp.gate_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002777, max: 0.060303, std: 0.000990
|
31 |
+
!! - model.layers.0.mlp.up_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
|
32 |
+
!! - model.layers.0.mlp.up_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147474830, max: 2147437148
|
33 |
+
!! - model.layers.0.mlp.up_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
|
34 |
+
!! - model.layers.0.mlp.up_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2107213722, max: 2089121671
|
35 |
+
!! - model.layers.0.mlp.up_proj.scales read: device: cpu, shape: [40, 13824], dtype: float32
|
36 |
+
!! - model.layers.0.mlp.up_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002075, max: 0.040131, std: 0.000730
|
37 |
+
!! - model.layers.0.post_attention_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
|
38 |
+
!! - model.layers.0.post_attention_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113
|
39 |
+
!! - model.layers.0.self_attn.k_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
40 |
+
!! - model.layers.0.self_attn.k_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147305928, max: 2147337675
|
41 |
+
!! - model.layers.0.self_attn.k_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
42 |
+
!! - model.layers.0.self_attn.k_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128119278, max: 2092336937
|
43 |
+
!! - model.layers.0.self_attn.k_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
|
44 |
+
!! - model.layers.0.self_attn.k_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001449, max: 0.082703, std: 0.005592
|
45 |
+
!! - model.layers.0.self_attn.o_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
46 |
+
!! - model.layers.0.self_attn.o_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147453144, max: 2147375548
|
47 |
+
!! - model.layers.0.self_attn.o_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
48 |
+
!! - model.layers.0.self_attn.o_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2107209387, max: 2071422582
|
49 |
+
!! - model.layers.0.self_attn.o_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
|
50 |
+
!! - model.layers.0.self_attn.o_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001521, max: 0.089478, std: 0.001425
|
51 |
+
!! - model.layers.0.self_attn.q_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
52 |
+
!! - model.layers.0.self_attn.q_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147399309, max: 2147314245
|
53 |
+
!! - model.layers.0.self_attn.q_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
54 |
+
!! - model.layers.0.self_attn.q_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128450726, max: 2092123285
|
55 |
+
!! - model.layers.0.self_attn.q_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
|
56 |
+
!! - model.layers.0.self_attn.q_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001049, max: 0.095764, std: 0.005581
|
57 |
+
!! - model.layers.0.self_attn.v_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
58 |
+
!! - model.layers.0.self_attn.v_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147441095, max: 2147387755
|
59 |
+
!! - model.layers.0.self_attn.v_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
60 |
+
!! - model.layers.0.self_attn.v_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2091420041, max: 2071422327
|
61 |
+
!! - model.layers.0.self_attn.v_proj.scales read: device: cpu, shape: [40, 5120], dtype: float32
|
62 |
+
!! - model.layers.0.self_attn.v_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001673, max: 0.015762, std: 0.001489
|
63 |
+
!! - model.norm.weight read: device: cpu, shape: [5120], dtype: float16
|
64 |
+
!! - model.norm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: 0.018066, max: 2.093750, std: 0.073120
|
65 |
+
!! Computing RoPE table for seq length: 2048
|
66 |
+
!! - stored for device: cuda:0
|
67 |
+
** Time, Load model: 1.58 seconds
|
68 |
+
-- Groupsize (inferred): 128
|
69 |
+
-- Act-order (inferred): no
|
70 |
+
** VRAM, Model: [cuda:0] 6,683.17 MB - [cuda:1] 0.00 MB
|
71 |
+
!! Inference, debug pass
|
72 |
+
!! Begin forward pass
|
73 |
+
!! Moving input_ids from cuda:0 to cpu
|
74 |
+
!! Built initial hidden state: device: cpu, shape: [1, 1920, 5120], dtype: float16, min: -0.110840, max: 0.124512, std: 0.018784
|
75 |
+
!! Prepared buffer for device: cuda:0
|
76 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
77 |
+
!! Moving hidden_states from cpu to cuda:0
|
78 |
+
!! Begin decoder 0
|
79 |
+
!! Begin self-attention
|
80 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -0.110840, max: 0.124512, std: 0.018784
|
81 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
82 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593 eps: 0.00000100
|
83 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001049/0.095764/0.005581
|
84 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001449/0.082703/0.005592
|
85 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.001673/0.015762/0.001489
|
86 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001521/0.089478/0.001425
|
87 |
+
!! - cache device: cuda:0, seq_len: 0
|
88 |
+
!! Begin MLP
|
89 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1.126953, max: 1.317383, std: 0.035309
|
90 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113 eps: 0.00000100
|
91 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002777/0.060303/0.000990
|
92 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002075/0.040131/0.000730
|
93 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003326/0.099487/0.001260
|
94 |
+
!! - method: normal
|
95 |
+
!! Begin decoder 1
|
96 |
+
!! Begin self-attention
|
97 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -10.023438, max: 37.812500, std: 0.116089
|
98 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
99 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012146, max: 0.326172, std: 0.022308 eps: 0.00000100
|
100 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001299/0.042847/0.005116
|
101 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001262/0.056030/0.005295
|
102 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.001407/0.011436/0.001119
|
103 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001063/0.086609/0.001472
|
104 |
+
!! - cache device: cuda:0, seq_len: 0
|
105 |
+
!! Begin MLP
|
106 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.671875, max: 35.968750, std: 0.113708
|
107 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003036, max: 0.166016, std: 0.010605 eps: 0.00000100
|
108 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003960/0.075562/0.001144
|
109 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003387/0.035187/0.000851
|
110 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002762/0.120483/0.001154
|
111 |
+
!! - method: normal
|
112 |
+
!! Begin decoder 2
|
113 |
+
!! Begin self-attention
|
114 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -12.234375, max: 33.375000, std: 0.152710
|
115 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
116 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.057617, max: 0.369141, std: 0.015396 eps: 0.00000100
|
117 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002361/0.074585/0.003971
|
118 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001963/0.050629/0.004532
|
119 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002445/0.020309/0.000759
|
120 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002083/0.110596/0.001124
|
121 |
+
!! - cache device: cuda:0, seq_len: 0
|
122 |
+
!! Begin MLP
|
123 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -12.398438, max: 29.312500, std: 0.156738
|
124 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014099, max: 0.161133, std: 0.011726 eps: 0.00000100
|
125 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002787/0.087097/0.001152
|
126 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003202/0.043213/0.000878
|
127 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002434/0.133301/0.001044
|
128 |
+
!! - method: normal
|
129 |
+
!! Begin decoder 3
|
130 |
+
!! Begin self-attention
|
131 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -933.000000, max: 26.562500, std: 0.348877
|
132 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
133 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.445312, std: 0.016769 eps: 0.00000100
|
134 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002218/0.064087/0.003193
|
135 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001682/0.047546/0.003334
|
136 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002258/0.013161/0.000889
|
137 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001929/0.086182/0.001017
|
138 |
+
!! - cache device: cuda:0, seq_len: 0
|
139 |
+
!! Begin MLP
|
140 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -933.000000, max: 27.828125, std: 0.353027
|
141 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.020508, max: 0.185547, std: 0.012711 eps: 0.00000100
|
142 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002598/0.055603/0.001158
|
143 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002819/0.043365/0.000893
|
144 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002668/0.083008/0.000952
|
145 |
+
!! - method: normal
|
146 |
+
!! Begin decoder 4
|
147 |
+
!! Begin self-attention
|
148 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.000000, max: 26.937500, std: 0.376465
|
149 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
150 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.036621, max: 0.458984, std: 0.017136 eps: 0.00000100
|
151 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002357/0.124084/0.003180
|
152 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001328/0.042419/0.003229
|
153 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002598/0.018280/0.000826
|
154 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001725/0.085449/0.000918
|
155 |
+
!! - cache device: cuda:0, seq_len: 0
|
156 |
+
!! Begin MLP
|
157 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.000000, max: 30.609375, std: 0.397705
|
158 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025391, max: 0.200195, std: 0.012398 eps: 0.00000100
|
159 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003830/0.047241/0.001214
|
160 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003572/0.041473/0.000900
|
161 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002481/0.095337/0.000922
|
162 |
+
!! - method: normal
|
163 |
+
!! Begin decoder 5
|
164 |
+
!! Begin self-attention
|
165 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.500000, max: 28.265625, std: 0.410889
|
166 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
167 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.492188, std: 0.019684 eps: 0.00000100
|
168 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001987/0.102661/0.003073
|
169 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001550/0.035492/0.003050
|
170 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002256/0.016541/0.000906
|
171 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002275/0.106079/0.001011
|
172 |
+
!! - cache device: cuda:0, seq_len: 0
|
173 |
+
!! Begin MLP
|
174 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -935.500000, max: 32.062500, std: 0.428223
|
175 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042236, max: 0.211914, std: 0.011848 eps: 0.00000100
|
176 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002773/0.047150/0.001265
|
177 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001515/0.041870/0.000920
|
178 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002594/0.062195/0.000935
|
179 |
+
!! - method: normal
|
180 |
+
!! Begin decoder 6
|
181 |
+
!! Begin self-attention
|
182 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -937.000000, max: 29.000000, std: 0.451904
|
183 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
184 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.067871, max: 0.558594, std: 0.019913 eps: 0.00000100
|
185 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002136/0.046173/0.003099
|
186 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001863/0.033478/0.003153
|
187 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002909/0.020889/0.000928
|
188 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001761/0.096313/0.001001
|
189 |
+
!! - cache device: cuda:0, seq_len: 0
|
190 |
+
!! Begin MLP
|
191 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -937.500000, max: 27.984375, std: 0.468262
|
192 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038574, max: 0.244141, std: 0.012810 eps: 0.00000100
|
193 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003599/0.058990/0.001412
|
194 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003576/0.044037/0.000947
|
195 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002380/0.090454/0.001029
|
196 |
+
!! - method: normal
|
197 |
+
!! Begin decoder 7
|
198 |
+
!! Begin self-attention
|
199 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.437500, std: 0.518066
|
200 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
201 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.010315, max: 0.609375, std: 0.018875 eps: 0.00000100
|
202 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002357/0.038116/0.002750
|
203 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002035/0.030289/0.002897
|
204 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002699/0.013130/0.000939
|
205 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001756/0.065430/0.000955
|
206 |
+
!! - cache device: cuda:0, seq_len: 0
|
207 |
+
!! Begin MLP
|
208 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.625000, std: 0.557129
|
209 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043701, max: 0.222656, std: 0.011360 eps: 0.00000100
|
210 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003187/0.053528/0.001369
|
211 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003983/0.029083/0.000935
|
212 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002668/0.070984/0.000947
|
213 |
+
!! - method: normal
|
214 |
+
!! Begin decoder 8
|
215 |
+
!! Begin self-attention
|
216 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.687500, std: 0.583008
|
217 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
218 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007812, max: 0.617188, std: 0.021469 eps: 0.00000100
|
219 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002020/0.036896/0.003115
|
220 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001634/0.027725/0.003042
|
221 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003176/0.019165/0.000947
|
222 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001910/0.084106/0.000935
|
223 |
+
!! - cache device: cuda:0, seq_len: 0
|
224 |
+
!! Begin MLP
|
225 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 78.812500, std: 0.605469
|
226 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.228516, std: 0.012070 eps: 0.00000100
|
227 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003246/0.053589/0.001263
|
228 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001094/0.036316/0.000944
|
229 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002659/0.075378/0.000929
|
230 |
+
!! - method: normal
|
231 |
+
!! Begin decoder 9
|
232 |
+
!! Begin self-attention
|
233 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.000000, std: 0.611816
|
234 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
235 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003876, max: 0.664062, std: 0.020859 eps: 0.00000100
|
236 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002146/0.038910/0.002712
|
237 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001664/0.032074/0.002876
|
238 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003122/0.015617/0.000871
|
239 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001311/0.095337/0.000900
|
240 |
+
!! - cache device: cuda:0, seq_len: 0
|
241 |
+
!! Begin MLP
|
242 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.062500, std: 0.624023
|
243 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.049805, max: 0.238281, std: 0.011787 eps: 0.00000100
|
244 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003241/0.061310/0.001322
|
245 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003132/0.040771/0.000956
|
246 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002480/0.081299/0.000928
|
247 |
+
!! - method: normal
|
248 |
+
!! Begin decoder 10
|
249 |
+
!! Begin self-attention
|
250 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.125000, std: 0.634277
|
251 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
252 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002594, max: 0.703125, std: 0.021515 eps: 0.00000100
|
253 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002222/0.033997/0.002638
|
254 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001856/0.029907/0.002831
|
255 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003365/0.014862/0.000932
|
256 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001518/0.084351/0.000958
|
257 |
+
!! - cache device: cuda:0, seq_len: 0
|
258 |
+
!! Begin MLP
|
259 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.187500, std: 0.652344
|
260 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.053955, max: 0.245117, std: 0.011978 eps: 0.00000100
|
261 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003246/0.042297/0.001295
|
262 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003368/0.040710/0.000970
|
263 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002800/0.089050/0.000934
|
264 |
+
!! - method: normal
|
265 |
+
!! Begin decoder 11
|
266 |
+
!! Begin self-attention
|
267 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.250000, std: 0.668457
|
268 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
269 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007355, max: 0.687500, std: 0.021606 eps: 0.00000100
|
270 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002106/0.034271/0.002579
|
271 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002033/0.028885/0.002792
|
272 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003374/0.014481/0.000937
|
273 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001925/0.075500/0.000946
|
274 |
+
!! - cache device: cuda:0, seq_len: 0
|
275 |
+
!! Begin MLP
|
276 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.312500, std: 0.690430
|
277 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.054443, max: 0.251953, std: 0.011749 eps: 0.00000100
|
278 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003128/0.051086/0.001299
|
279 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.001537/0.041565/0.000993
|
280 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003239/0.079163/0.000940
|
281 |
+
!! - method: normal
|
282 |
+
!! Begin decoder 12
|
283 |
+
!! Begin self-attention
|
284 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 79.375000, std: 0.723145
|
285 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
286 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014771, max: 0.664062, std: 0.020920 eps: 0.00000100
|
287 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002449/0.034271/0.002655
|
288 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002136/0.032806/0.002867
|
289 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003397/0.019394/0.000961
|
290 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001609/0.057343/0.000999
|
291 |
+
!! - cache device: cuda:0, seq_len: 0
|
292 |
+
!! Begin MLP
|
293 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.500000, std: 0.749023
|
294 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.056396, max: 0.249023, std: 0.012207 eps: 0.00000100
|
295 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003019/0.043274/0.001330
|
296 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002712/0.043762/0.001000
|
297 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003359/0.118286/0.000953
|
298 |
+
!! - method: normal
|
299 |
+
!! Begin decoder 13
|
300 |
+
!! Begin self-attention
|
301 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.562500, std: 0.785645
|
302 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
303 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.687500, std: 0.021698 eps: 0.00000100
|
304 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002420/0.034241/0.002577
|
305 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002388/0.034241/0.002741
|
306 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003078/0.015854/0.000962
|
307 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002022/0.078918/0.000970
|
308 |
+
!! - cache device: cuda:0, seq_len: 0
|
309 |
+
!! Begin MLP
|
310 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.687500, std: 0.807617
|
311 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051025, max: 0.265625, std: 0.012978 eps: 0.00000100
|
312 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003170/0.036652/0.001327
|
313 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004108/0.028717/0.000996
|
314 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002531/0.052429/0.000926
|
315 |
+
!! - method: normal
|
316 |
+
!! Begin decoder 14
|
317 |
+
!! Begin self-attention
|
318 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.750000, std: 0.848633
|
319 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
320 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025879, max: 0.691406, std: 0.021164 eps: 0.00000100
|
321 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002115/0.035156/0.002348
|
322 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001471/0.031067/0.002569
|
323 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003618/0.020035/0.000957
|
324 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001540/0.086060/0.000992
|
325 |
+
!! - cache device: cuda:0, seq_len: 0
|
326 |
+
!! Begin MLP
|
327 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.875000, std: 0.866699
|
328 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.055420, max: 0.273438, std: 0.013245 eps: 0.00000100
|
329 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003336/0.032928/0.001335
|
330 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003906/0.045197/0.000993
|
331 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002605/0.088013/0.000936
|
332 |
+
!! - method: normal
|
333 |
+
!! Begin decoder 15
|
334 |
+
!! Begin self-attention
|
335 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.917480
|
336 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
337 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031494, max: 0.679688, std: 0.020615 eps: 0.00000100
|
338 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002296/0.038727/0.002529
|
339 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002375/0.030533/0.002689
|
340 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003328/0.015869/0.000980
|
341 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001546/0.124634/0.001021
|
342 |
+
!! - cache device: cuda:0, seq_len: 0
|
343 |
+
!! Begin MLP
|
344 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.946777
|
345 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040039, max: 0.291016, std: 0.014809 eps: 0.00000100
|
346 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003687/0.051025/0.001274
|
347 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004307/0.041656/0.000965
|
348 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002167/0.078613/0.000919
|
349 |
+
!! - method: normal
|
350 |
+
!! Begin decoder 16
|
351 |
+
!! Begin self-attention
|
352 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.937500, std: 0.994141
|
353 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
354 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012573, max: 0.652344, std: 0.020477 eps: 0.00000100
|
355 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002371/0.034912/0.002207
|
356 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001926/0.029617/0.002392
|
357 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003460/0.018524/0.000947
|
358 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001738/0.051270/0.000971
|
359 |
+
!! - cache device: cuda:0, seq_len: 0
|
360 |
+
!! Begin MLP
|
361 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.007812
|
362 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045898, max: 0.298828, std: 0.015106 eps: 0.00000100
|
363 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003387/0.036011/0.001249
|
364 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003696/0.035187/0.000964
|
365 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002268/0.065063/0.000917
|
366 |
+
!! - method: normal
|
367 |
+
!! Begin decoder 17
|
368 |
+
!! Begin self-attention
|
369 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.063477
|
370 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
371 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025146, max: 0.722656, std: 0.021576 eps: 0.00000100
|
372 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002331/0.036224/0.002277
|
373 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001755/0.030884/0.002550
|
374 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003754/0.020874/0.000970
|
375 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001672/0.116455/0.001009
|
376 |
+
!! - cache device: cuda:0, seq_len: 0
|
377 |
+
!! Begin MLP
|
378 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.102539
|
379 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042969, max: 0.310547, std: 0.015625 eps: 0.00000100
|
380 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003586/0.035492/0.001222
|
381 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004265/0.044525/0.000955
|
382 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002222/0.067993/0.000917
|
383 |
+
!! - method: normal
|
384 |
+
!! Begin decoder 18
|
385 |
+
!! Begin self-attention
|
386 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.062500, std: 1.158203
|
387 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
388 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.738281, std: 0.022064 eps: 0.00000100
|
389 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002323/0.033447/0.002235
|
390 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001904/0.030121/0.002382
|
391 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004002/0.014252/0.000932
|
392 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001740/0.083801/0.000958
|
393 |
+
!! - cache device: cuda:0, seq_len: 0
|
394 |
+
!! Begin MLP
|
395 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.192383
|
396 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048584, max: 0.318359, std: 0.015625 eps: 0.00000100
|
397 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003035/0.034271/0.001252
|
398 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003998/0.045654/0.000957
|
399 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002491/0.084534/0.000911
|
400 |
+
!! - method: normal
|
401 |
+
!! Begin decoder 19
|
402 |
+
!! Begin self-attention
|
403 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.264648
|
404 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
405 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.024170, max: 0.753906, std: 0.022308 eps: 0.00000100
|
406 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002134/0.031494/0.002193
|
407 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001934/0.030380/0.002371
|
408 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003841/0.015404/0.000981
|
409 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001974/0.084167/0.001057
|
410 |
+
!! - cache device: cuda:0, seq_len: 0
|
411 |
+
!! Begin MLP
|
412 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.125000, std: 1.292969
|
413 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033936, max: 0.347656, std: 0.016785 eps: 0.00000100
|
414 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003767/0.040405/0.001213
|
415 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004185/0.043823/0.000943
|
416 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002474/0.062683/0.000900
|
417 |
+
!! - method: normal
|
418 |
+
!! Begin decoder 20
|
419 |
+
!! Begin self-attention
|
420 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.062500, std: 1.365234
|
421 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
422 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.757812, std: 0.022324 eps: 0.00000100
|
423 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002235/0.035187/0.002100
|
424 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002291/0.032471/0.002190
|
425 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003658/0.014191/0.001044
|
426 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001817/0.078064/0.001065
|
427 |
+
!! - cache device: cuda:0, seq_len: 0
|
428 |
+
!! Begin MLP
|
429 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.402344
|
430 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048096, max: 0.345703, std: 0.016815 eps: 0.00000100
|
431 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003643/0.044281/0.001211
|
432 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004276/0.048615/0.000933
|
433 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002605/0.067444/0.000911
|
434 |
+
!! - method: normal
|
435 |
+
!! Begin decoder 21
|
436 |
+
!! Begin self-attention
|
437 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.490234
|
438 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
439 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037598, max: 0.796875, std: 0.023514 eps: 0.00000100
|
440 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002506/0.043945/0.002247
|
441 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002028/0.031616/0.002365
|
442 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004189/0.014427/0.001028
|
443 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001978/0.039856/0.001017
|
444 |
+
!! - cache device: cuda:0, seq_len: 0
|
445 |
+
!! Begin MLP
|
446 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.533203
|
447 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.347656, std: 0.017212 eps: 0.00000100
|
448 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003614/0.052155/0.001178
|
449 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.032867/0.000925
|
450 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002708/0.063232/0.000911
|
451 |
+
!! - method: normal
|
452 |
+
!! Begin decoder 22
|
453 |
+
!! Begin self-attention
|
454 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.623047
|
455 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
456 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.753906, std: 0.022934 eps: 0.00000100
|
457 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002468/0.036316/0.002068
|
458 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002302/0.030502/0.002201
|
459 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003658/0.014572/0.000998
|
460 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002260/0.096069/0.001020
|
461 |
+
!! - cache device: cuda:0, seq_len: 0
|
462 |
+
!! Begin MLP
|
463 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1073.000000, max: 80.000000, std: 1.687500
|
464 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045654, max: 0.361328, std: 0.018143 eps: 0.00000100
|
465 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003679/0.035217/0.001136
|
466 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004360/0.036133/0.000911
|
467 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002403/0.078796/0.000916
|
468 |
+
!! - method: normal
|
469 |
+
!! Begin decoder 23
|
470 |
+
!! Begin self-attention
|
471 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 80.000000, std: 1.782227
|
472 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
473 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033691, max: 0.792969, std: 0.024429 eps: 0.00000100
|
474 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002359/0.034546/0.002054
|
475 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002043/0.033936/0.002104
|
476 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004379/0.013702/0.000979
|
477 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001885/0.075256/0.000995
|
478 |
+
!! - cache device: cuda:0, seq_len: 0
|
479 |
+
!! Begin MLP
|
480 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1072.000000, max: 79.875000, std: 1.843750
|
481 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.367188, std: 0.019226 eps: 0.00000100
|
482 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003729/0.050964/0.001107
|
483 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.036224/0.000899
|
484 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002159/0.082642/0.000899
|
485 |
+
!! - method: normal
|
486 |
+
!! Begin decoder 24
|
487 |
+
!! Begin self-attention
|
488 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 80.000000, std: 1.940430
|
489 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
490 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051758, max: 0.812500, std: 0.025452 eps: 0.00000100
|
491 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002163/0.037628/0.002060
|
492 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002029/0.031433/0.002123
|
493 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003849/0.016617/0.000987
|
494 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.001784/0.109741/0.001011
|
495 |
+
!! - cache device: cuda:0, seq_len: 0
|
496 |
+
!! Begin MLP
|
497 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 87.187500, std: 1.993164
|
498 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029419, max: 0.382812, std: 0.020203 eps: 0.00000100
|
499 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003664/0.039459/0.001067
|
500 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004559/0.033142/0.000891
|
501 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002037/0.088379/0.000898
|
502 |
+
!! - method: normal
|
503 |
+
!! Begin decoder 25
|
504 |
+
!! Begin self-attention
|
505 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 88.312500, std: 2.072266
|
506 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
507 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043213, max: 0.816406, std: 0.024796 eps: 0.00000100
|
508 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001895/0.034515/0.002041
|
509 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001381/0.040314/0.002146
|
510 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003727/0.015511/0.001091
|
511 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002243/0.103149/0.001124
|
512 |
+
!! - cache device: cuda:0, seq_len: 0
|
513 |
+
!! Begin MLP
|
514 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1071.000000, max: 98.000000, std: 2.152344
|
515 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029663, max: 0.404297, std: 0.020950 eps: 0.00000100
|
516 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003717/0.032501/0.001052
|
517 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004433/0.026627/0.000883
|
518 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002089/0.068298/0.000892
|
519 |
+
!! - method: normal
|
520 |
+
!! Begin decoder 26
|
521 |
+
!! Begin self-attention
|
522 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1070.000000, max: 101.000000, std: 2.234375
|
523 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
524 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.875000, std: 0.026947 eps: 0.00000100
|
525 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002312/0.030716/0.001928
|
526 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002153/0.033234/0.002005
|
527 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004166/0.014450/0.000995
|
528 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002365/0.091187/0.001030
|
529 |
+
!! - cache device: cuda:0, seq_len: 0
|
530 |
+
!! Begin MLP
|
531 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1070.000000, max: 105.500000, std: 2.265625
|
532 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030518, max: 0.400391, std: 0.021332 eps: 0.00000100
|
533 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004192/0.032410/0.001042
|
534 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004314/0.036591/0.000883
|
535 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002001/0.074585/0.000899
|
536 |
+
!! - method: normal
|
537 |
+
!! Begin decoder 27
|
538 |
+
!! Begin self-attention
|
539 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1069.000000, max: 108.812500, std: 2.341797
|
540 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
541 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.906250, std: 0.027390 eps: 0.00000100
|
542 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002163/0.037323/0.002039
|
543 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002100/0.032104/0.002142
|
544 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004280/0.019775/0.000985
|
545 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002172/0.070496/0.001004
|
546 |
+
!! - cache device: cuda:0, seq_len: 0
|
547 |
+
!! Begin MLP
|
548 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1069.000000, max: 115.812500, std: 2.398438
|
549 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.034180, max: 0.406250, std: 0.021439 eps: 0.00000100
|
550 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004284/0.040131/0.001047
|
551 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004375/0.046295/0.000883
|
552 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002033/0.049622/0.000891
|
553 |
+
!! - method: normal
|
554 |
+
!! Begin decoder 28
|
555 |
+
!! Begin self-attention
|
556 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1068.000000, max: 119.062500, std: 2.470703
|
557 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
558 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.937500, std: 0.027420 eps: 0.00000100
|
559 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002270/0.045990/0.002008
|
560 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002068/0.035706/0.002039
|
561 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003502/0.013725/0.001108
|
562 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002235/0.154175/0.001218
|
563 |
+
!! - cache device: cuda:0, seq_len: 0
|
564 |
+
!! Begin MLP
|
565 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1068.000000, max: 132.000000, std: 2.578125
|
566 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.022705, max: 0.423828, std: 0.022003 eps: 0.00000100
|
567 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004471/0.042694/0.001054
|
568 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004562/0.022446/0.000878
|
569 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001733/0.056427/0.000884
|
570 |
+
!! - method: normal
|
571 |
+
!! Begin decoder 29
|
572 |
+
!! Begin self-attention
|
573 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1067.000000, max: 134.750000, std: 2.632812
|
574 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
575 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003403, max: 0.957031, std: 0.027893 eps: 0.00000100
|
576 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002245/0.032928/0.001910
|
577 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002039/0.030350/0.001957
|
578 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004120/0.014153/0.001067
|
579 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002306/0.074097/0.001082
|
580 |
+
!! - cache device: cuda:0, seq_len: 0
|
581 |
+
!! Begin MLP
|
582 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1067.000000, max: 138.375000, std: 2.666016
|
583 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.028442, max: 0.691406, std: 0.022568 eps: 0.00000100
|
584 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004440/0.035675/0.001063
|
585 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004391/0.031128/0.000879
|
586 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001850/0.075684/0.000896
|
587 |
+
!! - method: normal
|
588 |
+
!! Begin decoder 30
|
589 |
+
!! Begin self-attention
|
590 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1066.000000, max: 141.125000, std: 2.714844
|
591 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
592 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.039062, max: 0.953125, std: 0.028458 eps: 0.00000100
|
593 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002247/0.030197/0.001984
|
594 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002272/0.032532/0.002090
|
595 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002539/0.015915/0.001025
|
596 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002310/0.092224/0.001046
|
597 |
+
!! - cache device: cuda:0, seq_len: 0
|
598 |
+
!! Begin MLP
|
599 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1066.000000, max: 145.625000, std: 2.767578
|
600 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.013855, max: 0.443359, std: 0.021713 eps: 0.00000100
|
601 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004665/0.045197/0.001092
|
602 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004078/0.036926/0.000885
|
603 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001813/0.072693/0.000899
|
604 |
+
!! - method: normal
|
605 |
+
!! Begin decoder 31
|
606 |
+
!! Begin self-attention
|
607 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1064.000000, max: 151.000000, std: 2.847656
|
608 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
609 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.005127, max: 0.949219, std: 0.028824 eps: 0.00000100
|
610 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002350/0.031052/0.001871
|
611 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002193/0.030899/0.001905
|
612 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004337/0.015503/0.001026
|
613 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002642/0.092957/0.001069
|
614 |
+
!! - cache device: cuda:0, seq_len: 0
|
615 |
+
!! Begin MLP
|
616 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1064.000000, max: 160.750000, std: 2.914062
|
617 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.015198, max: 0.449219, std: 0.022018 eps: 0.00000100
|
618 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004639/0.031525/0.001118
|
619 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004658/0.035858/0.000885
|
620 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001880/0.045258/0.000892
|
621 |
+
!! - method: normal
|
622 |
+
!! Begin decoder 32
|
623 |
+
!! Begin self-attention
|
624 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1062.000000, max: 162.625000, std: 2.964844
|
625 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
626 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002731, max: 0.898438, std: 0.028946 eps: 0.00000100
|
627 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002439/0.031342/0.001923
|
628 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001745/0.039093/0.001959
|
629 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003937/0.014107/0.001027
|
630 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002518/0.113953/0.001073
|
631 |
+
!! - cache device: cuda:0, seq_len: 0
|
632 |
+
!! Begin MLP
|
633 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1062.000000, max: 167.125000, std: 3.007812
|
634 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003967, max: 0.746094, std: 0.022736 eps: 0.00000100
|
635 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004223/0.046234/0.001122
|
636 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004738/0.031342/0.000886
|
637 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001719/0.055420/0.000911
|
638 |
+
!! - method: normal
|
639 |
+
!! Begin decoder 33
|
640 |
+
!! Begin self-attention
|
641 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1059.000000, max: 172.125000, std: 3.062500
|
642 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
643 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.910156, std: 0.029999 eps: 0.00000100
|
644 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002224/0.034576/0.001955
|
645 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002178/0.034698/0.001965
|
646 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003191/0.017090/0.001073
|
647 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002516/0.098511/0.001093
|
648 |
+
!! - cache device: cuda:0, seq_len: 0
|
649 |
+
!! Begin MLP
|
650 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1058.000000, max: 174.375000, std: 3.101562
|
651 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.021729, max: 0.457031, std: 0.021973 eps: 0.00000100
|
652 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004440/0.058960/0.001143
|
653 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004120/0.027802/0.000899
|
654 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001822/0.089966/0.000950
|
655 |
+
!! - method: normal
|
656 |
+
!! Begin decoder 34
|
657 |
+
!! Begin self-attention
|
658 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1054.000000, max: 176.625000, std: 3.140625
|
659 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
660 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038086, max: 0.953125, std: 0.030441 eps: 0.00000100
|
661 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002279/0.033783/0.001966
|
662 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002062/0.031311/0.002022
|
663 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003651/0.016846/0.001222
|
664 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002913/0.079651/0.001315
|
665 |
+
!! - cache device: cuda:0, seq_len: 0
|
666 |
+
!! Begin MLP
|
667 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1053.000000, max: 179.500000, std: 3.205078
|
668 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.460938, std: 0.021744 eps: 0.00000100
|
669 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004433/0.036102/0.001138
|
670 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004498/0.028717/0.000901
|
671 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.001920/0.123169/0.001141
|
672 |
+
!! - method: normal
|
673 |
+
!! Begin decoder 35
|
674 |
+
!! Begin self-attention
|
675 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 183.500000, std: 3.283203
|
676 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
677 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040283, max: 0.917969, std: 0.029037 eps: 0.00000100
|
678 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002428/0.032837/0.001951
|
679 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002157/0.030807/0.002024
|
680 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003971/0.013626/0.001038
|
681 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003014/0.090149/0.001112
|
682 |
+
!! - cache device: cuda:0, seq_len: 0
|
683 |
+
!! Begin MLP
|
684 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1021.500000, max: 188.875000, std: 3.333984
|
685 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030151, max: 0.468750, std: 0.021896 eps: 0.00000100
|
686 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002829/0.039459/0.001129
|
687 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004147/0.044250/0.000917
|
688 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002134/0.148560/0.001385
|
689 |
+
!! - method: normal
|
690 |
+
!! Begin decoder 36
|
691 |
+
!! Begin self-attention
|
692 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -957.000000, max: 190.625000, std: 3.396484
|
693 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
694 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.004456, max: 0.941406, std: 0.031082 eps: 0.00000100
|
695 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.001844/0.032776/0.001974
|
696 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001781/0.031769/0.002085
|
697 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.004047/0.016876/0.001062
|
698 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.002771/0.059174/0.001117
|
699 |
+
!! - cache device: cuda:0, seq_len: 0
|
700 |
+
!! Begin MLP
|
701 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -956.500000, max: 191.375000, std: 3.433594
|
702 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.050537, max: 0.839844, std: 0.022324 eps: 0.00000100
|
703 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004131/0.048218/0.001153
|
704 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.004238/0.036469/0.000927
|
705 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002237/0.148193/0.001454
|
706 |
+
!! - method: normal
|
707 |
+
!! Begin decoder 37
|
708 |
+
!! Begin self-attention
|
709 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -895.000000, max: 188.875000, std: 3.443359
|
710 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
711 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002762, max: 1.054688, std: 0.032867 eps: 0.00000100
|
712 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002245/0.036652/0.001965
|
713 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.001849/0.033752/0.002066
|
714 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003832/0.017563/0.001212
|
715 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003330/0.115906/0.001400
|
716 |
+
!! - cache device: cuda:0, seq_len: 0
|
717 |
+
!! Begin MLP
|
718 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -891.500000, max: 191.125000, std: 3.550781
|
719 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.066406, max: 0.593750, std: 0.021439 eps: 0.00000100
|
720 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.003469/0.083496/0.001222
|
721 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.003468/0.034821/0.000952
|
722 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.002447/0.204346/0.002012
|
723 |
+
!! - method: normal
|
724 |
+
!! Begin decoder 38
|
725 |
+
!! Begin self-attention
|
726 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -595.000000, max: 182.000000, std: 3.615234
|
727 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
728 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.097656, max: 1.039062, std: 0.031891 eps: 0.00000100
|
729 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002375/0.045197/0.001980
|
730 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002171/0.030624/0.001997
|
731 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.003450/0.017731/0.001331
|
732 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003344/0.227539/0.001991
|
733 |
+
!! - cache device: cuda:0, seq_len: 0
|
734 |
+
!! Begin MLP
|
735 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -95.500000, max: 199.750000, std: 3.875000
|
736 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.087891, max: 0.498047, std: 0.020370 eps: 0.00000100
|
737 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.004387/0.031525/0.001246
|
738 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002453/0.059601/0.001083
|
739 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.003397/0.199585/0.001426
|
740 |
+
!! - method: normal
|
741 |
+
!! Begin decoder 39
|
742 |
+
!! Begin self-attention
|
743 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -172.500000, max: 207.375000, std: 4.148438
|
744 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
745 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002625, max: 0.957031, std: 0.032471 eps: 0.00000100
|
746 |
+
!! - self_attn.q_proj: cuda:0 [Q] scales min/max/std: 0.002300/0.047607/0.002197
|
747 |
+
!! - self_attn.k_proj: cuda:0 [Q] scales min/max/std: 0.002066/0.033020/0.002274
|
748 |
+
!! - self_attn.v_proj: cuda:0 [Q] scales min/max/std: 0.002975/0.016586/0.001257
|
749 |
+
!! - self_attn.o_proj: cuda:0 [Q] scales min/max/std: 0.003019/0.146851/0.001698
|
750 |
+
!! - cache device: cuda:0, seq_len: 0
|
751 |
+
!! Begin MLP
|
752 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -152.500000, max: 230.750000, std: 4.437500
|
753 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.109863, max: 0.648438, std: 0.025543 eps: 0.00000100
|
754 |
+
!! - mlp.gate_proj: cuda:0 [Q] scales min/max/std: 0.002789/0.032501/0.001303
|
755 |
+
!! - mlp.up_proj: cuda:0 [Q] scales min/max/std: 0.002787/0.085999/0.001245
|
756 |
+
!! - mlp.down_proj: cuda:0 [Q] scales min/max/std: 0.004478/0.175049/0.001831
|
757 |
+
!! - method: normal
|
758 |
+
!! pre norm, hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -191.250000, max: 691.500000, std: 6.925781
|
759 |
+
!! pre lm_head, hidden_states: device: cuda:0, shape: [1, 1, 5120], dtype: float16, min: -18.781250, max: 24.484375, std: 1.285156
|
760 |
+
!! logits: device: cuda:0, shape: [1, 1, 32000], dtype: float16, min: -11.500000, max: 10.296875, std: 2.232422
|
761 |
+
!! Moving logits from cuda:0 to cpu
|
762 |
+
** Time, Inference: 0.86 seconds
|
koala-13B-4bit_qwop_cuda_slow.txt
ADDED
@@ -0,0 +1,776 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python test_benchmark_inference.py -dbg -d ~/llm_models/koala-13B-GPTQ
|
2 |
+
-- Loading model
|
3 |
+
-- Tokenizer: /home/nap/llm_models/koala-13B-GPTQ/tokenizer.model
|
4 |
+
-- Model config: /home/nap/llm_models/koala-13B-GPTQ/config.json
|
5 |
+
-- Model: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_qwop_cuda_slow.safetensors
|
6 |
+
-- Sequence length: 2048
|
7 |
+
-- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'debug']
|
8 |
+
!! Available CUDA devices:
|
9 |
+
" !! - cuda:0: NVIDIA GeForce RTX 4090
|
10 |
+
" !! - cuda:1: NVIDIA RTX A6000
|
11 |
+
!! Loading safetensors file: /home/nap/llm_models/koala-13B-GPTQ/koala-13B-4bit_qwop_cuda_slow.safetensors
|
12 |
+
!! Begin load tensors
|
13 |
+
!! - lm_head.weight read: device: cpu, shape: [32000, 5120], dtype: float16
|
14 |
+
!! - lm_head.weight map: device: cuda:0, shape: [32000, 5120], dtype: float16, min: -0.316406, max: 0.361328, std: 0.020935
|
15 |
+
!! - model.embed_tokens.weight read: device: cpu, shape: [32000, 5120], dtype: float16
|
16 |
+
!! - model.embed_tokens.weight map: device: cpu, shape: [32000, 5120], dtype: float16
|
17 |
+
!! - model.layers.0.input_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
|
18 |
+
!! - model.layers.0.input_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593
|
19 |
+
!! - model.layers.0.mlp.down_proj.g_idx read: device: cpu, shape: [13824], dtype: int32
|
20 |
+
!! - model.layers.0.mlp.down_proj.g_idx map: device: cuda:0, shape: [13824], dtype: int32, min: 0, max: 107
|
21 |
+
!! - model.layers.0.mlp.down_proj.qweight read: device: cpu, shape: [1728, 5120], dtype: int32
|
22 |
+
!! - model.layers.0.mlp.down_proj.qweight map: device: cuda:0, shape: [1728, 5120], dtype: int32, min: -2147416079, max: 2147375608
|
23 |
+
!! - model.layers.0.mlp.down_proj.qzeros read: device: cpu, shape: [108, 640], dtype: int32
|
24 |
+
!! - model.layers.0.mlp.down_proj.qzeros map: device: cuda:0, shape: [108, 640], dtype: int32, min: -2106165417, max: 2089191031
|
25 |
+
!! - model.layers.0.mlp.down_proj.scales read: device: cpu, shape: [108, 5120], dtype: float16
|
26 |
+
!! - model.layers.0.mlp.down_proj.scales map: device: cuda:0, shape: [108, 5120], dtype: float16, min: 0.003326, max: 0.099487, std: 0.001260
|
27 |
+
!! - model.layers.0.mlp.gate_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
28 |
+
!! - model.layers.0.mlp.gate_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
29 |
+
!! - model.layers.0.mlp.gate_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
|
30 |
+
!! - model.layers.0.mlp.gate_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147459474, max: 2147466163
|
31 |
+
!! - model.layers.0.mlp.gate_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
|
32 |
+
!! - model.layers.0.mlp.gate_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2125109368, max: 2089248375
|
33 |
+
!! - model.layers.0.mlp.gate_proj.scales read: device: cpu, shape: [40, 13824], dtype: float16
|
34 |
+
!! - model.layers.0.mlp.gate_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002777, max: 0.060303, std: 0.000990
|
35 |
+
!! - model.layers.0.mlp.up_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
36 |
+
!! - model.layers.0.mlp.up_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
37 |
+
!! - model.layers.0.mlp.up_proj.qweight read: device: cpu, shape: [640, 13824], dtype: int32
|
38 |
+
!! - model.layers.0.mlp.up_proj.qweight map: device: cuda:0, shape: [640, 13824], dtype: int32, min: -2147474830, max: 2147437148
|
39 |
+
!! - model.layers.0.mlp.up_proj.qzeros read: device: cpu, shape: [40, 1728], dtype: int32
|
40 |
+
!! - model.layers.0.mlp.up_proj.qzeros map: device: cuda:0, shape: [40, 1728], dtype: int32, min: -2107213722, max: 2089121671
|
41 |
+
!! - model.layers.0.mlp.up_proj.scales read: device: cpu, shape: [40, 13824], dtype: float16
|
42 |
+
!! - model.layers.0.mlp.up_proj.scales map: device: cuda:0, shape: [40, 13824], dtype: float16, min: 0.002075, max: 0.040131, std: 0.000730
|
43 |
+
!! - model.layers.0.post_attention_layernorm.weight read: device: cpu, shape: [5120], dtype: float16
|
44 |
+
!! - model.layers.0.post_attention_layernorm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113
|
45 |
+
!! - model.layers.0.self_attn.k_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
46 |
+
!! - model.layers.0.self_attn.k_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
47 |
+
!! - model.layers.0.self_attn.k_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
48 |
+
!! - model.layers.0.self_attn.k_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147305928, max: 2147337675
|
49 |
+
!! - model.layers.0.self_attn.k_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
50 |
+
!! - model.layers.0.self_attn.k_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128119278, max: 2092336937
|
51 |
+
!! - model.layers.0.self_attn.k_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
|
52 |
+
!! - model.layers.0.self_attn.k_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001449, max: 0.082703, std: 0.005592
|
53 |
+
!! - model.layers.0.self_attn.o_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
54 |
+
!! - model.layers.0.self_attn.o_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
55 |
+
!! - model.layers.0.self_attn.o_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
56 |
+
!! - model.layers.0.self_attn.o_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147453144, max: 2147375548
|
57 |
+
!! - model.layers.0.self_attn.o_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
58 |
+
!! - model.layers.0.self_attn.o_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2107209387, max: 2071422582
|
59 |
+
!! - model.layers.0.self_attn.o_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
|
60 |
+
!! - model.layers.0.self_attn.o_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001521, max: 0.089478, std: 0.001425
|
61 |
+
!! - model.layers.0.self_attn.q_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
62 |
+
!! - model.layers.0.self_attn.q_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
63 |
+
!! - model.layers.0.self_attn.q_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
64 |
+
!! - model.layers.0.self_attn.q_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147399309, max: 2147314245
|
65 |
+
!! - model.layers.0.self_attn.q_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
66 |
+
!! - model.layers.0.self_attn.q_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2128450726, max: 2092123285
|
67 |
+
!! - model.layers.0.self_attn.q_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
|
68 |
+
!! - model.layers.0.self_attn.q_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001049, max: 0.095764, std: 0.005581
|
69 |
+
!! - model.layers.0.self_attn.v_proj.g_idx read: device: cpu, shape: [5120], dtype: int32
|
70 |
+
!! - model.layers.0.self_attn.v_proj.g_idx map: device: cuda:0, shape: [5120], dtype: int32, min: 0, max: 39
|
71 |
+
!! - model.layers.0.self_attn.v_proj.qweight read: device: cpu, shape: [640, 5120], dtype: int32
|
72 |
+
!! - model.layers.0.self_attn.v_proj.qweight map: device: cuda:0, shape: [640, 5120], dtype: int32, min: -2147441095, max: 2147387755
|
73 |
+
!! - model.layers.0.self_attn.v_proj.qzeros read: device: cpu, shape: [40, 640], dtype: int32
|
74 |
+
!! - model.layers.0.self_attn.v_proj.qzeros map: device: cuda:0, shape: [40, 640], dtype: int32, min: -2091420041, max: 2071422327
|
75 |
+
!! - model.layers.0.self_attn.v_proj.scales read: device: cpu, shape: [40, 5120], dtype: float16
|
76 |
+
!! - model.layers.0.self_attn.v_proj.scales map: device: cuda:0, shape: [40, 5120], dtype: float16, min: 0.001673, max: 0.015762, std: 0.001489
|
77 |
+
!! - model.norm.weight read: device: cpu, shape: [5120], dtype: float16
|
78 |
+
!! - model.norm.weight map: device: cuda:0, shape: [5120], dtype: float16, min: 0.018066, max: 2.093750, std: 0.073120
|
79 |
+
!! Computing RoPE table for seq length: 2048
|
80 |
+
!! - stored for device: cuda:0
|
81 |
+
** Time, Load model: 3.72 seconds
|
82 |
+
-- Groupsize (inferred): 128
|
83 |
+
-- Act-order (inferred): yes
|
84 |
+
** VRAM, Model: [cuda:0] 6,689.96 MB - [cuda:1] 0.00 MB
|
85 |
+
!! Inference, debug pass
|
86 |
+
!! Begin forward pass
|
87 |
+
!! Moving input_ids from cuda:0 to cpu
|
88 |
+
!! Built initial hidden state: device: cpu, shape: [1, 1920, 5120], dtype: float16, min: -0.117676, max: 0.114746, std: 0.018738
|
89 |
+
!! Prepared buffer for device: cuda:0
|
90 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
91 |
+
!! Moving hidden_states from cpu to cuda:0
|
92 |
+
!! Begin decoder 0
|
93 |
+
!! Begin self-attention
|
94 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -0.117676, max: 0.114746, std: 0.018738
|
95 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
96 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002060, max: 0.742188, std: 0.045593 eps: 0.00000100
|
97 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001049/0.095764/0.005581
|
98 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001449/0.082703/0.005592
|
99 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001673/0.015762/0.001489
|
100 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001521/0.089478/0.001425
|
101 |
+
!! - cache device: cuda:0, seq_len: 0
|
102 |
+
!! Begin MLP
|
103 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1.013672, max: 1.294922, std: 0.035309
|
104 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.035889, max: 0.361328, std: 0.016113 eps: 0.00000100
|
105 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002777/0.060303/0.000990
|
106 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002075/0.040131/0.000730
|
107 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003326/0.099487/0.001260
|
108 |
+
!! - method: normal
|
109 |
+
!! Begin decoder 1
|
110 |
+
!! Begin self-attention
|
111 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.375000, max: 35.843750, std: 0.119446
|
112 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
113 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012146, max: 0.326172, std: 0.022308 eps: 0.00000100
|
114 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001299/0.042847/0.005116
|
115 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001262/0.056030/0.005295
|
116 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001407/0.011436/0.001119
|
117 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001063/0.086609/0.001472
|
118 |
+
!! - cache device: cuda:0, seq_len: 0
|
119 |
+
!! Begin MLP
|
120 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -9.039062, max: 33.656250, std: 0.116211
|
121 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003036, max: 0.166016, std: 0.010605 eps: 0.00000100
|
122 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003960/0.075562/0.001144
|
123 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003387/0.035187/0.000851
|
124 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002762/0.120483/0.001154
|
125 |
+
!! - method: normal
|
126 |
+
!! Begin decoder 2
|
127 |
+
!! Begin self-attention
|
128 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -11.812500, max: 30.734375, std: 0.155029
|
129 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
130 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.057617, max: 0.369141, std: 0.015396 eps: 0.00000100
|
131 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002361/0.074585/0.003971
|
132 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001963/0.050629/0.004532
|
133 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002445/0.020309/0.000759
|
134 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002083/0.110596/0.001124
|
135 |
+
!! - cache device: cuda:0, seq_len: 0
|
136 |
+
!! Begin MLP
|
137 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -11.648438, max: 26.859375, std: 0.158203
|
138 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014099, max: 0.161133, std: 0.011726 eps: 0.00000100
|
139 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002787/0.087097/0.001152
|
140 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003202/0.043213/0.000878
|
141 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002434/0.133301/0.001044
|
142 |
+
!! - method: normal
|
143 |
+
!! Begin decoder 3
|
144 |
+
!! Begin self-attention
|
145 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -890.500000, max: 24.171875, std: 0.338135
|
146 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
147 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.445312, std: 0.016769 eps: 0.00000100
|
148 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002218/0.064087/0.003193
|
149 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001682/0.047546/0.003334
|
150 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002258/0.013161/0.000889
|
151 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001929/0.086182/0.001017
|
152 |
+
!! - cache device: cuda:0, seq_len: 0
|
153 |
+
!! Begin MLP
|
154 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -890.500000, max: 25.640625, std: 0.342529
|
155 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.020508, max: 0.185547, std: 0.012711 eps: 0.00000100
|
156 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002598/0.055603/0.001158
|
157 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002819/0.043365/0.000893
|
158 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002668/0.083008/0.000952
|
159 |
+
!! - method: normal
|
160 |
+
!! Begin decoder 4
|
161 |
+
!! Begin self-attention
|
162 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -892.500000, max: 24.625000, std: 0.366211
|
163 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
164 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.036621, max: 0.458984, std: 0.017136 eps: 0.00000100
|
165 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002357/0.124084/0.003180
|
166 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001328/0.042419/0.003229
|
167 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002598/0.018280/0.000826
|
168 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001725/0.085449/0.000918
|
169 |
+
!! - cache device: cuda:0, seq_len: 0
|
170 |
+
!! Begin MLP
|
171 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -892.500000, max: 28.000000, std: 0.385742
|
172 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025391, max: 0.200195, std: 0.012398 eps: 0.00000100
|
173 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003830/0.047241/0.001214
|
174 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003572/0.041473/0.000900
|
175 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002481/0.095337/0.000922
|
176 |
+
!! - method: normal
|
177 |
+
!! Begin decoder 5
|
178 |
+
!! Begin self-attention
|
179 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -893.000000, max: 25.609375, std: 0.400879
|
180 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
181 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.492188, std: 0.019684 eps: 0.00000100
|
182 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001987/0.102661/0.003073
|
183 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001550/0.035492/0.003050
|
184 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002256/0.016541/0.000906
|
185 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002275/0.106079/0.001011
|
186 |
+
!! - cache device: cuda:0, seq_len: 0
|
187 |
+
!! Begin MLP
|
188 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -893.000000, max: 29.265625, std: 0.418213
|
189 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042236, max: 0.211914, std: 0.011848 eps: 0.00000100
|
190 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002773/0.047150/0.001265
|
191 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001515/0.041870/0.000920
|
192 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002594/0.062195/0.000935
|
193 |
+
!! - method: normal
|
194 |
+
!! Begin decoder 6
|
195 |
+
!! Begin self-attention
|
196 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -894.500000, max: 26.140625, std: 0.445312
|
197 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
198 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.067871, max: 0.558594, std: 0.019913 eps: 0.00000100
|
199 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002136/0.046173/0.003099
|
200 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001863/0.033478/0.003153
|
201 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002909/0.020889/0.000928
|
202 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001761/0.096313/0.001001
|
203 |
+
!! - cache device: cuda:0, seq_len: 0
|
204 |
+
!! Begin MLP
|
205 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -895.000000, max: 25.453125, std: 0.462891
|
206 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038574, max: 0.244141, std: 0.012810 eps: 0.00000100
|
207 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003599/0.058990/0.001412
|
208 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003576/0.044037/0.000947
|
209 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002380/0.090454/0.001029
|
210 |
+
!! - method: normal
|
211 |
+
!! Begin decoder 7
|
212 |
+
!! Begin self-attention
|
213 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.125000, std: 0.513672
|
214 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
215 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.010315, max: 0.609375, std: 0.018875 eps: 0.00000100
|
216 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002357/0.038116/0.002750
|
217 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002035/0.030289/0.002897
|
218 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002699/0.013130/0.000939
|
219 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001756/0.065430/0.000955
|
220 |
+
!! - cache device: cuda:0, seq_len: 0
|
221 |
+
!! Begin MLP
|
222 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.312500, std: 0.554688
|
223 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043701, max: 0.222656, std: 0.011360 eps: 0.00000100
|
224 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003187/0.053528/0.001369
|
225 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003983/0.029083/0.000935
|
226 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002668/0.070984/0.000947
|
227 |
+
!! - method: normal
|
228 |
+
!! Begin decoder 8
|
229 |
+
!! Begin self-attention
|
230 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.375000, std: 0.583008
|
231 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
232 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007812, max: 0.617188, std: 0.021469 eps: 0.00000100
|
233 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002020/0.036896/0.003115
|
234 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001634/0.027725/0.003042
|
235 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003176/0.019165/0.000947
|
236 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001910/0.084106/0.000935
|
237 |
+
!! - cache device: cuda:0, seq_len: 0
|
238 |
+
!! Begin MLP
|
239 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.500000, std: 0.605469
|
240 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.228516, std: 0.012070 eps: 0.00000100
|
241 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003246/0.053589/0.001263
|
242 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001094/0.036316/0.000944
|
243 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002659/0.075378/0.000929
|
244 |
+
!! - method: normal
|
245 |
+
!! Begin decoder 9
|
246 |
+
!! Begin self-attention
|
247 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.687500, std: 0.612305
|
248 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
249 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003876, max: 0.664062, std: 0.020859 eps: 0.00000100
|
250 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002146/0.038910/0.002712
|
251 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001664/0.032074/0.002876
|
252 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003122/0.015617/0.000871
|
253 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001311/0.095337/0.000900
|
254 |
+
!! - cache device: cuda:0, seq_len: 0
|
255 |
+
!! Begin MLP
|
256 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.750000, std: 0.625000
|
257 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.049805, max: 0.238281, std: 0.011787 eps: 0.00000100
|
258 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003241/0.061310/0.001322
|
259 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003132/0.040771/0.000956
|
260 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002480/0.081299/0.000928
|
261 |
+
!! - method: normal
|
262 |
+
!! Begin decoder 10
|
263 |
+
!! Begin self-attention
|
264 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.812500, std: 0.635742
|
265 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
266 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002594, max: 0.703125, std: 0.021515 eps: 0.00000100
|
267 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002222/0.033997/0.002638
|
268 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001856/0.029907/0.002831
|
269 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003365/0.014862/0.000932
|
270 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001518/0.084351/0.000958
|
271 |
+
!! - cache device: cuda:0, seq_len: 0
|
272 |
+
!! Begin MLP
|
273 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.875000, std: 0.654785
|
274 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.053955, max: 0.245117, std: 0.011978 eps: 0.00000100
|
275 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003246/0.042297/0.001295
|
276 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003368/0.040710/0.000970
|
277 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002800/0.089050/0.000934
|
278 |
+
!! - method: normal
|
279 |
+
!! Begin decoder 11
|
280 |
+
!! Begin self-attention
|
281 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 79.937500, std: 0.669922
|
282 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
283 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.007355, max: 0.687500, std: 0.021606 eps: 0.00000100
|
284 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002106/0.034271/0.002579
|
285 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002033/0.028885/0.002792
|
286 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003374/0.014481/0.000937
|
287 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001925/0.075500/0.000946
|
288 |
+
!! - cache device: cuda:0, seq_len: 0
|
289 |
+
!! Begin MLP
|
290 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.000000, std: 0.694336
|
291 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.054443, max: 0.251953, std: 0.011749 eps: 0.00000100
|
292 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003128/0.051086/0.001299
|
293 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001537/0.041565/0.000993
|
294 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003239/0.079163/0.000940
|
295 |
+
!! - method: normal
|
296 |
+
!! Begin decoder 12
|
297 |
+
!! Begin self-attention
|
298 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.062500, std: 0.726074
|
299 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
300 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.014771, max: 0.664062, std: 0.020920 eps: 0.00000100
|
301 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002449/0.034271/0.002655
|
302 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002136/0.032806/0.002867
|
303 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003397/0.019394/0.000961
|
304 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001609/0.057343/0.000999
|
305 |
+
!! - cache device: cuda:0, seq_len: 0
|
306 |
+
!! Begin MLP
|
307 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.250000, std: 0.751953
|
308 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.056396, max: 0.249023, std: 0.012207 eps: 0.00000100
|
309 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003019/0.043274/0.001330
|
310 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002712/0.043762/0.001000
|
311 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003359/0.118286/0.000953
|
312 |
+
!! - method: normal
|
313 |
+
!! Begin decoder 13
|
314 |
+
!! Begin self-attention
|
315 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.312500, std: 0.787598
|
316 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
317 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.687500, std: 0.021698 eps: 0.00000100
|
318 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002420/0.034241/0.002577
|
319 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002388/0.034241/0.002741
|
320 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003078/0.015854/0.000962
|
321 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002022/0.078918/0.000970
|
322 |
+
!! - cache device: cuda:0, seq_len: 0
|
323 |
+
!! Begin MLP
|
324 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.500000, std: 0.809570
|
325 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051025, max: 0.265625, std: 0.012978 eps: 0.00000100
|
326 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003170/0.036652/0.001327
|
327 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004108/0.028717/0.000996
|
328 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002531/0.052429/0.000926
|
329 |
+
!! - method: normal
|
330 |
+
!! Begin decoder 14
|
331 |
+
!! Begin self-attention
|
332 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.562500, std: 0.849121
|
333 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
334 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025879, max: 0.691406, std: 0.021164 eps: 0.00000100
|
335 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002115/0.035156/0.002348
|
336 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001471/0.031067/0.002569
|
337 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003618/0.020035/0.000957
|
338 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001540/0.086060/0.000992
|
339 |
+
!! - cache device: cuda:0, seq_len: 0
|
340 |
+
!! Begin MLP
|
341 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.687500, std: 0.866699
|
342 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.055420, max: 0.273438, std: 0.013245 eps: 0.00000100
|
343 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003336/0.032928/0.001335
|
344 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003906/0.045197/0.000993
|
345 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002605/0.088013/0.000936
|
346 |
+
!! - method: normal
|
347 |
+
!! Begin decoder 15
|
348 |
+
!! Begin self-attention
|
349 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.687500, std: 0.916016
|
350 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
351 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031494, max: 0.679688, std: 0.020615 eps: 0.00000100
|
352 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002296/0.038727/0.002529
|
353 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002375/0.030533/0.002689
|
354 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003328/0.015869/0.000980
|
355 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001546/0.124634/0.001021
|
356 |
+
!! - cache device: cuda:0, seq_len: 0
|
357 |
+
!! Begin MLP
|
358 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.750000, std: 0.945801
|
359 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040039, max: 0.291016, std: 0.014809 eps: 0.00000100
|
360 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003687/0.051025/0.001274
|
361 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004307/0.041656/0.000965
|
362 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002167/0.078613/0.000919
|
363 |
+
!! - method: normal
|
364 |
+
!! Begin decoder 16
|
365 |
+
!! Begin self-attention
|
366 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.750000, std: 0.993164
|
367 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
368 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.012573, max: 0.652344, std: 0.020477 eps: 0.00000100
|
369 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002371/0.034912/0.002207
|
370 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001926/0.029617/0.002392
|
371 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003460/0.018524/0.000947
|
372 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001738/0.051270/0.000971
|
373 |
+
!! - cache device: cuda:0, seq_len: 0
|
374 |
+
!! Begin MLP
|
375 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.812500, std: 1.004883
|
376 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045898, max: 0.298828, std: 0.015106 eps: 0.00000100
|
377 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003387/0.036011/0.001249
|
378 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003696/0.035187/0.000964
|
379 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002268/0.065063/0.000917
|
380 |
+
!! - method: normal
|
381 |
+
!! Begin decoder 17
|
382 |
+
!! Begin self-attention
|
383 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.812500, std: 1.059570
|
384 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
385 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.025146, max: 0.722656, std: 0.021576 eps: 0.00000100
|
386 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002331/0.036224/0.002277
|
387 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001755/0.030884/0.002550
|
388 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003754/0.020874/0.000970
|
389 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001672/0.116455/0.001009
|
390 |
+
!! - cache device: cuda:0, seq_len: 0
|
391 |
+
!! Begin MLP
|
392 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.098633
|
393 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.042969, max: 0.310547, std: 0.015625 eps: 0.00000100
|
394 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003586/0.035492/0.001222
|
395 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004265/0.044525/0.000955
|
396 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002222/0.067993/0.000917
|
397 |
+
!! - method: normal
|
398 |
+
!! Begin decoder 18
|
399 |
+
!! Begin self-attention
|
400 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.152344
|
401 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
402 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.738281, std: 0.022064 eps: 0.00000100
|
403 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002323/0.033447/0.002235
|
404 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001904/0.030121/0.002382
|
405 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004002/0.014252/0.000932
|
406 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001740/0.083801/0.000958
|
407 |
+
!! - cache device: cuda:0, seq_len: 0
|
408 |
+
!! Begin MLP
|
409 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.186523
|
410 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048584, max: 0.318359, std: 0.015625 eps: 0.00000100
|
411 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003035/0.034271/0.001252
|
412 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003998/0.045654/0.000957
|
413 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002491/0.084534/0.000911
|
414 |
+
!! - method: normal
|
415 |
+
!! Begin decoder 19
|
416 |
+
!! Begin self-attention
|
417 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.258789
|
418 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
419 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.024170, max: 0.753906, std: 0.022308 eps: 0.00000100
|
420 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002134/0.031494/0.002193
|
421 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001934/0.030380/0.002371
|
422 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003841/0.015404/0.000981
|
423 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001974/0.084167/0.001057
|
424 |
+
!! - cache device: cuda:0, seq_len: 0
|
425 |
+
!! Begin MLP
|
426 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.937500, std: 1.287109
|
427 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033936, max: 0.347656, std: 0.016785 eps: 0.00000100
|
428 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003767/0.040405/0.001213
|
429 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004185/0.043823/0.000943
|
430 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002474/0.062683/0.000900
|
431 |
+
!! - method: normal
|
432 |
+
!! Begin decoder 20
|
433 |
+
!! Begin self-attention
|
434 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.358398
|
435 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
436 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.757812, std: 0.022324 eps: 0.00000100
|
437 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002235/0.035187/0.002100
|
438 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002291/0.032471/0.002190
|
439 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003658/0.014191/0.001044
|
440 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001817/0.078064/0.001065
|
441 |
+
!! - cache device: cuda:0, seq_len: 0
|
442 |
+
!! Begin MLP
|
443 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.393555
|
444 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.048096, max: 0.345703, std: 0.016815 eps: 0.00000100
|
445 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003643/0.044281/0.001211
|
446 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004276/0.048615/0.000933
|
447 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002605/0.067444/0.000911
|
448 |
+
!! - method: normal
|
449 |
+
!! Begin decoder 21
|
450 |
+
!! Begin self-attention
|
451 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.483398
|
452 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
453 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037598, max: 0.796875, std: 0.023514 eps: 0.00000100
|
454 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002506/0.043945/0.002247
|
455 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002028/0.031616/0.002365
|
456 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004189/0.014427/0.001028
|
457 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001978/0.039856/0.001017
|
458 |
+
!! - cache device: cuda:0, seq_len: 0
|
459 |
+
!! Begin MLP
|
460 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.525391
|
461 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.347656, std: 0.017212 eps: 0.00000100
|
462 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003614/0.052155/0.001178
|
463 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.032867/0.000925
|
464 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002708/0.063232/0.000911
|
465 |
+
!! - method: normal
|
466 |
+
!! Begin decoder 22
|
467 |
+
!! Begin self-attention
|
468 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.616211
|
469 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
470 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.037354, max: 0.753906, std: 0.022934 eps: 0.00000100
|
471 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002468/0.036316/0.002068
|
472 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002302/0.030502/0.002201
|
473 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003658/0.014572/0.000998
|
474 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002260/0.096069/0.001020
|
475 |
+
!! - cache device: cuda:0, seq_len: 0
|
476 |
+
!! Begin MLP
|
477 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1029.000000, max: 80.875000, std: 1.678711
|
478 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.045654, max: 0.361328, std: 0.018143 eps: 0.00000100
|
479 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003679/0.035217/0.001136
|
480 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004360/0.036133/0.000911
|
481 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002403/0.078796/0.000916
|
482 |
+
!! - method: normal
|
483 |
+
!! Begin decoder 23
|
484 |
+
!! Begin self-attention
|
485 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.875000, std: 1.774414
|
486 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
487 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033691, max: 0.792969, std: 0.024429 eps: 0.00000100
|
488 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002359/0.034546/0.002054
|
489 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002043/0.033936/0.002104
|
490 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004379/0.013702/0.000979
|
491 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001885/0.075256/0.000995
|
492 |
+
!! - cache device: cuda:0, seq_len: 0
|
493 |
+
!! Begin MLP
|
494 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1028.000000, max: 80.812500, std: 1.833008
|
495 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.031982, max: 0.367188, std: 0.019226 eps: 0.00000100
|
496 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003729/0.050964/0.001107
|
497 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.036224/0.000899
|
498 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002159/0.082642/0.000899
|
499 |
+
!! - method: normal
|
500 |
+
!! Begin decoder 24
|
501 |
+
!! Begin self-attention
|
502 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 80.937500, std: 1.931641
|
503 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
504 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.051758, max: 0.812500, std: 0.025452 eps: 0.00000100
|
505 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002163/0.037628/0.002060
|
506 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002029/0.031433/0.002123
|
507 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003849/0.016617/0.000987
|
508 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001784/0.109741/0.001011
|
509 |
+
!! - cache device: cuda:0, seq_len: 0
|
510 |
+
!! Begin MLP
|
511 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 82.437500, std: 1.982422
|
512 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029419, max: 0.382812, std: 0.020203 eps: 0.00000100
|
513 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003664/0.039459/0.001067
|
514 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004559/0.033142/0.000891
|
515 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002037/0.088379/0.000898
|
516 |
+
!! - method: normal
|
517 |
+
!! Begin decoder 25
|
518 |
+
!! Begin self-attention
|
519 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 85.312500, std: 2.062500
|
520 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
521 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.043213, max: 0.816406, std: 0.024796 eps: 0.00000100
|
522 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001895/0.034515/0.002041
|
523 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001381/0.040314/0.002146
|
524 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003727/0.015511/0.001091
|
525 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002243/0.103149/0.001124
|
526 |
+
!! - cache device: cuda:0, seq_len: 0
|
527 |
+
!! Begin MLP
|
528 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1027.000000, max: 93.312500, std: 2.140625
|
529 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029663, max: 0.404297, std: 0.020950 eps: 0.00000100
|
530 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003717/0.032501/0.001052
|
531 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004433/0.026627/0.000883
|
532 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002089/0.068298/0.000892
|
533 |
+
!! - method: normal
|
534 |
+
!! Begin decoder 26
|
535 |
+
!! Begin self-attention
|
536 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1026.000000, max: 98.375000, std: 2.222656
|
537 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
538 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.875000, std: 0.026947 eps: 0.00000100
|
539 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002312/0.030716/0.001928
|
540 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002153/0.033234/0.002005
|
541 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004166/0.014450/0.000995
|
542 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002365/0.091187/0.001030
|
543 |
+
!! - cache device: cuda:0, seq_len: 0
|
544 |
+
!! Begin MLP
|
545 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1026.000000, max: 103.250000, std: 2.253906
|
546 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030518, max: 0.400391, std: 0.021332 eps: 0.00000100
|
547 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004192/0.032410/0.001042
|
548 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004314/0.036591/0.000883
|
549 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002001/0.074585/0.000899
|
550 |
+
!! - method: normal
|
551 |
+
!! Begin decoder 27
|
552 |
+
!! Begin self-attention
|
553 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1025.000000, max: 106.812500, std: 2.332031
|
554 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
555 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.044922, max: 0.906250, std: 0.027390 eps: 0.00000100
|
556 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002163/0.037323/0.002039
|
557 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002100/0.032104/0.002142
|
558 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004280/0.019775/0.000985
|
559 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002172/0.070496/0.001004
|
560 |
+
!! - cache device: cuda:0, seq_len: 0
|
561 |
+
!! Begin MLP
|
562 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1025.000000, max: 113.375000, std: 2.388672
|
563 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.034180, max: 0.406250, std: 0.021439 eps: 0.00000100
|
564 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004284/0.040131/0.001047
|
565 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004375/0.046295/0.000883
|
566 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002033/0.049622/0.000891
|
567 |
+
!! - method: normal
|
568 |
+
!! Begin decoder 28
|
569 |
+
!! Begin self-attention
|
570 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1024.000000, max: 116.187500, std: 2.458984
|
571 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
572 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038818, max: 0.937500, std: 0.027420 eps: 0.00000100
|
573 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002270/0.045990/0.002008
|
574 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002068/0.035706/0.002039
|
575 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003502/0.013725/0.001108
|
576 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002235/0.154175/0.001218
|
577 |
+
!! - cache device: cuda:0, seq_len: 0
|
578 |
+
!! Begin MLP
|
579 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1024.000000, max: 128.750000, std: 2.568359
|
580 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.022705, max: 0.423828, std: 0.022003 eps: 0.00000100
|
581 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004471/0.042694/0.001054
|
582 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004562/0.022446/0.000878
|
583 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001733/0.056427/0.000884
|
584 |
+
!! - method: normal
|
585 |
+
!! Begin decoder 29
|
586 |
+
!! Begin self-attention
|
587 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 131.500000, std: 2.623047
|
588 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
589 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003403, max: 0.957031, std: 0.027893 eps: 0.00000100
|
590 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002245/0.032928/0.001910
|
591 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002039/0.030350/0.001957
|
592 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004120/0.014153/0.001067
|
593 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002306/0.074097/0.001082
|
594 |
+
!! - cache device: cuda:0, seq_len: 0
|
595 |
+
!! Begin MLP
|
596 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1023.000000, max: 135.375000, std: 2.656250
|
597 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.028442, max: 0.691406, std: 0.022568 eps: 0.00000100
|
598 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004440/0.035675/0.001063
|
599 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004391/0.031128/0.000879
|
600 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001850/0.075684/0.000896
|
601 |
+
!! - method: normal
|
602 |
+
!! Begin decoder 30
|
603 |
+
!! Begin self-attention
|
604 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1022.000000, max: 138.750000, std: 2.707031
|
605 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
606 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.039062, max: 0.953125, std: 0.028458 eps: 0.00000100
|
607 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002247/0.030197/0.001984
|
608 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002272/0.032532/0.002090
|
609 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002539/0.015915/0.001025
|
610 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002310/0.092224/0.001046
|
611 |
+
!! - cache device: cuda:0, seq_len: 0
|
612 |
+
!! Begin MLP
|
613 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1022.000000, max: 145.125000, std: 2.757812
|
614 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.013855, max: 0.443359, std: 0.021713 eps: 0.00000100
|
615 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004665/0.045197/0.001092
|
616 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004078/0.036926/0.000885
|
617 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001813/0.072693/0.000899
|
618 |
+
!! - method: normal
|
619 |
+
!! Begin decoder 31
|
620 |
+
!! Begin self-attention
|
621 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1020.500000, max: 151.500000, std: 2.837891
|
622 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
623 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.005127, max: 0.949219, std: 0.028824 eps: 0.00000100
|
624 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002350/0.031052/0.001871
|
625 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002193/0.030899/0.001905
|
626 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004337/0.015503/0.001026
|
627 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002642/0.092957/0.001069
|
628 |
+
!! - cache device: cuda:0, seq_len: 0
|
629 |
+
!! Begin MLP
|
630 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1020.500000, max: 163.125000, std: 2.910156
|
631 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.015198, max: 0.449219, std: 0.022018 eps: 0.00000100
|
632 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004639/0.031525/0.001118
|
633 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004658/0.035858/0.000885
|
634 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001880/0.045258/0.000892
|
635 |
+
!! - method: normal
|
636 |
+
!! Begin decoder 32
|
637 |
+
!! Begin self-attention
|
638 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1019.000000, max: 165.250000, std: 2.960938
|
639 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
640 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002731, max: 0.898438, std: 0.028946 eps: 0.00000100
|
641 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002439/0.031342/0.001923
|
642 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001745/0.039093/0.001959
|
643 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003937/0.014107/0.001027
|
644 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002518/0.113953/0.001073
|
645 |
+
!! - cache device: cuda:0, seq_len: 0
|
646 |
+
!! Begin MLP
|
647 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1019.000000, max: 170.125000, std: 3.003906
|
648 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.003967, max: 0.746094, std: 0.022736 eps: 0.00000100
|
649 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004223/0.046234/0.001122
|
650 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004738/0.031342/0.000886
|
651 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001719/0.055420/0.000911
|
652 |
+
!! - method: normal
|
653 |
+
!! Begin decoder 33
|
654 |
+
!! Begin self-attention
|
655 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1016.500000, max: 172.750000, std: 3.056641
|
656 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
657 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.033203, max: 0.910156, std: 0.029999 eps: 0.00000100
|
658 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002224/0.034576/0.001955
|
659 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002178/0.034698/0.001965
|
660 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003191/0.017090/0.001073
|
661 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002516/0.098511/0.001093
|
662 |
+
!! - cache device: cuda:0, seq_len: 0
|
663 |
+
!! Begin MLP
|
664 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1015.500000, max: 177.375000, std: 3.095703
|
665 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.021729, max: 0.457031, std: 0.021973 eps: 0.00000100
|
666 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004440/0.058960/0.001143
|
667 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004120/0.027802/0.000899
|
668 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001822/0.089966/0.000950
|
669 |
+
!! - method: normal
|
670 |
+
!! Begin decoder 34
|
671 |
+
!! Begin self-attention
|
672 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1012.000000, max: 178.875000, std: 3.134766
|
673 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
674 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.038086, max: 0.953125, std: 0.030441 eps: 0.00000100
|
675 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002279/0.033783/0.001966
|
676 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002062/0.031311/0.002022
|
677 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003651/0.016846/0.001222
|
678 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002913/0.079651/0.001315
|
679 |
+
!! - cache device: cuda:0, seq_len: 0
|
680 |
+
!! Begin MLP
|
681 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -1011.000000, max: 181.750000, std: 3.199219
|
682 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.029907, max: 0.460938, std: 0.021744 eps: 0.00000100
|
683 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004433/0.036102/0.001138
|
684 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004498/0.028717/0.000901
|
685 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001920/0.123169/0.001141
|
686 |
+
!! - method: normal
|
687 |
+
!! Begin decoder 35
|
688 |
+
!! Begin self-attention
|
689 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -982.000000, max: 186.500000, std: 3.277344
|
690 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
691 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.040283, max: 0.917969, std: 0.029037 eps: 0.00000100
|
692 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002428/0.032837/0.001951
|
693 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002157/0.030807/0.002024
|
694 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003971/0.013626/0.001038
|
695 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003014/0.090149/0.001112
|
696 |
+
!! - cache device: cuda:0, seq_len: 0
|
697 |
+
!! Begin MLP
|
698 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -981.000000, max: 191.500000, std: 3.328125
|
699 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.030151, max: 0.468750, std: 0.021896 eps: 0.00000100
|
700 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002829/0.039459/0.001129
|
701 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004147/0.044250/0.000917
|
702 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002134/0.148560/0.001385
|
703 |
+
!! - method: normal
|
704 |
+
!! Begin decoder 36
|
705 |
+
!! Begin self-attention
|
706 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -919.500000, max: 191.500000, std: 3.392578
|
707 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
708 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.004456, max: 0.941406, std: 0.031082 eps: 0.00000100
|
709 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001844/0.032776/0.001974
|
710 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001781/0.031769/0.002085
|
711 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004047/0.016876/0.001062
|
712 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002771/0.059174/0.001117
|
713 |
+
!! - cache device: cuda:0, seq_len: 0
|
714 |
+
!! Begin MLP
|
715 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -919.000000, max: 193.875000, std: 3.429688
|
716 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.050537, max: 0.839844, std: 0.022324 eps: 0.00000100
|
717 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004131/0.048218/0.001153
|
718 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004238/0.036469/0.000927
|
719 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002237/0.148193/0.001454
|
720 |
+
!! - method: normal
|
721 |
+
!! Begin decoder 37
|
722 |
+
!! Begin self-attention
|
723 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -861.000000, max: 191.125000, std: 3.441406
|
724 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
725 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: -0.002762, max: 1.054688, std: 0.032867 eps: 0.00000100
|
726 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002245/0.036652/0.001965
|
727 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.001849/0.033752/0.002066
|
728 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003832/0.017563/0.001212
|
729 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003330/0.115906/0.001400
|
730 |
+
!! - cache device: cuda:0, seq_len: 0
|
731 |
+
!! Begin MLP
|
732 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -857.500000, max: 195.500000, std: 3.544922
|
733 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.066406, max: 0.593750, std: 0.021439 eps: 0.00000100
|
734 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003469/0.083496/0.001222
|
735 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003468/0.034821/0.000952
|
736 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002447/0.204346/0.002012
|
737 |
+
!! - method: normal
|
738 |
+
!! Begin decoder 38
|
739 |
+
!! Begin self-attention
|
740 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -580.000000, max: 195.125000, std: 3.591797
|
741 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
742 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.097656, max: 1.039062, std: 0.031891 eps: 0.00000100
|
743 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002375/0.045197/0.001980
|
744 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002171/0.030624/0.001997
|
745 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003450/0.017731/0.001331
|
746 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003344/0.227539/0.001991
|
747 |
+
!! - cache device: cuda:0, seq_len: 0
|
748 |
+
!! Begin MLP
|
749 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -108.500000, max: 203.750000, std: 3.845703
|
750 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.087891, max: 0.498047, std: 0.020370 eps: 0.00000100
|
751 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004387/0.031525/0.001246
|
752 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002453/0.059601/0.001083
|
753 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003397/0.199585/0.001426
|
754 |
+
!! - method: normal
|
755 |
+
!! Begin decoder 39
|
756 |
+
!! Begin self-attention
|
757 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -168.000000, max: 226.125000, std: 4.089844
|
758 |
+
!! - attn_mask: device: cuda:0, shape: [1, 1, 1920, 1920], dtype: float16, min: -65504.000000, max: 0.000000, std: 32752.000000
|
759 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.002625, max: 0.957031, std: 0.032471 eps: 0.00000100
|
760 |
+
!! - self_attn.q_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002300/0.047607/0.002197
|
761 |
+
!! - self_attn.k_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002066/0.033020/0.002274
|
762 |
+
!! - self_attn.v_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002975/0.016586/0.001257
|
763 |
+
!! - self_attn.o_proj: cuda:0 [Q,x_map] scales min/max/std: 0.003019/0.146851/0.001698
|
764 |
+
!! - cache device: cuda:0, seq_len: 0
|
765 |
+
!! Begin MLP
|
766 |
+
!! - hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -144.375000, max: 229.500000, std: 4.367188
|
767 |
+
!! - layernorm.weight: device: cuda:0, shape: [5120], dtype: float16, min: 0.109863, max: 0.648438, std: 0.025543 eps: 0.00000100
|
768 |
+
!! - mlp.gate_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002789/0.032501/0.001303
|
769 |
+
!! - mlp.up_proj: cuda:0 [Q,x_map] scales min/max/std: 0.002787/0.085999/0.001245
|
770 |
+
!! - mlp.down_proj: cuda:0 [Q,x_map] scales min/max/std: 0.004478/0.175049/0.001831
|
771 |
+
!! - method: normal
|
772 |
+
!! pre norm, hidden_states: device: cuda:0, shape: [1, 1920, 5120], dtype: float16, min: -198.250000, max: 719.000000, std: 6.828125
|
773 |
+
!! pre lm_head, hidden_states: device: cuda:0, shape: [1, 1, 5120], dtype: float16, min: -13.359375, max: 17.625000, std: 1.145508
|
774 |
+
!! logits: device: cuda:0, shape: [1, 1, 32000], dtype: float16, min: -11.101562, max: 10.367188, std: 2.171875
|
775 |
+
!! Moving logits from cuda:0 to cpu
|
776 |
+
** Time, Inference: 0.93 seconds
|