pranavajay commited on
Commit
d7c83e8
·
verified ·
1 Parent(s): c4437e7

Upload log.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. log.txt +488 -0
log.txt ADDED
@@ -0,0 +1,488 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Tensor 'context_embedder.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
2
+ Tensor 'context_embedder.weight' has different shapes: Model 1: torch.Size([2150, 4096]), Model 2: torch.Size([3072, 4096])
3
+ Tensor 'time_text_embed.guidance_embedder.linear_1.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
4
+ Tensor 'time_text_embed.guidance_embedder.linear_1.weight' has different shapes: Model 1: torch.Size([2150, 256]), Model 2: torch.Size([3072, 256])
5
+ Tensor 'time_text_embed.guidance_embedder.linear_2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
6
+ Tensor 'time_text_embed.guidance_embedder.linear_2.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
7
+ Tensor 'time_text_embed.text_embedder.linear_1.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
8
+ Tensor 'time_text_embed.text_embedder.linear_1.weight' has different shapes: Model 1: torch.Size([2150, 768]), Model 2: torch.Size([3072, 768])
9
+ Tensor 'time_text_embed.text_embedder.linear_2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
10
+ Tensor 'time_text_embed.text_embedder.linear_2.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
11
+ Tensor 'time_text_embed.timestep_embedder.linear_1.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
12
+ Tensor 'time_text_embed.timestep_embedder.linear_1.weight' has different shapes: Model 1: torch.Size([2150, 256]), Model 2: torch.Size([3072, 256])
13
+ Tensor 'time_text_embed.timestep_embedder.linear_2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
14
+ Tensor 'time_text_embed.timestep_embedder.linear_2.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
15
+ Tensor 'transformer_blocks.0.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
16
+ Tensor 'transformer_blocks.0.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
17
+ Tensor 'transformer_blocks.0.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
18
+ Tensor 'transformer_blocks.0.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
19
+ Tensor 'transformer_blocks.0.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
20
+ Tensor 'transformer_blocks.0.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
21
+ Tensor 'transformer_blocks.0.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
22
+ Tensor 'transformer_blocks.0.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
23
+ Tensor 'transformer_blocks.0.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
24
+ Tensor 'transformer_blocks.0.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
25
+ Tensor 'transformer_blocks.0.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
26
+ Tensor 'transformer_blocks.0.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
27
+ Tensor 'transformer_blocks.0.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
28
+ Tensor 'transformer_blocks.0.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
29
+ Tensor 'transformer_blocks.0.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
30
+ Tensor 'transformer_blocks.0.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
31
+ Tensor 'transformer_blocks.0.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
32
+ Tensor 'transformer_blocks.0.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
33
+ Tensor 'transformer_blocks.0.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
34
+ Tensor 'transformer_blocks.0.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
35
+ Tensor 'transformer_blocks.0.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
36
+ Tensor 'transformer_blocks.0.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
37
+ Tensor 'transformer_blocks.0.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
38
+ Tensor 'transformer_blocks.0.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
39
+ Tensor 'transformer_blocks.0.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
40
+ Tensor 'transformer_blocks.0.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
41
+ Tensor 'transformer_blocks.0.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
42
+ Tensor 'transformer_blocks.0.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
43
+ Tensor 'transformer_blocks.0.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
44
+ Tensor 'transformer_blocks.0.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
45
+ Tensor 'transformer_blocks.0.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
46
+ Tensor 'transformer_blocks.0.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
47
+ Tensor 'transformer_blocks.1.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
48
+ Tensor 'transformer_blocks.1.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
49
+ Tensor 'transformer_blocks.1.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
50
+ Tensor 'transformer_blocks.1.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
51
+ Tensor 'transformer_blocks.1.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
52
+ Tensor 'transformer_blocks.1.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
53
+ Tensor 'transformer_blocks.1.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
54
+ Tensor 'transformer_blocks.1.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
55
+ Tensor 'transformer_blocks.1.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
56
+ Tensor 'transformer_blocks.1.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
57
+ Tensor 'transformer_blocks.1.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
58
+ Tensor 'transformer_blocks.1.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
59
+ Tensor 'transformer_blocks.1.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
60
+ Tensor 'transformer_blocks.1.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
61
+ Tensor 'transformer_blocks.1.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
62
+ Tensor 'transformer_blocks.1.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
63
+ Tensor 'transformer_blocks.1.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
64
+ Tensor 'transformer_blocks.1.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
65
+ Tensor 'transformer_blocks.1.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
66
+ Tensor 'transformer_blocks.1.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
67
+ Tensor 'transformer_blocks.1.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
68
+ Tensor 'transformer_blocks.1.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
69
+ Tensor 'transformer_blocks.1.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
70
+ Tensor 'transformer_blocks.1.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
71
+ Tensor 'transformer_blocks.1.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
72
+ Tensor 'transformer_blocks.1.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
73
+ Tensor 'transformer_blocks.1.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
74
+ Tensor 'transformer_blocks.1.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
75
+ Tensor 'transformer_blocks.1.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
76
+ Tensor 'transformer_blocks.1.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
77
+ Tensor 'transformer_blocks.1.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
78
+ Tensor 'transformer_blocks.1.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
79
+ Tensor 'transformer_blocks.10.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
80
+ Tensor 'transformer_blocks.10.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
81
+ Tensor 'transformer_blocks.10.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
82
+ Tensor 'transformer_blocks.10.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
83
+ Tensor 'transformer_blocks.10.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
84
+ Tensor 'transformer_blocks.10.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
85
+ Tensor 'transformer_blocks.10.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
86
+ Tensor 'transformer_blocks.10.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
87
+ Tensor 'transformer_blocks.10.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
88
+ Tensor 'transformer_blocks.10.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
89
+ Tensor 'transformer_blocks.10.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
90
+ Tensor 'transformer_blocks.10.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
91
+ Tensor 'transformer_blocks.10.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
92
+ Tensor 'transformer_blocks.10.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
93
+ Tensor 'transformer_blocks.10.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
94
+ Tensor 'transformer_blocks.10.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
95
+ Tensor 'transformer_blocks.10.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
96
+ Tensor 'transformer_blocks.10.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
97
+ Tensor 'transformer_blocks.10.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
98
+ Tensor 'transformer_blocks.10.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
99
+ Tensor 'transformer_blocks.10.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
100
+ Tensor 'transformer_blocks.10.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
101
+ Tensor 'transformer_blocks.10.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
102
+ Tensor 'transformer_blocks.10.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
103
+ Tensor 'transformer_blocks.10.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
104
+ Tensor 'transformer_blocks.10.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
105
+ Tensor 'transformer_blocks.10.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
106
+ Tensor 'transformer_blocks.10.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
107
+ Tensor 'transformer_blocks.10.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
108
+ Tensor 'transformer_blocks.10.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
109
+ Tensor 'transformer_blocks.10.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
110
+ Tensor 'transformer_blocks.10.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
111
+ Tensor 'transformer_blocks.11.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
112
+ Tensor 'transformer_blocks.11.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
113
+ Tensor 'transformer_blocks.11.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
114
+ Tensor 'transformer_blocks.11.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
115
+ Tensor 'transformer_blocks.11.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
116
+ Tensor 'transformer_blocks.11.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
117
+ Tensor 'transformer_blocks.11.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
118
+ Tensor 'transformer_blocks.11.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
119
+ Tensor 'transformer_blocks.11.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
120
+ Tensor 'transformer_blocks.11.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
121
+ Tensor 'transformer_blocks.11.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
122
+ Tensor 'transformer_blocks.11.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
123
+ Tensor 'transformer_blocks.11.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
124
+ Tensor 'transformer_blocks.11.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
125
+ Tensor 'transformer_blocks.11.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
126
+ Tensor 'transformer_blocks.11.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
127
+ Tensor 'transformer_blocks.11.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
128
+ Tensor 'transformer_blocks.11.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
129
+ Tensor 'transformer_blocks.11.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
130
+ Tensor 'transformer_blocks.11.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
131
+ Tensor 'transformer_blocks.11.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
132
+ Tensor 'transformer_blocks.11.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
133
+ Tensor 'transformer_blocks.11.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
134
+ Tensor 'transformer_blocks.11.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
135
+ Tensor 'transformer_blocks.11.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
136
+ Tensor 'transformer_blocks.11.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
137
+ Tensor 'transformer_blocks.11.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
138
+ Tensor 'transformer_blocks.11.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
139
+ Tensor 'transformer_blocks.11.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
140
+ Tensor 'transformer_blocks.11.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
141
+ Tensor 'transformer_blocks.11.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
142
+ Tensor 'transformer_blocks.11.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
143
+ Tensor 'transformer_blocks.12.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
144
+ Tensor 'transformer_blocks.12.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
145
+ Tensor 'transformer_blocks.12.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
146
+ Tensor 'transformer_blocks.12.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
147
+ Tensor 'transformer_blocks.12.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
148
+ Tensor 'transformer_blocks.12.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
149
+ Tensor 'transformer_blocks.12.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
150
+ Tensor 'transformer_blocks.12.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
151
+ Tensor 'transformer_blocks.12.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
152
+ Tensor 'transformer_blocks.12.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
153
+ Tensor 'transformer_blocks.12.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
154
+ Tensor 'transformer_blocks.12.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
155
+ Tensor 'transformer_blocks.12.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
156
+ Tensor 'transformer_blocks.12.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
157
+ Tensor 'transformer_blocks.12.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
158
+ Tensor 'transformer_blocks.12.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
159
+ Tensor 'transformer_blocks.12.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
160
+ Tensor 'transformer_blocks.12.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
161
+ Tensor 'transformer_blocks.12.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
162
+ Tensor 'transformer_blocks.12.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
163
+ Tensor 'transformer_blocks.12.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
164
+ Tensor 'transformer_blocks.12.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
165
+ Tensor 'transformer_blocks.12.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
166
+ Tensor 'transformer_blocks.12.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
167
+ Tensor 'transformer_blocks.12.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
168
+ Tensor 'transformer_blocks.12.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
169
+ Tensor 'transformer_blocks.12.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
170
+ Tensor 'transformer_blocks.12.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
171
+ Tensor 'transformer_blocks.12.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
172
+ Tensor 'transformer_blocks.12.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
173
+ Tensor 'transformer_blocks.12.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
174
+ Tensor 'transformer_blocks.12.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
175
+ Tensor 'transformer_blocks.13.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
176
+ Tensor 'transformer_blocks.13.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
177
+ Tensor 'transformer_blocks.13.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
178
+ Tensor 'transformer_blocks.13.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
179
+ Tensor 'transformer_blocks.13.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
180
+ Tensor 'transformer_blocks.13.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
181
+ Tensor 'transformer_blocks.13.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
182
+ Tensor 'transformer_blocks.13.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
183
+ Tensor 'transformer_blocks.13.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
184
+ Tensor 'transformer_blocks.13.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
185
+ Tensor 'transformer_blocks.13.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
186
+ Tensor 'transformer_blocks.13.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
187
+ Tensor 'transformer_blocks.13.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
188
+ Tensor 'transformer_blocks.13.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
189
+ Tensor 'transformer_blocks.13.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
190
+ Tensor 'transformer_blocks.13.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
191
+ Tensor 'transformer_blocks.13.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
192
+ Tensor 'transformer_blocks.13.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
193
+ Tensor 'transformer_blocks.13.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
194
+ Tensor 'transformer_blocks.13.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
195
+ Tensor 'transformer_blocks.13.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
196
+ Tensor 'transformer_blocks.13.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
197
+ Tensor 'transformer_blocks.13.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
198
+ Tensor 'transformer_blocks.13.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
199
+ Tensor 'transformer_blocks.13.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
200
+ Tensor 'transformer_blocks.13.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
201
+ Tensor 'transformer_blocks.13.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
202
+ Tensor 'transformer_blocks.13.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
203
+ Tensor 'transformer_blocks.13.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
204
+ Tensor 'transformer_blocks.13.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
205
+ Tensor 'transformer_blocks.13.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
206
+ Tensor 'transformer_blocks.13.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
207
+ Tensor 'transformer_blocks.14.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
208
+ Tensor 'transformer_blocks.14.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
209
+ Tensor 'transformer_blocks.14.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
210
+ Tensor 'transformer_blocks.14.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
211
+ Tensor 'transformer_blocks.14.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
212
+ Tensor 'transformer_blocks.14.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
213
+ Tensor 'transformer_blocks.14.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
214
+ Tensor 'transformer_blocks.14.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
215
+ Tensor 'transformer_blocks.14.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
216
+ Tensor 'transformer_blocks.14.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
217
+ Tensor 'transformer_blocks.14.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
218
+ Tensor 'transformer_blocks.14.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
219
+ Tensor 'transformer_blocks.14.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
220
+ Tensor 'transformer_blocks.14.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
221
+ Tensor 'transformer_blocks.14.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
222
+ Tensor 'transformer_blocks.14.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
223
+ Tensor 'transformer_blocks.14.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
224
+ Tensor 'transformer_blocks.14.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
225
+ Tensor 'transformer_blocks.14.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
226
+ Tensor 'transformer_blocks.14.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
227
+ Tensor 'transformer_blocks.14.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
228
+ Tensor 'transformer_blocks.14.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
229
+ Tensor 'transformer_blocks.14.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
230
+ Tensor 'transformer_blocks.14.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
231
+ Tensor 'transformer_blocks.2.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
232
+ Tensor 'transformer_blocks.2.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
233
+ Tensor 'transformer_blocks.2.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
234
+ Tensor 'transformer_blocks.2.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
235
+ Tensor 'transformer_blocks.2.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
236
+ Tensor 'transformer_blocks.2.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
237
+ Tensor 'transformer_blocks.2.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
238
+ Tensor 'transformer_blocks.2.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
239
+ Tensor 'transformer_blocks.2.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
240
+ Tensor 'transformer_blocks.2.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
241
+ Tensor 'transformer_blocks.2.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
242
+ Tensor 'transformer_blocks.2.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
243
+ Tensor 'transformer_blocks.2.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
244
+ Tensor 'transformer_blocks.2.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
245
+ Tensor 'transformer_blocks.2.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
246
+ Tensor 'transformer_blocks.2.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
247
+ Tensor 'transformer_blocks.2.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
248
+ Tensor 'transformer_blocks.2.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
249
+ Tensor 'transformer_blocks.2.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
250
+ Tensor 'transformer_blocks.2.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
251
+ Tensor 'transformer_blocks.2.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
252
+ Tensor 'transformer_blocks.2.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
253
+ Tensor 'transformer_blocks.2.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
254
+ Tensor 'transformer_blocks.2.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
255
+ Tensor 'transformer_blocks.2.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
256
+ Tensor 'transformer_blocks.2.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
257
+ Tensor 'transformer_blocks.2.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
258
+ Tensor 'transformer_blocks.2.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
259
+ Tensor 'transformer_blocks.2.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
260
+ Tensor 'transformer_blocks.2.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
261
+ Tensor 'transformer_blocks.2.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
262
+ Tensor 'transformer_blocks.2.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
263
+ Tensor 'transformer_blocks.3.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
264
+ Tensor 'transformer_blocks.3.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
265
+ Tensor 'transformer_blocks.3.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
266
+ Tensor 'transformer_blocks.3.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
267
+ Tensor 'transformer_blocks.3.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
268
+ Tensor 'transformer_blocks.3.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
269
+ Tensor 'transformer_blocks.3.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
270
+ Tensor 'transformer_blocks.3.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
271
+ Tensor 'transformer_blocks.3.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
272
+ Tensor 'transformer_blocks.3.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
273
+ Tensor 'transformer_blocks.3.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
274
+ Tensor 'transformer_blocks.3.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
275
+ Tensor 'transformer_blocks.3.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
276
+ Tensor 'transformer_blocks.3.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
277
+ Tensor 'transformer_blocks.3.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
278
+ Tensor 'transformer_blocks.3.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
279
+ Tensor 'transformer_blocks.3.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
280
+ Tensor 'transformer_blocks.3.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
281
+ Tensor 'transformer_blocks.3.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
282
+ Tensor 'transformer_blocks.3.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
283
+ Tensor 'transformer_blocks.3.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
284
+ Tensor 'transformer_blocks.3.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
285
+ Tensor 'transformer_blocks.3.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
286
+ Tensor 'transformer_blocks.3.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
287
+ Tensor 'transformer_blocks.3.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
288
+ Tensor 'transformer_blocks.3.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
289
+ Tensor 'transformer_blocks.3.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
290
+ Tensor 'transformer_blocks.3.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
291
+ Tensor 'transformer_blocks.3.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
292
+ Tensor 'transformer_blocks.3.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
293
+ Tensor 'transformer_blocks.3.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
294
+ Tensor 'transformer_blocks.3.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
295
+ Tensor 'transformer_blocks.4.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
296
+ Tensor 'transformer_blocks.4.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
297
+ Tensor 'transformer_blocks.4.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
298
+ Tensor 'transformer_blocks.4.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
299
+ Tensor 'transformer_blocks.4.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
300
+ Tensor 'transformer_blocks.4.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
301
+ Tensor 'transformer_blocks.4.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
302
+ Tensor 'transformer_blocks.4.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
303
+ Tensor 'transformer_blocks.4.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
304
+ Tensor 'transformer_blocks.4.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
305
+ Tensor 'transformer_blocks.4.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
306
+ Tensor 'transformer_blocks.4.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
307
+ Tensor 'transformer_blocks.4.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
308
+ Tensor 'transformer_blocks.4.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
309
+ Tensor 'transformer_blocks.4.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
310
+ Tensor 'transformer_blocks.4.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
311
+ Tensor 'transformer_blocks.4.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
312
+ Tensor 'transformer_blocks.4.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
313
+ Tensor 'transformer_blocks.4.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
314
+ Tensor 'transformer_blocks.4.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
315
+ Tensor 'transformer_blocks.4.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
316
+ Tensor 'transformer_blocks.4.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
317
+ Tensor 'transformer_blocks.4.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
318
+ Tensor 'transformer_blocks.4.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
319
+ Tensor 'transformer_blocks.4.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
320
+ Tensor 'transformer_blocks.4.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
321
+ Tensor 'transformer_blocks.4.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
322
+ Tensor 'transformer_blocks.4.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
323
+ Tensor 'transformer_blocks.4.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
324
+ Tensor 'transformer_blocks.4.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
325
+ Tensor 'transformer_blocks.4.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
326
+ Tensor 'transformer_blocks.4.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
327
+ Tensor 'transformer_blocks.5.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
328
+ Tensor 'transformer_blocks.5.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
329
+ Tensor 'transformer_blocks.5.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
330
+ Tensor 'transformer_blocks.5.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
331
+ Tensor 'transformer_blocks.5.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
332
+ Tensor 'transformer_blocks.5.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
333
+ Tensor 'transformer_blocks.5.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
334
+ Tensor 'transformer_blocks.5.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
335
+ Tensor 'transformer_blocks.5.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
336
+ Tensor 'transformer_blocks.5.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
337
+ Tensor 'transformer_blocks.5.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
338
+ Tensor 'transformer_blocks.5.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
339
+ Tensor 'transformer_blocks.5.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
340
+ Tensor 'transformer_blocks.5.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
341
+ Tensor 'transformer_blocks.5.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
342
+ Tensor 'transformer_blocks.5.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
343
+ Tensor 'transformer_blocks.5.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
344
+ Tensor 'transformer_blocks.5.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
345
+ Tensor 'transformer_blocks.5.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
346
+ Tensor 'transformer_blocks.5.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
347
+ Tensor 'transformer_blocks.5.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
348
+ Tensor 'transformer_blocks.5.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
349
+ Tensor 'transformer_blocks.5.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
350
+ Tensor 'transformer_blocks.5.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
351
+ Tensor 'transformer_blocks.5.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
352
+ Tensor 'transformer_blocks.5.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
353
+ Tensor 'transformer_blocks.5.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
354
+ Tensor 'transformer_blocks.5.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
355
+ Tensor 'transformer_blocks.5.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
356
+ Tensor 'transformer_blocks.5.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
357
+ Tensor 'transformer_blocks.5.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
358
+ Tensor 'transformer_blocks.5.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
359
+ Tensor 'transformer_blocks.6.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
360
+ Tensor 'transformer_blocks.6.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
361
+ Tensor 'transformer_blocks.6.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
362
+ Tensor 'transformer_blocks.6.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
363
+ Tensor 'transformer_blocks.6.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
364
+ Tensor 'transformer_blocks.6.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
365
+ Tensor 'transformer_blocks.6.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
366
+ Tensor 'transformer_blocks.6.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
367
+ Tensor 'transformer_blocks.6.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
368
+ Tensor 'transformer_blocks.6.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
369
+ Tensor 'transformer_blocks.6.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
370
+ Tensor 'transformer_blocks.6.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
371
+ Tensor 'transformer_blocks.6.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
372
+ Tensor 'transformer_blocks.6.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
373
+ Tensor 'transformer_blocks.6.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
374
+ Tensor 'transformer_blocks.6.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
375
+ Tensor 'transformer_blocks.6.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
376
+ Tensor 'transformer_blocks.6.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
377
+ Tensor 'transformer_blocks.6.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
378
+ Tensor 'transformer_blocks.6.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
379
+ Tensor 'transformer_blocks.6.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
380
+ Tensor 'transformer_blocks.6.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
381
+ Tensor 'transformer_blocks.6.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
382
+ Tensor 'transformer_blocks.6.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
383
+ Tensor 'transformer_blocks.6.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
384
+ Tensor 'transformer_blocks.6.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
385
+ Tensor 'transformer_blocks.6.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
386
+ Tensor 'transformer_blocks.6.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
387
+ Tensor 'transformer_blocks.6.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
388
+ Tensor 'transformer_blocks.6.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
389
+ Tensor 'transformer_blocks.6.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
390
+ Tensor 'transformer_blocks.6.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
391
+ Tensor 'transformer_blocks.7.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
392
+ Tensor 'transformer_blocks.7.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
393
+ Tensor 'transformer_blocks.7.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
394
+ Tensor 'transformer_blocks.7.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
395
+ Tensor 'transformer_blocks.7.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
396
+ Tensor 'transformer_blocks.7.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
397
+ Tensor 'transformer_blocks.7.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
398
+ Tensor 'transformer_blocks.7.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
399
+ Tensor 'transformer_blocks.7.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
400
+ Tensor 'transformer_blocks.7.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
401
+ Tensor 'transformer_blocks.7.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
402
+ Tensor 'transformer_blocks.7.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
403
+ Tensor 'transformer_blocks.7.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
404
+ Tensor 'transformer_blocks.7.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
405
+ Tensor 'transformer_blocks.7.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
406
+ Tensor 'transformer_blocks.7.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
407
+ Tensor 'transformer_blocks.7.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
408
+ Tensor 'transformer_blocks.7.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
409
+ Tensor 'transformer_blocks.7.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
410
+ Tensor 'transformer_blocks.7.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
411
+ Tensor 'transformer_blocks.7.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
412
+ Tensor 'transformer_blocks.7.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
413
+ Tensor 'transformer_blocks.7.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
414
+ Tensor 'transformer_blocks.7.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
415
+ Tensor 'transformer_blocks.7.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
416
+ Tensor 'transformer_blocks.7.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
417
+ Tensor 'transformer_blocks.7.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
418
+ Tensor 'transformer_blocks.7.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
419
+ Tensor 'transformer_blocks.7.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
420
+ Tensor 'transformer_blocks.7.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
421
+ Tensor 'transformer_blocks.7.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
422
+ Tensor 'transformer_blocks.7.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
423
+ Tensor 'transformer_blocks.8.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
424
+ Tensor 'transformer_blocks.8.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
425
+ Tensor 'transformer_blocks.8.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
426
+ Tensor 'transformer_blocks.8.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
427
+ Tensor 'transformer_blocks.8.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
428
+ Tensor 'transformer_blocks.8.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
429
+ Tensor 'transformer_blocks.8.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
430
+ Tensor 'transformer_blocks.8.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
431
+ Tensor 'transformer_blocks.8.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
432
+ Tensor 'transformer_blocks.8.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
433
+ Tensor 'transformer_blocks.8.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
434
+ Tensor 'transformer_blocks.8.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
435
+ Tensor 'transformer_blocks.8.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
436
+ Tensor 'transformer_blocks.8.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
437
+ Tensor 'transformer_blocks.8.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
438
+ Tensor 'transformer_blocks.8.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
439
+ Tensor 'transformer_blocks.8.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
440
+ Tensor 'transformer_blocks.8.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
441
+ Tensor 'transformer_blocks.8.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
442
+ Tensor 'transformer_blocks.8.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
443
+ Tensor 'transformer_blocks.8.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
444
+ Tensor 'transformer_blocks.8.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
445
+ Tensor 'transformer_blocks.8.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
446
+ Tensor 'transformer_blocks.8.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
447
+ Tensor 'transformer_blocks.8.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
448
+ Tensor 'transformer_blocks.8.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
449
+ Tensor 'transformer_blocks.8.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
450
+ Tensor 'transformer_blocks.8.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
451
+ Tensor 'transformer_blocks.8.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
452
+ Tensor 'transformer_blocks.8.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
453
+ Tensor 'transformer_blocks.8.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
454
+ Tensor 'transformer_blocks.8.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
455
+ Tensor 'transformer_blocks.9.attn.add_k_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
456
+ Tensor 'transformer_blocks.9.attn.add_k_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
457
+ Tensor 'transformer_blocks.9.attn.add_q_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
458
+ Tensor 'transformer_blocks.9.attn.add_q_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
459
+ Tensor 'transformer_blocks.9.attn.add_v_proj.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
460
+ Tensor 'transformer_blocks.9.attn.add_v_proj.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
461
+ Tensor 'transformer_blocks.9.attn.norm_added_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
462
+ Tensor 'transformer_blocks.9.attn.norm_added_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
463
+ Tensor 'transformer_blocks.9.attn.norm_k.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
464
+ Tensor 'transformer_blocks.9.attn.norm_q.weight' has different shapes: Model 1: torch.Size([89]), Model 2: torch.Size([128])
465
+ Tensor 'transformer_blocks.9.attn.to_add_out.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
466
+ Tensor 'transformer_blocks.9.attn.to_add_out.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
467
+ Tensor 'transformer_blocks.9.attn.to_k.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
468
+ Tensor 'transformer_blocks.9.attn.to_k.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
469
+ Tensor 'transformer_blocks.9.attn.to_out.0.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
470
+ Tensor 'transformer_blocks.9.attn.to_out.0.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
471
+ Tensor 'transformer_blocks.9.attn.to_q.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
472
+ Tensor 'transformer_blocks.9.attn.to_q.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
473
+ Tensor 'transformer_blocks.9.attn.to_v.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
474
+ Tensor 'transformer_blocks.9.attn.to_v.weight' has different shapes: Model 1: torch.Size([2150, 3072]), Model 2: torch.Size([3072, 3072])
475
+ Tensor 'transformer_blocks.9.ff.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
476
+ Tensor 'transformer_blocks.9.ff.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
477
+ Tensor 'transformer_blocks.9.ff.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
478
+ Tensor 'transformer_blocks.9.ff.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
479
+ Tensor 'transformer_blocks.9.ff_context.net.0.proj.bias' has different shapes: Model 1: torch.Size([8601]), Model 2: torch.Size([12288])
480
+ Tensor 'transformer_blocks.9.ff_context.net.0.proj.weight' has different shapes: Model 1: torch.Size([8601, 3072]), Model 2: torch.Size([12288, 3072])
481
+ Tensor 'transformer_blocks.9.ff_context.net.2.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
482
+ Tensor 'transformer_blocks.9.ff_context.net.2.weight' has different shapes: Model 1: torch.Size([2150, 12288]), Model 2: torch.Size([3072, 12288])
483
+ Tensor 'transformer_blocks.9.norm1.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
484
+ Tensor 'transformer_blocks.9.norm1.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
485
+ Tensor 'transformer_blocks.9.norm1_context.linear.bias' has different shapes: Model 1: torch.Size([12902]), Model 2: torch.Size([18432])
486
+ Tensor 'transformer_blocks.9.norm1_context.linear.weight' has different shapes: Model 1: torch.Size([12902, 3072]), Model 2: torch.Size([18432, 3072])
487
+ Tensor 'x_embedder.bias' has different shapes: Model 1: torch.Size([2150]), Model 2: torch.Size([3072])
488
+ Tensor 'x_embedder.weight' has different shapes: Model 1: torch.Size([2150, 64]), Model 2: torch.Size([3072, 64])