wzkariampuzha commited on
Commit
e2eee02
·
1 Parent(s): eea0e9e

Upload config.yaml

Browse files
Files changed (1) hide show
  1. config.yaml +2929 -0
config.yaml ADDED
@@ -0,0 +1,2929 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _n_gpu:
4
+ desc: null
5
+ value: 1
6
+ _name_or_path:
7
+ desc: null
8
+ value: dmis-lab/biobert-base-cased-v1.1
9
+ _wandb:
10
+ desc: null
11
+ value:
12
+ cli_version: 0.12.2
13
+ framework: huggingface
14
+ huggingface_version: 4.10.0
15
+ is_jupyter_run: false
16
+ is_kaggle_kernel: false
17
+ m:
18
+ - 1: train/global_step
19
+ 6:
20
+ - 3
21
+ - 1: gradients/classifier\.bias._type
22
+ 5: 1
23
+ 6:
24
+ - 1
25
+ - 1: gradients/classifier\.bias.values
26
+ 5: 1
27
+ 6:
28
+ - 1
29
+ - 1: gradients/classifier\.bias.bins
30
+ 5: 1
31
+ 6:
32
+ - 1
33
+ - 1: gradients/classifier\.weight._type
34
+ 5: 1
35
+ 6:
36
+ - 1
37
+ - 1: gradients/classifier\.weight.values
38
+ 5: 1
39
+ 6:
40
+ - 1
41
+ - 1: gradients/classifier\.weight.bins
42
+ 5: 1
43
+ 6:
44
+ - 1
45
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.weight._type
46
+ 5: 1
47
+ 6:
48
+ - 1
49
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.weight.values
50
+ 5: 1
51
+ 6:
52
+ - 1
53
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.weight.bins
54
+ 5: 1
55
+ 6:
56
+ - 1
57
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.bias._type
58
+ 5: 1
59
+ 6:
60
+ - 1
61
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.bias.values
62
+ 5: 1
63
+ 6:
64
+ - 1
65
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.LayerNorm\.bias.bins
66
+ 5: 1
67
+ 6:
68
+ - 1
69
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.bias._type
70
+ 5: 1
71
+ 6:
72
+ - 1
73
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.bias.values
74
+ 5: 1
75
+ 6:
76
+ - 1
77
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.bias.bins
78
+ 5: 1
79
+ 6:
80
+ - 1
81
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.weight._type
82
+ 5: 1
83
+ 6:
84
+ - 1
85
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.weight.values
86
+ 5: 1
87
+ 6:
88
+ - 1
89
+ - 1: gradients/bert\.encoder\.layer\.11\.output\.dense\.weight.bins
90
+ 5: 1
91
+ 6:
92
+ - 1
93
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.bias._type
94
+ 5: 1
95
+ 6:
96
+ - 1
97
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.bias.values
98
+ 5: 1
99
+ 6:
100
+ - 1
101
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.bias.bins
102
+ 5: 1
103
+ 6:
104
+ - 1
105
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.weight._type
106
+ 5: 1
107
+ 6:
108
+ - 1
109
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.weight.values
110
+ 5: 1
111
+ 6:
112
+ - 1
113
+ - 1: gradients/bert\.encoder\.layer\.11\.intermediate\.dense\.weight.bins
114
+ 5: 1
115
+ 6:
116
+ - 1
117
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.weight._type
118
+ 5: 1
119
+ 6:
120
+ - 1
121
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.weight.values
122
+ 5: 1
123
+ 6:
124
+ - 1
125
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.weight.bins
126
+ 5: 1
127
+ 6:
128
+ - 1
129
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.bias._type
130
+ 5: 1
131
+ 6:
132
+ - 1
133
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.bias.values
134
+ 5: 1
135
+ 6:
136
+ - 1
137
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.LayerNorm\.bias.bins
138
+ 5: 1
139
+ 6:
140
+ - 1
141
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.bias._type
142
+ 5: 1
143
+ 6:
144
+ - 1
145
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.bias.values
146
+ 5: 1
147
+ 6:
148
+ - 1
149
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.bias.bins
150
+ 5: 1
151
+ 6:
152
+ - 1
153
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.weight._type
154
+ 5: 1
155
+ 6:
156
+ - 1
157
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.weight.values
158
+ 5: 1
159
+ 6:
160
+ - 1
161
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.output\.dense\.weight.bins
162
+ 5: 1
163
+ 6:
164
+ - 1
165
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.bias._type
166
+ 5: 1
167
+ 6:
168
+ - 1
169
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.bias.values
170
+ 5: 1
171
+ 6:
172
+ - 1
173
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.bias.bins
174
+ 5: 1
175
+ 6:
176
+ - 1
177
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.weight._type
178
+ 5: 1
179
+ 6:
180
+ - 1
181
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.weight.values
182
+ 5: 1
183
+ 6:
184
+ - 1
185
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.value\.weight.bins
186
+ 5: 1
187
+ 6:
188
+ - 1
189
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.bias._type
190
+ 5: 1
191
+ 6:
192
+ - 1
193
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.bias.values
194
+ 5: 1
195
+ 6:
196
+ - 1
197
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.bias.bins
198
+ 5: 1
199
+ 6:
200
+ - 1
201
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.weight._type
202
+ 5: 1
203
+ 6:
204
+ - 1
205
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.weight.values
206
+ 5: 1
207
+ 6:
208
+ - 1
209
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.key\.weight.bins
210
+ 5: 1
211
+ 6:
212
+ - 1
213
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.bias._type
214
+ 5: 1
215
+ 6:
216
+ - 1
217
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.bias.values
218
+ 5: 1
219
+ 6:
220
+ - 1
221
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.bias.bins
222
+ 5: 1
223
+ 6:
224
+ - 1
225
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.weight._type
226
+ 5: 1
227
+ 6:
228
+ - 1
229
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.weight.values
230
+ 5: 1
231
+ 6:
232
+ - 1
233
+ - 1: gradients/bert\.encoder\.layer\.11\.attention\.self\.query\.weight.bins
234
+ 5: 1
235
+ 6:
236
+ - 1
237
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.weight._type
238
+ 5: 1
239
+ 6:
240
+ - 1
241
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.weight.values
242
+ 5: 1
243
+ 6:
244
+ - 1
245
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.weight.bins
246
+ 5: 1
247
+ 6:
248
+ - 1
249
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.bias._type
250
+ 5: 1
251
+ 6:
252
+ - 1
253
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.bias.values
254
+ 5: 1
255
+ 6:
256
+ - 1
257
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.LayerNorm\.bias.bins
258
+ 5: 1
259
+ 6:
260
+ - 1
261
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.bias._type
262
+ 5: 1
263
+ 6:
264
+ - 1
265
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.bias.values
266
+ 5: 1
267
+ 6:
268
+ - 1
269
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.bias.bins
270
+ 5: 1
271
+ 6:
272
+ - 1
273
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.weight._type
274
+ 5: 1
275
+ 6:
276
+ - 1
277
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.weight.values
278
+ 5: 1
279
+ 6:
280
+ - 1
281
+ - 1: gradients/bert\.encoder\.layer\.10\.output\.dense\.weight.bins
282
+ 5: 1
283
+ 6:
284
+ - 1
285
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.bias._type
286
+ 5: 1
287
+ 6:
288
+ - 1
289
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.bias.values
290
+ 5: 1
291
+ 6:
292
+ - 1
293
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.bias.bins
294
+ 5: 1
295
+ 6:
296
+ - 1
297
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.weight._type
298
+ 5: 1
299
+ 6:
300
+ - 1
301
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.weight.values
302
+ 5: 1
303
+ 6:
304
+ - 1
305
+ - 1: gradients/bert\.encoder\.layer\.10\.intermediate\.dense\.weight.bins
306
+ 5: 1
307
+ 6:
308
+ - 1
309
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.weight._type
310
+ 5: 1
311
+ 6:
312
+ - 1
313
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.weight.values
314
+ 5: 1
315
+ 6:
316
+ - 1
317
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.weight.bins
318
+ 5: 1
319
+ 6:
320
+ - 1
321
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.bias._type
322
+ 5: 1
323
+ 6:
324
+ - 1
325
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.bias.values
326
+ 5: 1
327
+ 6:
328
+ - 1
329
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.LayerNorm\.bias.bins
330
+ 5: 1
331
+ 6:
332
+ - 1
333
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.bias._type
334
+ 5: 1
335
+ 6:
336
+ - 1
337
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.bias.values
338
+ 5: 1
339
+ 6:
340
+ - 1
341
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.bias.bins
342
+ 5: 1
343
+ 6:
344
+ - 1
345
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.weight._type
346
+ 5: 1
347
+ 6:
348
+ - 1
349
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.weight.values
350
+ 5: 1
351
+ 6:
352
+ - 1
353
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.output\.dense\.weight.bins
354
+ 5: 1
355
+ 6:
356
+ - 1
357
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.bias._type
358
+ 5: 1
359
+ 6:
360
+ - 1
361
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.bias.values
362
+ 5: 1
363
+ 6:
364
+ - 1
365
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.bias.bins
366
+ 5: 1
367
+ 6:
368
+ - 1
369
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.weight._type
370
+ 5: 1
371
+ 6:
372
+ - 1
373
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.weight.values
374
+ 5: 1
375
+ 6:
376
+ - 1
377
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.value\.weight.bins
378
+ 5: 1
379
+ 6:
380
+ - 1
381
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.bias._type
382
+ 5: 1
383
+ 6:
384
+ - 1
385
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.bias.values
386
+ 5: 1
387
+ 6:
388
+ - 1
389
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.bias.bins
390
+ 5: 1
391
+ 6:
392
+ - 1
393
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.weight._type
394
+ 5: 1
395
+ 6:
396
+ - 1
397
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.weight.values
398
+ 5: 1
399
+ 6:
400
+ - 1
401
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.key\.weight.bins
402
+ 5: 1
403
+ 6:
404
+ - 1
405
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.bias._type
406
+ 5: 1
407
+ 6:
408
+ - 1
409
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.bias.values
410
+ 5: 1
411
+ 6:
412
+ - 1
413
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.bias.bins
414
+ 5: 1
415
+ 6:
416
+ - 1
417
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.weight._type
418
+ 5: 1
419
+ 6:
420
+ - 1
421
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.weight.values
422
+ 5: 1
423
+ 6:
424
+ - 1
425
+ - 1: gradients/bert\.encoder\.layer\.10\.attention\.self\.query\.weight.bins
426
+ 5: 1
427
+ 6:
428
+ - 1
429
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.weight._type
430
+ 5: 1
431
+ 6:
432
+ - 1
433
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.weight.values
434
+ 5: 1
435
+ 6:
436
+ - 1
437
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.weight.bins
438
+ 5: 1
439
+ 6:
440
+ - 1
441
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.bias._type
442
+ 5: 1
443
+ 6:
444
+ - 1
445
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.bias.values
446
+ 5: 1
447
+ 6:
448
+ - 1
449
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.LayerNorm\.bias.bins
450
+ 5: 1
451
+ 6:
452
+ - 1
453
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.bias._type
454
+ 5: 1
455
+ 6:
456
+ - 1
457
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.bias.values
458
+ 5: 1
459
+ 6:
460
+ - 1
461
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.bias.bins
462
+ 5: 1
463
+ 6:
464
+ - 1
465
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.weight._type
466
+ 5: 1
467
+ 6:
468
+ - 1
469
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.weight.values
470
+ 5: 1
471
+ 6:
472
+ - 1
473
+ - 1: gradients/bert\.encoder\.layer\.9\.output\.dense\.weight.bins
474
+ 5: 1
475
+ 6:
476
+ - 1
477
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.bias._type
478
+ 5: 1
479
+ 6:
480
+ - 1
481
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.bias.values
482
+ 5: 1
483
+ 6:
484
+ - 1
485
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.bias.bins
486
+ 5: 1
487
+ 6:
488
+ - 1
489
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.weight._type
490
+ 5: 1
491
+ 6:
492
+ - 1
493
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.weight.values
494
+ 5: 1
495
+ 6:
496
+ - 1
497
+ - 1: gradients/bert\.encoder\.layer\.9\.intermediate\.dense\.weight.bins
498
+ 5: 1
499
+ 6:
500
+ - 1
501
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.weight._type
502
+ 5: 1
503
+ 6:
504
+ - 1
505
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.weight.values
506
+ 5: 1
507
+ 6:
508
+ - 1
509
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.weight.bins
510
+ 5: 1
511
+ 6:
512
+ - 1
513
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.bias._type
514
+ 5: 1
515
+ 6:
516
+ - 1
517
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.bias.values
518
+ 5: 1
519
+ 6:
520
+ - 1
521
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.LayerNorm\.bias.bins
522
+ 5: 1
523
+ 6:
524
+ - 1
525
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.bias._type
526
+ 5: 1
527
+ 6:
528
+ - 1
529
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.bias.values
530
+ 5: 1
531
+ 6:
532
+ - 1
533
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.bias.bins
534
+ 5: 1
535
+ 6:
536
+ - 1
537
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.weight._type
538
+ 5: 1
539
+ 6:
540
+ - 1
541
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.weight.values
542
+ 5: 1
543
+ 6:
544
+ - 1
545
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.output\.dense\.weight.bins
546
+ 5: 1
547
+ 6:
548
+ - 1
549
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.bias._type
550
+ 5: 1
551
+ 6:
552
+ - 1
553
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.bias.values
554
+ 5: 1
555
+ 6:
556
+ - 1
557
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.bias.bins
558
+ 5: 1
559
+ 6:
560
+ - 1
561
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.weight._type
562
+ 5: 1
563
+ 6:
564
+ - 1
565
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.weight.values
566
+ 5: 1
567
+ 6:
568
+ - 1
569
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.value\.weight.bins
570
+ 5: 1
571
+ 6:
572
+ - 1
573
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.bias._type
574
+ 5: 1
575
+ 6:
576
+ - 1
577
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.bias.values
578
+ 5: 1
579
+ 6:
580
+ - 1
581
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.bias.bins
582
+ 5: 1
583
+ 6:
584
+ - 1
585
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.weight._type
586
+ 5: 1
587
+ 6:
588
+ - 1
589
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.weight.values
590
+ 5: 1
591
+ 6:
592
+ - 1
593
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.key\.weight.bins
594
+ 5: 1
595
+ 6:
596
+ - 1
597
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.bias._type
598
+ 5: 1
599
+ 6:
600
+ - 1
601
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.bias.values
602
+ 5: 1
603
+ 6:
604
+ - 1
605
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.bias.bins
606
+ 5: 1
607
+ 6:
608
+ - 1
609
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.weight._type
610
+ 5: 1
611
+ 6:
612
+ - 1
613
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.weight.values
614
+ 5: 1
615
+ 6:
616
+ - 1
617
+ - 1: gradients/bert\.encoder\.layer\.9\.attention\.self\.query\.weight.bins
618
+ 5: 1
619
+ 6:
620
+ - 1
621
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.weight._type
622
+ 5: 1
623
+ 6:
624
+ - 1
625
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.weight.values
626
+ 5: 1
627
+ 6:
628
+ - 1
629
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.weight.bins
630
+ 5: 1
631
+ 6:
632
+ - 1
633
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.bias._type
634
+ 5: 1
635
+ 6:
636
+ - 1
637
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.bias.values
638
+ 5: 1
639
+ 6:
640
+ - 1
641
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.LayerNorm\.bias.bins
642
+ 5: 1
643
+ 6:
644
+ - 1
645
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.bias._type
646
+ 5: 1
647
+ 6:
648
+ - 1
649
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.bias.values
650
+ 5: 1
651
+ 6:
652
+ - 1
653
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.bias.bins
654
+ 5: 1
655
+ 6:
656
+ - 1
657
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.weight._type
658
+ 5: 1
659
+ 6:
660
+ - 1
661
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.weight.values
662
+ 5: 1
663
+ 6:
664
+ - 1
665
+ - 1: gradients/bert\.encoder\.layer\.8\.output\.dense\.weight.bins
666
+ 5: 1
667
+ 6:
668
+ - 1
669
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.bias._type
670
+ 5: 1
671
+ 6:
672
+ - 1
673
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.bias.values
674
+ 5: 1
675
+ 6:
676
+ - 1
677
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.bias.bins
678
+ 5: 1
679
+ 6:
680
+ - 1
681
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.weight._type
682
+ 5: 1
683
+ 6:
684
+ - 1
685
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.weight.values
686
+ 5: 1
687
+ 6:
688
+ - 1
689
+ - 1: gradients/bert\.encoder\.layer\.8\.intermediate\.dense\.weight.bins
690
+ 5: 1
691
+ 6:
692
+ - 1
693
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.weight._type
694
+ 5: 1
695
+ 6:
696
+ - 1
697
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.weight.values
698
+ 5: 1
699
+ 6:
700
+ - 1
701
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.weight.bins
702
+ 5: 1
703
+ 6:
704
+ - 1
705
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.bias._type
706
+ 5: 1
707
+ 6:
708
+ - 1
709
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.bias.values
710
+ 5: 1
711
+ 6:
712
+ - 1
713
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.LayerNorm\.bias.bins
714
+ 5: 1
715
+ 6:
716
+ - 1
717
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.bias._type
718
+ 5: 1
719
+ 6:
720
+ - 1
721
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.bias.values
722
+ 5: 1
723
+ 6:
724
+ - 1
725
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.bias.bins
726
+ 5: 1
727
+ 6:
728
+ - 1
729
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.weight._type
730
+ 5: 1
731
+ 6:
732
+ - 1
733
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.weight.values
734
+ 5: 1
735
+ 6:
736
+ - 1
737
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.output\.dense\.weight.bins
738
+ 5: 1
739
+ 6:
740
+ - 1
741
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.bias._type
742
+ 5: 1
743
+ 6:
744
+ - 1
745
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.bias.values
746
+ 5: 1
747
+ 6:
748
+ - 1
749
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.bias.bins
750
+ 5: 1
751
+ 6:
752
+ - 1
753
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.weight._type
754
+ 5: 1
755
+ 6:
756
+ - 1
757
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.weight.values
758
+ 5: 1
759
+ 6:
760
+ - 1
761
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.value\.weight.bins
762
+ 5: 1
763
+ 6:
764
+ - 1
765
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.bias._type
766
+ 5: 1
767
+ 6:
768
+ - 1
769
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.bias.values
770
+ 5: 1
771
+ 6:
772
+ - 1
773
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.bias.bins
774
+ 5: 1
775
+ 6:
776
+ - 1
777
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.weight._type
778
+ 5: 1
779
+ 6:
780
+ - 1
781
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.weight.values
782
+ 5: 1
783
+ 6:
784
+ - 1
785
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.key\.weight.bins
786
+ 5: 1
787
+ 6:
788
+ - 1
789
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.bias._type
790
+ 5: 1
791
+ 6:
792
+ - 1
793
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.bias.values
794
+ 5: 1
795
+ 6:
796
+ - 1
797
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.bias.bins
798
+ 5: 1
799
+ 6:
800
+ - 1
801
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.weight._type
802
+ 5: 1
803
+ 6:
804
+ - 1
805
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.weight.values
806
+ 5: 1
807
+ 6:
808
+ - 1
809
+ - 1: gradients/bert\.encoder\.layer\.8\.attention\.self\.query\.weight.bins
810
+ 5: 1
811
+ 6:
812
+ - 1
813
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.weight._type
814
+ 5: 1
815
+ 6:
816
+ - 1
817
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.weight.values
818
+ 5: 1
819
+ 6:
820
+ - 1
821
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.weight.bins
822
+ 5: 1
823
+ 6:
824
+ - 1
825
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.bias._type
826
+ 5: 1
827
+ 6:
828
+ - 1
829
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.bias.values
830
+ 5: 1
831
+ 6:
832
+ - 1
833
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.LayerNorm\.bias.bins
834
+ 5: 1
835
+ 6:
836
+ - 1
837
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.bias._type
838
+ 5: 1
839
+ 6:
840
+ - 1
841
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.bias.values
842
+ 5: 1
843
+ 6:
844
+ - 1
845
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.bias.bins
846
+ 5: 1
847
+ 6:
848
+ - 1
849
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.weight._type
850
+ 5: 1
851
+ 6:
852
+ - 1
853
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.weight.values
854
+ 5: 1
855
+ 6:
856
+ - 1
857
+ - 1: gradients/bert\.encoder\.layer\.7\.output\.dense\.weight.bins
858
+ 5: 1
859
+ 6:
860
+ - 1
861
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.bias._type
862
+ 5: 1
863
+ 6:
864
+ - 1
865
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.bias.values
866
+ 5: 1
867
+ 6:
868
+ - 1
869
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.bias.bins
870
+ 5: 1
871
+ 6:
872
+ - 1
873
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.weight._type
874
+ 5: 1
875
+ 6:
876
+ - 1
877
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.weight.values
878
+ 5: 1
879
+ 6:
880
+ - 1
881
+ - 1: gradients/bert\.encoder\.layer\.7\.intermediate\.dense\.weight.bins
882
+ 5: 1
883
+ 6:
884
+ - 1
885
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.weight._type
886
+ 5: 1
887
+ 6:
888
+ - 1
889
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.weight.values
890
+ 5: 1
891
+ 6:
892
+ - 1
893
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.weight.bins
894
+ 5: 1
895
+ 6:
896
+ - 1
897
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.bias._type
898
+ 5: 1
899
+ 6:
900
+ - 1
901
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.bias.values
902
+ 5: 1
903
+ 6:
904
+ - 1
905
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.LayerNorm\.bias.bins
906
+ 5: 1
907
+ 6:
908
+ - 1
909
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.bias._type
910
+ 5: 1
911
+ 6:
912
+ - 1
913
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.bias.values
914
+ 5: 1
915
+ 6:
916
+ - 1
917
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.bias.bins
918
+ 5: 1
919
+ 6:
920
+ - 1
921
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.weight._type
922
+ 5: 1
923
+ 6:
924
+ - 1
925
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.weight.values
926
+ 5: 1
927
+ 6:
928
+ - 1
929
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.output\.dense\.weight.bins
930
+ 5: 1
931
+ 6:
932
+ - 1
933
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.bias._type
934
+ 5: 1
935
+ 6:
936
+ - 1
937
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.bias.values
938
+ 5: 1
939
+ 6:
940
+ - 1
941
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.bias.bins
942
+ 5: 1
943
+ 6:
944
+ - 1
945
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.weight._type
946
+ 5: 1
947
+ 6:
948
+ - 1
949
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.weight.values
950
+ 5: 1
951
+ 6:
952
+ - 1
953
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.value\.weight.bins
954
+ 5: 1
955
+ 6:
956
+ - 1
957
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.bias._type
958
+ 5: 1
959
+ 6:
960
+ - 1
961
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.bias.values
962
+ 5: 1
963
+ 6:
964
+ - 1
965
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.bias.bins
966
+ 5: 1
967
+ 6:
968
+ - 1
969
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.weight._type
970
+ 5: 1
971
+ 6:
972
+ - 1
973
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.weight.values
974
+ 5: 1
975
+ 6:
976
+ - 1
977
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.key\.weight.bins
978
+ 5: 1
979
+ 6:
980
+ - 1
981
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.bias._type
982
+ 5: 1
983
+ 6:
984
+ - 1
985
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.bias.values
986
+ 5: 1
987
+ 6:
988
+ - 1
989
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.bias.bins
990
+ 5: 1
991
+ 6:
992
+ - 1
993
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.weight._type
994
+ 5: 1
995
+ 6:
996
+ - 1
997
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.weight.values
998
+ 5: 1
999
+ 6:
1000
+ - 1
1001
+ - 1: gradients/bert\.encoder\.layer\.7\.attention\.self\.query\.weight.bins
1002
+ 5: 1
1003
+ 6:
1004
+ - 1
1005
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.weight._type
1006
+ 5: 1
1007
+ 6:
1008
+ - 1
1009
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.weight.values
1010
+ 5: 1
1011
+ 6:
1012
+ - 1
1013
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.weight.bins
1014
+ 5: 1
1015
+ 6:
1016
+ - 1
1017
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.bias._type
1018
+ 5: 1
1019
+ 6:
1020
+ - 1
1021
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.bias.values
1022
+ 5: 1
1023
+ 6:
1024
+ - 1
1025
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.LayerNorm\.bias.bins
1026
+ 5: 1
1027
+ 6:
1028
+ - 1
1029
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.bias._type
1030
+ 5: 1
1031
+ 6:
1032
+ - 1
1033
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.bias.values
1034
+ 5: 1
1035
+ 6:
1036
+ - 1
1037
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.bias.bins
1038
+ 5: 1
1039
+ 6:
1040
+ - 1
1041
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.weight._type
1042
+ 5: 1
1043
+ 6:
1044
+ - 1
1045
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.weight.values
1046
+ 5: 1
1047
+ 6:
1048
+ - 1
1049
+ - 1: gradients/bert\.encoder\.layer\.6\.output\.dense\.weight.bins
1050
+ 5: 1
1051
+ 6:
1052
+ - 1
1053
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.bias._type
1054
+ 5: 1
1055
+ 6:
1056
+ - 1
1057
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.bias.values
1058
+ 5: 1
1059
+ 6:
1060
+ - 1
1061
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.bias.bins
1062
+ 5: 1
1063
+ 6:
1064
+ - 1
1065
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.weight._type
1066
+ 5: 1
1067
+ 6:
1068
+ - 1
1069
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.weight.values
1070
+ 5: 1
1071
+ 6:
1072
+ - 1
1073
+ - 1: gradients/bert\.encoder\.layer\.6\.intermediate\.dense\.weight.bins
1074
+ 5: 1
1075
+ 6:
1076
+ - 1
1077
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.weight._type
1078
+ 5: 1
1079
+ 6:
1080
+ - 1
1081
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.weight.values
1082
+ 5: 1
1083
+ 6:
1084
+ - 1
1085
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.weight.bins
1086
+ 5: 1
1087
+ 6:
1088
+ - 1
1089
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.bias._type
1090
+ 5: 1
1091
+ 6:
1092
+ - 1
1093
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.bias.values
1094
+ 5: 1
1095
+ 6:
1096
+ - 1
1097
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.LayerNorm\.bias.bins
1098
+ 5: 1
1099
+ 6:
1100
+ - 1
1101
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.bias._type
1102
+ 5: 1
1103
+ 6:
1104
+ - 1
1105
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.bias.values
1106
+ 5: 1
1107
+ 6:
1108
+ - 1
1109
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.bias.bins
1110
+ 5: 1
1111
+ 6:
1112
+ - 1
1113
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.weight._type
1114
+ 5: 1
1115
+ 6:
1116
+ - 1
1117
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.weight.values
1118
+ 5: 1
1119
+ 6:
1120
+ - 1
1121
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.output\.dense\.weight.bins
1122
+ 5: 1
1123
+ 6:
1124
+ - 1
1125
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.bias._type
1126
+ 5: 1
1127
+ 6:
1128
+ - 1
1129
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.bias.values
1130
+ 5: 1
1131
+ 6:
1132
+ - 1
1133
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.bias.bins
1134
+ 5: 1
1135
+ 6:
1136
+ - 1
1137
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.weight._type
1138
+ 5: 1
1139
+ 6:
1140
+ - 1
1141
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.weight.values
1142
+ 5: 1
1143
+ 6:
1144
+ - 1
1145
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.value\.weight.bins
1146
+ 5: 1
1147
+ 6:
1148
+ - 1
1149
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.bias._type
1150
+ 5: 1
1151
+ 6:
1152
+ - 1
1153
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.bias.values
1154
+ 5: 1
1155
+ 6:
1156
+ - 1
1157
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.bias.bins
1158
+ 5: 1
1159
+ 6:
1160
+ - 1
1161
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.weight._type
1162
+ 5: 1
1163
+ 6:
1164
+ - 1
1165
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.weight.values
1166
+ 5: 1
1167
+ 6:
1168
+ - 1
1169
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.key\.weight.bins
1170
+ 5: 1
1171
+ 6:
1172
+ - 1
1173
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.bias._type
1174
+ 5: 1
1175
+ 6:
1176
+ - 1
1177
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.bias.values
1178
+ 5: 1
1179
+ 6:
1180
+ - 1
1181
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.bias.bins
1182
+ 5: 1
1183
+ 6:
1184
+ - 1
1185
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.weight._type
1186
+ 5: 1
1187
+ 6:
1188
+ - 1
1189
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.weight.values
1190
+ 5: 1
1191
+ 6:
1192
+ - 1
1193
+ - 1: gradients/bert\.encoder\.layer\.6\.attention\.self\.query\.weight.bins
1194
+ 5: 1
1195
+ 6:
1196
+ - 1
1197
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.weight._type
1198
+ 5: 1
1199
+ 6:
1200
+ - 1
1201
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.weight.values
1202
+ 5: 1
1203
+ 6:
1204
+ - 1
1205
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.weight.bins
1206
+ 5: 1
1207
+ 6:
1208
+ - 1
1209
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.bias._type
1210
+ 5: 1
1211
+ 6:
1212
+ - 1
1213
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.bias.values
1214
+ 5: 1
1215
+ 6:
1216
+ - 1
1217
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.LayerNorm\.bias.bins
1218
+ 5: 1
1219
+ 6:
1220
+ - 1
1221
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.bias._type
1222
+ 5: 1
1223
+ 6:
1224
+ - 1
1225
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.bias.values
1226
+ 5: 1
1227
+ 6:
1228
+ - 1
1229
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.bias.bins
1230
+ 5: 1
1231
+ 6:
1232
+ - 1
1233
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.weight._type
1234
+ 5: 1
1235
+ 6:
1236
+ - 1
1237
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.weight.values
1238
+ 5: 1
1239
+ 6:
1240
+ - 1
1241
+ - 1: gradients/bert\.encoder\.layer\.5\.output\.dense\.weight.bins
1242
+ 5: 1
1243
+ 6:
1244
+ - 1
1245
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.bias._type
1246
+ 5: 1
1247
+ 6:
1248
+ - 1
1249
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.bias.values
1250
+ 5: 1
1251
+ 6:
1252
+ - 1
1253
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.bias.bins
1254
+ 5: 1
1255
+ 6:
1256
+ - 1
1257
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.weight._type
1258
+ 5: 1
1259
+ 6:
1260
+ - 1
1261
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.weight.values
1262
+ 5: 1
1263
+ 6:
1264
+ - 1
1265
+ - 1: gradients/bert\.encoder\.layer\.5\.intermediate\.dense\.weight.bins
1266
+ 5: 1
1267
+ 6:
1268
+ - 1
1269
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.weight._type
1270
+ 5: 1
1271
+ 6:
1272
+ - 1
1273
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.weight.values
1274
+ 5: 1
1275
+ 6:
1276
+ - 1
1277
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.weight.bins
1278
+ 5: 1
1279
+ 6:
1280
+ - 1
1281
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.bias._type
1282
+ 5: 1
1283
+ 6:
1284
+ - 1
1285
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.bias.values
1286
+ 5: 1
1287
+ 6:
1288
+ - 1
1289
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.LayerNorm\.bias.bins
1290
+ 5: 1
1291
+ 6:
1292
+ - 1
1293
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.bias._type
1294
+ 5: 1
1295
+ 6:
1296
+ - 1
1297
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.bias.values
1298
+ 5: 1
1299
+ 6:
1300
+ - 1
1301
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.bias.bins
1302
+ 5: 1
1303
+ 6:
1304
+ - 1
1305
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.weight._type
1306
+ 5: 1
1307
+ 6:
1308
+ - 1
1309
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.weight.values
1310
+ 5: 1
1311
+ 6:
1312
+ - 1
1313
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.output\.dense\.weight.bins
1314
+ 5: 1
1315
+ 6:
1316
+ - 1
1317
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.bias._type
1318
+ 5: 1
1319
+ 6:
1320
+ - 1
1321
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.bias.values
1322
+ 5: 1
1323
+ 6:
1324
+ - 1
1325
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.bias.bins
1326
+ 5: 1
1327
+ 6:
1328
+ - 1
1329
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.weight._type
1330
+ 5: 1
1331
+ 6:
1332
+ - 1
1333
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.weight.values
1334
+ 5: 1
1335
+ 6:
1336
+ - 1
1337
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.value\.weight.bins
1338
+ 5: 1
1339
+ 6:
1340
+ - 1
1341
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.bias._type
1342
+ 5: 1
1343
+ 6:
1344
+ - 1
1345
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.bias.values
1346
+ 5: 1
1347
+ 6:
1348
+ - 1
1349
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.bias.bins
1350
+ 5: 1
1351
+ 6:
1352
+ - 1
1353
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.weight._type
1354
+ 5: 1
1355
+ 6:
1356
+ - 1
1357
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.weight.values
1358
+ 5: 1
1359
+ 6:
1360
+ - 1
1361
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.key\.weight.bins
1362
+ 5: 1
1363
+ 6:
1364
+ - 1
1365
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.bias._type
1366
+ 5: 1
1367
+ 6:
1368
+ - 1
1369
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.bias.values
1370
+ 5: 1
1371
+ 6:
1372
+ - 1
1373
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.bias.bins
1374
+ 5: 1
1375
+ 6:
1376
+ - 1
1377
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.weight._type
1378
+ 5: 1
1379
+ 6:
1380
+ - 1
1381
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.weight.values
1382
+ 5: 1
1383
+ 6:
1384
+ - 1
1385
+ - 1: gradients/bert\.encoder\.layer\.5\.attention\.self\.query\.weight.bins
1386
+ 5: 1
1387
+ 6:
1388
+ - 1
1389
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.weight._type
1390
+ 5: 1
1391
+ 6:
1392
+ - 1
1393
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.weight.values
1394
+ 5: 1
1395
+ 6:
1396
+ - 1
1397
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.weight.bins
1398
+ 5: 1
1399
+ 6:
1400
+ - 1
1401
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.bias._type
1402
+ 5: 1
1403
+ 6:
1404
+ - 1
1405
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.bias.values
1406
+ 5: 1
1407
+ 6:
1408
+ - 1
1409
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.LayerNorm\.bias.bins
1410
+ 5: 1
1411
+ 6:
1412
+ - 1
1413
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.bias._type
1414
+ 5: 1
1415
+ 6:
1416
+ - 1
1417
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.bias.values
1418
+ 5: 1
1419
+ 6:
1420
+ - 1
1421
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.bias.bins
1422
+ 5: 1
1423
+ 6:
1424
+ - 1
1425
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.weight._type
1426
+ 5: 1
1427
+ 6:
1428
+ - 1
1429
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.weight.values
1430
+ 5: 1
1431
+ 6:
1432
+ - 1
1433
+ - 1: gradients/bert\.encoder\.layer\.4\.output\.dense\.weight.bins
1434
+ 5: 1
1435
+ 6:
1436
+ - 1
1437
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.bias._type
1438
+ 5: 1
1439
+ 6:
1440
+ - 1
1441
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.bias.values
1442
+ 5: 1
1443
+ 6:
1444
+ - 1
1445
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.bias.bins
1446
+ 5: 1
1447
+ 6:
1448
+ - 1
1449
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.weight._type
1450
+ 5: 1
1451
+ 6:
1452
+ - 1
1453
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.weight.values
1454
+ 5: 1
1455
+ 6:
1456
+ - 1
1457
+ - 1: gradients/bert\.encoder\.layer\.4\.intermediate\.dense\.weight.bins
1458
+ 5: 1
1459
+ 6:
1460
+ - 1
1461
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.weight._type
1462
+ 5: 1
1463
+ 6:
1464
+ - 1
1465
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.weight.values
1466
+ 5: 1
1467
+ 6:
1468
+ - 1
1469
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.weight.bins
1470
+ 5: 1
1471
+ 6:
1472
+ - 1
1473
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.bias._type
1474
+ 5: 1
1475
+ 6:
1476
+ - 1
1477
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.bias.values
1478
+ 5: 1
1479
+ 6:
1480
+ - 1
1481
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.LayerNorm\.bias.bins
1482
+ 5: 1
1483
+ 6:
1484
+ - 1
1485
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.bias._type
1486
+ 5: 1
1487
+ 6:
1488
+ - 1
1489
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.bias.values
1490
+ 5: 1
1491
+ 6:
1492
+ - 1
1493
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.bias.bins
1494
+ 5: 1
1495
+ 6:
1496
+ - 1
1497
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.weight._type
1498
+ 5: 1
1499
+ 6:
1500
+ - 1
1501
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.weight.values
1502
+ 5: 1
1503
+ 6:
1504
+ - 1
1505
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.output\.dense\.weight.bins
1506
+ 5: 1
1507
+ 6:
1508
+ - 1
1509
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.bias._type
1510
+ 5: 1
1511
+ 6:
1512
+ - 1
1513
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.bias.values
1514
+ 5: 1
1515
+ 6:
1516
+ - 1
1517
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.bias.bins
1518
+ 5: 1
1519
+ 6:
1520
+ - 1
1521
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.weight._type
1522
+ 5: 1
1523
+ 6:
1524
+ - 1
1525
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.weight.values
1526
+ 5: 1
1527
+ 6:
1528
+ - 1
1529
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.value\.weight.bins
1530
+ 5: 1
1531
+ 6:
1532
+ - 1
1533
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.bias._type
1534
+ 5: 1
1535
+ 6:
1536
+ - 1
1537
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.bias.values
1538
+ 5: 1
1539
+ 6:
1540
+ - 1
1541
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.bias.bins
1542
+ 5: 1
1543
+ 6:
1544
+ - 1
1545
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.weight._type
1546
+ 5: 1
1547
+ 6:
1548
+ - 1
1549
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.weight.values
1550
+ 5: 1
1551
+ 6:
1552
+ - 1
1553
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.key\.weight.bins
1554
+ 5: 1
1555
+ 6:
1556
+ - 1
1557
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.bias._type
1558
+ 5: 1
1559
+ 6:
1560
+ - 1
1561
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.bias.values
1562
+ 5: 1
1563
+ 6:
1564
+ - 1
1565
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.bias.bins
1566
+ 5: 1
1567
+ 6:
1568
+ - 1
1569
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.weight._type
1570
+ 5: 1
1571
+ 6:
1572
+ - 1
1573
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.weight.values
1574
+ 5: 1
1575
+ 6:
1576
+ - 1
1577
+ - 1: gradients/bert\.encoder\.layer\.4\.attention\.self\.query\.weight.bins
1578
+ 5: 1
1579
+ 6:
1580
+ - 1
1581
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.weight._type
1582
+ 5: 1
1583
+ 6:
1584
+ - 1
1585
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.weight.values
1586
+ 5: 1
1587
+ 6:
1588
+ - 1
1589
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.weight.bins
1590
+ 5: 1
1591
+ 6:
1592
+ - 1
1593
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.bias._type
1594
+ 5: 1
1595
+ 6:
1596
+ - 1
1597
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.bias.values
1598
+ 5: 1
1599
+ 6:
1600
+ - 1
1601
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.LayerNorm\.bias.bins
1602
+ 5: 1
1603
+ 6:
1604
+ - 1
1605
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.bias._type
1606
+ 5: 1
1607
+ 6:
1608
+ - 1
1609
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.bias.values
1610
+ 5: 1
1611
+ 6:
1612
+ - 1
1613
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.bias.bins
1614
+ 5: 1
1615
+ 6:
1616
+ - 1
1617
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.weight._type
1618
+ 5: 1
1619
+ 6:
1620
+ - 1
1621
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.weight.values
1622
+ 5: 1
1623
+ 6:
1624
+ - 1
1625
+ - 1: gradients/bert\.encoder\.layer\.3\.output\.dense\.weight.bins
1626
+ 5: 1
1627
+ 6:
1628
+ - 1
1629
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.bias._type
1630
+ 5: 1
1631
+ 6:
1632
+ - 1
1633
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.bias.values
1634
+ 5: 1
1635
+ 6:
1636
+ - 1
1637
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.bias.bins
1638
+ 5: 1
1639
+ 6:
1640
+ - 1
1641
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.weight._type
1642
+ 5: 1
1643
+ 6:
1644
+ - 1
1645
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.weight.values
1646
+ 5: 1
1647
+ 6:
1648
+ - 1
1649
+ - 1: gradients/bert\.encoder\.layer\.3\.intermediate\.dense\.weight.bins
1650
+ 5: 1
1651
+ 6:
1652
+ - 1
1653
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.weight._type
1654
+ 5: 1
1655
+ 6:
1656
+ - 1
1657
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.weight.values
1658
+ 5: 1
1659
+ 6:
1660
+ - 1
1661
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.weight.bins
1662
+ 5: 1
1663
+ 6:
1664
+ - 1
1665
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.bias._type
1666
+ 5: 1
1667
+ 6:
1668
+ - 1
1669
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.bias.values
1670
+ 5: 1
1671
+ 6:
1672
+ - 1
1673
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.LayerNorm\.bias.bins
1674
+ 5: 1
1675
+ 6:
1676
+ - 1
1677
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.bias._type
1678
+ 5: 1
1679
+ 6:
1680
+ - 1
1681
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.bias.values
1682
+ 5: 1
1683
+ 6:
1684
+ - 1
1685
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.bias.bins
1686
+ 5: 1
1687
+ 6:
1688
+ - 1
1689
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.weight._type
1690
+ 5: 1
1691
+ 6:
1692
+ - 1
1693
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.weight.values
1694
+ 5: 1
1695
+ 6:
1696
+ - 1
1697
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.output\.dense\.weight.bins
1698
+ 5: 1
1699
+ 6:
1700
+ - 1
1701
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.bias._type
1702
+ 5: 1
1703
+ 6:
1704
+ - 1
1705
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.bias.values
1706
+ 5: 1
1707
+ 6:
1708
+ - 1
1709
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.bias.bins
1710
+ 5: 1
1711
+ 6:
1712
+ - 1
1713
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.weight._type
1714
+ 5: 1
1715
+ 6:
1716
+ - 1
1717
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.weight.values
1718
+ 5: 1
1719
+ 6:
1720
+ - 1
1721
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.value\.weight.bins
1722
+ 5: 1
1723
+ 6:
1724
+ - 1
1725
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.bias._type
1726
+ 5: 1
1727
+ 6:
1728
+ - 1
1729
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.bias.values
1730
+ 5: 1
1731
+ 6:
1732
+ - 1
1733
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.bias.bins
1734
+ 5: 1
1735
+ 6:
1736
+ - 1
1737
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.weight._type
1738
+ 5: 1
1739
+ 6:
1740
+ - 1
1741
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.weight.values
1742
+ 5: 1
1743
+ 6:
1744
+ - 1
1745
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.key\.weight.bins
1746
+ 5: 1
1747
+ 6:
1748
+ - 1
1749
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.bias._type
1750
+ 5: 1
1751
+ 6:
1752
+ - 1
1753
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.bias.values
1754
+ 5: 1
1755
+ 6:
1756
+ - 1
1757
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.bias.bins
1758
+ 5: 1
1759
+ 6:
1760
+ - 1
1761
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.weight._type
1762
+ 5: 1
1763
+ 6:
1764
+ - 1
1765
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.weight.values
1766
+ 5: 1
1767
+ 6:
1768
+ - 1
1769
+ - 1: gradients/bert\.encoder\.layer\.3\.attention\.self\.query\.weight.bins
1770
+ 5: 1
1771
+ 6:
1772
+ - 1
1773
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.weight._type
1774
+ 5: 1
1775
+ 6:
1776
+ - 1
1777
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.weight.values
1778
+ 5: 1
1779
+ 6:
1780
+ - 1
1781
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.weight.bins
1782
+ 5: 1
1783
+ 6:
1784
+ - 1
1785
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.bias._type
1786
+ 5: 1
1787
+ 6:
1788
+ - 1
1789
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.bias.values
1790
+ 5: 1
1791
+ 6:
1792
+ - 1
1793
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.LayerNorm\.bias.bins
1794
+ 5: 1
1795
+ 6:
1796
+ - 1
1797
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.bias._type
1798
+ 5: 1
1799
+ 6:
1800
+ - 1
1801
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.bias.values
1802
+ 5: 1
1803
+ 6:
1804
+ - 1
1805
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.bias.bins
1806
+ 5: 1
1807
+ 6:
1808
+ - 1
1809
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.weight._type
1810
+ 5: 1
1811
+ 6:
1812
+ - 1
1813
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.weight.values
1814
+ 5: 1
1815
+ 6:
1816
+ - 1
1817
+ - 1: gradients/bert\.encoder\.layer\.2\.output\.dense\.weight.bins
1818
+ 5: 1
1819
+ 6:
1820
+ - 1
1821
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.bias._type
1822
+ 5: 1
1823
+ 6:
1824
+ - 1
1825
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.bias.values
1826
+ 5: 1
1827
+ 6:
1828
+ - 1
1829
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.bias.bins
1830
+ 5: 1
1831
+ 6:
1832
+ - 1
1833
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.weight._type
1834
+ 5: 1
1835
+ 6:
1836
+ - 1
1837
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.weight.values
1838
+ 5: 1
1839
+ 6:
1840
+ - 1
1841
+ - 1: gradients/bert\.encoder\.layer\.2\.intermediate\.dense\.weight.bins
1842
+ 5: 1
1843
+ 6:
1844
+ - 1
1845
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.weight._type
1846
+ 5: 1
1847
+ 6:
1848
+ - 1
1849
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.weight.values
1850
+ 5: 1
1851
+ 6:
1852
+ - 1
1853
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.weight.bins
1854
+ 5: 1
1855
+ 6:
1856
+ - 1
1857
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.bias._type
1858
+ 5: 1
1859
+ 6:
1860
+ - 1
1861
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.bias.values
1862
+ 5: 1
1863
+ 6:
1864
+ - 1
1865
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.LayerNorm\.bias.bins
1866
+ 5: 1
1867
+ 6:
1868
+ - 1
1869
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.bias._type
1870
+ 5: 1
1871
+ 6:
1872
+ - 1
1873
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.bias.values
1874
+ 5: 1
1875
+ 6:
1876
+ - 1
1877
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.bias.bins
1878
+ 5: 1
1879
+ 6:
1880
+ - 1
1881
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.weight._type
1882
+ 5: 1
1883
+ 6:
1884
+ - 1
1885
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.weight.values
1886
+ 5: 1
1887
+ 6:
1888
+ - 1
1889
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.output\.dense\.weight.bins
1890
+ 5: 1
1891
+ 6:
1892
+ - 1
1893
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.bias._type
1894
+ 5: 1
1895
+ 6:
1896
+ - 1
1897
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.bias.values
1898
+ 5: 1
1899
+ 6:
1900
+ - 1
1901
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.bias.bins
1902
+ 5: 1
1903
+ 6:
1904
+ - 1
1905
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.weight._type
1906
+ 5: 1
1907
+ 6:
1908
+ - 1
1909
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.weight.values
1910
+ 5: 1
1911
+ 6:
1912
+ - 1
1913
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.value\.weight.bins
1914
+ 5: 1
1915
+ 6:
1916
+ - 1
1917
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.bias._type
1918
+ 5: 1
1919
+ 6:
1920
+ - 1
1921
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.bias.values
1922
+ 5: 1
1923
+ 6:
1924
+ - 1
1925
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.bias.bins
1926
+ 5: 1
1927
+ 6:
1928
+ - 1
1929
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.weight._type
1930
+ 5: 1
1931
+ 6:
1932
+ - 1
1933
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.weight.values
1934
+ 5: 1
1935
+ 6:
1936
+ - 1
1937
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.key\.weight.bins
1938
+ 5: 1
1939
+ 6:
1940
+ - 1
1941
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.bias._type
1942
+ 5: 1
1943
+ 6:
1944
+ - 1
1945
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.bias.values
1946
+ 5: 1
1947
+ 6:
1948
+ - 1
1949
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.bias.bins
1950
+ 5: 1
1951
+ 6:
1952
+ - 1
1953
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.weight._type
1954
+ 5: 1
1955
+ 6:
1956
+ - 1
1957
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.weight.values
1958
+ 5: 1
1959
+ 6:
1960
+ - 1
1961
+ - 1: gradients/bert\.encoder\.layer\.2\.attention\.self\.query\.weight.bins
1962
+ 5: 1
1963
+ 6:
1964
+ - 1
1965
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.weight._type
1966
+ 5: 1
1967
+ 6:
1968
+ - 1
1969
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.weight.values
1970
+ 5: 1
1971
+ 6:
1972
+ - 1
1973
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.weight.bins
1974
+ 5: 1
1975
+ 6:
1976
+ - 1
1977
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.bias._type
1978
+ 5: 1
1979
+ 6:
1980
+ - 1
1981
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.bias.values
1982
+ 5: 1
1983
+ 6:
1984
+ - 1
1985
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.LayerNorm\.bias.bins
1986
+ 5: 1
1987
+ 6:
1988
+ - 1
1989
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.bias._type
1990
+ 5: 1
1991
+ 6:
1992
+ - 1
1993
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.bias.values
1994
+ 5: 1
1995
+ 6:
1996
+ - 1
1997
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.bias.bins
1998
+ 5: 1
1999
+ 6:
2000
+ - 1
2001
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.weight._type
2002
+ 5: 1
2003
+ 6:
2004
+ - 1
2005
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.weight.values
2006
+ 5: 1
2007
+ 6:
2008
+ - 1
2009
+ - 1: gradients/bert\.encoder\.layer\.1\.output\.dense\.weight.bins
2010
+ 5: 1
2011
+ 6:
2012
+ - 1
2013
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.bias._type
2014
+ 5: 1
2015
+ 6:
2016
+ - 1
2017
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.bias.values
2018
+ 5: 1
2019
+ 6:
2020
+ - 1
2021
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.bias.bins
2022
+ 5: 1
2023
+ 6:
2024
+ - 1
2025
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.weight._type
2026
+ 5: 1
2027
+ 6:
2028
+ - 1
2029
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.weight.values
2030
+ 5: 1
2031
+ 6:
2032
+ - 1
2033
+ - 1: gradients/bert\.encoder\.layer\.1\.intermediate\.dense\.weight.bins
2034
+ 5: 1
2035
+ 6:
2036
+ - 1
2037
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.weight._type
2038
+ 5: 1
2039
+ 6:
2040
+ - 1
2041
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.weight.values
2042
+ 5: 1
2043
+ 6:
2044
+ - 1
2045
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.weight.bins
2046
+ 5: 1
2047
+ 6:
2048
+ - 1
2049
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.bias._type
2050
+ 5: 1
2051
+ 6:
2052
+ - 1
2053
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.bias.values
2054
+ 5: 1
2055
+ 6:
2056
+ - 1
2057
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.LayerNorm\.bias.bins
2058
+ 5: 1
2059
+ 6:
2060
+ - 1
2061
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.bias._type
2062
+ 5: 1
2063
+ 6:
2064
+ - 1
2065
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.bias.values
2066
+ 5: 1
2067
+ 6:
2068
+ - 1
2069
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.bias.bins
2070
+ 5: 1
2071
+ 6:
2072
+ - 1
2073
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.weight._type
2074
+ 5: 1
2075
+ 6:
2076
+ - 1
2077
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.weight.values
2078
+ 5: 1
2079
+ 6:
2080
+ - 1
2081
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.output\.dense\.weight.bins
2082
+ 5: 1
2083
+ 6:
2084
+ - 1
2085
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.bias._type
2086
+ 5: 1
2087
+ 6:
2088
+ - 1
2089
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.bias.values
2090
+ 5: 1
2091
+ 6:
2092
+ - 1
2093
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.bias.bins
2094
+ 5: 1
2095
+ 6:
2096
+ - 1
2097
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.weight._type
2098
+ 5: 1
2099
+ 6:
2100
+ - 1
2101
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.weight.values
2102
+ 5: 1
2103
+ 6:
2104
+ - 1
2105
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.value\.weight.bins
2106
+ 5: 1
2107
+ 6:
2108
+ - 1
2109
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.bias._type
2110
+ 5: 1
2111
+ 6:
2112
+ - 1
2113
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.bias.values
2114
+ 5: 1
2115
+ 6:
2116
+ - 1
2117
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.bias.bins
2118
+ 5: 1
2119
+ 6:
2120
+ - 1
2121
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.weight._type
2122
+ 5: 1
2123
+ 6:
2124
+ - 1
2125
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.weight.values
2126
+ 5: 1
2127
+ 6:
2128
+ - 1
2129
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.key\.weight.bins
2130
+ 5: 1
2131
+ 6:
2132
+ - 1
2133
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.bias._type
2134
+ 5: 1
2135
+ 6:
2136
+ - 1
2137
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.bias.values
2138
+ 5: 1
2139
+ 6:
2140
+ - 1
2141
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.bias.bins
2142
+ 5: 1
2143
+ 6:
2144
+ - 1
2145
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.weight._type
2146
+ 5: 1
2147
+ 6:
2148
+ - 1
2149
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.weight.values
2150
+ 5: 1
2151
+ 6:
2152
+ - 1
2153
+ - 1: gradients/bert\.encoder\.layer\.1\.attention\.self\.query\.weight.bins
2154
+ 5: 1
2155
+ 6:
2156
+ - 1
2157
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.weight._type
2158
+ 5: 1
2159
+ 6:
2160
+ - 1
2161
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.weight.values
2162
+ 5: 1
2163
+ 6:
2164
+ - 1
2165
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.weight.bins
2166
+ 5: 1
2167
+ 6:
2168
+ - 1
2169
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.bias._type
2170
+ 5: 1
2171
+ 6:
2172
+ - 1
2173
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.bias.values
2174
+ 5: 1
2175
+ 6:
2176
+ - 1
2177
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.LayerNorm\.bias.bins
2178
+ 5: 1
2179
+ 6:
2180
+ - 1
2181
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.bias._type
2182
+ 5: 1
2183
+ 6:
2184
+ - 1
2185
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.bias.values
2186
+ 5: 1
2187
+ 6:
2188
+ - 1
2189
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.bias.bins
2190
+ 5: 1
2191
+ 6:
2192
+ - 1
2193
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.weight._type
2194
+ 5: 1
2195
+ 6:
2196
+ - 1
2197
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.weight.values
2198
+ 5: 1
2199
+ 6:
2200
+ - 1
2201
+ - 1: gradients/bert\.encoder\.layer\.0\.output\.dense\.weight.bins
2202
+ 5: 1
2203
+ 6:
2204
+ - 1
2205
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.bias._type
2206
+ 5: 1
2207
+ 6:
2208
+ - 1
2209
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.bias.values
2210
+ 5: 1
2211
+ 6:
2212
+ - 1
2213
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.bias.bins
2214
+ 5: 1
2215
+ 6:
2216
+ - 1
2217
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.weight._type
2218
+ 5: 1
2219
+ 6:
2220
+ - 1
2221
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.weight.values
2222
+ 5: 1
2223
+ 6:
2224
+ - 1
2225
+ - 1: gradients/bert\.encoder\.layer\.0\.intermediate\.dense\.weight.bins
2226
+ 5: 1
2227
+ 6:
2228
+ - 1
2229
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.weight._type
2230
+ 5: 1
2231
+ 6:
2232
+ - 1
2233
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.weight.values
2234
+ 5: 1
2235
+ 6:
2236
+ - 1
2237
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.weight.bins
2238
+ 5: 1
2239
+ 6:
2240
+ - 1
2241
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.bias._type
2242
+ 5: 1
2243
+ 6:
2244
+ - 1
2245
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.bias.values
2246
+ 5: 1
2247
+ 6:
2248
+ - 1
2249
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.LayerNorm\.bias.bins
2250
+ 5: 1
2251
+ 6:
2252
+ - 1
2253
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.bias._type
2254
+ 5: 1
2255
+ 6:
2256
+ - 1
2257
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.bias.values
2258
+ 5: 1
2259
+ 6:
2260
+ - 1
2261
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.bias.bins
2262
+ 5: 1
2263
+ 6:
2264
+ - 1
2265
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.weight._type
2266
+ 5: 1
2267
+ 6:
2268
+ - 1
2269
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.weight.values
2270
+ 5: 1
2271
+ 6:
2272
+ - 1
2273
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.output\.dense\.weight.bins
2274
+ 5: 1
2275
+ 6:
2276
+ - 1
2277
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.bias._type
2278
+ 5: 1
2279
+ 6:
2280
+ - 1
2281
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.bias.values
2282
+ 5: 1
2283
+ 6:
2284
+ - 1
2285
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.bias.bins
2286
+ 5: 1
2287
+ 6:
2288
+ - 1
2289
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.weight._type
2290
+ 5: 1
2291
+ 6:
2292
+ - 1
2293
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.weight.values
2294
+ 5: 1
2295
+ 6:
2296
+ - 1
2297
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.value\.weight.bins
2298
+ 5: 1
2299
+ 6:
2300
+ - 1
2301
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.bias._type
2302
+ 5: 1
2303
+ 6:
2304
+ - 1
2305
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.bias.values
2306
+ 5: 1
2307
+ 6:
2308
+ - 1
2309
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.bias.bins
2310
+ 5: 1
2311
+ 6:
2312
+ - 1
2313
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.weight._type
2314
+ 5: 1
2315
+ 6:
2316
+ - 1
2317
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.weight.values
2318
+ 5: 1
2319
+ 6:
2320
+ - 1
2321
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.key\.weight.bins
2322
+ 5: 1
2323
+ 6:
2324
+ - 1
2325
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.bias._type
2326
+ 5: 1
2327
+ 6:
2328
+ - 1
2329
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.bias.values
2330
+ 5: 1
2331
+ 6:
2332
+ - 1
2333
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.bias.bins
2334
+ 5: 1
2335
+ 6:
2336
+ - 1
2337
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.weight._type
2338
+ 5: 1
2339
+ 6:
2340
+ - 1
2341
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.weight.values
2342
+ 5: 1
2343
+ 6:
2344
+ - 1
2345
+ - 1: gradients/bert\.encoder\.layer\.0\.attention\.self\.query\.weight.bins
2346
+ 5: 1
2347
+ 6:
2348
+ - 1
2349
+ - 1: gradients/bert\.embeddings\.LayerNorm\.weight._type
2350
+ 5: 1
2351
+ 6:
2352
+ - 1
2353
+ - 1: gradients/bert\.embeddings\.LayerNorm\.weight.values
2354
+ 5: 1
2355
+ 6:
2356
+ - 1
2357
+ - 1: gradients/bert\.embeddings\.LayerNorm\.weight.bins
2358
+ 5: 1
2359
+ 6:
2360
+ - 1
2361
+ - 1: gradients/bert\.embeddings\.LayerNorm\.bias._type
2362
+ 5: 1
2363
+ 6:
2364
+ - 1
2365
+ - 1: gradients/bert\.embeddings\.LayerNorm\.bias.values
2366
+ 5: 1
2367
+ 6:
2368
+ - 1
2369
+ - 1: gradients/bert\.embeddings\.LayerNorm\.bias.bins
2370
+ 5: 1
2371
+ 6:
2372
+ - 1
2373
+ - 1: gradients/bert\.embeddings\.position_embeddings\.weight._type
2374
+ 5: 1
2375
+ 6:
2376
+ - 1
2377
+ - 1: gradients/bert\.embeddings\.position_embeddings\.weight.values
2378
+ 5: 1
2379
+ 6:
2380
+ - 1
2381
+ - 1: gradients/bert\.embeddings\.position_embeddings\.weight.bins
2382
+ 5: 1
2383
+ 6:
2384
+ - 1
2385
+ - 1: gradients/bert\.embeddings\.token_type_embeddings\.weight._type
2386
+ 5: 1
2387
+ 6:
2388
+ - 1
2389
+ - 1: gradients/bert\.embeddings\.token_type_embeddings\.weight.values
2390
+ 5: 1
2391
+ 6:
2392
+ - 1
2393
+ - 1: gradients/bert\.embeddings\.token_type_embeddings\.weight.bins
2394
+ 5: 1
2395
+ 6:
2396
+ - 1
2397
+ - 1: gradients/bert\.embeddings\.word_embeddings\.weight._type
2398
+ 5: 1
2399
+ 6:
2400
+ - 1
2401
+ - 1: gradients/bert\.embeddings\.word_embeddings\.weight.values
2402
+ 5: 1
2403
+ 6:
2404
+ - 1
2405
+ - 1: gradients/bert\.embeddings\.word_embeddings\.weight.bins
2406
+ 5: 1
2407
+ 6:
2408
+ - 1
2409
+ - 1: train/loss
2410
+ 5: 1
2411
+ 6:
2412
+ - 1
2413
+ - 1: train/learning_rate
2414
+ 5: 1
2415
+ 6:
2416
+ - 1
2417
+ - 1: train/epoch
2418
+ 5: 1
2419
+ 6:
2420
+ - 1
2421
+ - 1: train/train_runtime
2422
+ 5: 1
2423
+ 6:
2424
+ - 1
2425
+ - 1: train/train_samples_per_second
2426
+ 5: 1
2427
+ 6:
2428
+ - 1
2429
+ - 1: train/train_steps_per_second
2430
+ 5: 1
2431
+ 6:
2432
+ - 1
2433
+ - 1: train/total_flos
2434
+ 5: 1
2435
+ 6:
2436
+ - 1
2437
+ - 1: train/train_loss
2438
+ 5: 1
2439
+ 6:
2440
+ - 1
2441
+ - 1: eval/loss
2442
+ 5: 1
2443
+ 6:
2444
+ - 1
2445
+ - 1: eval/precision
2446
+ 5: 1
2447
+ 6:
2448
+ - 1
2449
+ - 1: eval/recall
2450
+ 5: 1
2451
+ 6:
2452
+ - 1
2453
+ - 1: eval/f1
2454
+ 5: 1
2455
+ 6:
2456
+ - 1
2457
+ - 1: eval/runtime
2458
+ 5: 1
2459
+ 6:
2460
+ - 1
2461
+ - 1: eval/samples_per_second
2462
+ 5: 1
2463
+ 6:
2464
+ - 1
2465
+ - 1: eval/steps_per_second
2466
+ 5: 1
2467
+ 6:
2468
+ - 1
2469
+ python_version: 3.6.8
2470
+ start_time: 1631810254
2471
+ t:
2472
+ 1:
2473
+ - 1
2474
+ - 2
2475
+ - 3
2476
+ - 5
2477
+ - 11
2478
+ 2:
2479
+ - 1
2480
+ - 2
2481
+ - 3
2482
+ - 5
2483
+ - 11
2484
+ 3:
2485
+ - 1
2486
+ - 7
2487
+ - 13
2488
+ 4: 3.6.8
2489
+ 5: 0.12.2
2490
+ 6: 4.10.0
2491
+ 8:
2492
+ - 5
2493
+ adafactor:
2494
+ desc: null
2495
+ value: false
2496
+ adam_beta1:
2497
+ desc: null
2498
+ value: 0.9
2499
+ adam_beta2:
2500
+ desc: null
2501
+ value: 0.999
2502
+ adam_epsilon:
2503
+ desc: null
2504
+ value: 1.0e-08
2505
+ add_cross_attention:
2506
+ desc: null
2507
+ value: false
2508
+ architectures:
2509
+ desc: null
2510
+ value: null
2511
+ attention_probs_dropout_prob:
2512
+ desc: null
2513
+ value: 0.1
2514
+ bad_words_ids:
2515
+ desc: null
2516
+ value: null
2517
+ bos_token_id:
2518
+ desc: null
2519
+ value: null
2520
+ chunk_size_feed_forward:
2521
+ desc: null
2522
+ value: 0
2523
+ classifier_dropout:
2524
+ desc: null
2525
+ value: null
2526
+ dataloader_drop_last:
2527
+ desc: null
2528
+ value: false
2529
+ dataloader_num_workers:
2530
+ desc: null
2531
+ value: 0
2532
+ dataloader_pin_memory:
2533
+ desc: null
2534
+ value: true
2535
+ ddp_find_unused_parameters:
2536
+ desc: null
2537
+ value: None
2538
+ debug:
2539
+ desc: null
2540
+ value: '[]'
2541
+ decoder_start_token_id:
2542
+ desc: null
2543
+ value: null
2544
+ deepspeed:
2545
+ desc: null
2546
+ value: None
2547
+ disable_tqdm:
2548
+ desc: null
2549
+ value: false
2550
+ diversity_penalty:
2551
+ desc: null
2552
+ value: 0.0
2553
+ do_eval:
2554
+ desc: null
2555
+ value: true
2556
+ do_predict:
2557
+ desc: null
2558
+ value: true
2559
+ do_sample:
2560
+ desc: null
2561
+ value: false
2562
+ do_train:
2563
+ desc: null
2564
+ value: true
2565
+ early_stopping:
2566
+ desc: null
2567
+ value: false
2568
+ encoder_no_repeat_ngram_size:
2569
+ desc: null
2570
+ value: 0
2571
+ eos_token_id:
2572
+ desc: null
2573
+ value: null
2574
+ eval_accumulation_steps:
2575
+ desc: null
2576
+ value: None
2577
+ eval_batch_size:
2578
+ desc: null
2579
+ value: 8
2580
+ eval_steps:
2581
+ desc: null
2582
+ value: None
2583
+ evaluation_strategy:
2584
+ desc: null
2585
+ value: 'no'
2586
+ finetuning_task:
2587
+ desc: null
2588
+ value: null
2589
+ forced_bos_token_id:
2590
+ desc: null
2591
+ value: null
2592
+ forced_eos_token_id:
2593
+ desc: null
2594
+ value: null
2595
+ fp16:
2596
+ desc: null
2597
+ value: false
2598
+ fp16_backend:
2599
+ desc: null
2600
+ value: auto
2601
+ fp16_full_eval:
2602
+ desc: null
2603
+ value: false
2604
+ fp16_opt_level:
2605
+ desc: null
2606
+ value: O1
2607
+ gradient_accumulation_steps:
2608
+ desc: null
2609
+ value: 1
2610
+ gradient_checkpointing:
2611
+ desc: null
2612
+ value: false
2613
+ greater_is_better:
2614
+ desc: null
2615
+ value: None
2616
+ group_by_length:
2617
+ desc: null
2618
+ value: false
2619
+ hidden_act:
2620
+ desc: null
2621
+ value: gelu
2622
+ hidden_dropout_prob:
2623
+ desc: null
2624
+ value: 0.1
2625
+ hidden_size:
2626
+ desc: null
2627
+ value: 768
2628
+ id2label:
2629
+ desc: null
2630
+ value:
2631
+ '0': B-EPI
2632
+ '1': B-LOC
2633
+ '2': B-STAT
2634
+ '3': I-EPI
2635
+ '4': I-LOC
2636
+ '5': I-STAT
2637
+ '6': O
2638
+ ignore_data_skip:
2639
+ desc: null
2640
+ value: false
2641
+ initializer_range:
2642
+ desc: null
2643
+ value: 0.02
2644
+ intermediate_size:
2645
+ desc: null
2646
+ value: 3072
2647
+ is_decoder:
2648
+ desc: null
2649
+ value: false
2650
+ is_encoder_decoder:
2651
+ desc: null
2652
+ value: false
2653
+ label2id:
2654
+ desc: null
2655
+ value:
2656
+ B-EPI: 0
2657
+ B-LOC: 1
2658
+ B-STAT: 2
2659
+ I-EPI: 3
2660
+ I-LOC: 4
2661
+ I-STAT: 5
2662
+ O: 6
2663
+ label_names:
2664
+ desc: null
2665
+ value: None
2666
+ label_smoothing_factor:
2667
+ desc: null
2668
+ value: 0.0
2669
+ layer_norm_eps:
2670
+ desc: null
2671
+ value: 1.0e-12
2672
+ learning_rate:
2673
+ desc: null
2674
+ value: 5.0e-05
2675
+ length_column_name:
2676
+ desc: null
2677
+ value: length
2678
+ length_penalty:
2679
+ desc: null
2680
+ value: 1.0
2681
+ load_best_model_at_end:
2682
+ desc: null
2683
+ value: false
2684
+ local_rank:
2685
+ desc: null
2686
+ value: -1
2687
+ log_level:
2688
+ desc: null
2689
+ value: -1
2690
+ log_level_replica:
2691
+ desc: null
2692
+ value: -1
2693
+ log_on_each_node:
2694
+ desc: null
2695
+ value: true
2696
+ logging_dir:
2697
+ desc: null
2698
+ value: ./resultsV3.2/runs/Sep16_16-37-25_ordr-neo4j-dev-ec2-04
2699
+ logging_first_step:
2700
+ desc: null
2701
+ value: false
2702
+ logging_steps:
2703
+ desc: null
2704
+ value: 500
2705
+ logging_strategy:
2706
+ desc: null
2707
+ value: steps
2708
+ lr_scheduler_type:
2709
+ desc: null
2710
+ value: linear
2711
+ max_grad_norm:
2712
+ desc: null
2713
+ value: 1.0
2714
+ max_length:
2715
+ desc: null
2716
+ value: 20
2717
+ max_position_embeddings:
2718
+ desc: null
2719
+ value: 512
2720
+ max_steps:
2721
+ desc: null
2722
+ value: -1
2723
+ metric_for_best_model:
2724
+ desc: null
2725
+ value: None
2726
+ min_length:
2727
+ desc: null
2728
+ value: 0
2729
+ model_type:
2730
+ desc: null
2731
+ value: bert
2732
+ mp_parameters:
2733
+ desc: null
2734
+ value: ''
2735
+ no_cuda:
2736
+ desc: null
2737
+ value: false
2738
+ no_repeat_ngram_size:
2739
+ desc: null
2740
+ value: 0
2741
+ num_attention_heads:
2742
+ desc: null
2743
+ value: 12
2744
+ num_beam_groups:
2745
+ desc: null
2746
+ value: 1
2747
+ num_beams:
2748
+ desc: null
2749
+ value: 1
2750
+ num_hidden_layers:
2751
+ desc: null
2752
+ value: 12
2753
+ num_return_sequences:
2754
+ desc: null
2755
+ value: 1
2756
+ num_train_epochs:
2757
+ desc: null
2758
+ value: 30.0
2759
+ output_attentions:
2760
+ desc: null
2761
+ value: false
2762
+ output_dir:
2763
+ desc: null
2764
+ value: ./resultsV3.2
2765
+ output_hidden_states:
2766
+ desc: null
2767
+ value: false
2768
+ output_scores:
2769
+ desc: null
2770
+ value: false
2771
+ overwrite_output_dir:
2772
+ desc: null
2773
+ value: true
2774
+ pad_token_id:
2775
+ desc: null
2776
+ value: 0
2777
+ past_index:
2778
+ desc: null
2779
+ value: -1
2780
+ per_device_eval_batch_size:
2781
+ desc: null
2782
+ value: 8
2783
+ per_device_train_batch_size:
2784
+ desc: null
2785
+ value: 16
2786
+ per_gpu_eval_batch_size:
2787
+ desc: null
2788
+ value: None
2789
+ per_gpu_train_batch_size:
2790
+ desc: null
2791
+ value: None
2792
+ position_embedding_type:
2793
+ desc: null
2794
+ value: absolute
2795
+ prediction_loss_only:
2796
+ desc: null
2797
+ value: false
2798
+ prefix:
2799
+ desc: null
2800
+ value: null
2801
+ problem_type:
2802
+ desc: null
2803
+ value: null
2804
+ pruned_heads:
2805
+ desc: null
2806
+ value: {}
2807
+ push_to_hub:
2808
+ desc: null
2809
+ value: false
2810
+ push_to_hub_model_id:
2811
+ desc: null
2812
+ value: resultsV3.2
2813
+ push_to_hub_organization:
2814
+ desc: null
2815
+ value: None
2816
+ push_to_hub_token:
2817
+ desc: null
2818
+ value: None
2819
+ remove_invalid_values:
2820
+ desc: null
2821
+ value: false
2822
+ remove_unused_columns:
2823
+ desc: null
2824
+ value: true
2825
+ repetition_penalty:
2826
+ desc: null
2827
+ value: 1.0
2828
+ report_to:
2829
+ desc: null
2830
+ value: '[''tensorboard'', ''wandb'']'
2831
+ resume_from_checkpoint:
2832
+ desc: null
2833
+ value: None
2834
+ return_dict:
2835
+ desc: null
2836
+ value: true
2837
+ return_dict_in_generate:
2838
+ desc: null
2839
+ value: false
2840
+ run_name:
2841
+ desc: null
2842
+ value: ./resultsV3.2
2843
+ save_on_each_node:
2844
+ desc: null
2845
+ value: false
2846
+ save_steps:
2847
+ desc: null
2848
+ value: 2500
2849
+ save_strategy:
2850
+ desc: null
2851
+ value: steps
2852
+ save_total_limit:
2853
+ desc: null
2854
+ value: None
2855
+ seed:
2856
+ desc: null
2857
+ value: 1
2858
+ sep_token_id:
2859
+ desc: null
2860
+ value: null
2861
+ sharded_ddp:
2862
+ desc: null
2863
+ value: '[]'
2864
+ skip_memory_metrics:
2865
+ desc: null
2866
+ value: true
2867
+ task_specific_params:
2868
+ desc: null
2869
+ value: null
2870
+ temperature:
2871
+ desc: null
2872
+ value: 1.0
2873
+ tie_encoder_decoder:
2874
+ desc: null
2875
+ value: false
2876
+ tie_word_embeddings:
2877
+ desc: null
2878
+ value: true
2879
+ tokenizer_class:
2880
+ desc: null
2881
+ value: null
2882
+ top_k:
2883
+ desc: null
2884
+ value: 50
2885
+ top_p:
2886
+ desc: null
2887
+ value: 1.0
2888
+ torch_dtype:
2889
+ desc: null
2890
+ value: null
2891
+ torchscript:
2892
+ desc: null
2893
+ value: false
2894
+ tpu_metrics_debug:
2895
+ desc: null
2896
+ value: false
2897
+ tpu_num_cores:
2898
+ desc: null
2899
+ value: None
2900
+ train_batch_size:
2901
+ desc: null
2902
+ value: 16
2903
+ transformers_version:
2904
+ desc: null
2905
+ value: 4.10.0
2906
+ type_vocab_size:
2907
+ desc: null
2908
+ value: 2
2909
+ use_bfloat16:
2910
+ desc: null
2911
+ value: false
2912
+ use_cache:
2913
+ desc: null
2914
+ value: true
2915
+ use_legacy_prediction_loop:
2916
+ desc: null
2917
+ value: false
2918
+ vocab_size:
2919
+ desc: null
2920
+ value: 28996
2921
+ warmup_ratio:
2922
+ desc: null
2923
+ value: 0.0
2924
+ warmup_steps:
2925
+ desc: null
2926
+ value: 0
2927
+ weight_decay:
2928
+ desc: null
2929
+ value: 0.05