arelius commited on
Commit
8efd0d1
1 Parent(s): abff7f1

Upload 8 files

Browse files

minimal files to run model from most recent chkpt

README.md CHANGED
@@ -1,3 +1,207 @@
1
  ---
2
- license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
3
+ base_model: microsoft/BioGPT
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
+ ## Training procedure
202
+
203
+
204
+ ### Framework versions
205
+
206
+
207
+ - PEFT 0.6.0
adapter_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "microsoft/BioGPT",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 16,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 4,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "q_proj"
20
+ ],
21
+ "task_type": "CAUSAL_LM"
22
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6b989853110df26897aa38c114d0f24e60e775afbe6d222a785a6b305aa30cc
3
+ size 1613562
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62bfeb62df5d53684740a33db572502c144dea131c951f4cb8ad301b5c957f93
3
+ size 803902
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3f001e11b281491b5c213dd732aa2ff767ac6bd0f42be1dbcc4b40ffd598415
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e3008979cba7c2af0bd74a2f3e180bbfc246eec8835fb7def4487224cb95e10
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,1169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.988236782190741,
5
+ "eval_steps": 500,
6
+ "global_step": 94500,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "learning_rate": 1.9894594822497684e-05,
14
+ "loss": 3.6148,
15
+ "step": 500
16
+ },
17
+ {
18
+ "epoch": 0.03,
19
+ "learning_rate": 1.9789189644995362e-05,
20
+ "loss": 3.3107,
21
+ "step": 1000
22
+ },
23
+ {
24
+ "epoch": 0.05,
25
+ "learning_rate": 1.9683784467493044e-05,
26
+ "loss": 3.2487,
27
+ "step": 1500
28
+ },
29
+ {
30
+ "epoch": 0.06,
31
+ "learning_rate": 1.9578379289990726e-05,
32
+ "loss": 3.1954,
33
+ "step": 2000
34
+ },
35
+ {
36
+ "epoch": 0.08,
37
+ "learning_rate": 1.947297411248841e-05,
38
+ "loss": 3.1585,
39
+ "step": 2500
40
+ },
41
+ {
42
+ "epoch": 0.09,
43
+ "learning_rate": 1.9367568934986087e-05,
44
+ "loss": 3.1312,
45
+ "step": 3000
46
+ },
47
+ {
48
+ "epoch": 0.11,
49
+ "learning_rate": 1.926216375748377e-05,
50
+ "loss": 3.0709,
51
+ "step": 3500
52
+ },
53
+ {
54
+ "epoch": 0.13,
55
+ "learning_rate": 1.9156758579981448e-05,
56
+ "loss": 3.0622,
57
+ "step": 4000
58
+ },
59
+ {
60
+ "epoch": 0.14,
61
+ "learning_rate": 1.905135340247913e-05,
62
+ "loss": 3.0556,
63
+ "step": 4500
64
+ },
65
+ {
66
+ "epoch": 0.16,
67
+ "learning_rate": 1.8945948224976815e-05,
68
+ "loss": 3.002,
69
+ "step": 5000
70
+ },
71
+ {
72
+ "epoch": 0.17,
73
+ "learning_rate": 1.8840543047474494e-05,
74
+ "loss": 3.0279,
75
+ "step": 5500
76
+ },
77
+ {
78
+ "epoch": 0.19,
79
+ "learning_rate": 1.8735137869972176e-05,
80
+ "loss": 2.9804,
81
+ "step": 6000
82
+ },
83
+ {
84
+ "epoch": 0.21,
85
+ "learning_rate": 1.8629732692469854e-05,
86
+ "loss": 2.988,
87
+ "step": 6500
88
+ },
89
+ {
90
+ "epoch": 0.22,
91
+ "learning_rate": 1.8524327514967536e-05,
92
+ "loss": 2.9664,
93
+ "step": 7000
94
+ },
95
+ {
96
+ "epoch": 0.24,
97
+ "learning_rate": 1.841892233746522e-05,
98
+ "loss": 2.9649,
99
+ "step": 7500
100
+ },
101
+ {
102
+ "epoch": 0.25,
103
+ "learning_rate": 1.83135171599629e-05,
104
+ "loss": 2.9827,
105
+ "step": 8000
106
+ },
107
+ {
108
+ "epoch": 0.27,
109
+ "learning_rate": 1.820811198246058e-05,
110
+ "loss": 2.9541,
111
+ "step": 8500
112
+ },
113
+ {
114
+ "epoch": 0.28,
115
+ "learning_rate": 1.810270680495826e-05,
116
+ "loss": 2.955,
117
+ "step": 9000
118
+ },
119
+ {
120
+ "epoch": 0.3,
121
+ "learning_rate": 1.7997301627455943e-05,
122
+ "loss": 2.9489,
123
+ "step": 9500
124
+ },
125
+ {
126
+ "epoch": 0.32,
127
+ "learning_rate": 1.7891896449953622e-05,
128
+ "loss": 2.9415,
129
+ "step": 10000
130
+ },
131
+ {
132
+ "epoch": 0.33,
133
+ "learning_rate": 1.7786491272451304e-05,
134
+ "loss": 2.9453,
135
+ "step": 10500
136
+ },
137
+ {
138
+ "epoch": 0.35,
139
+ "learning_rate": 1.7681086094948986e-05,
140
+ "loss": 2.9273,
141
+ "step": 11000
142
+ },
143
+ {
144
+ "epoch": 0.36,
145
+ "learning_rate": 1.7575680917446668e-05,
146
+ "loss": 2.935,
147
+ "step": 11500
148
+ },
149
+ {
150
+ "epoch": 0.38,
151
+ "learning_rate": 1.7470275739944346e-05,
152
+ "loss": 2.938,
153
+ "step": 12000
154
+ },
155
+ {
156
+ "epoch": 0.4,
157
+ "learning_rate": 1.736487056244203e-05,
158
+ "loss": 2.9441,
159
+ "step": 12500
160
+ },
161
+ {
162
+ "epoch": 0.41,
163
+ "learning_rate": 1.7259465384939707e-05,
164
+ "loss": 2.9091,
165
+ "step": 13000
166
+ },
167
+ {
168
+ "epoch": 0.43,
169
+ "learning_rate": 1.715406020743739e-05,
170
+ "loss": 2.9037,
171
+ "step": 13500
172
+ },
173
+ {
174
+ "epoch": 0.44,
175
+ "learning_rate": 1.704865502993507e-05,
176
+ "loss": 2.9112,
177
+ "step": 14000
178
+ },
179
+ {
180
+ "epoch": 0.46,
181
+ "learning_rate": 1.6943249852432753e-05,
182
+ "loss": 2.88,
183
+ "step": 14500
184
+ },
185
+ {
186
+ "epoch": 0.47,
187
+ "learning_rate": 1.6837844674930435e-05,
188
+ "loss": 2.9045,
189
+ "step": 15000
190
+ },
191
+ {
192
+ "epoch": 0.49,
193
+ "learning_rate": 1.6732439497428114e-05,
194
+ "loss": 2.9221,
195
+ "step": 15500
196
+ },
197
+ {
198
+ "epoch": 0.51,
199
+ "learning_rate": 1.6627034319925796e-05,
200
+ "loss": 2.9079,
201
+ "step": 16000
202
+ },
203
+ {
204
+ "epoch": 0.52,
205
+ "learning_rate": 1.6521629142423478e-05,
206
+ "loss": 2.8972,
207
+ "step": 16500
208
+ },
209
+ {
210
+ "epoch": 0.54,
211
+ "learning_rate": 1.641622396492116e-05,
212
+ "loss": 2.8958,
213
+ "step": 17000
214
+ },
215
+ {
216
+ "epoch": 0.55,
217
+ "learning_rate": 1.631081878741884e-05,
218
+ "loss": 2.9097,
219
+ "step": 17500
220
+ },
221
+ {
222
+ "epoch": 0.57,
223
+ "learning_rate": 1.620541360991652e-05,
224
+ "loss": 2.9103,
225
+ "step": 18000
226
+ },
227
+ {
228
+ "epoch": 0.58,
229
+ "learning_rate": 1.6100008432414203e-05,
230
+ "loss": 2.9001,
231
+ "step": 18500
232
+ },
233
+ {
234
+ "epoch": 0.6,
235
+ "learning_rate": 1.599460325491188e-05,
236
+ "loss": 2.8897,
237
+ "step": 19000
238
+ },
239
+ {
240
+ "epoch": 0.62,
241
+ "learning_rate": 1.5889198077409563e-05,
242
+ "loss": 2.9011,
243
+ "step": 19500
244
+ },
245
+ {
246
+ "epoch": 0.63,
247
+ "learning_rate": 1.5783792899907245e-05,
248
+ "loss": 2.8966,
249
+ "step": 20000
250
+ },
251
+ {
252
+ "epoch": 0.65,
253
+ "learning_rate": 1.5678387722404927e-05,
254
+ "loss": 2.8786,
255
+ "step": 20500
256
+ },
257
+ {
258
+ "epoch": 0.66,
259
+ "learning_rate": 1.5572982544902606e-05,
260
+ "loss": 2.8719,
261
+ "step": 21000
262
+ },
263
+ {
264
+ "epoch": 0.68,
265
+ "learning_rate": 1.5467577367400288e-05,
266
+ "loss": 2.8824,
267
+ "step": 21500
268
+ },
269
+ {
270
+ "epoch": 0.7,
271
+ "learning_rate": 1.5362172189897967e-05,
272
+ "loss": 2.8859,
273
+ "step": 22000
274
+ },
275
+ {
276
+ "epoch": 0.71,
277
+ "learning_rate": 1.525676701239565e-05,
278
+ "loss": 2.8539,
279
+ "step": 22500
280
+ },
281
+ {
282
+ "epoch": 0.73,
283
+ "learning_rate": 1.5151361834893332e-05,
284
+ "loss": 2.8895,
285
+ "step": 23000
286
+ },
287
+ {
288
+ "epoch": 0.74,
289
+ "learning_rate": 1.5045956657391013e-05,
290
+ "loss": 2.869,
291
+ "step": 23500
292
+ },
293
+ {
294
+ "epoch": 0.76,
295
+ "learning_rate": 1.4940551479888695e-05,
296
+ "loss": 2.8888,
297
+ "step": 24000
298
+ },
299
+ {
300
+ "epoch": 0.77,
301
+ "learning_rate": 1.4835146302386373e-05,
302
+ "loss": 2.8637,
303
+ "step": 24500
304
+ },
305
+ {
306
+ "epoch": 0.79,
307
+ "learning_rate": 1.4729741124884055e-05,
308
+ "loss": 2.8905,
309
+ "step": 25000
310
+ },
311
+ {
312
+ "epoch": 0.81,
313
+ "learning_rate": 1.4624335947381736e-05,
314
+ "loss": 2.8916,
315
+ "step": 25500
316
+ },
317
+ {
318
+ "epoch": 0.82,
319
+ "learning_rate": 1.4518930769879418e-05,
320
+ "loss": 2.8801,
321
+ "step": 26000
322
+ },
323
+ {
324
+ "epoch": 0.84,
325
+ "learning_rate": 1.4413525592377098e-05,
326
+ "loss": 2.8854,
327
+ "step": 26500
328
+ },
329
+ {
330
+ "epoch": 0.85,
331
+ "learning_rate": 1.430812041487478e-05,
332
+ "loss": 2.8838,
333
+ "step": 27000
334
+ },
335
+ {
336
+ "epoch": 0.87,
337
+ "learning_rate": 1.4202715237372462e-05,
338
+ "loss": 2.8701,
339
+ "step": 27500
340
+ },
341
+ {
342
+ "epoch": 0.89,
343
+ "learning_rate": 1.4097310059870142e-05,
344
+ "loss": 2.8749,
345
+ "step": 28000
346
+ },
347
+ {
348
+ "epoch": 0.9,
349
+ "learning_rate": 1.3991904882367824e-05,
350
+ "loss": 2.8488,
351
+ "step": 28500
352
+ },
353
+ {
354
+ "epoch": 0.92,
355
+ "learning_rate": 1.3886499704865503e-05,
356
+ "loss": 2.8685,
357
+ "step": 29000
358
+ },
359
+ {
360
+ "epoch": 0.93,
361
+ "learning_rate": 1.3781094527363185e-05,
362
+ "loss": 2.8526,
363
+ "step": 29500
364
+ },
365
+ {
366
+ "epoch": 0.95,
367
+ "learning_rate": 1.3675689349860865e-05,
368
+ "loss": 2.8521,
369
+ "step": 30000
370
+ },
371
+ {
372
+ "epoch": 0.96,
373
+ "learning_rate": 1.3570284172358547e-05,
374
+ "loss": 2.8803,
375
+ "step": 30500
376
+ },
377
+ {
378
+ "epoch": 0.98,
379
+ "learning_rate": 1.3464878994856228e-05,
380
+ "loss": 2.8803,
381
+ "step": 31000
382
+ },
383
+ {
384
+ "epoch": 1.0,
385
+ "learning_rate": 1.335947381735391e-05,
386
+ "loss": 2.8807,
387
+ "step": 31500
388
+ },
389
+ {
390
+ "epoch": 1.0,
391
+ "eval_loss": 2.767564058303833,
392
+ "eval_runtime": 6452.3126,
393
+ "eval_samples_per_second": 39.209,
394
+ "eval_steps_per_second": 4.901,
395
+ "step": 31624
396
+ },
397
+ {
398
+ "epoch": 1.01,
399
+ "learning_rate": 1.3254068639851592e-05,
400
+ "loss": 2.8679,
401
+ "step": 32000
402
+ },
403
+ {
404
+ "epoch": 1.03,
405
+ "learning_rate": 1.3148663462349272e-05,
406
+ "loss": 2.874,
407
+ "step": 32500
408
+ },
409
+ {
410
+ "epoch": 1.04,
411
+ "learning_rate": 1.3043258284846954e-05,
412
+ "loss": 2.8517,
413
+ "step": 33000
414
+ },
415
+ {
416
+ "epoch": 1.06,
417
+ "learning_rate": 1.2937853107344633e-05,
418
+ "loss": 2.8499,
419
+ "step": 33500
420
+ },
421
+ {
422
+ "epoch": 1.08,
423
+ "learning_rate": 1.2832447929842315e-05,
424
+ "loss": 2.8693,
425
+ "step": 34000
426
+ },
427
+ {
428
+ "epoch": 1.09,
429
+ "learning_rate": 1.2727042752339995e-05,
430
+ "loss": 2.8738,
431
+ "step": 34500
432
+ },
433
+ {
434
+ "epoch": 1.11,
435
+ "learning_rate": 1.2621637574837677e-05,
436
+ "loss": 2.8282,
437
+ "step": 35000
438
+ },
439
+ {
440
+ "epoch": 1.12,
441
+ "learning_rate": 1.2516232397335358e-05,
442
+ "loss": 2.8402,
443
+ "step": 35500
444
+ },
445
+ {
446
+ "epoch": 1.14,
447
+ "learning_rate": 1.241082721983304e-05,
448
+ "loss": 2.8686,
449
+ "step": 36000
450
+ },
451
+ {
452
+ "epoch": 1.15,
453
+ "learning_rate": 1.2305422042330722e-05,
454
+ "loss": 2.8629,
455
+ "step": 36500
456
+ },
457
+ {
458
+ "epoch": 1.17,
459
+ "learning_rate": 1.2200016864828402e-05,
460
+ "loss": 2.8643,
461
+ "step": 37000
462
+ },
463
+ {
464
+ "epoch": 1.19,
465
+ "learning_rate": 1.2094611687326084e-05,
466
+ "loss": 2.8254,
467
+ "step": 37500
468
+ },
469
+ {
470
+ "epoch": 1.2,
471
+ "learning_rate": 1.1989206509823763e-05,
472
+ "loss": 2.8569,
473
+ "step": 38000
474
+ },
475
+ {
476
+ "epoch": 1.22,
477
+ "learning_rate": 1.1883801332321445e-05,
478
+ "loss": 2.8581,
479
+ "step": 38500
480
+ },
481
+ {
482
+ "epoch": 1.23,
483
+ "learning_rate": 1.1778396154819125e-05,
484
+ "loss": 2.8454,
485
+ "step": 39000
486
+ },
487
+ {
488
+ "epoch": 1.25,
489
+ "learning_rate": 1.1672990977316807e-05,
490
+ "loss": 2.8462,
491
+ "step": 39500
492
+ },
493
+ {
494
+ "epoch": 1.26,
495
+ "learning_rate": 1.1567585799814487e-05,
496
+ "loss": 2.865,
497
+ "step": 40000
498
+ },
499
+ {
500
+ "epoch": 1.28,
501
+ "learning_rate": 1.146218062231217e-05,
502
+ "loss": 2.8448,
503
+ "step": 40500
504
+ },
505
+ {
506
+ "epoch": 1.3,
507
+ "learning_rate": 1.1356775444809851e-05,
508
+ "loss": 2.8366,
509
+ "step": 41000
510
+ },
511
+ {
512
+ "epoch": 1.31,
513
+ "learning_rate": 1.1251370267307532e-05,
514
+ "loss": 2.8472,
515
+ "step": 41500
516
+ },
517
+ {
518
+ "epoch": 1.33,
519
+ "learning_rate": 1.1145965089805214e-05,
520
+ "loss": 2.8406,
521
+ "step": 42000
522
+ },
523
+ {
524
+ "epoch": 1.34,
525
+ "learning_rate": 1.1040559912302892e-05,
526
+ "loss": 2.8612,
527
+ "step": 42500
528
+ },
529
+ {
530
+ "epoch": 1.36,
531
+ "learning_rate": 1.0935154734800574e-05,
532
+ "loss": 2.8621,
533
+ "step": 43000
534
+ },
535
+ {
536
+ "epoch": 1.38,
537
+ "learning_rate": 1.0829749557298255e-05,
538
+ "loss": 2.8466,
539
+ "step": 43500
540
+ },
541
+ {
542
+ "epoch": 1.39,
543
+ "learning_rate": 1.0724344379795937e-05,
544
+ "loss": 2.8412,
545
+ "step": 44000
546
+ },
547
+ {
548
+ "epoch": 1.41,
549
+ "learning_rate": 1.0618939202293617e-05,
550
+ "loss": 2.8483,
551
+ "step": 44500
552
+ },
553
+ {
554
+ "epoch": 1.42,
555
+ "learning_rate": 1.0513534024791299e-05,
556
+ "loss": 2.8502,
557
+ "step": 45000
558
+ },
559
+ {
560
+ "epoch": 1.44,
561
+ "learning_rate": 1.0408128847288981e-05,
562
+ "loss": 2.8451,
563
+ "step": 45500
564
+ },
565
+ {
566
+ "epoch": 1.45,
567
+ "learning_rate": 1.0302723669786661e-05,
568
+ "loss": 2.8406,
569
+ "step": 46000
570
+ },
571
+ {
572
+ "epoch": 1.47,
573
+ "learning_rate": 1.0197318492284343e-05,
574
+ "loss": 2.8499,
575
+ "step": 46500
576
+ },
577
+ {
578
+ "epoch": 1.49,
579
+ "learning_rate": 1.0091913314782022e-05,
580
+ "loss": 2.8533,
581
+ "step": 47000
582
+ },
583
+ {
584
+ "epoch": 1.5,
585
+ "learning_rate": 9.986508137279704e-06,
586
+ "loss": 2.8233,
587
+ "step": 47500
588
+ },
589
+ {
590
+ "epoch": 1.52,
591
+ "learning_rate": 9.881102959777386e-06,
592
+ "loss": 2.8373,
593
+ "step": 48000
594
+ },
595
+ {
596
+ "epoch": 1.53,
597
+ "learning_rate": 9.775697782275066e-06,
598
+ "loss": 2.8606,
599
+ "step": 48500
600
+ },
601
+ {
602
+ "epoch": 1.55,
603
+ "learning_rate": 9.670292604772747e-06,
604
+ "loss": 2.8385,
605
+ "step": 49000
606
+ },
607
+ {
608
+ "epoch": 1.57,
609
+ "learning_rate": 9.564887427270429e-06,
610
+ "loss": 2.8455,
611
+ "step": 49500
612
+ },
613
+ {
614
+ "epoch": 1.58,
615
+ "learning_rate": 9.459482249768109e-06,
616
+ "loss": 2.8701,
617
+ "step": 50000
618
+ },
619
+ {
620
+ "epoch": 1.6,
621
+ "learning_rate": 9.354077072265791e-06,
622
+ "loss": 2.8277,
623
+ "step": 50500
624
+ },
625
+ {
626
+ "epoch": 1.61,
627
+ "learning_rate": 9.248671894763471e-06,
628
+ "loss": 2.8681,
629
+ "step": 51000
630
+ },
631
+ {
632
+ "epoch": 1.63,
633
+ "learning_rate": 9.143266717261152e-06,
634
+ "loss": 2.8445,
635
+ "step": 51500
636
+ },
637
+ {
638
+ "epoch": 1.64,
639
+ "learning_rate": 9.037861539758834e-06,
640
+ "loss": 2.8545,
641
+ "step": 52000
642
+ },
643
+ {
644
+ "epoch": 1.66,
645
+ "learning_rate": 8.932456362256516e-06,
646
+ "loss": 2.8397,
647
+ "step": 52500
648
+ },
649
+ {
650
+ "epoch": 1.68,
651
+ "learning_rate": 8.827051184754196e-06,
652
+ "loss": 2.8442,
653
+ "step": 53000
654
+ },
655
+ {
656
+ "epoch": 1.69,
657
+ "learning_rate": 8.721646007251876e-06,
658
+ "loss": 2.8288,
659
+ "step": 53500
660
+ },
661
+ {
662
+ "epoch": 1.71,
663
+ "learning_rate": 8.616240829749558e-06,
664
+ "loss": 2.8278,
665
+ "step": 54000
666
+ },
667
+ {
668
+ "epoch": 1.72,
669
+ "learning_rate": 8.510835652247239e-06,
670
+ "loss": 2.835,
671
+ "step": 54500
672
+ },
673
+ {
674
+ "epoch": 1.74,
675
+ "learning_rate": 8.40543047474492e-06,
676
+ "loss": 2.8464,
677
+ "step": 55000
678
+ },
679
+ {
680
+ "epoch": 1.75,
681
+ "learning_rate": 8.300025297242601e-06,
682
+ "loss": 2.8251,
683
+ "step": 55500
684
+ },
685
+ {
686
+ "epoch": 1.77,
687
+ "learning_rate": 8.194620119740281e-06,
688
+ "loss": 2.8414,
689
+ "step": 56000
690
+ },
691
+ {
692
+ "epoch": 1.79,
693
+ "learning_rate": 8.089214942237964e-06,
694
+ "loss": 2.8335,
695
+ "step": 56500
696
+ },
697
+ {
698
+ "epoch": 1.8,
699
+ "learning_rate": 7.983809764735646e-06,
700
+ "loss": 2.8405,
701
+ "step": 57000
702
+ },
703
+ {
704
+ "epoch": 1.82,
705
+ "learning_rate": 7.878404587233326e-06,
706
+ "loss": 2.835,
707
+ "step": 57500
708
+ },
709
+ {
710
+ "epoch": 1.83,
711
+ "learning_rate": 7.772999409731006e-06,
712
+ "loss": 2.8239,
713
+ "step": 58000
714
+ },
715
+ {
716
+ "epoch": 1.85,
717
+ "learning_rate": 7.667594232228688e-06,
718
+ "loss": 2.8322,
719
+ "step": 58500
720
+ },
721
+ {
722
+ "epoch": 1.87,
723
+ "learning_rate": 7.5621890547263685e-06,
724
+ "loss": 2.8366,
725
+ "step": 59000
726
+ },
727
+ {
728
+ "epoch": 1.88,
729
+ "learning_rate": 7.45678387722405e-06,
730
+ "loss": 2.8391,
731
+ "step": 59500
732
+ },
733
+ {
734
+ "epoch": 1.9,
735
+ "learning_rate": 7.351378699721731e-06,
736
+ "loss": 2.8396,
737
+ "step": 60000
738
+ },
739
+ {
740
+ "epoch": 1.91,
741
+ "learning_rate": 7.245973522219411e-06,
742
+ "loss": 2.824,
743
+ "step": 60500
744
+ },
745
+ {
746
+ "epoch": 1.93,
747
+ "learning_rate": 7.140568344717092e-06,
748
+ "loss": 2.8282,
749
+ "step": 61000
750
+ },
751
+ {
752
+ "epoch": 1.94,
753
+ "learning_rate": 7.035163167214774e-06,
754
+ "loss": 2.8341,
755
+ "step": 61500
756
+ },
757
+ {
758
+ "epoch": 1.96,
759
+ "learning_rate": 6.929757989712456e-06,
760
+ "loss": 2.8247,
761
+ "step": 62000
762
+ },
763
+ {
764
+ "epoch": 1.98,
765
+ "learning_rate": 6.824352812210137e-06,
766
+ "loss": 2.837,
767
+ "step": 62500
768
+ },
769
+ {
770
+ "epoch": 1.99,
771
+ "learning_rate": 6.718947634707817e-06,
772
+ "loss": 2.8199,
773
+ "step": 63000
774
+ },
775
+ {
776
+ "epoch": 2.0,
777
+ "eval_loss": 2.7402868270874023,
778
+ "eval_runtime": 6430.6551,
779
+ "eval_samples_per_second": 39.341,
780
+ "eval_steps_per_second": 4.918,
781
+ "step": 63248
782
+ },
783
+ {
784
+ "epoch": 2.01,
785
+ "learning_rate": 6.613542457205498e-06,
786
+ "loss": 2.8177,
787
+ "step": 63500
788
+ },
789
+ {
790
+ "epoch": 2.02,
791
+ "learning_rate": 6.5081372797031795e-06,
792
+ "loss": 2.8361,
793
+ "step": 64000
794
+ },
795
+ {
796
+ "epoch": 2.04,
797
+ "learning_rate": 6.402732102200861e-06,
798
+ "loss": 2.8109,
799
+ "step": 64500
800
+ },
801
+ {
802
+ "epoch": 2.06,
803
+ "learning_rate": 6.297326924698541e-06,
804
+ "loss": 2.8399,
805
+ "step": 65000
806
+ },
807
+ {
808
+ "epoch": 2.07,
809
+ "learning_rate": 6.191921747196222e-06,
810
+ "loss": 2.8311,
811
+ "step": 65500
812
+ },
813
+ {
814
+ "epoch": 2.09,
815
+ "learning_rate": 6.086516569693904e-06,
816
+ "loss": 2.8227,
817
+ "step": 66000
818
+ },
819
+ {
820
+ "epoch": 2.1,
821
+ "learning_rate": 5.981111392191585e-06,
822
+ "loss": 2.8347,
823
+ "step": 66500
824
+ },
825
+ {
826
+ "epoch": 2.12,
827
+ "learning_rate": 5.8757062146892665e-06,
828
+ "loss": 2.8259,
829
+ "step": 67000
830
+ },
831
+ {
832
+ "epoch": 2.13,
833
+ "learning_rate": 5.770301037186947e-06,
834
+ "loss": 2.8554,
835
+ "step": 67500
836
+ },
837
+ {
838
+ "epoch": 2.15,
839
+ "learning_rate": 5.664895859684628e-06,
840
+ "loss": 2.8385,
841
+ "step": 68000
842
+ },
843
+ {
844
+ "epoch": 2.17,
845
+ "learning_rate": 5.559490682182309e-06,
846
+ "loss": 2.8042,
847
+ "step": 68500
848
+ },
849
+ {
850
+ "epoch": 2.18,
851
+ "learning_rate": 5.45408550467999e-06,
852
+ "loss": 2.8222,
853
+ "step": 69000
854
+ },
855
+ {
856
+ "epoch": 2.2,
857
+ "learning_rate": 5.348680327177671e-06,
858
+ "loss": 2.8209,
859
+ "step": 69500
860
+ },
861
+ {
862
+ "epoch": 2.21,
863
+ "learning_rate": 5.243275149675352e-06,
864
+ "loss": 2.836,
865
+ "step": 70000
866
+ },
867
+ {
868
+ "epoch": 2.23,
869
+ "learning_rate": 5.137869972173034e-06,
870
+ "loss": 2.8291,
871
+ "step": 70500
872
+ },
873
+ {
874
+ "epoch": 2.25,
875
+ "learning_rate": 5.032464794670715e-06,
876
+ "loss": 2.8257,
877
+ "step": 71000
878
+ },
879
+ {
880
+ "epoch": 2.26,
881
+ "learning_rate": 4.927059617168396e-06,
882
+ "loss": 2.8353,
883
+ "step": 71500
884
+ },
885
+ {
886
+ "epoch": 2.28,
887
+ "learning_rate": 4.8216544396660766e-06,
888
+ "loss": 2.8184,
889
+ "step": 72000
890
+ },
891
+ {
892
+ "epoch": 2.29,
893
+ "learning_rate": 4.716249262163758e-06,
894
+ "loss": 2.8308,
895
+ "step": 72500
896
+ },
897
+ {
898
+ "epoch": 2.31,
899
+ "learning_rate": 4.610844084661439e-06,
900
+ "loss": 2.8321,
901
+ "step": 73000
902
+ },
903
+ {
904
+ "epoch": 2.32,
905
+ "learning_rate": 4.50543890715912e-06,
906
+ "loss": 2.8366,
907
+ "step": 73500
908
+ },
909
+ {
910
+ "epoch": 2.34,
911
+ "learning_rate": 4.400033729656801e-06,
912
+ "loss": 2.8133,
913
+ "step": 74000
914
+ },
915
+ {
916
+ "epoch": 2.36,
917
+ "learning_rate": 4.2946285521544825e-06,
918
+ "loss": 2.8063,
919
+ "step": 74500
920
+ },
921
+ {
922
+ "epoch": 2.37,
923
+ "learning_rate": 4.189223374652164e-06,
924
+ "loss": 2.8334,
925
+ "step": 75000
926
+ },
927
+ {
928
+ "epoch": 2.39,
929
+ "learning_rate": 4.083818197149844e-06,
930
+ "loss": 2.8103,
931
+ "step": 75500
932
+ },
933
+ {
934
+ "epoch": 2.4,
935
+ "learning_rate": 3.978413019647526e-06,
936
+ "loss": 2.8221,
937
+ "step": 76000
938
+ },
939
+ {
940
+ "epoch": 2.42,
941
+ "learning_rate": 3.873007842145206e-06,
942
+ "loss": 2.817,
943
+ "step": 76500
944
+ },
945
+ {
946
+ "epoch": 2.43,
947
+ "learning_rate": 3.7676026646428875e-06,
948
+ "loss": 2.8334,
949
+ "step": 77000
950
+ },
951
+ {
952
+ "epoch": 2.45,
953
+ "learning_rate": 3.6621974871405687e-06,
954
+ "loss": 2.8423,
955
+ "step": 77500
956
+ },
957
+ {
958
+ "epoch": 2.47,
959
+ "learning_rate": 3.5567923096382494e-06,
960
+ "loss": 2.8247,
961
+ "step": 78000
962
+ },
963
+ {
964
+ "epoch": 2.48,
965
+ "learning_rate": 3.451387132135931e-06,
966
+ "loss": 2.8373,
967
+ "step": 78500
968
+ },
969
+ {
970
+ "epoch": 2.5,
971
+ "learning_rate": 3.3459819546336118e-06,
972
+ "loss": 2.8324,
973
+ "step": 79000
974
+ },
975
+ {
976
+ "epoch": 2.51,
977
+ "learning_rate": 3.240576777131293e-06,
978
+ "loss": 2.8312,
979
+ "step": 79500
980
+ },
981
+ {
982
+ "epoch": 2.53,
983
+ "learning_rate": 3.1351715996289737e-06,
984
+ "loss": 2.8452,
985
+ "step": 80000
986
+ },
987
+ {
988
+ "epoch": 2.55,
989
+ "learning_rate": 3.029766422126655e-06,
990
+ "loss": 2.8331,
991
+ "step": 80500
992
+ },
993
+ {
994
+ "epoch": 2.56,
995
+ "learning_rate": 2.9243612446243365e-06,
996
+ "loss": 2.8209,
997
+ "step": 81000
998
+ },
999
+ {
1000
+ "epoch": 2.58,
1001
+ "learning_rate": 2.8189560671220172e-06,
1002
+ "loss": 2.8289,
1003
+ "step": 81500
1004
+ },
1005
+ {
1006
+ "epoch": 2.59,
1007
+ "learning_rate": 2.7135508896196984e-06,
1008
+ "loss": 2.8415,
1009
+ "step": 82000
1010
+ },
1011
+ {
1012
+ "epoch": 2.61,
1013
+ "learning_rate": 2.608145712117379e-06,
1014
+ "loss": 2.8256,
1015
+ "step": 82500
1016
+ },
1017
+ {
1018
+ "epoch": 2.62,
1019
+ "learning_rate": 2.5027405346150608e-06,
1020
+ "loss": 2.8257,
1021
+ "step": 83000
1022
+ },
1023
+ {
1024
+ "epoch": 2.64,
1025
+ "learning_rate": 2.3973353571127415e-06,
1026
+ "loss": 2.833,
1027
+ "step": 83500
1028
+ },
1029
+ {
1030
+ "epoch": 2.66,
1031
+ "learning_rate": 2.2919301796104227e-06,
1032
+ "loss": 2.8348,
1033
+ "step": 84000
1034
+ },
1035
+ {
1036
+ "epoch": 2.67,
1037
+ "learning_rate": 2.186525002108104e-06,
1038
+ "loss": 2.8292,
1039
+ "step": 84500
1040
+ },
1041
+ {
1042
+ "epoch": 2.69,
1043
+ "learning_rate": 2.081119824605785e-06,
1044
+ "loss": 2.8259,
1045
+ "step": 85000
1046
+ },
1047
+ {
1048
+ "epoch": 2.7,
1049
+ "learning_rate": 1.9757146471034658e-06,
1050
+ "loss": 2.8319,
1051
+ "step": 85500
1052
+ },
1053
+ {
1054
+ "epoch": 2.72,
1055
+ "learning_rate": 1.870309469601147e-06,
1056
+ "loss": 2.8094,
1057
+ "step": 86000
1058
+ },
1059
+ {
1060
+ "epoch": 2.74,
1061
+ "learning_rate": 1.764904292098828e-06,
1062
+ "loss": 2.8276,
1063
+ "step": 86500
1064
+ },
1065
+ {
1066
+ "epoch": 2.75,
1067
+ "learning_rate": 1.659499114596509e-06,
1068
+ "loss": 2.8325,
1069
+ "step": 87000
1070
+ },
1071
+ {
1072
+ "epoch": 2.77,
1073
+ "learning_rate": 1.55409393709419e-06,
1074
+ "loss": 2.8247,
1075
+ "step": 87500
1076
+ },
1077
+ {
1078
+ "epoch": 2.78,
1079
+ "learning_rate": 1.4486887595918715e-06,
1080
+ "loss": 2.8363,
1081
+ "step": 88000
1082
+ },
1083
+ {
1084
+ "epoch": 2.8,
1085
+ "learning_rate": 1.3432835820895524e-06,
1086
+ "loss": 2.8343,
1087
+ "step": 88500
1088
+ },
1089
+ {
1090
+ "epoch": 2.81,
1091
+ "learning_rate": 1.2378784045872334e-06,
1092
+ "loss": 2.8325,
1093
+ "step": 89000
1094
+ },
1095
+ {
1096
+ "epoch": 2.83,
1097
+ "learning_rate": 1.1324732270849146e-06,
1098
+ "loss": 2.8316,
1099
+ "step": 89500
1100
+ },
1101
+ {
1102
+ "epoch": 2.85,
1103
+ "learning_rate": 1.0270680495825955e-06,
1104
+ "loss": 2.8232,
1105
+ "step": 90000
1106
+ },
1107
+ {
1108
+ "epoch": 2.86,
1109
+ "learning_rate": 9.216628720802767e-07,
1110
+ "loss": 2.8241,
1111
+ "step": 90500
1112
+ },
1113
+ {
1114
+ "epoch": 2.88,
1115
+ "learning_rate": 8.162576945779577e-07,
1116
+ "loss": 2.8203,
1117
+ "step": 91000
1118
+ },
1119
+ {
1120
+ "epoch": 2.89,
1121
+ "learning_rate": 7.108525170756387e-07,
1122
+ "loss": 2.8356,
1123
+ "step": 91500
1124
+ },
1125
+ {
1126
+ "epoch": 2.91,
1127
+ "learning_rate": 6.054473395733199e-07,
1128
+ "loss": 2.8509,
1129
+ "step": 92000
1130
+ },
1131
+ {
1132
+ "epoch": 2.92,
1133
+ "learning_rate": 5.00042162071001e-07,
1134
+ "loss": 2.8315,
1135
+ "step": 92500
1136
+ },
1137
+ {
1138
+ "epoch": 2.94,
1139
+ "learning_rate": 3.9463698456868205e-07,
1140
+ "loss": 2.8223,
1141
+ "step": 93000
1142
+ },
1143
+ {
1144
+ "epoch": 2.96,
1145
+ "learning_rate": 2.892318070663631e-07,
1146
+ "loss": 2.8368,
1147
+ "step": 93500
1148
+ },
1149
+ {
1150
+ "epoch": 2.97,
1151
+ "learning_rate": 1.838266295640442e-07,
1152
+ "loss": 2.8271,
1153
+ "step": 94000
1154
+ },
1155
+ {
1156
+ "epoch": 2.99,
1157
+ "learning_rate": 7.842145206172527e-08,
1158
+ "loss": 2.8375,
1159
+ "step": 94500
1160
+ }
1161
+ ],
1162
+ "logging_steps": 500,
1163
+ "max_steps": 94872,
1164
+ "num_train_epochs": 3,
1165
+ "save_steps": 500,
1166
+ "total_flos": 7.025524751644754e+17,
1167
+ "trial_name": null,
1168
+ "trial_params": null
1169
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:016b9b790bf64d70e454e68cc09880a3a450d4e85678979f995976030c8bf78c
3
+ size 4536