Update README.md
Browse files
README.md
CHANGED
@@ -99,7 +99,7 @@ https://huggingface.co/datasets/isaacchung/hotpotqa-dev-raft-subset
|
|
99 |
|
100 |
#### Training Hyperparameters
|
101 |
|
102 |
-
<!-- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
103 |
|
104 |
Model loaded:
|
105 |
```python
|
@@ -160,11 +160,27 @@ trainer = SFTTrainer(
|
|
160 |
)
|
161 |
```
|
162 |
|
163 |
-
|
164 |
|
165 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
166 |
|
167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
|
169 |
<!-- ## Evaluation -->
|
170 |
|
@@ -207,29 +223,29 @@ trainer = SFTTrainer(
|
|
207 |
<!-- ## Environmental Impact -->
|
208 |
|
209 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
210 |
-
|
211 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
212 |
|
213 |
- **Hardware Type:** [More Information Needed]
|
214 |
- **Hours used:** [More Information Needed]
|
215 |
- **Cloud Provider:** [More Information Needed]
|
216 |
- **Compute Region:** [More Information Needed]
|
217 |
-
- **Carbon Emitted:** [More Information Needed]
|
218 |
|
219 |
## Technical Specifications [optional]
|
220 |
|
221 |
-
### Model Architecture and Objective
|
222 |
|
223 |
-
[More Information Needed]
|
224 |
|
225 |
### Compute Infrastructure
|
226 |
|
227 |
-
[More Information Needed]
|
228 |
|
229 |
#### Hardware
|
230 |
|
231 |
-
|
232 |
-
|
233 |
<!-- #### Software
|
234 |
|
235 |
[More Information Needed]
|
|
|
99 |
|
100 |
#### Training Hyperparameters
|
101 |
|
102 |
+
<!-- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
103 |
|
104 |
Model loaded:
|
105 |
```python
|
|
|
160 |
)
|
161 |
```
|
162 |
|
163 |
+
#### Speeds, Sizes, Times [optional]
|
164 |
|
165 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
166 |
|
167 |
+
- train_runtime: 1148.4436
|
168 |
+
- train_samples_per_second: 0.392
|
169 |
+
- train_steps_per_second: 0.065
|
170 |
+
- train_loss: 0.5639963404337565
|
171 |
+
- epoch: 3.0
|
172 |
+
|
173 |
+
#### Training Loss
|
174 |
+
|
175 |
+
```
|
176 |
+
{'loss': 1.0092, 'grad_norm': 0.27965569496154785, 'learning_rate': 0.0002, 'epoch': 0.4}
|
177 |
+
{'loss': 0.695, 'grad_norm': 0.17789314687252045, 'learning_rate': 0.0002, 'epoch': 0.8}
|
178 |
+
{'loss': 0.6747, 'grad_norm': 0.13655725121498108, 'learning_rate': 0.0002, 'epoch': 1.2}
|
179 |
+
{'loss': 0.508, 'grad_norm': 0.14653471112251282, 'learning_rate': 0.0002, 'epoch': 1.6}
|
180 |
+
{'loss': 0.4961, 'grad_norm': 0.14873674511909485, 'learning_rate': 0.0002, 'epoch': 2.0}
|
181 |
+
{'loss': 0.3509, 'grad_norm': 0.1657964587211609, 'learning_rate': 0.0002, 'epoch': 2.4}
|
182 |
+
{'loss': 0.3321, 'grad_norm': 0.1634644716978073, 'learning_rate': 0.0002, 'epoch': 2.8}
|
183 |
+
```
|
184 |
|
185 |
<!-- ## Evaluation -->
|
186 |
|
|
|
223 |
<!-- ## Environmental Impact -->
|
224 |
|
225 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
226 |
+
|
227 |
+
<!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
228 |
|
229 |
- **Hardware Type:** [More Information Needed]
|
230 |
- **Hours used:** [More Information Needed]
|
231 |
- **Cloud Provider:** [More Information Needed]
|
232 |
- **Compute Region:** [More Information Needed]
|
233 |
+
- **Carbon Emitted:** [More Information Needed] -->
|
234 |
|
235 |
## Technical Specifications [optional]
|
236 |
|
237 |
+
<!-- ### Model Architecture and Objective -->
|
238 |
|
239 |
+
<!-- [More Information Needed] -->
|
240 |
|
241 |
### Compute Infrastructure
|
242 |
|
243 |
+
<!-- [More Information Needed] -->
|
244 |
|
245 |
#### Hardware
|
246 |
|
247 |
+
- 1x NVIDIA RTX 6000 Ada
|
248 |
+
|
249 |
<!-- #### Software
|
250 |
|
251 |
[More Information Needed]
|