Update README.md
Browse files
README.md
CHANGED
@@ -181,7 +181,7 @@ TODO
|
|
181 |
#### Training Hyperparameters
|
182 |
|
183 |
The following hyperparameters were used during training:
|
184 |
-
- learning_rate: 0.
|
185 |
- train_batch_size: 1024
|
186 |
- eval_batch_size: 256
|
187 |
- seed: 42
|
@@ -190,7 +190,7 @@ The following hyperparameters were used during training:
|
|
190 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
191 |
- lr_scheduler_type: cosine
|
192 |
- lr_scheduler_warmup_steps: 1000
|
193 |
-
- num_epochs:
|
194 |
|
195 |
## Evaluation
|
196 |
|
@@ -204,7 +204,7 @@ The architecture of this model is [Mixtral](https://huggingface.co/docs/transfor
|
|
204 |
|
205 |
### Compute Infrastructure
|
206 |
|
207 |
-
|
208 |
|
209 |
#### Hardware
|
210 |
|
|
|
181 |
#### Training Hyperparameters
|
182 |
|
183 |
The following hyperparameters were used during training:
|
184 |
+
- learning_rate: 0.0005
|
185 |
- train_batch_size: 1024
|
186 |
- eval_batch_size: 256
|
187 |
- seed: 42
|
|
|
190 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
191 |
- lr_scheduler_type: cosine
|
192 |
- lr_scheduler_warmup_steps: 1000
|
193 |
+
- num_epochs: 5
|
194 |
|
195 |
## Evaluation
|
196 |
|
|
|
204 |
|
205 |
### Compute Infrastructure
|
206 |
|
207 |
+
Server in a university laboratory
|
208 |
|
209 |
#### Hardware
|
210 |
|