Spaces:
Sleeping
Sleeping
Filip
commited on
Commit
·
7dc8469
1
Parent(s):
2bbdffc
small update
Browse files
README.md
CHANGED
@@ -36,7 +36,8 @@ Both models used the same hyperparameters during training.
|
|
36 |
`lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.
|
37 |
|
38 |
`lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
|
39 |
-
|
|
|
40 |
|
41 |
`gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.
|
42 |
|
|
|
36 |
`lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.
|
37 |
|
38 |
`lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
|
39 |
+
|
40 |
+
`per_device_train_batch_size=2`: Batch size per device (GPU/TPU or core/CPU)
|
41 |
|
42 |
`gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.
|
43 |
|