Filip commited on
Commit
7dc8469
·
1 Parent(s): 2bbdffc

small update

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -36,7 +36,8 @@ Both models used the same hyperparameters during training.
36
  `lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.
37
 
38
  `lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
39
- `per_device_train_batch_size=2`:
 
40
 
41
  `gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.
42
 
 
36
  `lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.
37
 
38
  `lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
39
+
40
+ `per_device_train_batch_size=2`: Batch size per device (GPU/TPU or core/CPU)
41
 
42
  `gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.
43