|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
**Hyperparameters:** |
|
|
|
- learning rate: 2e-5 |
|
- weight decay: 0.01 |
|
- per_device_train_batch_size: 8 |
|
- per_device_eval_batch_size: 8 |
|
- gradient_accumulation_steps:1 |
|
- eval steps: 6000 |
|
- max_length: 512 |
|
- num_epochs: 2 |
|
|
|
**Dataset version:** |
|
- “craffel/tasky_or_not”, “10xp3_10xc4”, “15f88c8” |
|
|
|
**Checkpoint:** |
|
|
|
- 48000 steps |
|
|
|
**Results on Validation set:** |
|
|
|
| Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 | |
|
|-------|---------------|-----------------|----------|-----------|----------|----------| |
|
| 6000 | 0.031900 | 0.163412 | 0.982194 | 0.999211 | 0.980462 | 0.989748 | |
|
| 12000 | 0.014700 | 0.106132 | 0.976666 | 0.999639 | 0.973733 | 0.986516 | |
|
| 18000 | 0.010700 | 0.043012 | 0.995743 | 0.999223 | 0.995918 | 0.997568 | |
|
| 24000 | 0.007400 | 0.095047 | 0.984724 | 0.999857 | 0.982714 | 0.991211 | |
|
| 30000 | 0.004100 | 0.087274 | 0.990400 | 0.999829 | 0.989217 | 0.994495 | |
|
| 36000 | 0.003100 | 0.162909 | 0.981972 | 1.000000 | 0.979434 | 0.989610 | |
|
| 42000 | 0.002200 | 0.148721 | 0.980454 | 0.999986 | 0.977717 | 0.988726 | |
|
| 48000 | 0.001000 | 0.094455 | 0.990437 | 0.999943 | 0.989147 | 0.994516 | |
|
|
|
|
|
|
|
|