|
***** Running training ***** |
|
Num examples = 6004 |
|
Num Epochs = 14 |
|
Instantaneous batch size per device = 16 |
|
Total train batch size (w. parallel, distributed & accumulation) = 16 |
|
Gradient Accumulation steps = 1 |
|
Total optimization steps = 5000 |
|
[2500/5000 12:15 < 12:15, 3.40 it/s, Epoch 6/14] |
|
Step Training Loss Validation Loss Precision Recall F1 Accuracy |
|
100 No log 0.247325 0.912333 0.925744 0.918990 0.960895 |
|
200 No log 0.171694 0.930514 0.928760 0.929636 0.963143 |
|
300 No log 0.132045 0.935375 0.943837 0.939587 0.970515 |
|
400 No log 0.142074 0.936490 0.939314 0.937900 0.968141 |
|
500 0.245500 0.105783 0.949794 0.955522 0.952649 0.975887 |
|
600 0.245500 0.107380 0.948120 0.950622 0.949369 0.973138 |
|
700 0.245500 0.111011 0.951504 0.954014 0.952757 0.972889 |
|
800 0.245500 0.093002 0.947999 0.955145 0.951558 0.975387 |
|
900 0.245500 0.100926 0.956193 0.954391 0.955291 0.976262 |
|
1000 0.086800 0.090775 0.955263 0.957784 0.956522 0.976637 |
|
1100 0.086800 0.099250 0.953829 0.957784 0.955802 0.976137 |
|
1200 0.086800 0.088502 0.952327 0.956276 0.954298 0.976762 |
|
1300 0.086800 0.094135 0.957078 0.958161 0.957619 0.977011 |
|
1400 0.086800 0.099687 0.957768 0.957407 0.957587 0.975887 |
|
1500 0.056000 0.108563 0.958930 0.959291 0.959111 0.974888 |
|
1600 0.056000 0.101031 0.957784 0.957784 0.957784 0.976262 |
|
1700 0.056000 0.099654 0.960135 0.962307 0.961220 0.978386 |
|
1800 0.056000 0.106387 0.954118 0.956276 0.955196 0.975512 |
|
1900 0.056000 0.096317 0.953846 0.958161 0.955998 0.975762 |
|
2000 0.040000 0.094224 0.959444 0.963061 0.961249 0.977761 |
|
2100 0.040000 0.110398 0.956669 0.957030 0.956849 0.975262 |
|
2200 0.040000 0.096151 0.955706 0.959668 0.957683 0.977386 |
|
2300 0.040000 0.108148 0.945149 0.954768 0.949934 0.974513 |
|
2400 0.040000 0.109966 0.950991 0.958161 0.954563 0.976637 |
|
2500 0.030900 0.117515 0.947921 0.953637 0.950770 0.973888 |
|
|