HueyNemud
model data
bdd207b
***** Running training *****
Num examples = 6004
Num Epochs = 14
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 5000
[2500/5000 12:15 < 12:15, 3.40 it/s, Epoch 6/14]
Step Training Loss Validation Loss Precision Recall F1 Accuracy
100 No log 0.247325 0.912333 0.925744 0.918990 0.960895
200 No log 0.171694 0.930514 0.928760 0.929636 0.963143
300 No log 0.132045 0.935375 0.943837 0.939587 0.970515
400 No log 0.142074 0.936490 0.939314 0.937900 0.968141
500 0.245500 0.105783 0.949794 0.955522 0.952649 0.975887
600 0.245500 0.107380 0.948120 0.950622 0.949369 0.973138
700 0.245500 0.111011 0.951504 0.954014 0.952757 0.972889
800 0.245500 0.093002 0.947999 0.955145 0.951558 0.975387
900 0.245500 0.100926 0.956193 0.954391 0.955291 0.976262
1000 0.086800 0.090775 0.955263 0.957784 0.956522 0.976637
1100 0.086800 0.099250 0.953829 0.957784 0.955802 0.976137
1200 0.086800 0.088502 0.952327 0.956276 0.954298 0.976762
1300 0.086800 0.094135 0.957078 0.958161 0.957619 0.977011
1400 0.086800 0.099687 0.957768 0.957407 0.957587 0.975887
1500 0.056000 0.108563 0.958930 0.959291 0.959111 0.974888
1600 0.056000 0.101031 0.957784 0.957784 0.957784 0.976262
1700 0.056000 0.099654 0.960135 0.962307 0.961220 0.978386
1800 0.056000 0.106387 0.954118 0.956276 0.955196 0.975512
1900 0.056000 0.096317 0.953846 0.958161 0.955998 0.975762
2000 0.040000 0.094224 0.959444 0.963061 0.961249 0.977761
2100 0.040000 0.110398 0.956669 0.957030 0.956849 0.975262
2200 0.040000 0.096151 0.955706 0.959668 0.957683 0.977386
2300 0.040000 0.108148 0.945149 0.954768 0.949934 0.974513
2400 0.040000 0.109966 0.950991 0.958161 0.954563 0.976637
2500 0.030900 0.117515 0.947921 0.953637 0.950770 0.973888