Why FP32?
#10
by
imoc
- opened
On this size I doubt there is any visible performance degradation even on FP8. FP32 took too long to download and store, switch "back" to B/FP16 maybe?
Sorry for that, the training was done this way and I forgot to convert it to bf16, seems too late given the size being so large! π
Nice I thought there might be some special reasons.
imoc
changed discussion status to
closed