Why FP32?

by imoc - opened

On this size I doubt there is any visible performance degradation even on FP8. FP32 took too long to download and store, switch "back" to B/FP16 maybe?

Sorry for that, the training was done this way and I forgot to convert it to bf16, seems too late given the size being so large! πŸ˜†

Nice I thought there might be some special reasons.

imoc changed discussion status to closed

Sign up or log in to comment