ibm-granite/granite-3.1-8b-instruct · RE-ADD float32 please!!!

A lot of use prefer to use float32 versions of these kinds of models. The reason is, if a model is originally in float32 we can convert at runtime (e.g. using bitsandbytes) into either float16 or bfloat16 depending on whether a user has cuda compute level 8.6 or above (as required for bfloat16). Also, having it in float32 allows users to run it on a CPU natively rather than having to convert at runtime a bfloat16 model to float32 (as vanilla CPU usage requires float32).

Anytime a model is converted from bfloat16/float16 to float32 there is accuracy loss...Likewise, if a model is converted to float16 from bfloat16 (and vice versa) there is accuracy loss since the precisions have a different floating point format...

In summary...there are still use cases for keeping the model in float32. What I'm asking for is:

(1) Keep float32 versions of things on the repository for this model and any other Granite models...this is IN ADDITION to float16, bfloat16 or whatever other precisions you want to have...

(2) This allows people to download specific versions that they want to use.

Hi @ctranslate2-4you , thanks for the feedback. We chose BF16 for all Granite models as the best tradeoff of size and precision that can be further converted/quantized for other dtypes while not requiring significant amounts of bandwidth and disk space to download and run locally. While your comment is certainly correct that publishing BF16 limits some use cases, the decision has been made to stick with it and avoid the complexity of publishing multiple sets of weight files for each model (and managing the synchronization between them).

For others finding this, if there is significant interest in full F32 precision checkpoints, we can certainly revisit, but for now there are no plans to post weights in both precisions.