type mismatch

#6
by michaelfeil - opened

type: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5" specifies float32 as torch_dtype. The qwen models are set in bfloat16.

Most frameworks will autocrats float32 to float16. This will likely reduce quality, as the qwen models are sensitive to bf16.

HITsz-Text Machine Group org

type: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5" specifies float32 as torch_dtype. The qwen models are set in bfloat16.

Thank you for your meticulous observations and suggestions.
We have identified that the inconsistency in torch_dtype within config.json originates from the function used to export the sentence_transformer in the training framework.
In other words, although our model was trained using bfloat16, the parameters were inadvertently saved as float32. However, specifying torch_dtype during inference loading should be a viable solution.

We will promptly verify the potential impact of this torch_dtype on the model's performance to determine whether modifications to the repository are necessary.

YanshekWoo pinned discussion

E.g. trt-llm will autocast the following rules:
-> bfloat16 -> bfloat16
-> float16 -> float16
-> float32 -> float16

A manual configuration is not feasible, or often ignored.

HITsz-Text Machine Group org

E.g. trt-llm will autocast the following rules:
-> bfloat16 -> bfloat16
-> float16 -> float16
-> float32 -> float16

A manual configuration is not feasible, or often ignored.

Thank you for your sharing.
For other frameworks, such as sentence-transformers, we will need to conduct comprehensive testing when we have the time.
At present, it seems that this level of precision does not significantly affect the performance.

HITsz-Text Machine Group org

Comparison: BF16 vs FP32 on CMTEB

We conducted rapid benchmarking of sentence-transformers (via transformers) on CMTEB with different precision settings. The model was loaded using:

model = SentenceTransformer(model_name_or_path, model_kwargs={"torch_dtype": "bfloat16"})

Results
Minimal performance difference between BF16 and FP32:

Data Type Retrieval STS PairClassification Classification Reranking Clustering Avg
FP32 70.11 51.57 72.94 70.94 64.38 57.32 64.13
BF16 70.14 51.58 72.94 70.90 64.30 57.31 64.12

Conclusion
However, the precision of the data significantly impacts computational efficiency.
We are considering whether to push an update for a new version or repository to store model parameters with bf16 precision. This would help avoid unnecessary additional inference costs that may arise if users overlook manual settings.

Cool, thanks! I just wanted to clarify that most frameworks (trt-llm/vllm/sglang) will likely "autocast" float32 to float16 (and not bfloat16) for performance reasons.

Sign up or log in to comment