type mismatch

pinned

by michaelfeil - opened 3 days ago

3 days ago

type: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5" specifies float32 as torch_dtype. The qwen models are set in bfloat16.

Most frameworks will autocrats float32 to float16. This will likely reduce quality, as the qwen models are sensitive to bf16.

YanshekWoo

HITsz-Text Machine Group org 3 days ago

type: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5" specifies float32 as torch_dtype. The qwen models are set in bfloat16.

Thank you for your meticulous observations and suggestions.
We have identified that the inconsistency in torch_dtype within config.json originates from the function used to export the sentence_transformer in the training framework.
In other words, although our model was trained using bfloat16, the parameters were inadvertently saved as float32. However, specifying torch_dtype during inference loading should be a viable solution.

We will promptly verify the potential impact of this torch_dtype on the model's performance to determine whether modifications to the repository are necessary.

YanshekWoo pinned discussion 3 days ago

michaelfeil

2 days ago

E.g. trt-llm will autocast the following rules:
-> bfloat16 -> bfloat16
-> float16 -> float16
-> float32 -> float16

A manual configuration is not feasible, or often ignored.

YanshekWoo

HITsz-Text Machine Group org 1 day ago

E.g. trt-llm will autocast the following rules:
-> bfloat16 -> bfloat16
-> float16 -> float16
-> float32 -> float16

A manual configuration is not feasible, or often ignored.

Thank you for your sharing.
For other frameworks, such as sentence-transformers, we will need to conduct comprehensive testing when we have the time.
At present, it seems that this level of precision does not significantly affect the performance.

YanshekWoo

HITsz-Text Machine Group org 1 day ago

Comparison: BF16 vs FP32 on CMTEB

We conducted rapid benchmarking of sentence-transformers (via transformers) on CMTEB with different precision settings. The model was loaded using:

model = SentenceTransformer(model_name_or_path, model_kwargs={"torch_dtype": "bfloat16"})

Results
Minimal performance difference between BF16 and FP32:

Data Type	Retrieval	STS	PairClassification	Classification	Reranking	Clustering	Avg
FP32	70.11	51.57	72.94	70.94	64.38	57.32	64.13
BF16	70.14	51.58	72.94	70.90	64.30	57.31	64.12

Conclusion
However, the precision of the data significantly impacts computational efficiency.
We are considering whether to push an update for a new version or repository to store model parameters with bf16 precision. This would help avoid unnecessary additional inference costs that may arise if users overlook manual settings.

michaelfeil

about 21 hours ago

Cool, thanks! I just wanted to clarify that most frameworks (trt-llm/vllm/sglang) will likely "autocast" float32 to float16 (and not bfloat16) for performance reasons.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment