Getting different results for the same examples provided in sample

#17
by sramakintel - opened

I tried the code implementation using sentence transformers, using the exact same queries and docs inputs, but my results are very different. I am running this on cpu (so removed .cudaand removed trust_remote_code=True during model download because it expects CUDA paths)

tensor([[0.3364, 0.2758],
[0.2444, 0.2929]])

Hello!

The trust_remote_code=True is still required to run this code: https://huggingface.co/dunzhang/stella_en_1.5B_v5/blob/main/modeling_qwen.py instead of the default code for models with the Qwen architecture.
You should get equivalent results when you re-enable that option.

  • Tom Aarsen

thanks for the response. So what I understand is that the sample example would need GPUs to get the desired result. Is that correct?

No, my apologies. The snippet from the model card without .cuda() should give the desired results on CPU.

Edit: I just realised that perhaps the custom modeling code does not work on CPU, due to the flash-attn requirements.

  • Tom Aarsen

yes flash-attn is not supported for CPUs which is a requirement even for the model card sample.

Sign up or log in to comment