Does this model only work on GPU?

#16

by xPurity - opened Jul 26, 2024

Jul 26, 2024

Hi,
i am fairly new to experimenting with embedding models on my RAG app. Until now i never had problems with swapping out the model for re-indexing my documents, however i am restricted to a device with CPU only. I can't get this model to work on my device so i just wanted to ask if this is possible?
I'm using the sentence transformer python package and try to call the model without the cuda() method however this results in an error when i try to install the package flash attn.

tDudu78

Jul 26, 2024

Hi,
I have tried the onnx and quantized versions which use the onnxruntime package and works on CPU only device, however it is very slow because the tokenizer batch every line to a length of 512 even if it's not that size, so you'll have to change this in the tokenizer :
"strategy":"BatchLongest"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment