How to quantize bge-m3 model
#1
by
xiaomeng12
- opened
Hello, I want to quantize the bge-m3 model, could you tell me some details about your method ? Thank you.
Sorry, I was just trying out torch2's quant and trace, I have not the slightest idea that this even works, but here you go for the procedure:
import torch
model = AutoModel... # model from HF
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
dummy_input = (encoded_input['input_ids'], encoded_input['attention_mask'])
traced_model = torch.jit.trace(quantized_model, dummy_input, strict=False)
If you want, here's one that I know works, which is based on multilingual-e5-small, optimized and quantized using optimum: https://huggingface.co/georgechang8/multilingual-e5-small-onnx-opt-q/tree/main
IIRC the same procedure does not work on bge-m3 unfortunately.
Thanks for your help.
xiaomeng12
changed discussion status to
closed