georgechang8/bge-m3-int8q-jit · How to quantize bge-m3 model

xiaomeng12

Nov 7, 2024

Hello, I want to quantize the bge-m3 model, could you tell me some details about your method ? Thank you.

cqchangm

Nov 13, 2024

•

edited Nov 13, 2024

Sorry, I was just trying out torch2's quant and trace, I have not the slightest idea that this even works, but here you go for the procedure:

import torch
model = AutoModel... # model from HF
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
dummy_input = (encoded_input['input_ids'], encoded_input['attention_mask'])
traced_model = torch.jit.trace(quantized_model, dummy_input, strict=False)

If you want, here's one that I know works, which is based on multilingual-e5-small, optimized and quantized using optimum: https://huggingface.co/georgechang8/multilingual-e5-small-onnx-opt-q/tree/main
IIRC the same procedure does not work on bge-m3 unfortunately.

xiaomeng12

Nov 22, 2024

Thanks for your help.

xiaomeng12 changed discussion status to closed Nov 22, 2024