NF4 model inference

#8
by Melody32768 - opened

Hello, for the nf4 model, I see nf4 quantization in the weights section, will the input be quantized to nf4 when inferring the model? Is fp4 the same?

Sign up or log in to comment