NF4 model inference
#8
by
Melody32768
- opened
Hello, for the nf4 model, I see nf4 quantization in the weights section, will the input be quantized to nf4 when inferring the model? Is fp4 the same?
Hello, for the nf4 model, I see nf4 quantization in the weights section, will the input be quantized to nf4 when inferring the model? Is fp4 the same?