using quants with pipeline
#1
by
supercharge19
- opened
Is it possible to use a quantized version of model through huggingface (transformers') pipeline? or can a model be loaded as int4 or even fp4 (instead of fp16 as this model) through pipeline? How will model behave if done so this way, how much accuracy/output degrade with quantization through pipeline (if possible)?