using quants with pipeline

by supercharge19 - opened Feb 15

Feb 15

Is it possible to use a quantized version of model through huggingface (transformers') pipeline? or can a model be loaded as int4 or even fp4 (instead of fp16 as this model) through pipeline? How will model behave if done so this way, how much accuracy/output degrade with quantization through pipeline (if possible)?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment