百川大模型本地量化部署的问题

#34

by Jason123321123 - opened Feb 18

Feb 18

raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

Jason123321123

Feb 18

部署后运行便报这个错误，错误的核心在于模型尝试加载到GPU上，但是因为GPU的内存不足，一些模块被分配到了CPU或磁盘上。这通常发生在尝试加载一个非常大的模型，而GPU没有足够的RAM来完全容纳模型时。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment