想请教一下加载glm需要多少cuda memory

#11
by FearandDreams - opened

如题,自己在用A5000有24g显存,确定无占用的情况下在“model = AutoModelForCausalLM.from_pretrained(......)......”这里遇到CUDA out of memory

reference:https://github.com/THUDM/GLM-4/blob/main/basic_demo/README.md#glm-4v-9b
the model requires at least 28GB of VRAM when using bf16.

btw, the model name "glm-4v-9b" may be misleading, as it suggests that the model has only 9B parameters(but actually 13B refer to https://github.com/THUDM/CogVLM2?tab=readme-ov-file#recent-updates)

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

需要使用28G显存,您可以使用load in 4bit加载哦

请问怎么用4bit加载?需要修改哪里呢?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
low_cpu_mem_usage=True
).eval()

感谢! 但之后有遇到报错:return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same

如果是单机多卡,代码需要如何修改?我们的设备上有两张卡,每张卡22GiB,跑的时候也是报错:Out of Memory,发现它只使用了一张卡

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

最新的代码已经支持了,auto自动分布

zRzRzRzRzRzRzR changed discussion status to closed

Sign up or log in to comment