想请教一下加载glm需要多少cuda memory

#11

by FearandDreams - opened about 1 month ago

Discussion

FearandDreams

about 1 month ago

如题，自己在用A5000有24g显存，确定无占用的情况下在“model = AutoModelForCausalLM.from_pretrained(......)......”这里遇到CUDA out of memory

tungloong

about 1 month ago

reference：https://github.com/THUDM/GLM-4/blob/main/basic_demo/README.md#glm-4v-9b
the model requires at least 28GB of VRAM when using bf16.

btw, the model name "glm-4v-9b" may be misleading, as it suggests that the model has only 9B parameters(but actually 13B refer to https://github.com/THUDM/CogVLM2?tab=readme-ov-file#recent-updates)

FearandDreams

about 1 month ago

Thanks!

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org about 1 month ago

需要使用28G显存，您可以使用load in 4bit加载哦

FearandDreams

about 1 month ago

请问怎么用4bit加载？需要修改哪里呢？

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org about 1 month ago

model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
low_cpu_mem_usage=True
).eval()

FearandDreams

about 1 month ago

感谢！但之后有遇到报错：return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same

zhaoyang0618

about 1 month ago

如果是单机多卡，代码需要如何修改？我们的设备上有两张卡，每张卡22GiB，跑的时候也是报错：Out of Memory，发现它只使用了一张卡

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org 28 days ago

最新的代码已经支持了，auto自动分布

zRzRzRzRzRzRzR changed discussion status to closed 28 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment