glm-4v-9b自己call forward的时候cuda out of memory
@zRzRzRzRzRzRzR 作者您好,我这边正常用demo里的outputs = self.basemodel.generate(**inputs, **gen_kwargs)时一切正常,但是自己call output = self.base_model.forward(input_ids=tokens, images=image_tensor, return_dict=True)时会CUDA out of memory。我想把output里的logits提取出来所以的话还有别的什么方法吗?gpu显存是48gb
有没有完整的测试代码呢?我好复现一下
@zRzRzRzRzRzRzR 您好,测试代码如下(主要就是call forward):
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer, BitsAndBytesConfig
from huggingface_hub import login
login(token="your token")
import os
os.environ["TRANSFORMERS_CACHE"] = "enter if needed"
device = "cuda:3" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"THUDM/glm-4v-9b",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device)
image = Image.open("test2.jpg").convert('RGB')
image_tensor = tokenizer.apply_chat_template([{"role": "user", "image": image}],
add_generation_prompt=True, tokenize=True, return_tensors="pt",
return_dict=True)["images"]
prompt = "hello"
prompt_tokens = tokenizer.encode(prompt)
prompt_tokens = torch.tensor([prompt_tokens], device=device)
image_tensor_to_update = image_tensor.clone().detach().requires_grad_(True)
with torch.cuda.amp.autocast():
tokens = prompt_tokens.to(device)
image_tensor = image_tensor.to(device)
output = model.forward(
input_ids=prompt_tokens,
images=image_tensor,
return_dict=True
)
logits = output.logits
@zRzRzRzRzRzRzR 您好,我这边还是没有解决OOM的问题,但我在forward外面套一个no_grad就没OOM了,问题应该在grad上面
@zRzRzRzRzRzRzR 我这边又试了一下A100,也还是oom
我这边已经调通了模型的微调,4v-9b全参激活的话肯定80G不够啊
@zRzRzRzRzRzRzR 感谢解答!不过我这边不需要全参的gradients。只需要input的gradients的话要怎么改code?