Memory leak.
#2
by
Yurkoff
- opened
That's a code problem, nothing to do with the model itself. And no-one can help with that without seeing the code being run.
My code:
import torch
from transformers import LlamaTokenizerFast, LlamaForCausalLM
tokenizer = LlamaTokenizerFast.from_pretrained(model_dir)
model = LlamaForCausalLM.from_pretrained(model_dir,
load_in_8bit=True,
device_map='sequential',
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
inputs = self.tokenizer(prompts)
output_ids = self.model.generate(torch.as_tensor(inputs.input_ids).to(self.device),
do_sample=True,
temperature=0.8,
max_new_tokens=512,
top_p=0.95,
# synced_gpus=True,
)
results = self.tokenizer.batch_decode(output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False)[0]
Virsions of my packeges:
torch==2.0.1+cu118; sys_platform == 'linux'
torchvision==0.15.2+cu118; sys_platform == 'linux'
torchtext==0.15.2; sys_platform == 'linux'
torchaudio==2.0.2+cu118; sys_platform == 'linux'
psutil==5.9.5
requests==2.31.0
captum==0.6.0
packaging==23.1
pynvml==11.4.1
pyyaml==6.0
nvgpu
cython==0.29.34
wheel==0.40.0
pillow==9.3.0
numpy==1.24.3
torchtext==0.15.2
torchserve==0.7.1
torch-model-archiver==0.7.1
transformers==4.31.0
tokenizers==0.13.3
sentencepiece==0.1.99
bitsandbytes==0.41.1
accelerate==0.21.0
scipy==1.10.1
I solved problem. After each inference i call
gc.collect()
torch.cuda.empty_cache()
Yurkoff
changed discussion status to
closed