Colab is getting crashed due to Memory while accessing Llama-3-8B
Hello Brilliant Minds,
I am trying to access Llama-3-8B using Google Colab & transformers with accelerate, but every time it is getting crashed giving error of RAM fully utilized, I have used T4-GPU as hardware accelerator but no success, can anyone please suggest something. I am able to access it through Ollama though locally.
Regards,
Fellow Learner
can you share what your resources look like along w/ any errors the system spits out?
@sayanroy07
try quantization
It worked for me when i used the bitsandbytes library on T4
can you share what your resources look like along w/ any errors the system spits out?
Hi this has been resolved now, just had to re-start the kernal & colab session, it worked now. thanks all
@sayanroy07 try quantization
It worked for me when i used the bitsandbytes library on T4
Hi this has been resolved now, just had to re-start the kernal & colab session, it worked now. thanks all
Happens to me too, restarting the kernel doesn't work. I try another model mistralai/Mistral-7B-Instruct-v0.3 which also crashes Google Colab. Both models work locally using Ollama.
@ceefour here's an example of how i made it work on colab
!pip install -q accelerate bitsandbytes
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")