Colab is getting crashed due to Memory while accessing Llama-3-8B

#50
by sayanroy07 - opened

Hello Brilliant Minds,

I am trying to access Llama-3-8B using Google Colab & transformers with accelerate, but every time it is getting crashed giving error of RAM fully utilized, I have used T4-GPU as hardware accelerator but no success, can anyone please suggest something. I am able to access it through Ollama though locally.

image.png

Regards,
Fellow Learner

can you share what your resources look like along w/ any errors the system spits out?

@sayanroy07 try quantization
It worked for me when i used the bitsandbytes library on T4

can you share what your resources look like along w/ any errors the system spits out?

Hi this has been resolved now, just had to re-start the kernal & colab session, it worked now. thanks all

sayanroy07 changed discussion status to closed

@sayanroy07 try quantization
It worked for me when i used the bitsandbytes library on T4

Hi this has been resolved now, just had to re-start the kernal & colab session, it worked now. thanks all

Happens to me too, restarting the kernel doesn't work. I try another model mistralai/Mistral-7B-Instruct-v0.3 which also crashes Google Colab. Both models work locally using Ollama.

@ceefour here's an example of how i made it work on colab

!pip install -q accelerate bitsandbytes 
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

Thank you @not-lain for your tip! I will try it.

Sign up or log in to comment