Text Generation
Transformers
PyTorch
Safetensors
English
hf_olmo
custom_code

Memory crash in Google Colab free tier

#5
by sudhir2016 - opened

Tried this code
import hf_olmo
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch.float16, load_in_4bit=True,device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B")
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)

optional verifying cuda

inputs = {k: v.to('cuda') for k,v in inputs.items()}
olmo = olmo.to('cuda')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Got memory crash in free tier colab due to low RAM.

Yeah 7B will be hard with 12-16 GB of memory. Maybe try the 1B model? https://huggingface.co/allenai/OLMo-1B

natolambert changed discussion status to closed

Sign up or log in to comment