run model in colab using 8 bit

#8
by kabalanresearch - opened

Im trying to run the model using the 8 bit library

model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto",torch_dtype=torch.bfloat16, load_in_8bit=True)

the model gets loaded and returns output, but the return value is some kind of gibberish,
did some one have success with the 8 bit library ?

Google org

This is expected as float16 does not work either on this model. We are investigating this!

Also, note that this happens only for xxl model, for other models the int8 quantization works as expected

I tested the xl one using float16and int8and it does not work as expected (gibberish). However, it works like a charm in fp32

@mrm8488 can you pls post your model config

@mrm8488 can you pls post your model config

It is the config you can find in the repo: https://huggingface.co/google/flan-t5-xl/blob/main/config.json

Anyone here able to run Flan-T5-XL on colab? I tried 8bit and got junk results.

can you try with the recent release of transformers pip install -U transformers + use 4bit instead (just pass load_in_4bit=True)

Sign up or log in to comment