Why so large?

#12
by takeraparterer - opened

This isn't even a 3b model, why is it 10GB???

Google org
edited Aug 1

Hi there! This is a 2.6B parameter model, so storing the weights with bfloat16 precision (like it currently is) would result in a 2.6B parameters * 2 bytes per parameter = ~5.2GB, as seen in the files tab:

image.png

When loading your weights, be sure to set torch_dtype=torch.bfloat16, otherwise you will see it take 10GB of RAM/VRAM.

takeraparterer changed discussion status to closed

Sign up or log in to comment