Why so large?
#12
by
takeraparterer
- opened
This isn't even a 3b model, why is it 10GB???
Hi there! This is a 2.6B parameter model, so storing the weights with bfloat16 precision (like it currently is) would result in a 2.6B parameters * 2 bytes per parameter = ~5.2GB
, as seen in the files tab:
When loading your weights, be sure to set torch_dtype=torch.bfloat16
, otherwise you will see it take 10GB of RAM/VRAM.
thanks
takeraparterer
changed discussion status to
closed