Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq

HF model (16bit?)

#6
by bdambrosio - opened

I know, everyone wants everything. But I'm running LLama-2-70B from HF16 (using Dettmer bitsandbytes 4 bit) and it works wonderfully. Love to try this the same way.

Good to hear. I didn't release an fp16 because I already linked to the original Stability AI model:

image.png

But now I look at it again, I realise it's actually in fp32. Is that what you meant? You'd like an fp16 to save the disk space of downloading their fp32? Because I think you can load an fp32 with bitsandbytes just like an fp16?

For now I've updated my README to reflect the fact that it's actually fp32, not fp16

Ok, thanks. I'm downloading the 32bit now. It's huge, don't understand enough to know whether or not it will be a problem loading it (I only have 128GB ram, 48GB vram), but we'll see.

bdambrosio changed discussion status to closed

I think it should load fine, as it'll still be in 4bit after the conversion. The only issue is having to store twice as much data on disk as with an fp16, and wait longer for it to download.

I will see about making an fp16. I'll try PRing it to them first rather than make my own, and if they don't want it I'll release it

And thanks very much for the Patreon subscription!

You are such a massive resource for the OS LLM community. How could I not!
If I can load the 32 bit I'll let you know, still downloading over 1GB+ fiber

Ok, load of 32bit was no problem! Yay. tnx

Sign up or log in to comment