.pt version uses 2gb less VRAM for me than the non-groupsized .safetensors

#10

by Monero - opened Mar 31, 2023

Mar 31, 2023

I'm using KoboldAI with a RX 6800xt and Vega 64 combined for 24gb VRAM on Linux Mint

I've noticed the safetensors versions use significantly higher VRAM compared to the .ckpt version.

In comparison, the same prompt and context tokens for the 30b-int4.pt model totals 22,061MB and for the no-groupsize safetensors it uses 24146MB

Is there anyway to quantize the new one so that it's not using as much VRAM as the new ones?

elinas

Owner Mar 31, 2023

I'm not sure how that's the case as I was maxing out my VRAM with the original version at max context. I have not used Kobold AI in a while now since I had not heard of them supporting 4bit (or were working on implementing it)? Maybe some of the model is being offloaded to swap, honestly I have no clue.

Juanps90

Apr 10, 2023

I'm not sure how that's the case as I was maxing out my VRAM with the original version at max context. I have not used Kobold AI in a while now since I had not heard of them supporting 4bit (or were working on implementing it)? Maybe some of the model is being offloaded to swap, honestly I have no clue.

Hello, elinas, and thank you very much for your work. When you say maxing out your VRAM at max content do you mean a 24GiB card? Because I'm having issues and needing to limit tokens to about 1k for it to fit in my 3090. Can you explain how you are using the model? I am using the johnsmit0031 repo.

Thank you very much.

elinas

Owner Apr 12, 2023

Can you explain how you are using the model? I am using the johnsmit0031 repo.

I am not really familiar with that repo (other than it promises 4bit training and loras) and only have 2 official options, 3 if you use the KoboldAI 4bit version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment