MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3-32k-GGUF · The f16 with 32k ctx fits nicely in 24GB VRAM

The f16 with 32k ctx fits nicely in 24GB VRAM

by ubergarm - opened Apr 27, 2024

Apr 27, 2024

Just what the title says. In early limited testing the f16 model at full 32k ctx fully offloads on my 3090TI with 24GB VRAM providing ~50tok/sec inference. Most impressive 8B I've personally tried so far!

MaziyarPanahi

Owner Apr 27, 2024

This is great! I wouldn't have thought the 32k in f16 could fit in 25GB VRAM! Thanks for sharing, it helps others to calculate how much they can offload to the GPU.

supercharge19

May 1, 2024

If the model unaligned then it is probably the best model, because I tried dolphin-2.9 and that is horribly slow, normally my machine outputs 2.3 tokens per second (on CPU), but that was doing .3 tokens per second with 4k quants and context size was also not very long (just 512 or perhaps even shorter) and yet out putting tokens at slowest rate. I hope this one would be normal or even better for speed.

MaziyarPanahi

Owner May 1, 2024

@supercharge19 out of curiosity since you seem more knowledgeable on the matter, isn't dolphin-2.9 also a fine-tuned model on Llama-3-8B? Is it possible to have different speed from fine-tunes of the same model? (chat template, fine-tune technique, etc.)

supercharge19

May 1, 2024

Don't be humble please, though dolphin-2.9 is indeed is finetuned on the same model (llama-3-8b) but I heard (for mistral) some fine tuning rendered models slower than original (slower than original, or at least context window got shorter i.e. quality suffered for same context length) and I don't think any other model (like llama based ones) would be any different.

Downloaded this one, only will test and try and return with results later.

supercharge19

May 1, 2024

Speed good @2.23 tokens per second but generation quality sucks (actually there is a minor fault, opening separate discussion for that now).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment