Smaller version for Home User GPU's

by apcameron - opened Dec 26, 2024

Discussion

apcameron

Dec 26, 2024

Are you planning to release a smaller version of V3 that could run on a 24GB GPU?

breadlicker45

Dec 26, 2024

a 70b model would be pretty good

AnA202

Dec 26, 2024

a 70b model would be pretty good

that something that most home can't run haha
maybe 32 MoE model

AuriAetherwiing

Dec 27, 2024

+1
Would love to see DeepSeek-V3-Lite
16B or 27B version would be just wonderful to have

deleted

Dec 27, 2024

This comment has been hidden

XJF2332

Dec 28, 2024

+1 here
16b would be fine, just like deepseek v2 lite

deleted

Dec 28, 2024

This comment has been hidden

breadlicker45

Dec 28, 2024

•

edited Dec 28, 2024

a 70b model would be pretty good

that something that most home can't run haha
maybe 32 MoE model

i was saying 70b model made up of 500-1500m parm parts/smaller models. this would have 140-40 parts. this would only use around 235-350GB of VRAM. but you could quantize it to fp4 and it would probably use half of it

krustik

Jan 15

•

edited Jan 15

a 70b model would be pretty good

that something that most home can't run haha
maybe 32 MoE model

i was saying 70b model made up of 500-1500m parm parts/smaller models. this would have 140-40 parts. this would only use around 235-350GB of VRAM. but you could quantize it to fp4 and it would probably use half of it

Isn't that kinda useless? It would make it dumber literally and equal to such models like Qwen or Mistral Large(122B-uses 130Gb RAM in Q8), which in my tests are really bad with coding. 70billions models uses near 90Gb Ram in best Q8 GGUF (usually RAM=storage size+10%).
The only best coding performance i've seen is Deepseek V2.5-1210, 235billions which in Q8 uses 277Gb RAM. Going lower in quantization or size removes the logical point.
V3 in Q5-K-M uses 502Gb RAM, i'm testing it right now. Q5 degraded it and coding performance here worse than in V2.5-Q8-235B, but it's good model, on Chatgpt 4 level.

If you want to run THAT - FORGET consumer hardware, i'm running V3 on 10 years old Gigabyte server board with 12 RAM slots super cheaply(except proper CPU fan for such still $100, enterprise "tax"), and i looked the latest board presentations on CES right now - they offering max 256GB for consumers, yes there's new records achieved in speed of DDR5 RAM, but they literally showing that consumer hardware would not be made with any reserves for future (you can try to fit V2.5 in worse Q6 quality on that new motherboard, not Q8-i've tested on motherboard with 256 RAM, not enough RAM). That 12 ram boards or great CPUs with 100 cores is the enterprise toys, you can search used ones. Forget about consumer hardware maybe except GPUs, it can offload some work.
Consumer market need COMPLETELY NEW hardware - NVMe for even loading such models near half terabyte are already slow, PCI-e bandwith speed very slow and main bottleneck in using CPU+GPU combo (no wonder Nvidia uses optical lines for that), server RAM in LRDIMMs are incredibly hot (can easily get 90 C), there's many problems and i don't see that manufacturers even want to resolve them. No access for these to ordinary.

breadlicker45

Jan 16

•

edited Jan 16

a 70b model would be pretty good

that something that most home can't run haha
maybe 32 MoE model

i was saying 70b model made up of 500-1500m parm parts/smaller models. this would have 140-40 parts. this would only use around 235-350GB of VRAM. but you could quantize it to fp4 and it would probably use half of it

Isn't that kinda useless? It would make it dumber literally and equal to such models like Qwen or Mistral Large(122B-uses 130Gb RAM in Q8), which in my tests are really bad with coding. 70billions models uses near 90Gb Ram in best Q8 GGUF (usually RAM=storage size+10%).
The only best coding performance i've seen is Deepseek V2.5-1210, 235billions which in Q8 uses 277Gb RAM. Going lower in quantization or size removes the logical point.
V3 in Q5-K-M uses 502Gb RAM, i'm testing it right now. Q5 degraded it and coding performance here worse than in V2.5-Q8-235B, but it's good model, on Chatgpt 4 level.

If you want to run THAT - FORGET consumer hardware, i'm running V3 on 10 years old Gigabyte server board with 12 RAM slots super cheaply(except proper CPU fan for such still $100, enterprise "tax"), and i looked the latest board presentations on CES right now - they offering max 256GB for consumers, yes there's new records achieved in speed of DDR5 RAM, but they literally showing that consumer hardware would not be made with any reserves for future (you can try to fit V2.5 in worse Q6 quality on that new motherboard, not Q8-i've tested on motherboard with 256 RAM, not enough RAM). That 12 ram boards or great CPUs with 100 cores is the enterprise toys, you can search used ones. Forget about consumer hardware maybe except GPUs, it can offload some work.
Consumer market need COMPLETELY NEW hardware - NVMe for even loading such models near half terabyte are already slow, PCI-e bandwith speed very slow and main bottleneck in using CPU+GPU combo (no wonder Nvidia uses optical lines for that), server RAM in LRDIMMs are incredibly hot (can easily get 90 C), there's many problems and i don't see that manufacturers even want to resolve them. No access for these to ordinary.

what about DCPMM rdimm? a lot of ram (128-256gb) with not a lot of cost and is ddr4 and works with the lenovo p520. have you thought about that?. i also use a lenovo p520 as my workstation with a p40 gpu

krustik

Jan 20

what about DCPMM rdimm? a lot of ram (128-256gb) with not a lot of cost and is ddr4 and works with the lenovo p520. have you thought about that?. i also use a lenovo p520 as my workstation with a p40 gpu

Yes, i'm using myself DDR4 in enterprise Gigabyte motherboard, 576GB total rn. Considering GPU offloading, it's notorius bottleneck of PCI-e, for now good 20 cores Xeon CPU is enough for me, i can wait LLM writing. GPU i mostly using only in ComfyUI.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment