Smaller version for Home User GPU's
Are you planning to release a smaller version of V3 that could run on a 24GB GPU?
a 70b model would be pretty good
a 70b model would be pretty good
that something that most home can't run haha
maybe 32 MoE model
+1
Would love to see DeepSeek-V3-Lite
16B or 27B version would be just wonderful to have
+1 here
16b would be fine, just like deepseek v2 lite
a 70b model would be pretty good
that something that most home can't run haha
maybe 32 MoE model
i was saying 70b model made up of 500-1500m parm parts/smaller models. this would have 140-40 parts. this would only use around 235-350GB of VRAM. but you could quantize it to fp4 and it would probably use half of it
a 70b model would be pretty good
that something that most home can't run haha
maybe 32 MoE modeli was saying 70b model made up of 500-1500m parm parts/smaller models. this would have 140-40 parts. this would only use around 235-350GB of VRAM. but you could quantize it to fp4 and it would probably use half of it
Isn't that kinda useless? It would make it dumber literally and equal to such models like Qwen or Mistral Large(122B-uses 130Gb RAM in Q8), which in my tests are really bad with coding. 70billions models uses near 90Gb Ram in best Q8 GGUF (usually RAM=storage size+10%).
The only best coding performance i've seen is Deepseek V2.5-1210, 235billions which in Q8 uses 277Gb RAM. Going lower in quantization or size removes the logical point.
V3 in Q5-K-M uses 502Gb RAM, i'm testing it right now. Q5 degraded it and coding performance here worse than in V2.5-Q8-235B, but it's good model, on Chatgpt 4 level.
If you want to run THAT - FORGET consumer hardware, i'm running V3 on 10 years old Gigabyte server board with 12 RAM slots super cheaply(except proper CPU fan for such still $100, enterprise "tax"), and i looked the latest board presentations on CES right now - they offering max 256GB for consumers, yes there's new records achieved in speed of DDR5 RAM, but they literally showing that consumer hardware would not be made with any reserves for future (you can try to fit V2.5 in worse Q6 quality on that new motherboard, not Q8-i've tested on motherboard with 256 RAM, not enough RAM). That 12 ram boards or great CPUs with 100 cores is the enterprise toys, you can search used ones. Forget about consumer hardware maybe except GPUs, it can offload some work.
Consumer market need COMPLETELY NEW hardware - NVMe for even loading such models near half terabyte are already slow, PCI-e bandwith speed very slow and main bottleneck in using CPU+GPU combo (no wonder Nvidia uses optical lines for that), server RAM in LRDIMMs are incredibly hot (can easily get 90 C), there's many problems and i don't see that manufacturers even want to resolve them. No access for these to ordinary.