Text Generation
GGUF
English
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
science fiction
romance
all genres
story
writing
vivid prose
vivid writing
fiction
roleplaying
bfloat16
swearing
rp
llama3
enhanced quants
max quants
maxcpu quants
horror
mergekit
Inference Endpoints
conversational
Max cpu/D_AU?
#2
by
Utochi
- opened
Just a simple question, whats the difference between the two types in the files?
in simpler terms than whats in the model card please.. for us kinda clueless people
The "max cpu" versions offload part of the model on to the cpu ; this results in less vram usage, but also lower token per second.
This optional quant also uses the cpu for "math" - which is slightly more accurate than "gpu" (video card) math.
The result is slightly better instruction following and output generation.
For creative usage:
This results in greater nuance / connection as well as connection between concepts, details, character and "world".
For problem solving:
Greater change the model will both understand your problem better and craft a better, more accurate answer.