Max cpu/D_AU?

#2
by Utochi - opened

Just a simple question, whats the difference between the two types in the files?

in simpler terms than whats in the model card please.. for us kinda clueless people

The "max cpu" versions offload part of the model on to the cpu ; this results in less vram usage, but also lower token per second.
This optional quant also uses the cpu for "math" - which is slightly more accurate than "gpu" (video card) math.
The result is slightly better instruction following and output generation.

For creative usage:
This results in greater nuance / connection as well as connection between concepts, details, character and "world".

For problem solving:
Greater change the model will both understand your problem better and craft a better, more accurate answer.

Sign up or log in to comment