Edit model card

NEO CLASS Ultra Quants for : TinyLlama-1.1B-Chat-v1.0-Ultra-NEO-V1-Imatrix-GGUF

The NEO Class tech was created after countless investigations and over 120 lab experiments backed by real world testing and qualitative results.

NEO Class results:

Better overall function, instruction following, output quality and stronger connections to ideas, concepts and the world in general.

In addition quants now operate above their "grade" so to speak :

IE: Q4 / IQ4 operate at Q5KM/Q6 levels.

Likewise for Q3/IQ3 operate at Q4KM/Q5 levels.

Perplexity drop of 591 points for Neo Class Imatrix quant of IQ4XS VS regular quant of IQ4XS.

(lower is better)

For experimental "X" quants of this model please go here:

[ https://huggingface.co/DavidAU/TinyLlama-1.1B-Chat-v1.0-Ultra-NEO-V1-X-Imatrix-GGUF ]

Model Notes:

Maximum context is 2k. Please see original model maker's page for details, and usage information for this model.

Special thanks to the model creators at TinyLLama for making such a fantastic model:

[ https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 ]

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 2" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Downloads last month
443
GGUF
Model size
1.1B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Collections including DavidAU/TinyLlama-1.1B-Chat-v1.0-Ultra-NEO-V1-Imatrix-GGUF