ExLlamaV2 Quantization

I applied the last step of my continuous finetuning method to the Nemotron-70b model from Nvidia. More details bellow:

Quants: (Coming Soon)

Open-LLM-Leaderboard scores: (Coming soon)

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

UnstableLlama
/

Rombos-LLM-V2.6-Nemotron-70b-exl2