Nemotron-Mini-4B-Instruct-GGUF Q8
This quantized GGUF model was created using llama.cpp
Original model: https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct
You can run this model on LM Studio
Model Overview
Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment. It is a fine-tuned version of nvidia/Minitron-4B-Base, which was pruned and distilled from Nemotron-4 15B using our LLM compression technique. This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use.
Model Developer: NVIDIA
Model Dates: Nemotron-Mini-4B-Instruct was trained between February 2024 and Aug 2024.
License
NVIDIA Community Model License
Model Architecture
Nemotron-Mini-4B-Instruct uses a model embedding size of 3072, 32 attention heads, and an MLP intermediate dimension of 9216. It also uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
Architecture Type: Transformer Decoder (auto-regressive language model)
Network Architecture: Nemotron-4
Prompt Format:
We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.
Single Turn
<extra_id_0>System
{system prompt}
<extra_id_1>User
{prompt}
<extra_id_1>Assistant\n
Tool use
<extra_id_0>System
{system prompt}
<tool> ... </tool>
<context> ... </context>
<extra_id_1>User
{prompt}
<extra_id_1>Assistant
<toolcall> ... </toolcall>
<extra_id_1>Tool
{tool response}
<extra_id_1>Assistant\n
- Downloads last month
- 37
Model tree for abiks/Nemotron-Mini-4B-Instruct-GGUF-Q8
Base model
nvidia/Minitron-4B-Base