Finetune Llama 3.2, NVIDIA Nemotron, Mistral 2-5x faster with 70% less memory via Unsloth!
We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
unsloth/Llama-3.1-Nemotron-70B-Instruct-bnb-4bit
For more details on the model, please go to NVIDIA's original model card
✨ Finetune for Free
All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
Unsloth supports | Free Notebooks | Performance | Memory use |
---|---|---|---|
Llama-3.2 (3B) | ▶️ Start on Colab | 2.4x faster | 58% less |
Llama-3.1 (8B) | ▶️ Start on Colab | 2.4x faster | 58% less |
Phi-3.5 (mini) | ▶️ Start on Colab | 2x faster | 50% less |
Gemma 2 (9B) | ▶️ Start on Colab | 2.4x faster | 58% less |
Mistral (7B) | ▶️ Start on Colab | 2.2x faster | 62% less |
DPO - Zephyr | ▶️ Start on Colab | 1.9x faster | 19% less |
- This conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
- This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
- * Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
Special Thanks
A huge thank you to the Meta and Llama team for creating these models and for NVIDIA fine-tuning them and releasing them.
- Downloads last month
- 2,355
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for unsloth/Llama-3.1-Nemotron-70B-Instruct-bnb-4bit
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.1-70B-Instruct