A newer version of the Gradio SDK is available:
5.21.0
metadata
title: quantized-LLM comparison
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
short_descriptions: Fine-tuned Llama-3.2-1B-Instruct with different quantizations
An example chatbot using Gradio, huggingface_hub
, and the Hugging Face Inference API.
HuggingFace Space with Quantized LLMs
Baseline model: Llama-3.2-1B-Instruct with 4-bit quantization
Training infrastracture:
- Google Colab with NVIDIA Tesla T4 GPU
- Finetuning with parameter-effecient finetuning (PEFT) by low-rank adaption (LORA) using Unsloth and HuggingFace's supervised finetuning libraries.
- Weight & Biases for model training monitoring and model checkpointing. Checkpointing every 10 steps.
Finetuning details
Datasets: