vietphuon
/

Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

FINAL BENCHMARKING

Time to First Token (TTFT): 0.001s
Time Per Output Token (TPOT): 33.26ms/token
Throughput (token/s): 30.88token/s
Average Token Latency (ms/token): 33.33ms/token
Total Generation Time: 13.966s
Input Tokenization Time: 0.011s
Input Tokens: 1909
Output Tokens: 420
Total Tokens: 2329
Memory Usage (GPU): 3.38GB

Uploaded model

Developed by: vietphuon
License: apache-2.0
Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 208

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vietphuon/Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Finetuned

(105)

this model

Collection including vietphuon/Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit

Released fine-tuned QuizGen models

Most current fine-tuned and tested models for Quizgen downstream task from Rockship Co. • 4 items • Updated 27 days ago