mgoin's picture
Create README.md (#1)
2639f40 verified
|
raw
history blame
1.55 kB
metadata
tags:
  - fp8
  - vllm

Meta-Llama-3-70B-Instruct-FP8

Model Overview

Meta-Llama-3-70B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

Usage and Creation

Produced using AutoFP8 with calibration samples from ultrachat.

Evaluation

Open LLM Leaderboard evaluation scores

Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-FP8
(this model)
arc-c
25-shot
72.69 72.61
hellaswag
10-shot
85.50 85.41
mmlu
5-shot
80.18 80.06
truthfulqa
0-shot
62.90 62.73
winogrande
5-shot
83.34 83.03
gsm8k
5-shot
92.49 91.12
Average
Accuracy
79.51 79.16
Recovery 100% 99.55%