nvidia/DeepSeek-R1-FP4 · Benchmark results compared to orig fp8 / int4 quants etc?

CHNtentes

2 days ago

Can you evaluate this fp4 quant? How it compares to others out there.

llama-ultra-producer

about 23 hours ago

Do we have ppl score for FP4 data type? Thank you very much.

zhiyucheng

NVIDIA org about 3 hours ago

Yes we can evaluate this FP4 model, see this post for MMLU and cost per token. For more benchmark results, please stay tuned.

llama-ultra-producer

about 3 hours ago

•

edited about 2 hours ago

@zhiyucheng Thank you. I noticed the word "99.8% of FP8 on MMLU". But I didn't see it being collected by any official websites. Usually in llm-harnessing tests, we will have a weighted score over major tasks, not only general knowledge testing.

For R1, reasoning is very import aspect we would like to know.

It will be great if we have any snotshopts about version of llm-harnessing used, snapshots of evaluation, fp4 conversion scripts and methods.

I am always care about FP4, not only because FP4 multiplication may be dramatically different from FP8 (only 16 variants), but also potential speedup under current arch . How would it look like with wgmma in hoper tensorcore? A lot of questions.

omrialmog

NVIDIA org about 3 hours ago

Hello @llama-ultra-producer ,

Thank you for your interest with all the questions!
If you're looking for more details on FP4 quantization please take a look at our Github page - https://github.com/NVIDIA/TensorRT-Model-Optimizer
We are open source so feel free to dig in if you're interested, and don't hesitate to try things out.

Please stay tuned for more detailed benchmark results.