Lighteval

🤗 Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it’s transformers, tgi, vllm, or nanotron-with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you effortlessly create new tasks and metrics tailored to your needs, or browsing all our existing tasks and metrics.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

< > Update on GitHub