Spaces:
Running
Running
File size: 3,403 Bytes
3c37eb3 c8763bd 341eaa4 bee5389 ad5bd56 d4660ee ad5bd56 c8763bd 9dc4521 3c37eb3 e747f4e c382b2a 9e3eaf4 f3dc796 d574374 df1a500 67b4a03 483e3a1 f3dc796 483e3a1 f3dc796 483e3a1 f3dc796 483e3a1 bee5389 6203f23 2ff4a74 3c37eb3 9dc4521 bee5389 9dc4521 2ff4a74 00642fb ad5bd56 9dc4521 bee5389 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
TITLE = """<h1 align="center" id="space-title">๐ค LLM-Perf Leaderboard ๐๏ธ</h1>"""
INTRODUCTION_TEXT = f"""
The ๐ค LLM-Perf Leaderboard ๐๏ธ aims to benchmark the performance (latency, throughput, memory & energy) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors.
Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
- Model evaluation requests should be made in the [๐ค Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the [๐ค LLM Performance Leaderboard ๐๏ธ](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) automatically.
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility.
"""
ABOUT_TEXT = """<h3>About the ๐ค LLM-Perf Leaderboard ๐๏ธ</h3>
<ul>
<li>To avoid communication-dependent results, only one GPU is used.</li>
<li>Score is the average evaluation score obtained from the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">๐ค Open LLM Leaderboard</a>.</li>
<li>LLMs are running on a singleton batch with a prompt size of 256 and generating a 1000 tokens.</li>
<li>Peak memory is measured in MB during the generate pass using Py3NVML while assuring the GPU's isolation.</li>
<li>Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.</li>
<li>Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.</li>
</ul>
"""
EXAMPLE_CONFIG_TEXT = """
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark:
```yaml
defaults:
- backend: pytorch # default backend
- benchmark: inference # default benchmark
- experiment # inheriting from experiment config
- _self_ # for hydra 1.1 compatibility
- override hydra/job_logging: colorlog # colorful logging
- override hydra/hydra_logging: colorlog # colorful logging
hydra:
run:
dir: llm-experiments/{experiment_name}
job:
chdir: true
experiment_name: {experiment_name}
model: {model}
device: cuda
backend:
no_weights: true
torch_dtype: float16
bettertransformer: true
quantization_scheme: gptq
benchmark:
memory: true
energy: true
new_tokens: 1000
input_shapes:
batch_size: 1
sequence_length: 256
```
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON_TEXT = r"""@misc{llm-perf-leaderboard,
author = {Ilyas Moutawwakil, Rรฉgis Pierrard},
title = {LLM-Perf Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}",
@software{optimum-benchmark,
author = {Ilyas Moutawwakil, Rรฉgis Pierrard},
publisher = {Hugging Face},
title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.},
}
"""
|