🤗 LLM-Perf Leaderboard 🏋️

TITLE = """<h1 align="center" id="space-title">🤗 LLM-Perf Leaderboard 🏋️</h1>"""

INTRODUCTION_TEXT = f"""
The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput, memory & energy) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors.

Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
- Model evaluation requests should be made in the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the [🤗 LLM Performance Leaderboard 🏋️](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) automatically.
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility.
"""

ABOUT_TEXT = """<h3>About the 🤗 LLM-Perf Leaderboard 🏋️</h3>
<ul>
    <li>To avoid communication-dependent results, only one GPU is used.</li>
    <li>Score is the average evaluation score obtained from the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">🤗 Open LLM Leaderboard</a>.</li>
    <li>LLMs are running on a singleton batch with a prompt size of 256 and generating a 1000 tokens.</li>
    <li>Peak memory is measured in MB during the generate pass using Py3NVML while assuring the GPU's isolation.</li>
    <li>Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.</li>
    <li>Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.</li>
</ul>
"""

EXAMPLE_CONFIG_TEXT = """
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark:
```yaml
defaults:
  - backend: pytorch # default backend
  - benchmark: inference # default benchmark
  - experiment # inheriting from experiment config
  - _self_ # for hydra 1.1 compatibility
  - override hydra/job_logging: colorlog # colorful logging
  - override hydra/hydra_logging: colorlog # colorful logging

hydra:
  run:
    dir: llm-experiments/{experiment_name}
  job:
    chdir: true

experiment_name: {experiment_name}

model: {model}

device: cuda

backend:
  no_weights: true
  torch_dtype: float16
  bettertransformer: true
  quantization_scheme: gptq


benchmark:
  memory: true
  energy: true
  
  new_tokens: 1000
  input_shapes:
    batch_size: 1
    sequence_length: 256


```
"""


CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON_TEXT = r"""@misc{llm-perf-leaderboard,
  author = {Ilyas Moutawwakil, Régis Pierrard},
  title = {LLM-Perf Leaderboard},
  year = {2023},
  publisher = {Hugging Face},
  howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}",
@software{optimum-benchmark,
  author = {Ilyas Moutawwakil, Régis Pierrard},
  publisher = {Hugging Face},
  title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.},
}
"""