Arnav Chavan
alignment fix and desription change
c5bc8e4
|
raw
history blame
2.99 kB
metadata
title: Edge LLM Leaderboard
emoji: πŸŒ–
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - edge llm leaderboard
  - llm edge leaderboard
  - llm
  - edge
  - leaderboard

Edge LLM leaderboard

πŸ“ About

The Edge LLM Leaderboard is a leaderboard to gauge practical performance and quality of edge LLMs. Its aim is to benchmark the performance (throughput and memory) of Large Language Models (LLMs) on Edge hardware - starting with a Raspberry Pi 5 (8GB) based on the ARM Cortex A76 CPU.

Anyone from the community can request a new base model or edge hardware/backend/optimization configuration for automated benchmarking:

  • Model evaluation requests will be made live soon, in the meantime feel free to email to - arnav[dot]chavan[@]nyunai[dot]com

✍️ Details

  • To avoid multi-thread discrepencies, all 4 threads are used on the Pi 5.
  • LLMs are running on a singleton batch with a prompt size of 512 and generating 128 tokens.

All of our throughput benchmarks are ran by this single tool llama-bench using the power of llama.cpp to guarantee reproducibility and consistency.

πŸ† Ranking Models

We use MMLU (zero-shot) via llama-perplexity for performance evaluation, focusing on key metrics relevant for edge applications:

  1. Prefill Latency (Time to First Token - TTFT): Measures the time to generate the first token. Low TTFT ensures a smooth user experience, especially for real-time interactions in edge use cases.

  2. Decode Latency (Generation Speed): Indicates the speed of generating subsequent tokens, critical for real-time tasks like transcription or extended dialogue sessions.

  3. Model Size: Smaller models are better suited for edge devices with limited secondary storage compared to cloud or GPU systems, making efficient deployment possible.

These metrics collectively address the unique challenges of deploying LLMs on edge devices, balancing performance, responsiveness, and memory constraints.

πŸƒ How to run locally

To run the Edge LLM Leaderboard locally on your machine, follow these steps:

1. Clone the Repository

First, clone the repository to your local machine:

git clone https://huggingface.co/spaces/nyunai/edge-llm-leaderboard
cd edge-llm-leaderboard

2. Install the Required Dependencies

Install the necessary Python packages listed in the requirements.txt file: pip install -r requirements.txt

3. Run the Application

You can run the Gradio application in one of the following ways:

  • Option 1: Using Python python app.py
  • Option 2: Using Gradio CLI (include hot-reload) gradio app.py

4. Access the Application

Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/