--- title: Edge LLM Leaderboard emoji: 🌖 colorFrom: red colorTo: blue sdk: gradio sdk_version: 5.8.0 app_file: app.py pinned: true license: apache-2.0 tags: [edge llm leaderboard, llm edge leaderboard, llm, edge, leaderboard] --- # Edge LLM leaderboard ## 📝 About The Edge LLM Leaderboard is a leaderboard to gauge practical performance and quality of edge LLMs. Its aim is to benchmark the performance (throughput and memory) of Large Language Models (LLMs) on Edge hardware - starting with a Raspberry Pi 5 (8GB) based on the ARM Cortex A76 CPU. Anyone from the community can request a new base model or edge hardware/backend/optimization configuration for automated benchmarking: - Model evaluation requests will be made live soon, in the meantime feel free to email to - arnav[dot]chavan[@]nyunai[dot]com ## ✍️ Details - To avoid multi-thread discrepencies, all 4 threads are used on the Pi 5. - LLMs are running on a singleton batch with a prompt size of 512 and generating 128 tokens. All of our throughput benchmarks are ran by this single tool [llama-bench](https://github.com/ggerganov/llama.cpp/tree/master/examples/llama-bench) using the power of [llama.cpp](https://github.com/ggerganov/llama.cpp) to guarantee reproducibility and consistency. ## 🏆 Ranking Models We use MMLU (zero-shot) via [llama-perplexity](https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity) for performance evaluation, focusing on key metrics relevant for edge applications: 1. **Prefill Latency (Time to First Token - TTFT):** Measures the time to generate the first token. Low TTFT ensures a smooth user experience, especially for real-time interactions in edge use cases. 2. **Decode Latency (Generation Speed):** Indicates the speed of generating subsequent tokens, critical for real-time tasks like transcription or extended dialogue sessions. 3. **Model Size:** Smaller models are better suited for edge devices with limited secondary storage compared to cloud or GPU systems, making efficient deployment possible. These metrics collectively address the unique challenges of deploying LLMs on edge devices, balancing performance, responsiveness, and memory constraints. ## 🏃 How to run locally To run the Edge LLM Leaderboard locally on your machine, follow these steps: ### 1. Clone the Repository First, clone the repository to your local machine: ```bash git clone https://huggingface.co/spaces/nyunai/edge-llm-leaderboard cd edge-llm-leaderboard ``` ### 2. Install the Required Dependencies Install the necessary Python packages listed in the requirements.txt file: `pip install -r requirements.txt` ### 3. Run the Application You can run the Gradio application in one of the following ways: - Option 1: Using Python `python app.py` - Option 2: Using Gradio CLI (include hot-reload) `gradio app.py` ### 4. Access the Application Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/