--- title: WhisperKit Benchmarks emoji: 🏆 colorFrom: green colorTo: indigo sdk: gradio app_file: main.py license: mit --- ## Prerequisites Ensure you have the following software installed: - Python 3.10 or higher - pip (Python package installer) ## Installation 1. **Clone the repository**: ```sh git clone https://github.com/argmaxinc/model-performance-dashboard.git cd model-performance-dashboard ``` 2. **Create a virtual environment**: ```sh python -m venv venv source venv/bin/activate ``` 3. **Install required packages**: ```sh pip install -r requirements.txt ``` ## Usage 1. **Run the application**: ```sh gradio main.py ``` 2. **Access the application**: After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard. ## Data Generation The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data. 1. **Performance Data Update (performance_generate.py)**: - Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset). - Processes the data to extract performance metrics for various models, devices, and operating systems. - Calculates metrics such as speed, tokens per second for long and short-form data. - Saves the results in `performance_data.json` and `support_data.csv`. 2. **Multilingual Data Update (multilingual_generate.py)**: - Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual). - Processes the data to generate confusion matrices for language detection. - Calculates metrics for both forced and unforced language detection scenarios. - Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`. 3. **Quality Data Update (quality_generate.py)**: - Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals). - Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset. - Saves the results in `quality_data.json`. ## Data Update To update the dashboard with latest data from our HuggingFace datasets, run: ```sh make use-huggingface-data ``` Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run: ```sh make use-local-data ```