Spaces:

k-mktr
/

gpu-poor-llm-arena

Running

App Files Files Community

gpu-poor-llm-arena / README.md

k-mktr

Update README.md

0223ad5 verified about 1 month ago

preview code

raw

history blame

5.03 kB

	---
	title: GPU Poor LLM Arena
	emoji: 🏆
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.1.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: 'Compact LLM Battle Arena: Frugal AI Face-Off!'
	---

	# 🏆 GPU-Poor LLM Gladiator Arena 🏆

	Welcome to the GPU-Poor LLM Gladiator Arena, where frugal meets fabulous in the world of AI! This project pits compact language models (maxing out at 9B parameters) against each other in a battle of wits and words.

	## 🤔 Starting from "Why?"

	In the recent months, we've seen a lot of these "Tiny" models released, and some of them are really impressive.

	- Gradio Exploration: This project serves me as a playground for experimenting with Gradio app development; I am learning how to create interactive AI interfaces with it.

	- Tiny Model Evaluation: I wanted to develop a personal (and now public) stats system for evaluating tiny language models. It's not too serious, but it provides valuable insights into the capabilities of these compact powerhouses.

	- Accessibility: Built on Ollama, this arena allows pretty much anyone to experiment with these models themselves. No need for expensive GPUs or cloud services!

	- Pure Fun: At its core, this project is about having fun with AI. It's a lighthearted way to explore and compare different models. So, haters, feel free to chill – we're just here for a good time!


	## 🌟 Features

	- Battle Arena: Pit two mystery models against each other and decide which pint-sized powerhouse reigns supreme.
	- Leaderboard: Track the performance of different models over time using an improved scoring system.
	- Performance Chart: Visualize model performance with interactive charts.
	- Privacy-Focused: Uses local Ollama API, avoiding pricey commercial APIs and keeping data close to home.
	- Customizable: Easy to add new models and prompts.

	## 🚀 Getting Started

	### Prerequisites

	- Python 3.7+
	- Gradio
	- Plotly
	- Ollama (running locally)

	### Installation

	1. Clone the repository:
	```
	git clone https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena.git
	cd gpu-poor-llm-arena
	```

	2. Install the required packages:
	```
	pip install gradio plotly requests
	```

	3. Ensure Ollama is running locally or via a remote server.

	4. Run the application:
	```
	python app.py
	```

	## 🎮 How to Use

	1. Open the application in your web browser (typically at `http://localhost:7860`).
	2. In the "Battle Arena" tab:
	- Enter a prompt or use the random prompt generator (🎲 button).
	- Click "Generate Responses" to see outputs from two random models.
	- Vote for the better response.
	3. Check the "Leaderboard" tab to see overall model performance.
	4. View the "Performance Chart" tab for a visual representation of model wins and losses.

	## 🛠 Configuration

	You can customize the arena by modifying the `arena_config.py` file:

	- Add or remove models from the `APPROVED_MODELS` list.
	- Adjust the `API_URL` and `API_KEY` if needed.
	- Customize `example_prompts` for more variety in random prompts.

	## 📊 Leaderboard

	The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.

	### Scoring System

	We use a sophisticated scoring system to rank the models fairly:

	1. We calculate a score for each model using the formula:
	```
	score = win_rate * (1 - 1 / (total_battles + 1))
	```
	This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.

	2. We sort the results primarily by this new score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).

	3. The leaderboard displays this calculated score alongside wins, losses, and other statistics.

	4. The ranking is based on this sophisticated score instead of just the number of wins.

	This approach provides a fairer ranking system that considers both performance (win rate) and experience (total battles). Models that maintain a high win rate over many battles will be ranked higher than those with fewer battles or lower win rates.

	## 🤖 Models

	The arena currently supports various compact models, including:

	- LLaMA 3.2 (1B and 3B versions)
	- LLaMA 3.1 (8B version)
	- Gemma 2 (2B and 9B versions)
	- Qwen 2.5 (0.5B, 1.5B, 3B, and 7B versions)
	- Mistral 0.3 (7B version)
	- Phi 3.5 (3.8B version)
	- Hermes 3 (8B version)
	- Aya 23 (8B version)

	## 🤝 Contributing

	Contributions are welcome! Please feel free to suggest a model that Ollama supports. Some results are already quite surprising.

	## 📜 License

	This project is open-source and available under the MIT License

	## 🙏 Acknowledgements

	- Thanks to the Ollama team for providing that amazing tool.
	- Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!

	Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! 🏆