Spaces:

junior-labs
/

llm-colosseum

Runtime error

+import gradio as gr
+import pandas as pd
+# Load the results
+elo_df = pd.read_csv("elo-20240326.csv")
+text_description = """
+# 🤼 LLM Colosseum Leaderboard
+LLM Colosseum is a new way to assess the relative performance of LLMs. We have them play Street Fighter III against each other, and we use the results to calculate their Elo ratings.
+Watch a demo of LLMs playing Street Fighter III [here](https://youtu.be/Kk8foX3dm2I).
+More info in the LLM Colosseum GitHub [repository](https://github.com/OpenGenerativeAI/llm-colosseum).
+"""
+with gr.Blocks(
+    title="LLM Colosseum Leaderboard",
+) as demo:
+    gr.Markdown(text_description)
+    gr.Dataframe(value=elo_df, interactive=False)
+demo.launch()

elo-20240326.csv ADDED Viewed

+Model,Organization,Colosseum Elo
+gpt-3.5-turbo-0125, openai,1776
+mistral-small-latest, mistral,1586
+gpt-4-1106-preview, openai,1585
+gpt-4, openai,1517
+gpt-4-turbo-preview, openai,1509
+gpt-4-0125-preview, openai,1439
+mistral-medium-latest, mistral,1356
+mistral-large-latest, mistral,1231

requirements.txt ADDED Viewed

+aiofiles==23.2.1
+altair==5.2.0
+annotated-types==0.6.0
+anyio==4.3.0
+attrs==23.2.0
+certifi==2024.2.2
+charset-normalizer==3.3.2
+click==8.1.7
+colorama==0.4.6
+contourpy==1.2.0
+cycler==0.12.1
+fastapi==0.110.0
+ffmpy==0.3.2
+filelock==3.13.3
+fonttools==4.50.0
+fsspec==2024.3.1
+gradio==4.23.0
+gradio_client==0.14.0
+h11==0.14.0
+httpcore==1.0.4
+httpx==0.27.0
+huggingface-hub==0.22.1
+idna==3.6
+importlib_resources==6.4.0
+Jinja2==3.1.3
+jsonschema==4.21.1
+jsonschema-specifications==2023.12.1
+kiwisolver==1.4.5
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+matplotlib==3.8.3
+mdurl==0.1.2
+numpy==1.26.4
+orjson==3.9.15
+packaging==24.0
+pandas==2.2.1
+pillow==10.2.0
+pydantic==2.6.4
+pydantic_core==2.16.3
+pydub==0.25.1
+Pygments==2.17.2
+pyparsing==3.1.2
+python-dateutil==2.9.0.post0
+python-multipart==0.0.9
+pytz==2024.1
+PyYAML==6.0.1
+referencing==0.34.0
+requests==2.31.0
+rich==13.7.1
+rpds-py==0.18.0
+ruff==0.3.4
+semantic-version==2.10.0
+shellingham==1.5.4
+six==1.16.0
+sniffio==1.3.1
+starlette==0.36.3
+tomlkit==0.12.0
+toolz==0.12.1
+tqdm==4.66.2
+typer==0.10.0
+typing_extensions==4.10.0
+tzdata==2024.1
+urllib3==2.2.1
+uvicorn==0.29.0
+websockets==11.0.3