Spaces:

SE-Arena
/

Software-Engineering-Arena

Sleeping

zhiminy commited on 28 days ago

Commit

ea8b371

1 Parent(s): ab10f2f

add grok models

Files changed (2) hide show

app.py CHANGED Viewed

@@ -519,7 +519,7 @@ with gr.Blocks() as app:
             # ⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
             ## 📜How It Works
-            - **Blind Comparison**: Submit a SE-related query to two anonymous chatbots randomly selected from up to {len(available_models)} top models, including OpenAI-o3, Gemini-2.0, Claude-3.5, Deepseek-r1, Mistral-large, Llama-3.3, Qwen-2.5, and others.
             - **Interactive Voting**: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
             - **Fair Play Rules**: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.

             # ⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
             ## 📜How It Works
+            - **Blind Comparison**: Submit a SE-related query to two anonymous chatbots randomly selected from up to {len(available_models)} top models, including OpenAI-o3, Grok-2, Gemini-2.0, Claude-3.7, Deepseek-r1, Mistral-large, Llama-3.3, Qwen-2.5, and others.
             - **Interactive Voting**: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
             - **Fair Play Rules**: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.

context_window.json CHANGED Viewed

@@ -5,13 +5,15 @@
     "gpt-4o-mini": 128000,
     "claude-3-5-haiku-20241022" : 200000,
     "claude-3-5-sonnet-20241022" : 200000,
-    "claude-3-opus-20240229" : 200000,
     "deepseek-chat": 64000,
     "deepseek-r1": 64000,
     "gemini-1.5-flash": 1048576,
     "gemini-1.5-pro": 2097152,
     "gemini-2.0-flash-lite-preview": 1048576,
     "gemini-2.0-pro-exp": 2097152,
     "llama-3.1-8b": 128000,
     "llama-3.1-405b": 128000,
     "llama-3.1-70b": 128000,

     "gpt-4o-mini": 128000,
     "claude-3-5-haiku-20241022" : 200000,
     "claude-3-5-sonnet-20241022" : 200000,
+    "claude-3-7-sonnet-latest": 200000,
+    "claude-3-opus-20240229": 200000,
     "deepseek-chat": 64000,
     "deepseek-r1": 64000,
     "gemini-1.5-flash": 1048576,
     "gemini-1.5-pro": 2097152,
     "gemini-2.0-flash-lite-preview": 1048576,
     "gemini-2.0-pro-exp": 2097152,
+    "grok-2-1212": 131072,
     "llama-3.1-8b": 128000,
     "llama-3.1-405b": 128000,
     "llama-3.1-70b": 128000,