Spaces:
Running
Running
Update common.py
Browse files
common.py
CHANGED
@@ -47,18 +47,34 @@ EVAL_DESCRIPTION = """
|
|
47 |
- Examples (Optional)
|
48 |
"""
|
49 |
|
50 |
-
DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input
|
51 |
|
52 |
-
|
53 |
-
Score 1: The response
|
54 |
-
Score 2: The response
|
55 |
-
Score 3: The response
|
56 |
-
Score 4: The response
|
57 |
-
Score 5: The response
|
58 |
|
59 |
[User Query]: {{input}}
|
60 |
|
61 |
-
[Response]: {{response}}"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
# Default Variable Values
|
64 |
DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
|
@@ -127,19 +143,24 @@ Judge Arena is specifically designed to assess AI models that function as evalua
|
|
127 |
# FAQ
|
128 |
|
129 |
**Isn't this the same as Chatbot Arena?**
|
|
|
130 |
We are big fans of what the LMSYS team have done with Chatbot Arena and fully credit them for the inspiration to develop this. We were looking for a dynamic leaderboard that graded on AI judge capabilities and didn't manage to find one, so we created Judge Arena. This UI is designed especially for evals; to match the format of the model-based eval prompts that you would use in your LLM evaluation / monitoring tool.
|
131 |
|
132 |
**Why should I trust this leaderboard?**
|
133 |
-
|
|
|
134 |
|
135 |
**Who funds this effort?**
|
|
|
136 |
Atla currently funds this out of our own pocket. We are looking for API credits (with no strings attached) to support this effort - please get in touch if you or someone you know might be able to help.
|
137 |
|
138 |
**What is Atla working on?**
|
|
|
139 |
We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
|
140 |
<br><br>
|
141 |
# Get in touch
|
142 |
-
|
|
|
143 |
|
144 |
|
145 |
|
|
|
47 |
- Examples (Optional)
|
48 |
"""
|
49 |
|
50 |
+
DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input. Your evaluation should focus on the helpfulness of the response given the user's instructions. Do not allow the length of the response to influence your evaluation. Be objective as possible and give a brief explanation for your score.
|
51 |
|
52 |
+
Scoring Rubric:
|
53 |
+
Score 1: The response is unhelpful, providing irrelevant or incorrect content that does not address the request.
|
54 |
+
Score 2: The response is partially helpful, missing key elements or including minor inaccuracies, and lacks depth in addressing the request.
|
55 |
+
Score 3: The response is adequately helpful, correctly addressing the main request with relevant information and some depth.
|
56 |
+
Score 4: The response is very helpful, addressing the request thoroughly with accurate and detailed content, but may lack a minor aspect of helpfulness.
|
57 |
+
Score 5: The response is exceptionally helpful, providing precise, comprehensive content that fully resolves the request with insight and clarity.
|
58 |
|
59 |
[User Query]: {{input}}
|
60 |
|
61 |
+
[AI Response]: {{response}}"""
|
62 |
+
|
63 |
+
# Split the eval prompt into editable and fixed parts
|
64 |
+
DEFAULT_EVAL_PROMPT_EDITABLE = """You are assessing a chat bot response to a user's input. Your evaluation should focus on the helpfulness of the response given the user's instructions. Do not allow the length of the response to influence your evaluation. Be objective as possible and give a brief explanation for your score.
|
65 |
+
|
66 |
+
Scoring Rubric:
|
67 |
+
Score 1: The response is unhelpful, providing irrelevant or incorrect content that does not address the request.
|
68 |
+
Score 2: The response is partially helpful, missing key elements or including minor inaccuracies, and lacks depth in addressing the request.
|
69 |
+
Score 3: The response is adequately helpful, correctly addressing the main request with relevant information and some depth.
|
70 |
+
Score 4: The response is very helpful, addressing the request thoroughly with accurate and detailed content, but may lack a minor aspect of helpfulness.
|
71 |
+
Score 5: The response is exceptionally helpful, providing precise, comprehensive content that fully resolves the request with insight and clarity."""
|
72 |
+
|
73 |
+
# Fixed suffix that will always be appended
|
74 |
+
FIXED_EVAL_SUFFIX = """
|
75 |
+
[User Query]: {{input}}
|
76 |
+
|
77 |
+
[AI Response]: {{response}}"""
|
78 |
|
79 |
# Default Variable Values
|
80 |
DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
|
|
|
143 |
# FAQ
|
144 |
|
145 |
**Isn't this the same as Chatbot Arena?**
|
146 |
+
|
147 |
We are big fans of what the LMSYS team have done with Chatbot Arena and fully credit them for the inspiration to develop this. We were looking for a dynamic leaderboard that graded on AI judge capabilities and didn't manage to find one, so we created Judge Arena. This UI is designed especially for evals; to match the format of the model-based eval prompts that you would use in your LLM evaluation / monitoring tool.
|
148 |
|
149 |
**Why should I trust this leaderboard?**
|
150 |
+
|
151 |
+
We have listed out our efforts to be fully transparent in the policies above. All of the code for this leaderboard is open-source and can be found on our [Github](https://github.com/atla-ai/judge-arena). Check out our [blog](https://www.atla-ai.com/blog) to stay up to date as we analyse the results from the leaderboard.
|
152 |
|
153 |
**Who funds this effort?**
|
154 |
+
|
155 |
Atla currently funds this out of our own pocket. We are looking for API credits (with no strings attached) to support this effort - please get in touch if you or someone you know might be able to help.
|
156 |
|
157 |
**What is Atla working on?**
|
158 |
+
|
159 |
We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
|
160 |
<br><br>
|
161 |
# Get in touch
|
162 |
+
We’d love to hear your feedback! For general feature requests or to submit / suggest new models to add to the arena, please open up a discussion in the [community](https://huggingface.co/spaces/AtlaAI/judge-arena/discussions) tab. You can also contact us directly on [X](https://x.com/Atla_AI) or [Discord](https://discord.gg/yNpUAMqs).
|
163 |
+
\nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
|
164 |
|
165 |
|
166 |
|