kaikaidai commited on
Commit
08422fa
·
verified ·
1 Parent(s): 0f79b0c

Removed prompt tags in Atla template

Browse files
Files changed (1) hide show
  1. prompts.py +57 -58
prompts.py CHANGED
@@ -90,64 +90,10 @@ Score 5: {score5_desc}
90
  ###Feedback:
91
  """
92
 
93
- # Define the Flow Judge prompt
94
- FLOW_JUDGE_PROMPT = """# GOAL
95
- Your job is to evaluate a task carried out by an AI system powered by a large \
96
- language model.
97
-
98
- You will be provided with the inputs and output of the task, as well as the evaluation criteria \
99
- and scoring rubric. Your task is to evaluate the output of the AI system based on the evaluation \
100
- criteria and scoring rubric provided.
101
-
102
- # INPUT
103
- Below are the inputs required for performing the task:
104
- <inputs>
105
- {INPUTS}
106
- </inputs>
107
-
108
- # OUTPUT
109
- Below is the output of the task:
110
- <output>
111
- {OUTPUT}
112
- </output>
113
-
114
- # EVALUATION CRITERIA AND SCORING RUBRIC
115
- Here are the evaluation criteria and the rubric that you need to use for evaluating the task:
116
- <evaluation_criteria>
117
- {EVALUATION_CRITERIA}
118
- </evaluation_criteria>
119
-
120
- <scoring_rubric>
121
- {RUBRIC}
122
- </scoring_rubric>
123
-
124
- # INSTRUCTIONS FOR THE EVALUATION
125
- 1. Understand the task and criteria: Familiarize yourself with the task to be evaluated. \
126
- Review the evaluation criteria and scoring rubric to understand the different levels of \
127
- performance and the descriptions for each score.
128
- 2. Review the inputs and output: Look at the inputs provided for the task. Examine the output \
129
- generated from completing the task.
130
- 3. Compare output to score descriptions: Compare the output against the criteria and score \
131
- descriptions in the scoring rubric. For each criterion,decide which description best matches the \
132
- output.
133
- 4. After comparing the output to the score descriptions, pay attention to the small details that \
134
- might impact the final score that you assign. Sometimes a small difference can dictate the final \
135
- score.
136
- 5. Write verbal feedback justifying your evaluation that includes a detailed rationale, referring \
137
- to specific aspects of the output and comparing them to the rubric.
138
- 6. Assign a final score based on the scoring rubric.
139
-
140
- ## FORMAT FOR THE EVALUATION
141
- - Write the verbal feedback inside <feedback> tags without any additional surrounding text.
142
- - Write the numeric score inside <score> tags, without any additional surrounding text and always \
143
- after the feedback.
144
-
145
- Please accurately evaluate the task. Strictly adhere to the evaluation criteria and rubric."""
146
-
147
  # Judge system prompt for non-Prometheus models
148
  JUDGE_SYSTEM_PROMPT = """Please act as an impartial judge and evaluate based on the user's instruction. Your output format should strictly adhere to JSON as follows: {"feedback": "<write feedback>", "result": <numerical score>}. Ensure the output is valid JSON, without additional formatting or explanations."""
149
 
150
- ATLA_PROMPT = """<|begin_of_text|><|start_header_id|>user<|end_header_id|> You are tasked with evaluating a response based on a given instruction (which may contain an Input) and a scoring rubric that serve as the evaluation standard. Provide a comprehensive feedback on the response quality strictly adhering to the scoring rubric, without any general evaluation. Follow this with a score between 1 and 5, referring to the scoring rubric. Avoid generating any additional opening, closing, or explanations.
151
  Here are some rules of the evaluation:
152
  (1) You should prioritize evaluating whether the response satisfies the provided rubric. The basis of your score should depend exactly on the rubric. However, the response does not need to explicitly address points raised in the rubric. Rather, evaluate the response based on the criteria outlined in the rubric.
153
 
@@ -174,8 +120,7 @@ ATLA_PROMPT = """<|begin_of_text|><|start_header_id|>user<|end_header_id|> You a
174
  Score 2: {score2_desc}
175
  Score 3: {score3_desc}
176
  Score 4: {score4_desc}
177
- Score 5: {score5_desc}
178
- <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
179
 
180
  ATLA_PROMPT_WITH_REFERENCE = """You are tasked with evaluating a response based on a given instruction (which may contain an Input) and a scoring rubric and reference answer that serve as the evaluation standard. Provide a comprehensive feedback on the response quality strictly adhering to the scoring rubric, without any general evaluation. Follow this with a score between 1 and 5, referring to the scoring rubric. Avoid generating any additional opening, closing, or explanations.
181
 
@@ -208,4 +153,58 @@ ATLA_PROMPT_WITH_REFERENCE = """You are tasked with evaluating a response based
208
  Score 5: {score5_desc}
209
 
210
  Reference answer:
211
- {ground_truth_input}"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ###Feedback:
91
  """
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  # Judge system prompt for non-Prometheus models
94
  JUDGE_SYSTEM_PROMPT = """Please act as an impartial judge and evaluate based on the user's instruction. Your output format should strictly adhere to JSON as follows: {"feedback": "<write feedback>", "result": <numerical score>}. Ensure the output is valid JSON, without additional formatting or explanations."""
95
 
96
+ ATLA_PROMPT = """You are tasked with evaluating a response based on a given instruction (which may contain an Input) and a scoring rubric that serve as the evaluation standard. Provide a comprehensive feedback on the response quality strictly adhering to the scoring rubric, without any general evaluation. Follow this with a score between 1 and 5, referring to the scoring rubric. Avoid generating any additional opening, closing, or explanations.
97
  Here are some rules of the evaluation:
98
  (1) You should prioritize evaluating whether the response satisfies the provided rubric. The basis of your score should depend exactly on the rubric. However, the response does not need to explicitly address points raised in the rubric. Rather, evaluate the response based on the criteria outlined in the rubric.
99
 
 
120
  Score 2: {score2_desc}
121
  Score 3: {score3_desc}
122
  Score 4: {score4_desc}
123
+ Score 5: {score5_desc}"""
 
124
 
125
  ATLA_PROMPT_WITH_REFERENCE = """You are tasked with evaluating a response based on a given instruction (which may contain an Input) and a scoring rubric and reference answer that serve as the evaluation standard. Provide a comprehensive feedback on the response quality strictly adhering to the scoring rubric, without any general evaluation. Follow this with a score between 1 and 5, referring to the scoring rubric. Avoid generating any additional opening, closing, or explanations.
126
 
 
153
  Score 5: {score5_desc}
154
 
155
  Reference answer:
156
+ {ground_truth_input}"""
157
+
158
+ # Define the Flow Judge prompt
159
+ FLOW_JUDGE_PROMPT = """# GOAL
160
+ Your job is to evaluate a task carried out by an AI system powered by a large \
161
+ language model.
162
+
163
+ You will be provided with the inputs and output of the task, as well as the evaluation criteria \
164
+ and scoring rubric. Your task is to evaluate the output of the AI system based on the evaluation \
165
+ criteria and scoring rubric provided.
166
+
167
+ # INPUT
168
+ Below are the inputs required for performing the task:
169
+ <inputs>
170
+ {INPUTS}
171
+ </inputs>
172
+
173
+ # OUTPUT
174
+ Below is the output of the task:
175
+ <output>
176
+ {OUTPUT}
177
+ </output>
178
+
179
+ # EVALUATION CRITERIA AND SCORING RUBRIC
180
+ Here are the evaluation criteria and the rubric that you need to use for evaluating the task:
181
+ <evaluation_criteria>
182
+ {EVALUATION_CRITERIA}
183
+ </evaluation_criteria>
184
+
185
+ <scoring_rubric>
186
+ {RUBRIC}
187
+ </scoring_rubric>
188
+
189
+ # INSTRUCTIONS FOR THE EVALUATION
190
+ 1. Understand the task and criteria: Familiarize yourself with the task to be evaluated. \
191
+ Review the evaluation criteria and scoring rubric to understand the different levels of \
192
+ performance and the descriptions for each score.
193
+ 2. Review the inputs and output: Look at the inputs provided for the task. Examine the output \
194
+ generated from completing the task.
195
+ 3. Compare output to score descriptions: Compare the output against the criteria and score \
196
+ descriptions in the scoring rubric. For each criterion,decide which description best matches the \
197
+ output.
198
+ 4. After comparing the output to the score descriptions, pay attention to the small details that \
199
+ might impact the final score that you assign. Sometimes a small difference can dictate the final \
200
+ score.
201
+ 5. Write verbal feedback justifying your evaluation that includes a detailed rationale, referring \
202
+ to specific aspects of the output and comparing them to the rubric.
203
+ 6. Assign a final score based on the scoring rubric.
204
+
205
+ ## FORMAT FOR THE EVALUATION
206
+ - Write the verbal feedback inside <feedback> tags without any additional surrounding text.
207
+ - Write the numeric score inside <score> tags, without any additional surrounding text and always \
208
+ after the feedback.
209
+
210
+ Please accurately evaluate the task. Strictly adhere to the evaluation criteria and rubric."""