Update README.md
Browse files
README.md
CHANGED
@@ -54,9 +54,9 @@ To achieve optimal performance, we recommend the following settings:
|
|
54 |
|
55 |
3. Standardize output format: We recommend using hints to standardize model outputs when benchmarking.
|
56 |
|
57 |
-
a. Math questions: Add a statement ```Please reason step by step, and put your final answer within \\boxed{}.``` to the prompt.
|
58 |
|
59 |
-
b. Code problems: Add
|
60 |
|
61 |
4. In particular, we use ```latex2sympy2``` and ```sympy``` to assist in judging complex Latex formats for the Math500 evaluation script. For all datasets, we generate 64 responses per query to estimate pass@1.
|
62 |
|
|
|
54 |
|
55 |
3. Standardize output format: We recommend using hints to standardize model outputs when benchmarking.
|
56 |
|
57 |
+
a. Math questions: Add a statement "```Please reason step by step, and put your final answer within \\boxed{}.```" to the prompt.
|
58 |
|
59 |
+
b. Code problems: Add "### Format: Read the inputs from stdin solve the problem and write the answer to stdout. Enclose your code within delimiters as follows.\n \```python\n# YOUR CODE HERE\n\```\n### Answer: (use the provided format with backticks)" to the prompt.
|
60 |
|
61 |
4. In particular, we use ```latex2sympy2``` and ```sympy``` to assist in judging complex Latex formats for the Math500 evaluation script. For all datasets, we generate 64 responses per query to estimate pass@1.
|
62 |
|