Terry Zhuo
commited on
Commit
·
2a1a6c1
1
Parent(s):
7a7f67a
big update
Browse files
app.py
CHANGED
@@ -350,9 +350,9 @@ with main_block as demo:
|
|
350 |
gr.Markdown(
|
351 |
"""
|
352 |
**Notes:**
|
353 |
-
-
|
354 |
-
- <u>Hard</u>: A subset of ~150 BigCodeBench tasks which is more user-facing and challenging.
|
355 |
-
- <u>Full</u>: The full set of 1140 BigCodeBench tasks.
|
356 |
- _Complete_ vs _Instruct_:
|
357 |
- <u>Complete</u>: Code Completion based on the (verbose) structured docstring. This split tests if the models are good at coding.
|
358 |
- <u>Instruct</u> (🔥Vibe Check🔥): Code Generation based on the (less verbose) NL-oriented instructions. This split tests if the models are really capable enough to understand human intents to code.
|
|
|
350 |
gr.Markdown(
|
351 |
"""
|
352 |
**Notes:**
|
353 |
+
- _Hard Set_ vs _Full Set_:
|
354 |
+
- <u>Hard Set</u>: A subset of ~150 BigCodeBench tasks which is more user-facing and challenging.
|
355 |
+
- <u>Full Set</u>: The full set of 1140 BigCodeBench tasks.
|
356 |
- _Complete_ vs _Instruct_:
|
357 |
- <u>Complete</u>: Code Completion based on the (verbose) structured docstring. This split tests if the models are good at coding.
|
358 |
- <u>Instruct</u> (🔥Vibe Check🔥): Code Generation based on the (less verbose) NL-oriented instructions. This split tests if the models are really capable enough to understand human intents to code.
|