Spaces:

nuprl
/

MultiPL-E

Sleeping

arjunguha commited on Jul 12

Commit

8cbb73d

•

1 Parent(s): 99a9bf4

Describe MultiPL-E

Files changed (1) hide show

app.py CHANGED Viewed

@@ -44,7 +44,7 @@ df[['Language', 'Model']] = df['Dataset'].apply(extract_info)
 # Create a dictionary to map models to friendly names
 model_to_friendly = {
     "starcoder2_15b": "StarCoder2-15B",
-    "deepseekcoder_v2lite": "DeepSeekCoder2-Lite"
 }
 # Function to get friendly name or original name if not in the dictionary
@@ -83,7 +83,21 @@ def get_initial_table():
 # Create the Gradio interface
 with gr.Blocks() as app:
-    gr.Markdown("# Model Leaderboard")
     with gr.Row():
         language_checkboxes = gr.CheckboxGroup(

 # Create a dictionary to map models to friendly names
 model_to_friendly = {
     "starcoder2_15b": "StarCoder2-15B",
+    "deepseekcoder_v2lite_base": "DeepSeekCoder2-Lite-Base"
 }
 # Function to get friendly name or original name if not in the dictionary
 # Create the Gradio interface
 with gr.Blocks() as app:
+    gr.Markdown("""
+# MultiPL-E Results
+[MultiPL-E](https://huggingface.co/datasets/nuprl/MultiPL-E) is a dataset for
+evaluating large language models for code generation that supports several
+programming languages. It takes the OpenAI HumanEval and the Mostly Basic
+Python Programs (MBPP) benchmarks and uses little compilers to translate them
+to other languages. It is easy to add support for new languages and benchmarks.
+This table shows how some recent Code LLMs perform on MultiPL-HumanEval.
+We use the MultiPL-E 3.0 problems, which incorporates several fixes and
+supports several new programming languages.
+""")
     with gr.Row():
         language_checkboxes = gr.CheckboxGroup(