benchbench

Running

Yotam-Perlitz commited on Sep 2

Commit

282d506

•

1 Parent(s): f5e30e4

add citations

Signed-off-by: Yotam-Perlitz <y.perlitz@ibm.com>

Files changed (1) hide show

app.py CHANGED Viewed

@@ -494,7 +494,8 @@ with st.expander(label="Model scored by the aggragate"):
     )
 with st.expander(label="Citations"):
-    st.code("""
     @misc{liu2023agentbenchevaluatingllmsagents,
         title={AgentBench: Evaluating LLMs as Agents},
         author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
@@ -503,7 +504,7 @@ with st.expander(label="Citations"):
         archivePrefix={arXiv},
         primaryClass={cs.AI},
         url={https://arxiv.org/abs/2308.03688},
-}
     @software{Li_AlpacaEval_An_Automatic_2023,
         author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
@@ -755,7 +756,8 @@ with st.expander(label="Citations"):
         year={2024}
     }
-    """)
 st.markdown(
     "BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "

     )
 with st.expander(label="Citations"):
+    st.code(
+        r"""
     @misc{liu2023agentbenchevaluatingllmsagents,
         title={AgentBench: Evaluating LLMs as Agents},
         author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
         archivePrefix={arXiv},
         primaryClass={cs.AI},
         url={https://arxiv.org/abs/2308.03688},
+    }
     @software{Li_AlpacaEval_An_Automatic_2023,
         author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
         year={2024}
     }
+    """
+    )
 st.markdown(
     "BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "