metadata
title: README
emoji: π₯
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
EvalPlus: Rigorous Evaluation of LLMs for Code Generation
About
EvalPlus evaluates LLM-generated code on:
- Code Correctness: HumanEval+ and MBPP+
- Code Efficiency: EvalPerf
Resources
- π» GitHub Repo: evalplus/evalplus
- π Leader Board: evalplus.github.io
- π Papers: EvalPlus@NeurIPS'23, EvalPerf@COLM'24
- π Python Package: PyPI
Citations
@inproceedings{evalplus,
title = {Is Your Code Generated by Chat{GPT} Really Correct? Rigorous Evaluation of Large Language Models for Code Generation},
author = {Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming},
booktitle = {Thirty-seventh Conference on Neural Information Processing Systems},
year = {2023},
url = {https://openreview.net/forum?id=1qvx610Cu7},
}
@inproceedings{evalperf,
title = {Evaluating Language Models for Efficient Code Generation},
author = {Liu, Jiawei and Xie, Songrun and Wang, Junhao and Wei, Yuxiang and Ding, Yifeng and Zhang, Lingming},
booktitle = {First Conference on Language Modeling},
year = {2024},
url = {https://openreview.net/forum?id=IBCBMeAhmC},
}