ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Paper • 2304.05977 • Published Apr 12, 2023 • 1
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1
CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation Paper • 2311.18702 • Published Nov 30, 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models Paper • 2311.18743 • Published Nov 30, 2023 • 1
GLM: General Language Model Pretraining with Autoregressive Blank Infilling Paper • 2103.10360 • Published Mar 18, 2021 • 2
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks Paper • 2110.07602 • Published Oct 14, 2021
SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Paper • 2309.07045 • Published Sep 13, 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Paper • 2308.14508 • Published Aug 28, 2023 • 2
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Paper • 2402.14672 • Published Feb 22
Extensive Self-Contrast Enables Feedback-Free Language Model Alignment Paper • 2404.00604 • Published Mar 31
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Paper • 2404.02893 • Published Apr 3 • 20