Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 18 days ago • 43
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9 • 2
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 119
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean Paper • 2403.06412 • Published Mar 11 • 3