Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published 15 days ago • 7
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 12 days ago • 26
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 28 days ago • 46
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 30 days ago • 45
view article Article Navigating Korean LLM Research #2: Evaluation Tools By amphora • Oct 23, 2024 • 7
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 41
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Paper • 2409.19715 • Published Sep 29, 2024 • 9
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 12
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 2
Aligning to Thousands of Preferences via System Message Generalization Paper • 2405.17977 • Published May 28, 2024 • 7
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2, 2024 • 119
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards Paper • 2404.10346 • Published Apr 16, 2024 • 1
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning Paper • 2305.14045 • Published May 23, 2023 • 5
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 48
LangBridge: Multilingual Reasoning Without Multilingual Supervision Paper • 2401.10695 • Published Jan 19, 2024 • 5
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation Paper • 2401.06591 • Published Jan 12, 2024 • 3