TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 56
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 56
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 56
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation Paper • 2410.16665 • Published Oct 22, 2024
On Memorization of Large Language Models in Logical Reasoning Paper • 2410.23123 • Published Oct 30, 2024 • 18
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models Paper • 2402.16786 • Published Feb 26, 2024
QASem Parsing: Text-to-text Modeling of QA-based Semantics Paper • 2205.11413 • Published May 23, 2022
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13, 2024 • 2
The Art of Saying No: Contextual Noncompliance in Language Models Paper • 2407.12043 • Published Jul 2, 2024 • 4
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published Sep 5, 2024 • 18
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published Jun 26, 2024 • 8
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Paper • 2406.18495 • Published Jun 26, 2024 • 12
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 14
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 14
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published Jun 16, 2024 • 13
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published Jun 16, 2024 • 13
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 65