Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
31
Aligning LLMs to be helpful, honest, harmless, and huggy (H4)
transformers
in dedicated releases!v4.49.0-SmolVLM-2
and v4.49.0-SigLIP-2
.