-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 77 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 35 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 25 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 14
Robin Williams PRO
bfuzzy1
AI & ML interests
None yet
Recent Activity
updated
a collection
about 21 hours ago
Agents
updated
a collection
about 21 hours ago
Nifty
upvoted
a
paper
about 21 hours ago
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Organizations
None yet
Collections
11
llambses-1 models