Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
1
Jen Wei
bird-of-paradise
Follow
0 followers
ยท
4 following
AI & ML interests
None yet
Recent Activity
new
activity
about 12 hours ago
HuggingFaceH4/blogpost-scaling-test-time-compute:
Questions about Verifier Development, Search as Data Generation Tool, and Model Family Alignment
reacted
to
lewtun
's
post
with ๐ฅ
about 17 hours ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
liked
a Space
17 days ago
HuggingFaceH4/blogpost-scaling-test-time-compute
View all activity
Organizations
None yet
models
None public yet
datasets
1
bird-of-paradise/transformer-from-scratch-tutorial
Updated
17 days ago
โข
81
โข
3