Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
8
CoffeeBliss
CoffeeBliss
Follow
bird-of-paradise's profile picture
1 follower
·
2 following
AI & ML interests
None yet
Recent Activity
replied
to
lewtun
's
post
about 2 months ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
reacted
to
lewtun
's
post
with 🔥
about 2 months ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
liked
a model
about 2 months ago
bartowski/HuatuoGPT-o1-8B-GGUF
View all activity
Organizations
None yet
CoffeeBliss
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 models
about 2 months ago
bartowski/HuatuoGPT-o1-8B-GGUF
Text Generation
•
Updated
Dec 31, 2024
•
3.21k
•
7
FreedomIntelligence/HuatuoGPT-o1-8B
Text Generation
•
Updated
Dec 30, 2024
•
1.43k
•
33
liked
a dataset
about 2 months ago
yulan-team/YuLan-Mini-Datasets
Updated
Dec 29, 2024
•
1.07k
•
9
liked
a model
about 2 months ago
yulan-team/YuLan-Mini
Text Generation
•
Updated
12 days ago
•
889
•
35
liked
2 models
5 months ago
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
•
Updated
Dec 4, 2024
•
1.63M
•
•
1.31k
meta-llama/Llama-3.2-1B-Instruct
Text Generation
•
Updated
Oct 24, 2024
•
1.7M
•
•
756
liked
2 models
6 months ago
openbmb/MiniCPM-V-2_6
Image-Text-to-Text
•
Updated
Jan 15
•
130k
•
929
openbmb/MiniCPM-V-2_6-gguf
Updated
17 days ago
•
7.61k
•
151