-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
Updated • 4 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
Updated • 4 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 59 • 2
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
Organization Card
Collections
2
models
8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
Updated
•
4
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
Updated
•
4
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-Llama-3-Armo-iter_3
Updated
•
10
•
2
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-Llama-3-Armo-iter_2
Updated
•
5
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-Llama-3-Armo-iter_1
Updated
•
9
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-Llama-3-epoch_2
Text Generation
•
Updated
•
13
•
3
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-Llama-3
Text Generation
•
Updated
•
11
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/652eec0aabc673c4204c459e/yhRe3kTFMWCPWb5PvV9_N.png)
Cornell-AGI/REBEL-OpenChat-3.5
Text Generation
•
Updated
•
6
•
1
datasets
9
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer
•
Updated
•
10.5k
•
47
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer
•
Updated
•
17.1k
•
43
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer
•
Updated
•
116k
•
43
•
1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer
•
Updated
•
64.6k
•
59
•
2
Cornell-AGI/REFUEL-UltraInteract-setting-two
Viewer
•
Updated
•
106k
•
32
•
1
Cornell-AGI/REFUEL-hh-setting-two
Viewer
•
Updated
•
165k
•
55
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_1
Viewer
•
Updated
•
56.1k
•
55
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3
Viewer
•
Updated
•
44.6k
•
58
•
1
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_2
Viewer
•
Updated
•
55.1k
•
37