Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
6
Sahand Rezaei-Shoshtari
sahandrez
Follow
https://sahandrez.github.io/
sahandrez
AI & ML interests
Reinforcement Learning
Organizations
None yet
models
7
Sort: Recently updated
sahandrez/rloo-paired-zephyr-7b-sft-qlora-ultrafeedback-binarized-20241017-165205
Updated
2 days ago
sahandrez/rloo-paired-zephyr-7b-sft-qlora-ultrafeedback-binarized-20241016-204436
Updated
3 days ago
sahandrez/rloo-paired-zephyr-7b-sft-qlora-ultrafeedback-binarized-20241015-144325
Updated
4 days ago
sahandrez/rloo-paired-zephyr-7b-sft-qlora-ultrafeedback-binarized-20241015-132136
Updated
5 days ago
sahandrez/pairwise-reward-sft-zephyr-7b-sft-qlora-ultrafeedback
Updated
6 days ago
•
14
sahandrez/pairwise-reward-zephyr-7b-sft-qlora-ultrafeedback
Updated
7 days ago
•
26
sahandrez/sft-zephyr-7b-sft-qlora-ultrafeedback
Updated
9 days ago
•
66
datasets
2
Sort: Recently updated
sahandrez/ultrafeedback_kto
Viewer
•
Updated
27 days ago
•
126k
•
13
sahandrez/ultrafeedback_unpaired
Viewer
•
Updated
about 1 month ago
•
126k
•
6