SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"
Haozhe Ji
ehzoah
AI & ML interests
language modeling, text generation
Recent Activity
updated
a dataset
5 days ago
ehzoah/uf-ArmoRM-Llama3-8B-v0.1
New activity
5 days ago
ehzoah/uf-ArmoRM-Llama3-8B-v0.1:Librarian Bot: Add language metadata for dataset
updated
a model
11 days ago
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
Organizations
None yet
Collections
1
Papers
1
models
7
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
Updated
•
6
ehzoah/Llama-3.2-1B-sft-full
Text Generation
•
Updated
•
23
•
1
ehzoah/pythia-1.4b-sft-full
Updated
ehzoah/exo-hh-reward-model
Updated
ehzoah/exo-imdb-sft-model
Text Generation
•
Updated
•
13
ehzoah/exo-imdb-reward-model
Text Generation
•
Updated
•
15
ehzoah/exo-hh-sft-model
Updated