SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"
Haozhe Ji
ehzoah
AI & ML interests
language modeling, text generation
Recent Activity
updated
a dataset
about 1 month ago
ehzoah/uf-ArmoRM-Llama3-8B-v0.1
new activity
about 1 month ago
ehzoah/uf-ArmoRM-Llama3-8B-v0.1:Librarian Bot: Add language metadata for dataset
updated
a model
about 1 month ago
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
Organizations
None yet
Collections
1
Papers
1
models
7
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
Updated
•
53
ehzoah/Llama-3.2-1B-sft-full
Text Generation
•
Updated
•
45
ehzoah/pythia-1.4b-sft-full
Updated
ehzoah/exo-hh-reward-model
Updated
ehzoah/exo-imdb-sft-model
Text Generation
•
Updated
•
7
ehzoah/exo-imdb-reward-model
Text Generation
•
Updated
•
10
ehzoah/exo-hh-sft-model
Updated