Edit model card

Model Details

This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.

Model Description

Model Sources

Downloads last month
0

Collection including Lichang-Chen/ODIN-ppo-L230-best