Difference from alpaca-farm-ppo-sim-gpt4-20k-wdiff
#1
by
robkirk
- opened
Hi, what's the difference between this model and https://huggingface.co/tatsu-lab/alpaca-farm-ppo-sim-gpt4-20k-wdiff ?
Great question. ppo-sim
refers to the PPO model trained on the standard AlpacaFarm simulation preference data (randomizing over prompts/API LLMs + injecting label nosie). ppo-sim-gpt4-20k
refers to the PPO model trained with a single prompt/API LLM (gpt4). Mapping to the paper https://arxiv.org/pdf/2305.14387.pdf, ppo-sim
is the PPO model in Table 2 (left column) and also final row in Table 4. ppo-sim-gpt4-20k
is the third row in Table 4.
rtaori
changed discussion status to
closed