Difference from alpaca-farm-ppo-sim-gpt4-20k-wdiff

by robkirk - opened Jul 27, 2023

Discussion

robkirk

Jul 27, 2023

Hi, what's the difference between this model and https://huggingface.co/tatsu-lab/alpaca-farm-ppo-sim-gpt4-20k-wdiff ?

rtaori

Tatsu Lab org Jul 27, 2023

Great question. ppo-sim refers to the PPO model trained on the standard AlpacaFarm simulation preference data (randomizing over prompts/API LLMs + injecting label nosie). ppo-sim-gpt4-20k refers to the PPO model trained with a single prompt/API LLM (gpt4). Mapping to the paper https://arxiv.org/pdf/2305.14387.pdf, ppo-sim is the PPO model in Table 2 (left column) and also final row in Table 4. ppo-sim-gpt4-20k is the third row in Table 4.

rtaori changed discussion status to closed Jul 27, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment