chujiezheng/zephyr_0.05
Text Generation
•
Updated
•
16
Note zephyr-7b-sft-full trained by DPO with 5% UltraFeedback data
Note zephyr-7b-sft-full trained by DPO with 10% UltraFeedback data
Note alpha = 8.0
Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data
Note alpha = 2.5
Note zephyr-7b-sft-full trained by DPO with 40% UltraFeedback data
Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 learning rate
Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 learning rate
Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 epochs
Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 epochs