takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_offline_nav_2nd Reinforcement Learning • Updated 1 day ago • 3
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_ppo_3rd Reinforcement Learning • Updated about 15 hours ago • 16