Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22 • 6
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 84