
tsessk/llm-course-hw2-dpo
Text Generation
•
Updated
•
7
llm course @ HSE and vk llm A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness