Model Description
I fine-tuned the new Llama-3.2-1B-Instruct model using both the Anthropic HH-RLHF and Magpie-Pro-DPO datasets with Direct Preference Optimization (DPO). I formatted the datasets according to DPO requirements and structured them into a “ready-to-apply chat template.”
Additionally, I modified the model’s tokenizer chat template to save tokens by removing the automatic system date messages, such as “Cutting Knowledge Date: December 2023,” from the system messages.
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.