metadata

base_model: unsloth/qwen2.5-14b-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - trl
  - sft
license: apache-2.0
language:
  - en
datasets:
  - qingy2024/QwQ-LongCoT-Verified-130K

Uploaded model

Developed by: qingy2024
License: apache-2.0
Finetuned from model : unsloth/qwen2.5-14b-bnb-4bit

This model is a fine-tuned version of Qwen 2.5-14B, trained on QwQ 32B Preview's responses to questions from the NuminaMathCoT dataset.

Note: This model uses the standard ChatML template.

At 500 steps, the loss was plateauing so I decided to stop training to prevent excessive overfitting.

Training Details

Base Model: Qwen 2.5-14B
Fine-Tuning Dataset: Verified subset of NuminaMathCoT using Qwen 2.5 3B Instruct as a judge. (the sharegpt-verified-cleaned subset from my dataset).
QLoRA Configuration:
- Rank: 32
- Rank Stabilization: Enabled
Optimization Settings:
- Batch Size: 8
- Gradient Accumulation Steps: 2 (Effective Batch Size: 16)
- Warm-Up Steps: 5
- Weight Decay: 0.01
Training Steps: 500 steps
Hardware Information: A100-80GB