Qwen2.5-0.5B Fine-Tuned on GSM8K with DeepSeek Augmentation
Model Overview π
This model is a fine-tuned version of Qwen2.5-0.5B, specifically trained for mathematical reasoning tasks using the GSM8K dataset, with additional Chain-of-Thought (CoT) reasoning augmentation from DeepSeek-V3. The model has been fine-tuned to generate detailed step-by-step solutions to grade school math problems, ensuring better logical reasoning and interpretability.
πΉ Key Features
- Base Model:
Qwen/Qwen2.5-0.5B
- Fine-Tuned On:
eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
- Optimized for: Mathematical problem-solving & step-by-step reasoning
- Fine-tuned with: LoRA (Low-Rank Adaptation) for parameter-efficient training
- Chain-of-Thought (CoT): Generates clear and structured reasoning for each problem
- Inference-ready: Available on π€ Hugging Face Hub
Model Details π
π Description
- Developed by: [Your Name or Organization]
- Funded by: [Optional: Mention if funded]
- Shared by: Hugging Face Hub
- Model Type: Causal Language Model (Text Generation)
- Languages: English (
en
) - License: MIT License
- Fine-tuned from:
Qwen/Qwen2.5-0.5B
π Model Repository
- Hugging Face Model Page:
π Fine-tuned Qwen2.5-0.5B
π₯ How to Load & Use This Model
You can load this model using π€ transformers
as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define model repo ID (Replace with actual HF repo)
model_name = "your-repo-id"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Move model to GPU (if available)
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Example inference
question = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"
inputs = tokenizer(question, return_tensors="pt").to(device)
output = model.generate(**inputs, max_length=200)
# Decode and print response
print(tokenizer.decode(output[0], skip_special_tokens=True))
π¬ Training Details
ποΈ Training Data
The model was fine-tuned on the GSM8K dataset, specifically the augmented dataset:
πΉ eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
This dataset contains:
- 8K training samples (
train
split) - 1K testing samples (
test
split) - Features:
"question"
,"answer"
, and"cot"
(Chain-of-Thought)
βοΈ Training Procedure
- Preprocessing: Each question was formatted using a prompt template to encourage step-by-step reasoning.
- Training Framework: Used
transformers
,trl
, andunsloth
for efficient fine-tuning. - Fine-Tuning Strategy: LoRA (Low-Rank Adaptation)
- Applied to query and value projection layers (
q_proj
,v_proj
) - LoRA hyperparameters:
r=8
,lora_alpha=16
,lora_dropout=0.1
- Applied to query and value projection layers (
- Optimization:
- Mixed Precision Training (
fp16
) - Batch Size: 16
- Gradient Accumulation: 1
- Learning Rate: 2e-4
- Mixed Precision Training (
- Training Time: Approx. 10,446 seconds (~3 hours)
π Performance & Evaluation
β Training Performance
Step | Loss | Grad Norm | Learning Rate | Epoch |
---|---|---|---|---|
10 | 2.1319 | 3.656 | 2e-4 | 0.0107 |
1000 | 0.2013 | 0.328 | 2.3e-7 | 9.98 |
9340 | 0.2048 | 0.341 | 2.1e-8 | 9.99 |
π§ͺ Testing & Expected Results
The model was evaluated on the 1K test samples and showed strong accuracy in multi-step problem-solving.
Example expected response:
To solve the problem, we first find the clips sold in May:
Clips in May = 48 / 2 = 24
Next, we find the total:
Total Clips = 48 + 24 = 72
#### Answer: 72
π Bias, Risks, and Limitations
β οΈ Potential Risks
- May hallucinate incorrect reasoning steps if prompts are unclear.
- Could struggle with complex mathematical problems outside its training data.
- Limited generalization to non-math reasoning tasks.
π― Recommendations
- If using this model for critical applications, verify outputs with human review.
- For better performance, fine-tune on larger datasets with real-world numerical reasoning.
π Environmental Impact
Estimated Carbon Emissions:
- Hardware Used: NVIDIA A100 GPU
- Training Time: ~3 hours
- Estimated CO2 Emitted: ~5.6 kg CO2eq (using ML Impact Calculator)
π Citation
If you use this model in your research, please cite it as:
@misc{Upcoming,
title={Upcoming},
author={Yiqiao},
year={2025}
}