Edit model card

Qwen2.5-0.5B Synthia Fine-tuned Model

This is a fine-tuned version of Qwen/Qwen2.5-0.5B on the Synthia-v1.5-II dataset, optimized for conversational AI and instruction following.

Built with Axolotl

Model Description

This model builds upon the powerful Qwen2.5-0.5B base model, which features:

  • 490M parameters (360M non-embedding parameters)
  • 24 transformer layers
  • 14 attention heads for queries and 2 for key/values (GQA architecture)
  • Support for 32,768 context length
  • Advanced features like RoPE positional embeddings, SwiGLU activations, and RMSNorm

The model has been fine-tuned on the Synthia-v1.5-II dataset, which is designed to enhance instruction following and conversational abilities. The training process used careful hyperparameter tuning to maintain the base model's capabilities while optimizing for natural dialogue and instruction following.

Intended Uses & Limitations

This model is intended for:

  • Conversational AI applications
  • Instruction following tasks
  • Text generation with strong coherence
  • Multi-turn dialogue systems

Limitations:

  • The model inherits the 32K token context window from the base model
  • As a 0.5B parameter model, it may not match larger models in complex reasoning tasks
  • Performance in non-English languages may be limited
  • Users should be aware of potential biases present in the training data

Training and Evaluation Data

The model was fine-tuned on the Synthia-v1.5-II dataset, which is specifically designed for instruction-following and conversational AI. The training process used:

  • 95% of data for training
  • 5% for validation
  • Instruction format: "[INST] {instruction} [/INST]"

Training Procedure

Training Hyperparameters

Key hyperparameters:

  • Learning rate: 1e-05
  • Batch size: 40 (5 micro-batch × 8 gradient accumulation steps)
  • Training epochs: 3
  • Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-8)
  • Learning rate scheduler: Cosine with 100 warmup steps
  • Sequence length: 4096
  • Sample packing: Enabled
  • Mixed precision: BF16

Training Results

The model was trained for 672 steps over 3 epochs, showing consistent improvement throughout the training process.

Framework Versions

  • Transformers 4.46.0
  • PyTorch 2.3.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1

Citation

If you use this model, please cite both the original Qwen2.5 work and this fine-tuned version:

Downloads last month
24
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for artificialguybr/QWEN-2.5-0.5B-Synthia-II

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(31)
this model
Quantizations
3 models

Dataset used to train artificialguybr/QWEN-2.5-0.5B-Synthia-II