Qwen2.5-Think-KTO v0.1: A Reasoning-Enhanced Language Model

NOTE: This model is currently undertrained and needs some coaxing to output <think>...</think> tags.

What's New in v0.1

This initial release enhances the base Qwen2.5-7B model's reasoning capabilities using Kahneman-Tversky Optimization (KTO). The model is trained using binary feedback signals, indicating whether outputs are desirable or undesirable for given inputs.

How It Works

The model generates responses using a simple thought-then-answer format:

<think>
Let me approach this step by step...
First, we need to consider X...
Then, looking at Y...
Finally, Z leads us to...
</think>

[final answer based on thought process]

Technical Details

Base Architecture

  • Base Model: Qwen2.5-7B
  • Training Approach: Kahneman-Tversky Optimization (KTO)
  • Dataset: Binary feedback signals (desirable/undesirable outputs)
  • Quality Control: Programmatic validation

Training Parameters

  • Optimization:
    • Learning Rate: 5e-6
    • Scheduler: Cosine with 0.1 warmup ratio
    • Optimizer: AdamW 8-bit
    • Batch Size: 5 per device
    • Gradient Accumulation Steps: 1
    • Number of Epochs: 3
  • Model Config:
    • Max Length: 3746
    • Max Prompt Length: 364
    • Attention Implementation: Flash Attention 2
    • Gradient Checkpointing: Enabled
  • Infrastructure:
    • Accelerate for distributed training
    • Wandb logging
    • LIGER optimization enabled

What's It Good For?

✅ Tasks requiring natural thought processes ✅ Scenarios where binary feedback is available ✅ Problems benefiting from human-like reasoning ✅ Applications needing clear thought-to-answer progression

Limitations

  • Bounded by base Qwen2.5-7B capabilities
  • May not generalize beyond training distribution
  • First version with room for improvement
  • Performance on non-reasoning tasks unchanged
  • Limited by quality of binary feedback

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.1")
tokenizer = AutoTokenizer.from_pretrained("ericflo/Qwen2.5-Think-KTO-v0.1")

prompt = "What are the implications of Moore's Law slowing down?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=512)
response = tokenizer.decode(output[0])

Citation

@misc{qwen25-think-kto,
  title={Qwen2.5-Think-KTO: Enhanced Reasoning Through Human-Aware Learning},
  author={[Eric Florenzano]},
  year={2024},
  howpublished={\url{https://huggingface.co/ericflo/Qwen2.5-Think-KTO-v0.1}}
}

Acknowledgments

This model builds on the Qwen2.5-7B base model and implements the KTO approach developed by Ethayarajh et al. Special thanks to the authors of the KTO paper and the broader AI research community for their contributions to model alignment techniques.

Downloads last month
149
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ericflo/Qwen2.5-7B-Think-KTO-v0.1

Base model

Qwen/Qwen2.5-7B
Quantized
(55)
this model
Quantizations
1 model

Dataset used to train ericflo/Qwen2.5-7B-Think-KTO-v0.1