Arabic QwQ 32B Preview:

This is the Arabic version of QwQ 32B Preview, specifically fine-tuned for Arabic reasoning tasks, with a primary focus on math. It is trained on a dataset featuring questions, step-by-step solutions, and detailed chains of thought, guiding users to arrive at the correct answers effectively.

Arabic-QwQ-32B-Preview is fine-tuned for Arabic reasoning tasks using the Unsloth and the newly intorduced Arabic_Reasoning_Dataset.

Overview on the training:

We fine-tuned a pre-trained language model to improve its reasoning capabilities on Arabic datasets. The model leverages advanced techniques like LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.

Key Features

⏺ 4-bit Quantization: Reduces memory usage and speeds up training.

⏺ Gradient Checkpointing: Saves VRAM and supports long contexts.

⏺ Early Stopping: Prevents overfitting during training.

⏺ Evaluation Strategy: Regular checkpoints and validation to monitor performance.

Dataset

🔹 Training Source: Omartificial-Intelligence-Space/Arabic_Reasoning_Dataset with 10,000 samples.

🔹 Description: Contains instruction-answer pairs for reasoning tasks in Arabic.

🔹 Validation Source: MohammedNasser/Arabic_Reasoning_Instruct_QA

🔹 Description: Contains reasoning challenges to validate model performance.

Preprocessing

Each dataset is formatted using the alpaca_prompt template:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{output}

Fine-Tuning Configuration

Model

▪️ Base Model: Qwen/QwQ-32B-Preview

▪️ Optimization: LoRA with the following parameters:

▪️Rank r: 16

▪️ LoRA alpha: 16

▪️ Dropout: 0

▪️ Gradient checkpointing: "unsloth" for long contexts.

Training Arguments

▪️ Batch Size: 8 (per device)

▪️ Gradient Accumulation Steps: 2

▪️ Epochs: 3

▪️ Learning Rate: 2e-4

▪️ Optimizer: adamw_8bit

▪️ Scheduler: Linear

▪️ FP16/BF16: Enabled based on hardware support.

Usage

pip install unsloth

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Omartificial-Intelligence-Space/Arabic-QWQ-32B-Preview",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt.format(
        "YOUR INSTRUCTION", # instruction
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 256, use_cache = True)
tokenizer.batch_decode(outputs)

Results and Comparsion

The Qwen/QwQ-32B model, while inherently multilingual and supportive of Arabic, exhibits inconsistent performance in Arabic reasoning tasks compared to its stronger default capabilities in English. Our observations indicate that the model often requires explicit, structured prompting to generate coherent Arabic responses, and even then, its reasoning abilities in Arabic can be limited. To address this, we have adapted the model by fine-tuning it with targeted Arabic reasoning datasets and task-specific instructions, enhancing its understanding and alignment with Arabic language tasks. This adaptation demonstrates the need for language-specific adjustments to optimize multilingual models for underrepresented languages like Arabic.

The following results of the Arabic-QwQ and QwQ-Preivew models were analyzed to better understand the impact of fine-tuning on the model's performance, particularly in enhancing its capabilities for Arabic language tasks.

An example illustrating how base models generate Chinese responses when provided with an Arabic question:

The fine-tuned model effectively resolves the issue of unintended Chinese interference in the responses, delivering clear and accurate answers in Arabic.

An example demonstrating how base models sometimes respond in English unless explicitly instructed to answer in Arabic. In contrast, the fine-tuned model seamlessly responds in Arabic without requiring additional instructions, simply by providing the question in Arabic.

Although the base model provides the correct answer, it is often in English, making it challenging for Arabic users to understand unless they are proficient in English.

At times, the base models provide additional context, resulting in unnecessarily lengthy answers. The fine-tuned model addresses this issue by focusing on delivering concise, straightforward solutions without extra context.

There are instances where both the base and fine-tuned models perform well in answering the query, demonstrating their capability to comprehend and provide accurate responses. However, the fine-tuned model consistently outperforms by aligning more closely with the requirements of Arabic users.

How to Use

To utilize the Arabic-QwQ model effectively:

Use Unsloth for Faster Inference
We recommend using Unsloth to load and perform inference with the model. This method is optimized for speed and offers better performance compared to traditional loading methods.

Incorporate Prompt Templates for Structured Instructions
For more specific instructions or complex tasks, use a prompt template to guide the model's responses. For example, structure your input like:

     prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
     
     ### Instruction:
     {}
     
     ### Response:
     {}"""

Acknowledgments

We would like to express our gratitude to Prince Sultan University for their support in the development and fine-tuning of this model. Their contributions were invaluable in making this work possible.

Citation

If you use this model in your research or application, please cite it as follows:

@misc{Arabic_QWQ,
  author = {Omer Nacar},
  title = {Arabic-QwQ: Fine-tuned QwQ LLM for Arabic Reasoning and Understanding},
  year = {2024},
  url = {https://huggingface.co/Omartificial-Intelligence-Space/Arabic-QWQ-32B-Preview},
  institution = {Prince Sultan University},
  note = {Fine-tuned version of the QwQ-32B model for Arabic-specific tasks.}
}

This work is built upon the great work done by Qwen:

@misc{qwq-32b-preview,
    title = {QwQ: Reflect Deeply on the Boundaries of the Unknown},
    url = {https://qwenlm.github.io/blog/qwq-32b-preview/},
    author = {Qwen Team},
    month = {November},
    year = {2024}
}

@article{qwen2,
      title={Qwen2 Technical Report}, 
      author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
}

Omartificial-Intelligence-Space
/

Arabic-QWQ-32B-Preview