Chess Reasoner Model

This is a LoRA-trained model based on Qwen2.5-14B-Instruct, fine-tuned to analyze chess positions and suggest moves. The model was trained using GRPO (Guided Reward Proximal Optimization) to maximize move quality as evaluated by Stockfish.

Model Details

  • Base Model: Qwen/Qwen2.5-14B-Instruct
  • Training Method: LoRA with GRPO
  • Training Data: Generated chess positions with Stockfish evaluations
  • Input Format: Chess positions in FEN notation with ASCII board visualization
  • Output Format:
    <think>
    Analysis of the position
    </think>
    <move>
    Chosen move in UCI format (e.g., e2e4)
    </move>
    

Usage

from unsloth import FastLanguageModel
from safetensors.torch import load_file

# Load the base model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Qwen/Qwen2.5-14B-Instruct",
    max_seq_length=1024,
    load_in_4bit=True,
    fast_inference=True
)

# Get PEFT model with LoRA configuration
model = FastLanguageModel.get_peft_model(
    model,
    r=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=64
)

# Load the LoRA weights
state_dict = load_file("adapter_model.safetensors")
model.load_state_dict(state_dict)

Training

The model was trained on chess positions with rewards based on move quality as evaluated by Stockfish. The reward function considers:

  • Move legality
  • Move syntax
  • Move quality (compared to Stockfish's top 3 moves)

License

This model inherits the license of the base Qwen2.5-14B-Instruct model.

Model Card for Model ID

Model Details

Model Description

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

image/png

image/png

Downloads last month
9
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.