Skywork-R1V

Introduction Image

🌐 Homepage | 📖 Technical Report | 💻 GitHub


1. Model Introduction

Model Name Vision Encoder Language Model HF Link
Skywork-R1V-38B InternViT-6B-448px-V2_5 deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 🤗 Link
Skywork-R1V-38B-qwq InternViT-6B-448px-V2_5 Qwen/QwQ-32B -

2. Feature

  • Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
  • Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
  • Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.

3. Evaluation



Comparison with Larger-Scale Open-Source and Closed-Source Models
Benchmark LLM VLM
QwQ-32B-Preview InternVL-2.5-38B VILA 1.5-40B InternVL2-40B Skywork-R1V-38B
Reasoning MATH-500 90.6 - - - 94.0
AIME 2024 50.0 - - - 72.0
GPQA 54.5 - - - 61.6
Vision MathVista(mini) - 71.9 49.5 63.7 67.5
MMMU(Val) - 63.9 55.1 55.2 69.0


Evaluation results of state-of-the-art LLMs and VLMs
Vision Reasoning Vision
MATH-500 AIME 2024 GPQA MathVista(mini) MMMU(Val)
pass@1 pass@1 pass@1 pass@1 pass@1
Qwen2.5-72B-Instruct 80.0 23.3 49.0 - -
Deepseek V3 90.2 39.2 59.1 - -
Deepseek R1 97.3 79.8 71.5 - -
Claude 3.5 Sonnet 78.3 16.0 65.0 65.3 66.4
GPT-4o 74.6 9.3 49.9 63.8 69.1
Kimi k1.5 96.2 77.5 - 74.9 70.0
Qwen2.5-VL-72B-Instruct - - - 74.8 70.2
LLaVA-Onevision-72B - - - 67.5 56.8
InternVL2-Llama3-76B - - - 65.5 62.7
InternVL2.5-78B - - - 72.3 70.1
Skywork-R1V-38B 94.0 72.0 61.6 67.5 69.0
skywork_r1v_eval

4. Usage

step1. Clone the Repository

First, clone the repository to your local machine:

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

step2. Set Up the Environment

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

step3. Run the Inference Script

Prepare your images and questions, and update them to inference_with_transformers.py

CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
    --model_path path \
    --image_paths image1_path \
    --question "your question"

5. Citation

If you use Skywork-R1V in your research, please cite:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}

This project is released under an open-source license.

Downloads last month
9
Safetensors
Model size
38.4B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including Skywork/Skywork-R1V-38B