Skywork-R1V

🌐 Homepage | 📖 Technical Report | 💻 GitHub

1. Model Introduction

Model Name	Vision Encoder	Language Model	HF Link
Skywork-R1V-38B	InternViT-6B-448px-V2_5	deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	🤗 Link
Skywork-R1V-38B-qwq	InternViT-6B-448px-V2_5	Qwen/QwQ-32B	-

2. Feature

Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.

3. Evaluation

Comparison with Larger-Scale Open-Source and Closed-Source Models

	Benchmark	LLM	VLM
		QwQ-32B-Preview	InternVL-2.5-38B	VILA 1.5-40B	InternVL2-40B	Skywork-R1V-38B
Reasoning	MATH-500	90.6	-	-	-	94.0
	AIME 2024	50.0	-	-	-	72.0
	GPQA	54.5	-	-	-	61.6
Vision	MathVista(mini)	-	71.9	49.5	63.7	67.5
	MMMU(Val)	-	63.9	55.1	55.2	69.0

Evaluation results of state-of-the-art LLMs and VLMs

	Vision	Reasoning			Vision
		MATH-500	AIME 2024	GPQA	MathVista(mini)	MMMU(Val)
		pass@1	pass@1	pass@1	pass@1	pass@1
Qwen2.5-72B-Instruct	❌	80.0	23.3	49.0	-	-
Deepseek V3	❌	90.2	39.2	59.1	-	-
Deepseek R1	❌	97.3	79.8	71.5	-	-
Claude 3.5 Sonnet	✅	78.3	16.0	65.0	65.3	66.4
GPT-4o	✅	74.6	9.3	49.9	63.8	69.1
Kimi k1.5	✅	96.2	77.5	-	74.9	70.0
Qwen2.5-VL-72B-Instruct	✅	-	-	-	74.8	70.2
LLaVA-Onevision-72B	✅	-	-	-	67.5	56.8
InternVL2-Llama3-76B	✅	-	-	-	65.5	62.7
InternVL2.5-78B	✅	-	-	-	72.3	70.1
Skywork-R1V-38B	✅	94.0	72.0	61.6	67.5	69.0

4. Usage

step1. Clone the Repository

First, clone the repository to your local machine:

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

step2. Set Up the Environment

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

step3. Run the Inference Script

Prepare your images and questions, and update them to inference_with_transformers.py

CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
    --model_path path \
    --image_paths image1_path \
    --question "your question"

5. Citation

If you use Skywork-R1V in your research, please cite:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}

This project is released under an open-source license.

Skywork
/

Skywork-R1V-38B

Skywork-R1V

🌐 Homepage | 📖 Technical Report | 💻 GitHub

1. Model Introduction

2. Feature

3. Evaluation

4. Usage

step1. Clone the Repository

step2. Set Up the Environment

step3. Run the Inference Script

5. Citation

Collection including Skywork/Skywork-R1V-38B

Skywork-R1V