Skywork-R1V
Collection
pioneering multimodal reasoning with cot
•
3 items
•
Updated
•
5
Model Name | Vision Encoder | Language Model | HF Link |
---|---|---|---|
Skywork-R1V-38B | InternViT-6B-448px-V2_5 | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | 🤗 Link |
Skywork-R1V-38B-qwq | InternViT-6B-448px-V2_5 | Qwen/QwQ-32B | - |
Benchmark | LLM | VLM | ||||
---|---|---|---|---|---|---|
QwQ-32B-Preview | InternVL-2.5-38B | VILA 1.5-40B | InternVL2-40B | Skywork-R1V-38B | ||
Reasoning | MATH-500 | 90.6 | - | - | - | 94.0 |
AIME 2024 | 50.0 | - | - | - | 72.0 | |
GPQA | 54.5 | - | - | - | 61.6 | |
Vision | MathVista(mini) | - | 71.9 | 49.5 | 63.7 | 67.5 |
MMMU(Val) | - | 63.9 | 55.1 | 55.2 | 69.0 |
Vision | Reasoning | Vision | |||||
---|---|---|---|---|---|---|---|
MATH-500 | AIME 2024 | GPQA | MathVista(mini) | MMMU(Val) | |||
pass@1 | pass@1 | pass@1 | pass@1 | pass@1 | |||
Qwen2.5-72B-Instruct | ❌ | 80.0 | 23.3 | 49.0 | - | - | |
Deepseek V3 | ❌ | 90.2 | 39.2 | 59.1 | - | - | |
Deepseek R1 | ❌ | 97.3 | 79.8 | 71.5 | - | - | |
Claude 3.5 Sonnet | ✅ | 78.3 | 16.0 | 65.0 | 65.3 | 66.4 | |
GPT-4o | ✅ | 74.6 | 9.3 | 49.9 | 63.8 | 69.1 | |
Kimi k1.5 | ✅ | 96.2 | 77.5 | - | 74.9 | 70.0 | |
Qwen2.5-VL-72B-Instruct | ✅ | - | - | - | 74.8 | 70.2 | |
LLaVA-Onevision-72B | ✅ | - | - | - | 67.5 | 56.8 | |
InternVL2-Llama3-76B | ✅ | - | - | - | 65.5 | 62.7 | |
InternVL2.5-78B | ✅ | - | - | - | 72.3 | 70.1 | |
Skywork-R1V-38B | ✅ | 94.0 | 72.0 | 61.6 | 67.5 | 69.0 |
First, clone the repository to your local machine:
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
Prepare your images and questions, and update them to inference_with_transformers.py
CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
--model_path path \
--image_paths image1_path \
--question "your question"
If you use Skywork-R1V in your research, please cite:
@article{skywork2025r1v,
title = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
year = {2025},
journal = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V.pdf},
url = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}
This project is released under an open-source license.