doubility123 commited on
Commit
7daa9c5
1 Parent(s): 46a21c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -5
README.md CHANGED
@@ -1,5 +1,143 @@
1
- ---
2
- license: other
3
- license_name: deepseek
4
- license_link: https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: deepseek
4
+ license_link: https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL
5
+ ---
6
+
7
+ ## 1. Introduction
8
+
9
+ Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively.
10
+ DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models.
11
+
12
+
13
+ [DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding]()
14
+
15
+ [**Github Repository**](https://github.com/deepseek-ai/DeepSeek-VL2)
16
+
17
+
18
+ Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu**, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan*** (* Equal Contribution, ** Project Lead, *** Corresponding author)
19
+
20
+ ![](https://github.com/deepseek-ai/DeepSeek-VL2/tree/main/images/vl2_teaser.jpeg)
21
+
22
+ ## 2. Release
23
+ ✅ <b>2024-12-13</b>: DeepSeek-VL2 family released, including <code>DeepSeek-VL2-tiny</code>, <code>DeepSeek-VL2-small</code>, <code>DeepSeek-VL2</code>.
24
+
25
+ ### 2. Model Summary
26
+
27
+ DeepSeek-VL2-tiny is built on DeepSeekMoE-3B.
28
+
29
+
30
+ ## 4. Quick Start
31
+
32
+ ### Installation
33
+
34
+ On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:
35
+
36
+ ```shell
37
+ pip install -e .
38
+ ```
39
+
40
+ ### Simple Inference Example
41
+
42
+ ```python
43
+ import torch
44
+ from transformers import AutoModelForCausalLM
45
+
46
+ from deepseek_vl.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
47
+ from deepseek_vl.utils.io import load_pil_images
48
+
49
+
50
+ # specify the path to the model
51
+ model_path = "deepseek-ai/deepseek-vl2-small"
52
+ vl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)
53
+ tokenizer = vl_chat_processor.tokenizer
54
+
55
+ vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
56
+ vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
57
+
58
+ ## single image conversation example
59
+ conversation = [
60
+ {
61
+ "role": "<|User|>",
62
+ "content": "<image>\n<|ref|>The giraffe at the back.<|/ref|>.",
63
+ "images": ["./images/visual_grounding.jpeg"],
64
+ },
65
+ {"role": "<|Assistant|>", "content": ""},
66
+ ]
67
+
68
+ ## multiple images (or in-context learning) conversation example
69
+ # conversation = [
70
+ # {
71
+ # "role": "User",
72
+ # "content": "<image_placeholder>A dog wearing nothing in the foreground, "
73
+ # "<image_placeholder>a dog wearing a santa hat, "
74
+ # "<image_placeholder>a dog wearing a wizard outfit, and "
75
+ # "<image_placeholder>what's the dog wearing?",
76
+ # "images": [
77
+ # "images/dog_a.png",
78
+ # "images/dog_b.png",
79
+ # "images/dog_c.png",
80
+ # "images/dog_d.png",
81
+ # ],
82
+ # },
83
+ # {"role": "Assistant", "content": ""}
84
+ # ]
85
+
86
+ # load images and prepare for inputs
87
+ pil_images = load_pil_images(conversation)
88
+ prepare_inputs = vl_chat_processor(
89
+ conversations=conversation,
90
+ images=pil_images,
91
+ force_batchify=True,
92
+ system_prompt=""
93
+ ).to(vl_gpt.device)
94
+
95
+ # run image encoder to get the image embeddings
96
+ inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
97
+
98
+ # run the model to get the response
99
+ outputs = vl_gpt.language_model.generate(
100
+ inputs_embeds=inputs_embeds,
101
+ attention_mask=prepare_inputs.attention_mask,
102
+ pad_token_id=tokenizer.eos_token_id,
103
+ bos_token_id=tokenizer.bos_token_id,
104
+ eos_token_id=tokenizer.eos_token_id,
105
+ max_new_tokens=512,
106
+ do_sample=False,
107
+ use_cache=True
108
+ )
109
+
110
+ answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
111
+ print(f"{prepare_inputs['sft_format'][0]}", answer)
112
+ ```
113
+
114
+ ### Gradio Demo (TODO)
115
+
116
+
117
+ ## 5. License
118
+
119
+ This code repository is licensed under [MIT License](./LICENSE-CODE). The use of DeepSeek-VL2 models is subject to [DeepSeek Model License](./LICENSE-MODEL). DeepSeek-VL2 series supports commercial use.
120
+
121
+ ## 6. Citation
122
+
123
+ ```
124
+
125
+ @misc{wu2024deepseekvl2,
126
+ title={DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding},
127
+ author={Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan},
128
+ year={2024},
129
+ }
130
+
131
+ @misc{lu2024deepseekvl,
132
+ title={DeepSeek-VL: Towards Real-World Vision-Language Understanding},
133
+ author={Haoyu Lu and Wen Liu and Bo Zhang and Bingxuan Wang and Kai Dong and Bo Liu and Jingxiang Sun and Tongzheng Ren and Zhuoshu Li and Hao Yang and Yaofeng Sun and Chengqi Deng and Hanwei Xu and Zhenda Xie and Chong Ruan},
134
+ year={2024},
135
+ eprint={2403.05525},
136
+ archivePrefix={arXiv},
137
+ primaryClass={cs.AI}
138
+ }
139
+ ```
140
+
141
+ ## 7. Contact
142
+
143
+ If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).