update
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ inference: false
|
|
18 |
<br>
|
19 |
|
20 |
<p align="center">
|
21 |
-
Qwen-VL <a href="https://modelscope.cn/models/qwen/Qwen-VL/summary">🤖 <a> | <a href="https://huggingface.co/Qwen/Qwen-VL">🤗</a>  | Qwen-VL-Chat <a href="https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary">🤖 <a>| <a href="https://huggingface.co/Qwen/Qwen-VL-Chat">🤗</a>  |  <a href="https://modelscope.cn/studios/qwen/Qwen-VL-Chat-Demo/summary">Demo</a>  |  <a href="https://github.com/QwenLM/Qwen-VL/blob/
|
22 |
|
23 |
</p>
|
24 |
<br>
|
@@ -41,13 +41,13 @@ inference: false
|
|
41 |
- Qwen-VL: Qwen-VL 以 Qwen-7B 的预训练模型作为语言模型的初始化,并以 [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) 作为视觉编码器的初始化,中间加入单层随机初始化的 cross-attention,经过约1.5B的图文数据训练得到。最终图像输入分辨率为448。
|
42 |
- Qwen-VL-Chat: 在 Qwen-VL 的基础上,我们使用对齐机制打造了基于大语言模型的视觉AI助手Qwen-VL-Chat,其训练数据涵盖了 QWen-7B 的纯文本 SFT 数据、开源 LVLM 的 SFT 数据、数据合成和人工标注的图文对齐数据。
|
43 |
|
44 |
-
如果想了解更多关于模型的信息,请点击[链接](visual_memo.md)查看我们的技术备忘录。
|
45 |
|
46 |
We release two models of the Qwen-VL series:
|
47 |
- Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
|
48 |
- Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.
|
49 |
|
50 |
-
For more details about Qwen-VL, please refer to our [technical memo](visual_memo.md).
|
51 |
|
52 |
## 评测
|
53 |
|
|
|
18 |
<br>
|
19 |
|
20 |
<p align="center">
|
21 |
+
Qwen-VL <a href="https://modelscope.cn/models/qwen/Qwen-VL/summary">🤖 <a> | <a href="https://huggingface.co/Qwen/Qwen-VL">🤗</a>  | Qwen-VL-Chat <a href="https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary">🤖 <a>| <a href="https://huggingface.co/Qwen/Qwen-VL-Chat">🤗</a>  |  <a href="https://modelscope.cn/studios/qwen/Qwen-VL-Chat-Demo/summary">Demo</a>  |  <a href="https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md">Report</a>   |   <a href="https://discord.gg/9bjvspyu">Discord</a>
|
22 |
|
23 |
</p>
|
24 |
<br>
|
|
|
41 |
- Qwen-VL: Qwen-VL 以 Qwen-7B 的预训练模型作为语言模型的初始化,并以 [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) 作为视觉编码器的初始化,中间加入单层随机初始化的 cross-attention,经过约1.5B的图文数据训练得到。最终图像输入分辨率为448。
|
42 |
- Qwen-VL-Chat: 在 Qwen-VL 的基础上,我们使用对齐机制打造了基于大语言模型的视觉AI助手Qwen-VL-Chat,其训练数据涵盖了 QWen-7B 的纯文本 SFT 数据、开源 LVLM 的 SFT 数据、数据合成和人工标注的图文对齐数据。
|
43 |
|
44 |
+
如果想了解更多关于模型的信息,请点击[链接](https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md)查看我们的技术备忘录。
|
45 |
|
46 |
We release two models of the Qwen-VL series:
|
47 |
- Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
|
48 |
- Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.
|
49 |
|
50 |
+
For more details about Qwen-VL, please refer to our [technical memo](https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md).
|
51 |
|
52 |
## 评测
|
53 |
|