LinhIcey commited on
Commit
d54fbe3
·
1 Parent(s): 9958185

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -17,4 +17,28 @@ metrics:
17
 
18
  ## 简介 Brief Introduction
19
 
20
- Ziya-Visual
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## 简介 Brief Introduction
19
 
20
+ **Lyrics** 是IDEA CCNL研发的大规模视觉语言模型(Large Vision Language Model, LVLM)。Lyrics在预训练(视觉语言的表征对齐)和指令微调(视觉到语言的生成学习)的两阶段训练过程中,构建了视觉细化器来提取局部视觉特征和具化的空间表征,其由图像标记(RAM)、目标检测(Grounding DINO)和语义分割(SAM)模块组成。
21
+
22
+ Lyrics 可以以图像、文本、视觉对象作为输入,并以文本和视觉对象的空间表征作为输出。Lyrics模型具有强大的细粒度视觉特征提取和理解能力,能够完成各种以视觉为中心的任务,包括多回合视觉对话、视觉场景理解和推理、基于常识的图像描述、指向性问答。
23
+
24
+ **Lyrics** is a Large Vision Language Model (LVLM) developed by IDEA CCNL. In the two-stage training process of pre-training (representation alignment of vision-language) and instruction fine-tuning (generative learning from vision to language), Lyrics construct a visual refiner to extract local visual features and embodied spatial representations. It consists of image tagging (RAM), object detection (Grounding DINO) and semantic segmentation (SAM) modules.
25
+
26
+ Lyrics can take images, text, and visual objects as input, and text and spatial representations of visual objects as output. The Lyrics model has a powerful ability of fine-grained visual feature extraction and understanding, and is capable of various visual-centric tasks, including multi-turn visual conversation, visual scene understanding and reasoning, commonsense-grounded image description, referential dialogue.
27
+
28
+
29
+ ## 安装要求 (Requirements)
30
+
31
+ * python 3.8及以上版本
32
+ * pytorch 1.12及以上版本
33
+ * 建议使用CUDA 11.3及以上(GPU用户需考虑此选项)
34
+ * python 3.8 and above
35
+ * pytorch 1.12 and above
36
+ * CUDA 11.3 and above are recommended (this is for GPU users)
37
+
38
+ ### 零样本图像描述 & 通用视觉问答 (Zero-shot Image Captioning & General VQA)
39
+ ![](assets/image_caption_vqa.jpg)
40
+
41
+ - 在 Image Captioning 中,Lyrics 在 COCO, Nocaps (0-shot) 和 Flickr30K (0-shot) 数据集上超过了同等规模的 LVLM 模型,取得了 **SOTA** 的结果。
42
+ - 在 General VQA 中,Lyrics 在四个数据集取得了 **SOTA** 的结果,并在 Vizwiz 数据集上与 Qwen-VL 旗鼓相当。
43
+ - In Image Captioning, Lyrics on COCO, Nocaps (0-shot), and Flickr30K (0-shot) datasets outperform LVLM models of the same size, achieving **SOTA** results.
44
+ - In General VQA, Lyrics achieved **SOTA** results across four datasets and tied with Qwen-VL on the Vizwiz dataset.