Ivy1997 commited on
Commit
fbf5dc7
·
verified ·
1 Parent(s): 969265b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -9,11 +9,11 @@ tags:
9
 
10
  ![logo.jpg](logo.jpg)
11
 
12
- [Ivy\-VL] is a lightweight multimodal model with only 3B parameters. It accepts both image and text inputs to generate text outputs. 
13
 
14
- Thanks to its lightweight design, it can be deployed on edge devices such as AI glasses and smartphones, offering low memory usage and high speed while maintaining strong performance on multimodal tasks. Some well-known small models include PaliGemma 3B, Moondream2, Qwen2VL, InternVL2, and InternVL2.5. Ivy-VL outperforms them on multiple benchmarks.
15
 
16
- The model is built upon the [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) language model, with [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) serving as the vision encoder. 
17
 
18
  # Model Summary:
19
 
@@ -50,7 +50,7 @@ import warnings
50
 
51
  warnings.filterwarnings("ignore")
52
 
53
- pretrained = "AI-Safeguard/Ivy-VL"
54
 
55
  model_name = "llava_qwen"
56
  device = "cuda"
@@ -97,16 +97,18 @@ print(text_outputs)
97
  # Future Plan:
98
 
99
  * We plan to release more versions of LLMs in different sizes.
100
-
101
  * We will focus on improving the performance of the video modality.
102
 
103
 
104
  # Citation:
105
 
 
 
106
  ```plaintext
107
  @misc{ivy2024ivy-vl,
108
  title={Ivy-VL:Compact Vision-Language Models Achieving SOTA with Optimal Data},
109
- url={https://huggingface.co/AI-Safeguard/Ivy-VL},
110
  author={Ivy Zhang,Jenny N,Theresa Yu and David Qiu},
111
  month={December},
112
  year={2024}
 
9
 
10
  ![logo.jpg](logo.jpg)
11
 
12
+ `Ivy-VL` is a lightweight multimodal model with only 3B parameters. It accepts both image and text inputs to generate text outputs. 
13
 
14
+ Thanks to its lightweight design, it can be deployed on edge devices such as AI glasses and smartphones, offering low memory usage and high speed while maintaining strong performance on multimodal tasks. Some well-known small models include [PaliGemma 3B](https://huggingface.co/google/paligemma-3b-mix-448), [Moondream2](https://huggingface.co/vikhyatk/moondream2), [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B), [InternVL2-2B](https://huggingface.co/OpenGVLab/InternVL2-2B), and [InternVL2_5-2B](https://huggingface.co/OpenGVLab/InternVL2_5-2B). Ivy-VL outperforms them on multiple benchmarks.
15
 
16
+ The model is built upon the [`Qwen/Qwen2.5-3B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) language model, with [`google/siglip-so400m-patch14-384`](https://huggingface.co/google/siglip-so400m-patch14-384) serving as the vision encoder. 
17
 
18
  # Model Summary:
19
 
 
50
 
51
  warnings.filterwarnings("ignore")
52
 
53
+ pretrained = "AI-Safeguard/Ivy-VL-llava"
54
 
55
  model_name = "llava_qwen"
56
  device = "cuda"
 
97
  # Future Plan:
98
 
99
  * We plan to release more versions of LLMs in different sizes.
100
+
101
  * We will focus on improving the performance of the video modality.
102
 
103
 
104
  # Citation:
105
 
106
+ If you find our work helpful, feel free to give us a cite.
107
+
108
  ```plaintext
109
  @misc{ivy2024ivy-vl,
110
  title={Ivy-VL:Compact Vision-Language Models Achieving SOTA with Optimal Data},
111
+ url={https://huggingface.co/AI-Safeguard/Ivy-VL-llava},
112
  author={Ivy Zhang,Jenny N,Theresa Yu and David Qiu},
113
  month={December},
114
  year={2024}