Update README.md
Browse files
README.md
CHANGED
@@ -26,9 +26,9 @@ The fundamental concept behind HelpingAI-Vision is to generate one token embeddi
|
|
26 |
|
27 |
For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
|
28 |
|
29 |
-
HelpingAI-Vision was fine-tuned from
|
30 |
|
31 |
-
The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free
|
32 |
```
|
33 |
<|im_start|>system
|
34 |
You are Vortex, a helpful AI assistant.<|im_end|>
|
|
|
26 |
|
27 |
For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
|
28 |
|
29 |
+
HelpingAI-Vision was fine-tuned from MC-LLaVA-3b.
|
30 |
|
31 |
+
The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free ask
|
32 |
```
|
33 |
<|im_start|>system
|
34 |
You are Vortex, a helpful AI assistant.<|im_end|>
|