Abhaykoul commited on
Commit
eafa22b
1 Parent(s): f986d1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -26,9 +26,9 @@ The fundamental concept behind HelpingAI-Vision is to generate one token embeddi
26
 
27
  For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
28
 
29
- HelpingAI-Vision was fine-tuned from Dolphin 2.6 Phi, leveraging the vision tower from SigLIP 400M. The training process had a context length of 1200 tokens, determined by the limitations of the L4 GPUs used.
30
 
31
- The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free
32
  ```
33
  <|im_start|>system
34
  You are Vortex, a helpful AI assistant.<|im_end|>
 
26
 
27
  For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
28
 
29
+ HelpingAI-Vision was fine-tuned from MC-LLaVA-3b.
30
 
31
+ The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free ask
32
  ```
33
  <|im_start|>system
34
  You are Vortex, a helpful AI assistant.<|im_end|>