visheratin
commited on
Commit
•
260330d
1
Parent(s):
7c5d988
Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,9 @@ LLaVA-3b is a model fine-tuned from [Dolphin 2.6 Phi](https://huggingface.co/cog
|
|
21 |
[SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
|
22 |
|
23 |
1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
|
24 |
-
allows to get more info from the image into the language model.
|
25 |
-
2. The model uses the output from the latest layer of the vision encoder instead of intermediate one.
|
|
|
26 |
|
27 |
As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
|
28 |
|
@@ -111,7 +112,12 @@ output = model.generate(**inputs, max_new_tokens=200, do_sample=True, top_p=0.5,
|
|
111 |
```
|
112 |
|
113 |
## License
|
114 |
-
|
|
|
|
|
|
|
|
|
|
|
115 |
|
116 |
**Where to send questions or comments about the model:**
|
117 |
|
|
|
21 |
[SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
|
22 |
|
23 |
1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
|
24 |
+
allows us to get more info from the image into the language model.
|
25 |
+
2. The model uses the output from the latest layer of the vision encoder instead of the intermediate one.
|
26 |
+
3. The context length during training was 1200 tokens, as the L4 GPUs I used didn't allow me to get more.
|
27 |
|
28 |
As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
|
29 |
|
|
|
112 |
```
|
113 |
|
114 |
## License
|
115 |
+
|
116 |
+
This model is based on Phi-2 and is governed by Microsoft's research license, which prohibits commercial use.
|
117 |
+
|
118 |
+
## Acknowledgments
|
119 |
+
|
120 |
+
Thanks to [ML Collective](https://mlcollective.org/) for providing credits for computing resources.
|
121 |
|
122 |
**Where to send questions or comments about the model:**
|
123 |
|