Update README.md
Browse files
README.md
CHANGED
@@ -116,6 +116,7 @@ print(text_outputs)
|
|
116 |
- **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
|
117 |
- **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
|
118 |
- **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
|
|
|
119 |
- **Precision:** bfloat16
|
120 |
|
121 |
## Hardware & Software
|
@@ -130,4 +131,14 @@ print(text_outputs)
|
|
130 |
@article{li2024llavaonevision,
|
131 |
title={LLaVA-OneVision},
|
132 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
```
|
|
|
116 |
- **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
|
117 |
- **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
|
118 |
- **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
|
119 |
+
- **Critic / Preference Learning Stage:** 9.4k question-image input from [LLaVA-RLHF](https://llava-rlhf.github.io/) with self-generated responses, reward signal from [llava-critic-7b(https://huggingface.co/lmms-lab/llava-critic-7b), iterative DPO for 3 epoches, full model
|
120 |
- **Precision:** bfloat16
|
121 |
|
122 |
## Hardware & Software
|
|
|
131 |
@article{li2024llavaonevision,
|
132 |
title={LLaVA-OneVision},
|
133 |
}
|
134 |
+
|
135 |
+
@article{xiong2024llavacritic,
|
136 |
+
title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
|
137 |
+
author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
|
138 |
+
year={2024},
|
139 |
+
eprint={2410.02712},
|
140 |
+
archivePrefix={arXiv},
|
141 |
+
primaryClass={cs.CV},
|
142 |
+
url={https://arxiv.org/abs/2410.02712},
|
143 |
+
}
|
144 |
```
|