lmms-lab
/

llava-onevision-qwen2-7b-ov-chat

Text Generation

Inference Endpoints

Model card Files Files and versions Community

txiong23 commited on Oct 9, 2024

Commit

41240ab

·

verified ·

1 Parent(s): 2f1f5be

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -116,6 +116,7 @@ print(text_outputs)
 - **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
 - **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
 - **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
 - **Precision:** bfloat16
 ## Hardware & Software
@@ -130,4 +131,14 @@ print(text_outputs)
 @article{li2024llavaonevision,
       title={LLaVA-OneVision},
 }
 ```

 - **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
 - **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
 - **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
+- **Critic / Preference Learning Stage:** 9.4k question-image input from [LLaVA-RLHF](https://llava-rlhf.github.io/) with self-generated responses, reward signal from [llava-critic-7b(https://huggingface.co/lmms-lab/llava-critic-7b), iterative DPO for 3 epoches, full model
 - **Precision:** bfloat16
 ## Hardware & Software
 @article{li2024llavaonevision,
       title={LLaVA-OneVision},
 }
+@article{xiong2024llavacritic,
+  title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
+  author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
+  year={2024},
+  eprint={2410.02712},
+  archivePrefix={arXiv},
+  primaryClass={cs.CV},
+  url={https://arxiv.org/abs/2410.02712},
+}
 ```