|
--- |
|
pipeline_tag: image-text-to-text |
|
inference: false |
|
license: apache-2.0 |
|
--- |
|
|
|
<br> |
|
<br> |
|
|
|
# LLaVA-Reasoner Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
LLaVA-Reasoner is an open-source image vision language model, fine-tuned from GPT4-o distilled chain-of-thought (CoT) reasoning data. |
|
|
|
This model is the **SFT-preview** version. |
|
|
|
Base LLM: [Lin-Chen/open-llava-next-llama3-8b](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b) |
|
|
|
**Model date:** |
|
Trained on Sep, 2024. |
|
|
|
**Paper or resources for more information:** |
|
|
|
Paper: https://arxiv.org/abs/2410.16198 |
|
|
|
Code: https://github.com/RifleZhang/LLaVA-Reasoner-DPO/tree/main |
|
|
|
## License |
|
[Lin-Chen/open-llava-next-llama3-8b](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b) license. |
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/RifleZhang/LLaVA-Reasoner-DPO/issues |
|
|
|
## Intended use |
|
**Primary intended uses:** |
|
Image CoT reasoning |
|
|
|
**Primary intended users:** |
|
Researchers in artificial intelligence, large multimodal model, etc. |
|
|
|
## Training dataset |
|
[ShareGPT4o-Reasoning](https://huggingface.co/datasets/Share4oReasoning/sft_data) dataset. |
|
|
|
## Evaluation |
|
Follow https://github.com/RifleZhang/LLaVA-Reasoner-DPO/blob/main/README.md |
|
|
|
|
|
## citation |
|
``` |
|
@article{zhang2024improve, |
|
title={Improve vision language model chain-of-thought reasoning}, |
|
author={Zhang, Ruohong and Zhang, Bowen and Li, Yanghao and Zhang, Haotian and Sun, Zhiqing and Gan, Zhe and Yang, Yinfei and Pang, Ruoming and Yang, Yiming}, |
|
journal={arXiv preprint arXiv:2410.16198}, |
|
year={2024} |
|
} |
|
``` |