--- license: apache-2.0 datasets: - openbmb/RLAIF-V-Dataset language: - en --- # Model Card for RLAIF-V [GitHub ](https://github.com/RLHF-V/RLAIF-V) **RLAIF-V-12B** is a model exhibits super GPT-4V trustworthiness. The model is built on the SFT version of OmniLMM-12B, which is one of the first version of MiniCPM-V series. We utilize a novel framework, **RLAIF-V**, which **aligns MLLMs in a fully open-source paradigm**. This alignment framework maximally exploits the open-source feedback from two key perspectives, including **high-quality feedback data** and an **online feedback learning algorithm**. ## Model Details ### Key Features * 🏅 **Super GPT-4V Trustworthiness via Open-source Feedback**. By learning from open-source AI feedback, RLAIF-V 12B achieves super GPT-4V trustworthiness in both generative and discriminative tasks. * 💪 **Maintaining Well Performance on General Abilities**: On benchmarks tested with the general abilities (e.g. LLaVABench, MMStar), RLAIF-V-12B also exhibits good performance.

fig1

### Examples

fig2-1 fig2-1

### Model Description - **Related model:** [OmniLMM-12B](https://huggingface.co/openbmb/OmniLMM-12B) - **Trained on data:** [RLAIF-V-Dataset](https://huggingface.co/datasets/HaoyeZhang/RLAIF-V-Dataset)