metadata

license: apache-2.0
datasets:
  - openbmb/RLAIF-V-Dataset
language:
  - en

Model Card for RLAIF-V

GitHub

RLAIF-V-12B is a model exhibits super GPT-4V trustworthiness. The model is built on the SFT version of OmniLMM-12B, which is one of the first version of MiniCPM-V series.

We utilize a novel framework, RLAIF-V, which aligns MLLMs in a fully open-source paradigm. This alignment framework maximally exploits the open-source feedback from two key perspectives, including high-quality feedback data and an online feedback learning algorithm.

Model Details

Evaluation

🏅 Super GPT-4V Trustworthiness via Open-source Feedback. By learning from open-source AI feedback, RLAIF-V 12B achieves super GPT-4V trustworthiness in both generative and discriminative tasks.
💪 Maintaining Well Performance on General Abilities: On benchmarks tested with the general abilities (e.g. LLaVABench, MMStar), RLAIF-V-12B also performs well.

fig1

Examples

fig2-1 fig2-1

Model Description

Related model: OmniLMM-12B
Trained on data: RLAIF-V-Dataset