Text Generation
Transformers
Safetensors
English
omnilmm
conversational
Inference Endpoints
RLAIF-V-12B / README.md
HaoyeZhang's picture
Update README.md
2235170 verified
|
raw
history blame
1.85 kB
metadata
license: apache-2.0
datasets:
  - openbmb/RLAIF-V-Dataset
language:
  - en

Model Card for RLAIF-V

GitHub

RLAIF-V-12B is a model exhibits super GPT-4V trustworthiness. The model is built on the SFT version of OmniLMM-12B, which is one of the first version of MiniCPM-V series.

We utilize a novel framework, RLAIF-V, which aligns MLLMs in a fully open-source paradigm. This alignment framework maximally exploits the open-source feedback from two key perspectives, including high-quality feedback data and an online feedback learning algorithm.

Model Details

Evaluation

  • 🏅 Super GPT-4V Trustworthiness via Open-source Feedback. By learning from open-source AI feedback, RLAIF-V 12B achieves super GPT-4V trustworthiness in both generative and discriminative tasks.
  • 💪 Maintaining Well Performance on General Abilities: On benchmarks tested with the general abilities (e.g. LLaVABench, MMStar), RLAIF-V-12B also performs well.

fig1

Examples

fig2-1 fig2-1

Model Description