Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

1. Tsinghua University
2. National University of Singapore


Existing Multimodal Large Language Models prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. Our RLHF-V framework enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback.

  • Fine-Grained and Diverse Human Preference Data: We collect 1.4K fine-grained human feedback consisting of 3.7k pieces of segment-level corrections, covering hallucination types including objects (41.2%), positions (20.3%), numbers (16.5%), attributes (10%), actions (5.3%), and others (6.8%).
  • High Data Efficiency and Scalability: With just 1.4K annotated data, we achieve a 34.8% reduction in model hallucinations. Moreover, the decrease in hallucinations becomes more significant as more data used.
  • Enhanced Performance and Computational Efficiency: Our fine-grained correctional human feedback data can better credit the desired behavior, allowing efficient training in 1 hour on 8 A100 GPUs to achieve promising results.
  • Outstanding Trustworthiness without Compromising Helpfulness: Our model surpasses existing open-source MLLMs in reducing hallucination rates, mitigates hallucination from over-generalization, and maintains informativeness. Surprisingly, RLHF-V is even more resistant to the over-generalization problem compared with GPT-4V.


The proposed RLHF-V framework:

We collect 1.4k fine-grained dense feedback data by asking human annotators to correct the hallucinated segments in model responses. The training takes only 1 hour with 8 A100 GPUs to get RLHF-V-13B which is initialized from our RLHF-V_SFT-13B.

Illustration of the RLHF-V frmework


Low hallucination rate while being informative:

Main experimental results of RLHF-V

Data-efficient and showing good scaling results:

More resistant to over-generalization:


  author      = {Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun and Tat-Seng Chua},
  title       = {RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback},
  journal      = {arxiv},
  year         = {2023},


  • Short-form QA: RLHF-V can give a more trustworthy answer in short-form QA.
  • Long-form QA: RLHF-V can generate informative image description with less hallucinations.
  • Long-form QA: RLHF-V can provide detailed reasoning with less hallucinations.
  • Long-form QA: RLHF-V is more resistant to over-generalization.
  • Long-form QA: RLHF-V is more resistant to over-generalization.