RLHF-V

Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback


1. Tsinghua University
2. National University of Singapore
*Correspondence


Abstract

Existing Multimodal Large Language Models prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. Our RLHF-V framework enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback.

  • Fine-Grained and Diverse Human Preference Data: We collect 1.4K fine-grained human feedback consisting of 3.7k pieces of segment-level corrections, covering hallucination types including objects (41.2%), positions (20.3%), numbers (16.5%), attributes (10%), actions (5.3%), and others (6.8%).
  • High Data Efficiency and Scalability: With just 1.4K annotated data, we achieve a 34.8% reduction in model hallucinations. Moreover, the decrease in hallucinations becomes more significant as more data used.
  • Enhanced Performance and Computational Efficiency: Our fine-grained correctional human feedback data can better credit the desired behavior, allowing efficient training in 1 hour on 8 A100 GPUs to achieve promising results.
  • Outstanding Trustworthiness without Compromising Helpfulness: Our model surpasses existing open-source MLLMs in reducing hallucination rates, mitigates hallucination from over-generalization, and maintains informativeness. Surprisingly, RLHF-V is even more resistant to the over-generalization problem compared with GPT-4V.




Method

The proposed RLHF-V framework:

We collect 1.4k fine-grained dense feedback data by asking human annotators to correct the hallucinated segments in model responses. The training takes only 1 hour with 8 A100 GPUs to get RLHF-V-13B which is initialized from our RLHF-V_SFT-13B.

Illustration of the RLHF-V frmework



Highlights


Low hallucination rate while being informative:

Main experimental results of RLHF-V

Data-efficient and showing good scaling results:



More resistant to over-generalization:



BibTeX


@article{2023rlhf-v,
  author      = {Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun and Tat-Seng Chua},
  title       = {RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback},
  journal      = {arxiv},
  year         = {2023},
}

Examples

  • Short-form QA: RLHF-V can give a more trustworthy answer in short-form QA.
  • Long-form QA: RLHF-V can generate informative image description with less hallucinations.
  • Long-form QA: RLHF-V can provide detailed reasoning with less hallucinations.
  • Long-form QA: RLHF-V is more resistant to over-generalization.
  • Long-form QA: RLHF-V is more resistant to over-generalization.