Existing Multimodal Large Language Models prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. Our RLHF-V framework enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback.
The proposed RLHF-V framework:
We collect 1.4k fine-grained dense feedback data by asking human annotators to correct the hallucinated segments in model responses. The training takes only 1 hour with 8 A100 GPUs to get RLHF-V-13B which is initialized from our RLHF-V_SFT-13B.
Low hallucination rate while being informative:
Data-efficient and showing good scaling results:
More resistant to over-generalization:
@article{2023rlhf-v,
author = {Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun and Tat-Seng Chua},
title = {RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback},
journal = {arxiv},
year = {2023},
}