Score: 1

Aligning Multimodal LLM with Human Preference: A Survey

Published: March 18, 2025 | arXiv ID: 2503.14504v2

By: Tao Yu , Yi-Fan Zhang , Chaoyou Fu and more

Potential Business Impact:

Makes AI understand pictures and sounds better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications

Computation and Language

Teaches AI to be helpful and kind, your way.

21 Mar 2025 0

92%

Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

Machine Learning (CS)

Helps computers understand pictures and words together.

12 Jun 2025 2

91%

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Computation and Language

Makes AI understand what you like best.

9 Apr 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

16 pages

Aligning Multimodal LLM with Human Preference: A Survey

Makes AI understand pictures and sounds better.

Technical Abstract

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications

Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models