LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
By: Ho Yin 'Sam' Ng , Ting-Yao Hsu , Aashish Anantha Ramakrishnan and more
Potential Business Impact:
Helps AI write figure captions like authors.
Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite language models' personalization (LaMP) advances, these technologies often focus on text-only settings and rarely address scenarios where both inputs and profiles are multimodal. This paper introduces LaMP-Cap, a dataset for personalized figure caption generation with multimodal figure profiles. For each target figure, LaMP-Cap provides not only the needed inputs, such as figure images, but also up to three other figures from the same document--each with its image, caption, and figure-mentioning paragraphs--as a profile to characterize the context. Experiments with four LLMs show that using profile information consistently helps generate captions closer to the original author-written ones. Ablation studies reveal that images in the profile are more helpful than figure-mentioning paragraphs, highlighting the advantage of using multimodal profiles over text-only ones.
Similar Papers
Multi-LLM Collaborative Caption Generation in Scientific Documents
Computation and Language
Makes computer pictures tell better stories.
Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge
Computation and Language
Writes better picture descriptions for science papers.
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023
Computation and Language
Lets computers write better science picture captions.