ChatStitch: Visualizing Through Structures via Surround-View Unsupervised Deep Image Stitching with Collaborative LLM-Agents
By: Hao Liang , Zhipeng Dong , Kaixin Chen and more
Potential Business Impact:
Lets cars see hidden spots by talking.
Surround-view perception has garnered significant attention for its ability to enhance the perception capabilities of autonomous driving vehicles through the exchange of information with surrounding cameras. However, existing surround-view perception systems are limited by inefficiencies in unidirectional interaction pattern with human and distortions in overlapping regions exponentially propagating into non-overlapping areas. To address these challenges, this paper introduces ChatStitch, a surround-view human-machine co-perception system capable of unveiling obscured blind spot information through natural language commands integrated with external digital assets. To dismantle the unidirectional interaction bottleneck, ChatStitch implements a cognitively grounded closed-loop interaction multi-agent framework based on Large Language Models. To suppress distortion propagation across overlapping boundaries, ChatStitch proposes SV-UDIS, a surround-view unsupervised deep image stitching method under the non-global-overlapping condition. We conducted extensive experiments on the UDIS-D, MCOV-SLAM open datasets, and our real-world dataset. Specifically, our SV-UDIS method achieves state-of-the-art performance on the UDIS-D dataset for 3, 4, and 5 image stitching tasks, with PSNR improvements of 9\%, 17\%, and 21\%, and SSIM improvements of 8\%, 18\%, and 26\%, respectively.
Similar Papers
Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding
CV and Pattern Recognition
Teaches computers to describe object locations correctly.
Taking Language Embedded 3D Gaussian Splatting into the Wild
Graphics
Lets computers understand building styles from photos.
StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps
CV and Pattern Recognition
Fixes shaky videos when stitching them together.