Towards a Multimodal Stream Processing System
By: Uélison Jean Lopes dos Santos , Alessandro Ferri , Szilard Nistor and more
Potential Business Impact:
Lets computers understand and react to many things at once.
In this paper, we present a vision for a new generation of multimodal streaming systems that embed MLLMs as first-class operators, enabling real-time query processing across multiple modalities. Achieving this is non-trivial: while recent work has integrated MLLMs into databases for multimodal queries, streaming systems require fundamentally different approaches due to their strict latency and throughput requirements. Our approach proposes novel optimizations at all levels, including logical, physical, and semantic query transformations that reduce model load to improve throughput while preserving accuracy. We demonstrate this with Samsara, a prototype leveraging such optimizations to improve performance by more than an order of magnitude. Moreover, we discuss a research roadmap that outlines open research challenges for building a scalable and efficient multimodal stream processing systems.
Similar Papers
Towards a Multimodal Stream Processing System
Databases
Lets computers understand many things at once, fast.
Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization
Databases
Lets computers understand and answer questions from any data.
Research Challenges in Relational Database Management Systems for LLM Queries
Databases
Makes computer databases understand and use smart language.