From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
By: Basem Rizk , Joel Walsh , Mark Core and more
Potential Business Impact:
Makes computers understand videos better and learn.
Analysis of multi-modal content can be tricky, computationally expensive, and require a significant amount of engineering efforts. Lots of work with pre-trained models on static data is out there, yet fusing these opensource models and methods with complex data such as videos is relatively challenging. In this paper, we present a framework that enables efficiently prototyping pipelines for multi-modal content analysis. We craft a candidate recipe for a pipeline, marrying a set of pre-trained models, to convert videos into a temporal semi-structured data format. We translate this structure further to a frame-level indexed knowledge graph representation that is query-able and supports continual learning, enabling the dynamic incorporation of new domain-specific knowledge through an interactive medium.
Similar Papers
Multi-modal video data-pipelines for machine learning with minimal human supervision
CV and Pattern Recognition
Lets computers understand videos and sounds together.
Effectively obtaining acoustic, visual and textual data from videos
Multimedia
Creates new data for AI to learn from videos.
Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics
Artificial Intelligence
Spots fake pictures and sounds fast.