A Case for Computing on Unstructured Data
By: Mushtari Sadia, Amrita Roy Chowdhury, Ang Chen
Potential Business Impact:
Lets computers understand messy information like text and pictures.
Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.
Similar Papers
Analytical Queries for Unstructured Data
Databases
Helps computers understand videos and text better.
Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
Machine Learning (CS)
Finds best computer learning for different data.
A Unifying Framework for Robust and Efficient Inference with Unstructured Data
Econometrics
Makes computers understand messy data without bias.