A Feasibility Experiment on the Application of Predictive Coding to Instant Messaging Corpora
By: Thanasis Schoinas, Ghulam Qadir
Potential Business Impact:
Helps lawyers sort instant messages faster and cheaper.
Predictive coding, the term used in the legal industry for document classification using machine learning, presents additional challenges when the dataset comprises instant messages, due to their informal nature and smaller sizes. In this paper, we exploit a data management workflow to group messages into day chats, followed by feature selection and a logistic regression classifier to provide an economically feasible predictive coding solution. We also improve the solution's baseline model performance by dimensionality reduction, with focus on quantitative features. We test our methodology on an Instant Bloomberg dataset, rich in quantitative information. In parallel, we provide an example of the cost savings of our approach.
Similar Papers
Applying NLP to iMessages: Understanding Topic Avoidance, Responsiveness, and Sentiment
Computation and Language
Analyzes your messages to understand how you talk.
Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
Machine Learning (CS)
Predicts patient codes early from doctor notes.
Multi-step Predictive Coding Leads To Simplicity Bias
Machine Learning (CS)
Deep AI learns world's hidden rules.