M-CALLM: Multi-level Context Aware LLM Framework for Group Interaction Prediction
By: Diana Romero , Xin Gao , Daniel Khalkhali and more
Potential Business Impact:
Helps computers guess what groups will do together.
This paper explores how large language models can leverage multi-level contextual information to predict group coordination patterns in collaborative mixed reality environments. We demonstrate that encoding individual behavioral profiles, group structural properties, and temporal dynamics as natural language enables LLMs to break through the performance ceiling of statistical models. We build M-CALLM, a framework that transforms multimodal sensor streams into hierarchical context for LLM-based prediction, and evaluate three paradigms (zero-shot prompting, few-shot learning, and supervised fine-tuning) against statistical baselines across intervention mode (real-time prediction) and simulation mode (autoregressive forecasting) Head-to-head comparison on 16 groups (64 participants, ~25 hours) demonstrates that context-aware LLMs achieve 96% accuracy for conversation prediction, a 3.2x improvement over LSTM baselines, while maintaining sub-35ms latency. However, simulation mode reveals brittleness with 83% degradation due to cascading errors. Deep-dive into modality-specific performance shows conversation depends on temporal patterns, proximity benefits from group structure (+6%), while shared attention fails completely (0% recall), exposing architectural limitations. We hope this work spawns new ideas for building intelligent collaborative sensing systems that balance semantic reasoning capabilities with fundamental constraints.
Similar Papers
Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights
Robotics
Helps robots understand what people will do.
Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality
Human-Computer Interaction
Helps AR/VR assistants understand what you're doing.
Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions
CV and Pattern Recognition
Teaches AI to spot lies in videos.