Score: 0

M-CALLM: Multi-level Context Aware LLM Framework for Group Interaction Prediction

Published: November 18, 2025 | arXiv ID: 2511.14661v1

By: Diana Romero , Xin Gao , Daniel Khalkhali and more

Potential Business Impact:

Helps computers guess what groups will do together.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper explores how large language models can leverage multi-level contextual information to predict group coordination patterns in collaborative mixed reality environments. We demonstrate that encoding individual behavioral profiles, group structural properties, and temporal dynamics as natural language enables LLMs to break through the performance ceiling of statistical models. We build M-CALLM, a framework that transforms multimodal sensor streams into hierarchical context for LLM-based prediction, and evaluate three paradigms (zero-shot prompting, few-shot learning, and supervised fine-tuning) against statistical baselines across intervention mode (real-time prediction) and simulation mode (autoregressive forecasting) Head-to-head comparison on 16 groups (64 participants, ~25 hours) demonstrates that context-aware LLMs achieve 96% accuracy for conversation prediction, a 3.2x improvement over LSTM baselines, while maintaining sub-35ms latency. However, simulation mode reveals brittleness with 83% degradation due to cascading errors. Deep-dive into modality-specific performance shows conversation depends on temporal patterns, proximity benefits from group structure (+6%), while shared attention fails completely (0% recall), exposing architectural limitations. We hope this work spawns new ideas for building intelligent collaborative sensing systems that balance semantic reasoning capabilities with fundamental constraints.

Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights

Robotics

Helps robots understand what people will do.

1 Apr 2025 0

89%

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Human-Computer Interaction

Helps AR/VR assistants understand what you're doing.

1 Nov 2025 0

89%

Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions

CV and Pattern Recognition

Teaches AI to spot lies in videos.

20 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

14 pages

M-CALLM: Multi-level Context Aware LLM Framework for Group Interaction Prediction

Helps computers guess what groups will do together.

Technical Abstract

Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions