Score: 0

ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues

Published: January 7, 2026 | arXiv ID: 2601.04297v1

By: Behrad Binaei-Haghighi , Nafiseh Sadat Sajadi , Mehrad Liviyan and more

Potential Business Impact:

Drawings reveal feelings and thoughts automatically.

Business Areas:

Image Recognition Data and Analytics, Software

The objective assessment of human affective and psychological states presents a significant challenge, particularly through non-verbal channels. This paper introduces digital drawing as a rich and underexplored modality for affective sensing. We present a novel multimodal framework, named ArtCognition, for the automated analysis of the House-Tree-Person (HTP) test, a widely used psychological instrument. ArtCognition uniquely fuses two distinct data streams: static visual features from the final artwork, captured by computer vision models, and dynamic behavioral kinematic cues derived from the drawing process itself, such as stroke speed, pauses, and smoothness. To bridge the gap between low-level features and high-level psychological interpretation, we employ a Retrieval-Augmented Generation (RAG) architecture. This grounds the analysis in established psychological knowledge, enhancing explainability and reducing the potential for model hallucination. Our results demonstrate that the fusion of visual and behavioral kinematic cues provides a more nuanced assessment than either modality alone. We show significant correlations between the extracted multimodal features and standardized psychological metrics, validating the framework's potential as a scalable tool to support clinicians. This work contributes a new methodology for non-intrusive affective state assessment and opens new avenues for technology-assisted mental healthcare.

Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes

Human-Computer Interaction

AI learns to feel emotions from pictures and words.

24 Mar 2025 1

88%

Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems

Machine Learning (CS)

Helps computers understand feelings from faces, voices, words.

2 Dec 2025 1

88%

Context-aware Multimodal AI Reveals Hidden Pathways in Five Centuries of Art Evolution

CV and Pattern Recognition

AI finds art's meaning from pictures and history.

15 Mar 2025 0

View PDF Login to Bookmark

Page Count

12 pages

ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues

Drawings reveal feelings and thoughts automatically.

Technical Abstract

Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes

Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems

Context-aware Multimodal AI Reveals Hidden Pathways in Five Centuries of Art Evolution