Score: 0

Beyond Appearance: Transformer-based Person Identification from Conversational Dynamics

Published: October 6, 2025 | arXiv ID: 2510.04753v1

By: Masoumeh Chapariniya , Teodora Vukovic , Sarah Ebling and more

Potential Business Impact:

Identifies people by how they move and stand.

Business Areas:
Image Recognition Data and Analytics, Software

This paper investigates the performance of transformer-based architectures for person identification in natural, face-to-face conversation scenario. We implement and evaluate a two-stream framework that separately models spatial configurations and temporal motion patterns of 133 COCO WholeBody keypoints, extracted from a subset of the CANDOR conversational corpus. Our experiments compare pre-trained and from-scratch training, investigate the use of velocity features, and introduce a multi-scale temporal transformer for hierarchical motion modeling. Results demonstrate that domain-specific training significantly outperforms transfer learning, and that spatial configurations carry more discriminative information than temporal dynamics. The spatial transformer achieves 95.74% accuracy, while the multi-scale temporal transformer achieves 93.90%. Feature-level fusion pushes performance to 98.03%, confirming that postural and dynamic information are complementary. These findings highlight the potential of transformer architectures for person identification in natural interactions and provide insights for future multimodal and cross-cultural studies.

Country of Origin
🇨🇭 Switzerland

Page Count
6 pages

Category
Computer Science:
CV and Pattern Recognition