Score: 2

SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations

Published: October 27, 2025 | arXiv ID: 2510.23182v1

By: Shuai Huang, Wenxuan Zhao, Jun Gao

Potential Business Impact:

Tests how well AI understands people talking.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As large language models (LLMs) develop anthropomorphic abilities, they are increasingly being deployed as autonomous agents to interact with humans. However, evaluating their performance in realistic and complex social interactions remains a significant challenge. Most previous research built datasets through simulated agent-to-agent interactions, which fails to capture the authentic linguistic styles and relational dynamics found in real human conversations. To address this gap, we introduce SI-Bench, a novel benchmark designed to evaluate aspects of social intelligence in LLMs. Grounded in broad social science theories, SI-Bench contains 2,221 authentic multi-turn dialogues collected from a social networking application. We further selected a subset of 312 dialogues for manual annotation across 8 major models. The experiments show that SOTA models have surpassed the human expert in process reasoning under complex social situations, yet they still fall behind humans in reply quality. Moreover, introducing Chain-of-Thought (CoT) reasoning may degrade the performance of LLMs in social dialogue tasks. All datasets are openly available at https://github.com/SI-Bench/SI-Bench.git.

SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models

Social and Information Networks

Helps computers understand how people think.

13 Oct 2025 1

91%

SocialEval: Evaluating Social Intelligence of Large Language Models

Computation and Language

Helps computers understand and act like people.

1 Jun 2025 1

91%

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Computation and Language

Tests if AI acts like real people.

20 Oct 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

17 pages

SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations

Tests how well AI understands people talking.

Technical Abstract

SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models

SocialEval: Evaluating Social Intelligence of Large Language Models

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors