Score: 0

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Published: October 27, 2025 | arXiv ID: 2510.23875v1

By: Eswari Jayakumar, Niladri Sekhar Dash, Debasmita Mukherjee

Potential Business Impact:

Tests how well AI acts like a person.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

While Large Language Model (LLM)-based agents can be used to create highly engaging interactive applications through prompting personality traits and contextual data, effectively assessing their personalities has proven challenging. This novel interdisciplinary approach addresses this gap by combining agent development and linguistic analysis to assess the prompted personality of LLM-based agents in a poetry explanation task. We developed a novel, flexible question bank, informed by linguistic assessment criteria and human cognitive learning levels, offering a more comprehensive evaluation than current methods. By evaluating agent responses with natural language processing models, other LLMs, and human experts, our findings illustrate the limitations of purely deep learning solutions and emphasize the critical role of interdisciplinary design in agent development.

Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Traits?

Human-Computer Interaction

Makes computer characters act like real people.

27 Aug 2025 0

92%

Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications

Computation and Language

Helps computers judge writing better than people.

1 Apr 2025 0

92%

Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli

Artificial Intelligence

AI understands feelings like people do.

19 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Page Count

10 pages

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Tests how well AI acts like a person.

Technical Abstract

Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Traits?

Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications

Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli