Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
By: Eswari Jayakumar, Niladri Sekhar Dash, Debasmita Mukherjee
Potential Business Impact:
Tests how well AI acts like a person.
While Large Language Model (LLM)-based agents can be used to create highly engaging interactive applications through prompting personality traits and contextual data, effectively assessing their personalities has proven challenging. This novel interdisciplinary approach addresses this gap by combining agent development and linguistic analysis to assess the prompted personality of LLM-based agents in a poetry explanation task. We developed a novel, flexible question bank, informed by linguistic assessment criteria and human cognitive learning levels, offering a more comprehensive evaluation than current methods. By evaluating agent responses with natural language processing models, other LLMs, and human experts, our findings illustrate the limitations of purely deep learning solutions and emphasize the critical role of interdisciplinary design in agent development.
Similar Papers
Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Traits?
Human-Computer Interaction
Makes computer characters act like real people.
Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications
Computation and Language
Helps computers judge writing better than people.
Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
Artificial Intelligence
AI understands feelings like people do.