Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models
By: Bocheng Chen , Han Zi , Xi Chen and more
Potential Business Impact:
Teaches computers to spot and fix bad ideas.
Moral sensitivity is fundamental to human moral competence, as it guides individuals in regulating everyday behavior. Although many approaches seek to align large language models (LLMs) with human moral values, how to enable them morally sensitive has been extremely challenging. In this paper, we take a step toward answering the question: how can we enhance moral sensitivity in LLMs? Specifically, we propose two pragmatic inference methods that faciliate LLMs to diagnose morally benign and hazardous input and correct moral errors, whereby enhancing LLMs' moral sensitivity. A central strength of our pragmatic inference methods is their unified perspective: instead of modeling moral discourses across semantically diverse and complex surface forms, they offer a principled perspective for designing pragmatic inference procedures grounded in their inferential loads. Empirical evidence demonstrates that our pragmatic methods can enhance moral sensitivity in LLMs and achieves strong performance on representative morality-relevant benchmarks.
Similar Papers
Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
Computation and Language
Teaches AI to make fair choices, but it's hard.
Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization
Computation and Language
Teaches AI to make fair choices, but it's hard.
Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making
Computers and Society
Teaches computers to make fair choices.