Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment
By: Felix Jahn , Yannic Muskalla , Lisa Dargasz and more
Potential Business Impact:
Keeps AI from doing bad things, even if smart.
As AI agents become increasingly autonomous, widely deployed in consequential contexts, and efficacious in bringing about real-world impacts, ensuring that their decisions are not only instrumentally effective but also normatively aligned has become critical. We introduce a neuro-symbolic reason-based containment architecture, Governor for Reason-Aligned ContainmEnt (GRACE), that decouples normative reasoning from instrumental decision-making and can contain AI agents of virtually any design. GRACE restructures decision-making into three modules: a Moral Module (MM) that determines permissible macro actions via deontic logic-based reasoning; a Decision-Making Module (DMM) that encapsulates the target agent while selecting instrumentally optimal primitive actions in accordance with derived macro actions; and a Guard that monitors and enforces moral compliance. The MM uses a reason-based formalism providing a semantic foundation for deontic logic, enabling interpretability, contestability, and justifiability. Its symbolic representation enriches the DMM's informational context and supports formal verification and statistical guarantees of alignment enforced by the Guard. We demonstrate GRACE on an example of a LLM therapy assistant, showing how it enables stakeholders to understand, contest, and refine agent behavior.
Similar Papers
Transparent, Evaluable, and Accessible Data Agents: A Proof-of-Concept Framework
Artificial Intelligence
Lets anyone ask computers about business data.
Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics
Artificial Intelligence
AI agents learn to fix complex machines using logic.
Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics
Artificial Intelligence
AI agents learn to avoid impossible mistakes.