Score: 0

Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Published: March 19, 2025 | arXiv ID: 2503.14973v2

By: Rishav Rishav , Somjit Nath , Vincent Michalski and more

Potential Business Impact:

Shows why robots make choices, not just one.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Building trust in reinforcement learning (RL) agents requires understanding why they make certain decisions, especially in high-stakes applications like robotics, healthcare, and finance. Existing explainability methods often focus on single states or entire trajectories, either providing only local, step-wise insights or attributing decisions to coarse, episodelevel summaries. Both approaches miss the recurring strategies and temporally extended patterns that actually drive agent behavior across multiple decisions. We address this gap by proposing a fully offline, reward-free framework for behavior discovery and segmentation, enabling the attribution of actions to meaningful and interpretable behavior segments that capture recurring patterns appearing across multiple trajectories. Our method identifies coherent behavior clusters from state-action sequences and attributes individual actions to these clusters for fine-grained, behavior-centric explanations. Evaluations on four diverse offline RL environments show that our approach discovers meaningful behaviors and outperforms trajectory-level baselines in fidelity, human preference, and cluster coherence. Our code is publicly available.

Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis

Machine Learning (CS)

Explains why a computer chose a specific long-term plan.

7 Dec 2025 1

88%

Interpretable Learning Dynamics in Unsupervised Reinforcement Learning

Machine Learning (CS)

Helps robots learn faster by watching what's interesting.

6 May 2025 0

88%

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles

Artificial Intelligence

Helps self-driving cars explain their decisions.

18 Mar 2025 0

View PDF Login to Bookmark

Page Count

23 pages

Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Shows why robots make choices, not just one.

Technical Abstract

Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis

Interpretable Learning Dynamics in Unsupervised Reinforcement Learning

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles