Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management
By: Sanjay Basu , Sadiq Y. Patel , Parth Sheth and more
Potential Business Impact:
Helps doctors choose best way to reach patients.
Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble that incorporates predictive uncertainty and time/effort cost. The method exposes transparent dials for neighborhood size and uncertainty/cost penalties and preserves an auditable training pipeline. Evaluated on a de-identified operational dataset, TTL+ITD achieves stable value estimates with predictable efficiency trade-offs and subgroup auditing.
Similar Papers
RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning
Machine Learning (CS)
Makes AI smarter without needing constant human help.
Test-time Offline Reinforcement Learning on Goal-related Experience
Machine Learning (CS)
Teaches robots to learn new tasks faster.
Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management
Machine Learning (CS)
Helps doctors help sick people safely and fairly.