CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation
By: Ibrahim Ethem Hamamci , Sezgin Er , Suprosanna Shit and more
Potential Business Impact:
Helps AI understand medical scans better.
Evaluating long-context radiology report generation is challenging. NLG metrics fail to capture clinical correctness, while LLM-based metrics often lack generalizability. Clinical accuracy metrics are more relevant but are sensitive to class imbalance, frequently favoring trivial predictions. We propose the CRG Score, a distribution-aware and adaptable metric that evaluates only clinically relevant abnormalities explicitly described in reference reports. CRG supports both binary and structured labels (e.g., type, location) and can be paired with any LLM for feature extraction. By balancing penalties based on label distribution, it enables fairer, more robust evaluation and serves as a clinically aligned reward function.
Similar Papers
CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation
Computation and Language
Tests AI reports for doctor accuracy.
MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation
Computation and Language
Makes AI write correct medical reports from scans.
S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework
CV and Pattern Recognition
Makes doctor reports clear and complete.