Score: 0

A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations

Published: December 7, 2025 | arXiv ID: 2512.06708v1

By: Waleed Razzaq, Yun-Bo Zhao

Potential Business Impact:

Predicts when machine parts will break.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.

Country of Origin
🇨🇳 China

Page Count
25 pages

Category
Computer Science:
Machine Learning (CS)