Score: 0

A two-stream network with global-local feature fusion for bone age assessment

Published: December 20, 2025 | arXiv ID: 2512.18331v1

By: Qiong Lou, Han Yang, Fang Lu

Bone Age Assessment (BAA) is a widely used clinical technique that can accurately reflect an individual's growth and development level, as well as maturity. In recent years, although deep learning has advanced the field of bone age assessment, existing methods face challenges in efficiently balancing global features and local skeletal details. This study aims to develop an automated bone age assessment system based on a two-stream deep learning architecture to achieve higher accuracy in bone age assessment. We propose the BoNet+ model incorporating global and local feature extraction channels. A Transformer module is introduced into the global feature extraction channel to enhance the ability in extracting global features through multi-head self-attention mechanism. A RFAConv module is incorporated into the local feature extraction channel to generate adaptive attention maps within multiscale receptive fields, enhancing local feature extraction capabilities. Global and local features are concatenated along the channel dimension and optimized by an Inception-V3 network. The proposed method has been validated on the Radiological Society of North America (RSNA) and Radiological Hand Pose Estimation (RHPE) test datasets, achieving mean absolute errors (MAEs) of 3.81 and 5.65 months, respectively. These results are comparable to the state-of-the-art. The BoNet+ model reduces the clinical workload and achieves automatic, high-precision, and more objective bone age assessment.

Category
Computer Science:
CV and Pattern Recognition