Score: 0

Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation

Published: January 9, 2026 | arXiv ID: 2601.05599v1

By: Takito Sawada, Akinori Iwata, Masahiro Okuda

Potential Business Impact:

Helps computers see drawings better.

Business Areas:
Image Recognition Data and Analytics, Software

Convolutional Neural Networks (CNNs) are known to exhibit a strong texture bias, favoring local patterns over global shape information--a tendency inherent to their convolutional architecture. While this bias is beneficial for texture-rich natural images, it often degrades performance on shape-dominant data such as illustrations and sketches. Although prior work has proposed shape-biased models to mitigate this issue, these approaches lack a quantitative metric for identifying which datasets would actually benefit from such modifications. To address this gap, we propose a data-driven metric that quantifies the shape-texture balance of a dataset by computing the Structural Similarity Index (SSIM) between each image's luminance channel and its L0-smoothed counterpart. Building on this metric, we further introduce a computationally efficient adaptation method that promotes shape bias by modifying the dilation of max-pooling operations while keeping convolutional weights frozen. Experimental results show that this approach consistently improves classification accuracy on shape-dominant datasets, particularly in low-data regimes where full fine-tuning is impractical, requiring training only the final classification layer.

Page Count
4 pages

Category
Computer Science:
CV and Pattern Recognition