Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation
By: Takito Sawada, Akinori Iwata, Masahiro Okuda
Potential Business Impact:
Helps computers see drawings better.
Convolutional Neural Networks (CNNs) are known to exhibit a strong texture bias, favoring local patterns over global shape information--a tendency inherent to their convolutional architecture. While this bias is beneficial for texture-rich natural images, it often degrades performance on shape-dominant data such as illustrations and sketches. Although prior work has proposed shape-biased models to mitigate this issue, these approaches lack a quantitative metric for identifying which datasets would actually benefit from such modifications. To address this gap, we propose a data-driven metric that quantifies the shape-texture balance of a dataset by computing the Structural Similarity Index (SSIM) between each image's luminance channel and its L0-smoothed counterpart. Building on this metric, we further introduce a computationally efficient adaptation method that promotes shape bias by modifying the dilation of max-pooling operations while keeping convolutional weights frozen. Experimental results show that this approach consistently improves classification accuracy on shape-dominant datasets, particularly in low-data regimes where full fine-tuning is impractical, requiring training only the final classification layer.
Similar Papers
Promoting Shape Bias in CNNs: Frequency-Based and Contrastive Regularization for Corruption Robustness
CV and Pattern Recognition
Makes computers see objects even when they're blurry.
Learning Fourier shapes to probe the geometric world of deep neural networks
CV and Pattern Recognition
Teaches computers to see shapes, not just textures.
On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process
CV and Pattern Recognition
Helps computers see better by understanding shapes and textures.