Score: 2

Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions

Published: August 5, 2025 | arXiv ID: 2508.03156v1

By: Paul Gümmer , Julian Rosenberger , Mathias Kraus and more

Potential Business Impact:

Finds house prices more accurately by grouping similar homes.

House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.

Country of Origin
🇩🇪 🇦🇹 Germany, Austria

Repos / Data Links

Page Count
15 pages

Category
Computer Science:
Machine Learning (CS)