Score: 0

Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index

Published: August 26, 2025 | arXiv ID: 2508.19093v1

By: Mathew Henrickson

Potential Business Impact:

Helps find art history by asking questions.

Business Areas:
Semantic Search Internet Services

This research presents a Retrieval-Augmented Generation (RAG) framework for art provenance studies, focusing on the Getty Provenance Index. Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects. The process is complicated by fragmented, multilingual archival data that hinders efficient retrieval. Current search portals require precise metadata, limiting exploratory searches. Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures. We assess RAG's capability to retrieve and summarize auction records using a 10,000-record sample from the Getty Provenance Index - German Sales. The results show this approach provides a scalable solution for navigating art market archives, offering a practical tool for historians and cultural heritage professionals conducting historically sensitive research.

Page Count
15 pages

Category
Computer Science:
Computation and Language