Score: 2

Culture Cartography: Mapping the Landscape of Cultural Knowledge

Published: October 31, 2025 | arXiv ID: 2510.27672v1

By: Caleb Ziems , William Held , Jane Yu and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Teaches computers about different cultures better.

Business Areas:
Mapping Services Navigation and Mapping

To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein. This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits. We implement this methodology as a tool called CultureExplorer. Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o are missing, even with web search. Fine-tuning on this data boosts the accuracy of Llama-3.1-8B by up to 19.2% on related culture benchmarks.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Repos / Data Links

Page Count
19 pages

Category
Computer Science:
Computation and Language