Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics
By: Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz
Potential Business Impact:
Finds best Wikipedia articles across languages.
This study presents a comparative analysis of 55 Wikipedia language editions employing a citation index alongside a synthetic quality measure. Specifically, we identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version. This index was built on the basis of wikilinks between Wikipedia articles in each language version and in order to do that we processed 6.6 billion page-to-page link records. Next, we used a quality score for each Wikipedia article - a synthetic measure scaled from 0 to 100. This approach enabled quality comparison of Wikipedia articles even between language versions with different quality grading schemes. Our results highlight disparities among Wikipedia language editions, revealing strengths and gaps in content coverage and quality across topics.
Similar Papers
Factual Inconsistencies in Multilingual Wikipedia Tables
Computation and Language
Fixes Wikipedia facts across languages.
MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages
Computation and Language
Helps computers understand text in many languages.
MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages
Computation and Language
Helps computers understand Wikipedia in many languages.