Rakuten Data Release: A Large-Scale and Long-Term Reviews Corpus for Hotel Domain
By: Yuki Nakayama , Koki Hikichi , Yun Ching Liu and more
Potential Business Impact:
Helps hotels understand customer feedback better.
This paper presents a large-scale corpus of Rakuten Travel Reviews. Our collection contains 7.3 million customer reviews for 16 years, ranging from 2009 to 2024. Each record in the dataset contains the review text, its response from an accommodation, an anonymized reviewer ID, review date, accommodation ID, plan ID, plan title, room type, room name, purpose, accompanying group, and user ratings from different aspect categories, as well as an overall score. We present statistical information about our corpus and provide insights into factors driving data drift between 2019 and 2024 using statistical approaches.
Similar Papers
A Retail-Corpus for Aspect-Based Sentiment Analysis with Large Language Models
Computation and Language
Helps computers understand what people like about stores.
OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews
Computation and Language
Summarizes thousands of reviews into helpful highlights.
Data Augmentation for Fake Reviews Detection in Multiple Languages and Multiple Domains
Computation and Language
Finds fake online reviews better.