Score: 0

An Automated Grey Literature Extraction Tool for Software Engineering

Published: December 28, 2025 | arXiv ID: 2512.23066v1

By: Houcine Abdelkader Cherief , Brahim Mahmoudi , Zacharie Chenail-Larcher and more

Potential Business Impact:

Finds hidden software secrets for better research.

Business Areas:
Reading Apps Apps, Software

Grey literature is essential to software engineering research as it captures practices and decisions that rarely appear in academic venues. However, collecting and assessing it at scale remains difficult because of their heterogeneous sources, formats, and APIs that impede reproducible, large-scale synthesis. To address this issue, we present GLiSE, a prompt-driven tool that turns a research topic prompt into platform-specific queries, gathers results from common software-engineering web sources (GitHub, Stack Overflow) and Google Search, and uses embedding-based semantic classifiers to filter and rank results according to their relevance. GLiSE is designed for reproducibility with all settings being configuration-based, and every generated query being accessible. In this paper, (i) we present the GLiSE tool, (ii) provide a curated dataset of software engineering grey-literature search results classified by semantic relevance to their originating search intent, and (iii) conduct an empirical study on the usability of our tool.

Page Count
8 pages

Category
Computer Science:
Software Engineering