Score: 0

DataOps-driven CI/CD for analytics repositories

Published: November 15, 2025 | arXiv ID: 2511.12277v1

By: Dmytro Valiaiev

Potential Business Impact:

Makes data analysis more reliable and easier to check.

Business Areas:
Data Integration Data and Analytics, Information Technology, Software

The proliferation of SQL for data processing has often occurred without the rigor of traditional software development, leading to siloed efforts, logic replication, and increased risk. This ad-hoc approach hampers data governance and makes validation nearly impossible. Organizations are adopting DataOps, a methodology combining Agile, Lean, and DevOps principles to address these challenges to treat analytics pipelines as production systems. However, a standardized framework for implementing DataOps is lacking. This perspective proposes a qualitative design for a DataOps-aligned validation framework. It introduces a DataOps Controls Scorecard, derived from a multivocal literature review, which distills key concepts into twelve testable controls. These controls are then mapped to a modular, extensible CI/CD pipeline framework designed to govern a single source of truth (SOT) SQL repository. The framework consists of five stages: Lint, Optimize, Parse, Validate, and Observe, each containing specific, automated checks. A Requirements Traceability Matrix (RTM) demonstrates how each high-level control is enforced by concrete pipeline checks, ensuring qualitative completeness. This approach provides a structured mechanism for enhancing data quality, governance, and collaboration, allowing teams to scale analytics development with transparency and control.

Country of Origin
🇺🇸 United States

Page Count
12 pages

Category
Electrical Engineering and Systems Science:
Systems and Control