Score: 0

A Review and Analysis of a Parallel Approach for Decision Tree Learning from Large Data Streams

Published: May 17, 2025 | arXiv ID: 2505.11780v1

By: Zeinab Shiralizadeh

Potential Business Impact:

Teaches computers to learn from fast-moving information.

Business Areas:

Big Data Data and Analytics

This work studies one of the parallel decision tree learning algorithms, pdsCART, designed for scalable and efficient data analysis. The method incorporates three core capabilities. First, it supports real-time learning from data streams, allowing trees to be constructed incrementally. Second, it enables parallel processing of high-volume streaming data, making it well-suited for large-scale applications. Third, the algorithm integrates seamlessly into the MapReduce framework, ensuring compatibility with distributed computing environments. In what follows, we present the algorithm's key components along with results highlighting its performance and scalability.

Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems

Distributed, Parallel, and Cluster Computing

Makes computer programs run faster by adjusting their parts.

16 Apr 2025 3

85%

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

Distributed, Parallel, and Cluster Computing

Tests how fast computer programs process data.

14 Apr 2025 1

85%

Declarative Data Pipeline for Large Scale ML Services

Distributed, Parallel, and Cluster Computing

Builds better computer programs faster and smarter.

20 Aug 2025 0

View PDF Login to Bookmark

Page Count

7 pages

A Review and Analysis of a Parallel Approach for Decision Tree Learning from Large Data Streams

Teaches computers to learn from fast-moving information.

Technical Abstract

Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

Declarative Data Pipeline for Large Scale ML Services