Score: 0

High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments

Published: December 11, 2025 | arXiv ID: 2512.10312v1

By: Julian Rodriguez , Piotr Lopez , Emiliano Lerma and more

Potential Business Impact:

Teaches computers to learn from lots of information.

Business Areas:

Big Data Data and Analytics

This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through group and individual strategies, followed by text analysis and classification with RestMex and movie feature analysis with IMDb. Finally, it describes the technical implementation of a distributed computing cluster with Apache Spark on Linux using Scala.

Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains

Machine Learning (CS)

Finds best computer learning for different data.

29 Nov 2025 0

89%

Declarative Data Pipeline for Large Scale ML Services

Distributed, Parallel, and Cluster Computing

Builds better computer programs faster and smarter.

20 Aug 2025 0

88%

An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models

Distributed, Parallel, and Cluster Computing

Lets big AI run on normal computers.

13 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇲🇽 Mexico

Page Count

8 pages

High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments

Teaches computers to learn from lots of information.

Technical Abstract

Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains

Declarative Data Pipeline for Large Scale ML Services

An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models