High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments
By: Julian Rodriguez , Piotr Lopez , Emiliano Lerma and more
Potential Business Impact:
Teaches computers to learn from lots of information.
This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through group and individual strategies, followed by text analysis and classification with RestMex and movie feature analysis with IMDb. Finally, it describes the technical implementation of a distributed computing cluster with Apache Spark on Linux using Scala.
Similar Papers
Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
Machine Learning (CS)
Finds best computer learning for different data.
Declarative Data Pipeline for Large Scale ML Services
Distributed, Parallel, and Cluster Computing
Builds better computer programs faster and smarter.
An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models
Distributed, Parallel, and Cluster Computing
Lets big AI run on normal computers.