Designing Co-operation in Systems of Hierarchical, Multi-objective Schedulers for Stream Processing
By: Animesh Dangwal , Yufeng Jiang , Charlie Arnold and more
Potential Business Impact:
Lets computers handle huge data faster.
Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for various tasks such as deriving product insights, providing and improving user services, and enabling AI at scale for our ever-growing user base. Meta's current stream processing framework supports processing TerraBytes(TBs) of data in mere seconds. This is enabled by our efficient schedulers and multi-layered infrastructure, which allocate workloads across various compute resources, working together in hierarchies across various parts of the infrastructure. But with the ever growing complexity of applications, and user needs, areas of the infrastructure that previously required minimal load balancing, now must be made more robust and proactive to application load. In our work we explore how to build and design such a system that focuses on load balancing over key compute resources and properties of these applications. We also showcase how to integrate new schedulers into the hierarchy of the existing ones, allowing multiple schedulers to work together and perform load balancing, at their infrastructure level, effectively.
Similar Papers
Workload Schedulers -- Genesis, Algorithms and Differences
Distributed, Parallel, and Cluster Computing
Organizes computer tasks to run faster.
Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges
Distributed, Parallel, and Cluster Computing
Organizes big computer jobs to finish faster.
Declarative Data Pipeline for Large Scale ML Services
Distributed, Parallel, and Cluster Computing
Builds better computer programs faster and smarter.