Co-Locating Distributed Data-Parallel Jobs in Shared Clusters

Cluster resource utilization and makespan can often be significantly improved when data-parallel batch processing jobs share cluster resources without much isolation. However, how much workloads benefit from resource sharing depends on the specific job combinations that run together, so co-locations should be selected taking resource utilizations and interferences into account.


Distributed data-parallel batch processing systems like MapReduce, Spark, and Flink are popular tools for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources using temporary container reservations. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization, despite overprovisioning and the typically fluctuating resource demands of long-running batch processing jobs. That is, it is often beneficial for resource utilization and makespan to co-locate jobs that stress different resources. However, some combinations of jobs utilize resources significantly better and interfere less with each other when running on the same shared nodes, compared to others. This is, however, often not clear without actually running specific jobs together with other jobs in a particular cluster environment.


Recurring data processing jobs provide an opportunity to monitor and learn how specific jobs and job combinations utilize shared cluster resources. This knowledge can inform scheduling decisions. More specifically, our approach to scheduling recurring distributed dataflow jobs employs reinforcement learning to select jobs that stress different resources than the jobs already running on a cluster.

Our approach takes the resource utilization of and interference among co-located jobs into account. Furthermore, we continuously learn which combinations of jobs should be promoted or prevented, when it comes to co-locating them on shared nodes. Thereby, changes in the behavior of recurring jobs due to, for instance, modified input data or changed job parameters, will also be automatically reflected in our scheduler’s model of co-location goodness over time.


We developed and evaluated a family of cluster schedulers that co-locate data-parallel batch processing jobs on sahred cluster resources using reinforcement learning, aiming to optimize makespan by selecting combinations that exhibit a high resource utilization and low interference: The first scheduler, which we call Mary, implements the reinforcement learning algorithm and measure of co-location goodness. The second scheduler, which we call Hugo, builds on the first scheduler and adds offline grouping of jobs to provide a scheduling mechanism that efficiently generalizes from already monitored job combinations. The third scheduler, which we call Hugo*, adds bounded waiting, showing how additional scheduling requirements can be integrated. We implemented all our scheduler variants for YARN and used exemplary Flink and Spark jobs to demonstrate the impact on cluster resource utilization, makespan, and waiting times.



If you have any questions or are interested in collaborating with us on this topic, please get in touch with Lauritz!


This work has been supported through grants by the German Science Foundation as Stratosphere (DFG Research Unit FOR 1306) as well as by the German Ministry for Education and Research as Berlin Big Data Center BBDC (BMBMF grant 01IS14013A) and Berlin Institute for the Foundations of Learning and Data BIFOLD (BMBF grant 01IS18025A).