Resource Allocation for Distributed Processing
Choosing and configuring cluster resources for distributed data processing jobs can be a challenging task.
Even expert users often do not fully understand system and workload dynamics, also just because there is usually no full information for all the factors that influence the performance of processing jobs.
At the same time, configuring cluster resources so that jobs execute without significant bottlenecks and taking into account constraints for the execution time and utilized resources is important.
We, therefore, work on resource allocation methods and tools that take such requirements into account and utilize profiling, monitoring, and performance modeling to select adequate sets of resources.
Moreover, co-locating processing tasks with complementary resource demands in shared infrastructures can further increase the resource utilization and job throughput.
We, therefore, aim to answer the following questions for different data processing workloads with our research: What kind of resource should be allocated for a job and its tasks? Which job should be run next when resources become available? Where should a specific task be placed in a particular infrastructure? Should certain tasks be co-located onto shared resources?
To answer these questions, we use monitoring data, profiling runs, different performance models, as well as scoring and optimization methods.
Ongoing Research
We currently work on multiple topics in this area:
Featured Publications
- Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?.
Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Odej Kao.
Proceedings of the 35th International Conference on Scientific and Statistical Database Management (SSDBM). ACM. 2023. [arXiv preprint]
- Lotaru: Locally estimating runtimes of scientific workflow tasks in heterogeneous clusters.
Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Jonathan Will, Ulf Leser, Odej Kao.
Proceedings of the 34th International Conference on Scientific and Statistical Database Management. ACM. 2022. [arXiv preprint]
- Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts.
Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, and Odej Kao.
In the Proceedings of the 23rd IEEE International Conference on Cluster Computing (CLUSTER). IEEE. 2021. [arXiv preprint]
- C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds.
Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Jonathan Bader, and Odej Kao.
In the Proceedings of the 9th IEEE International Conference on Cloud Engineering (IC2E). IEEE. 2021. [arXiv preprint] [video]
- Let’s Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud.
Philipp Wiesner, Ilja Behnke, Dominik Scheinert, Kordian Gontarska, and Lauritz Thamsen.
In the Proceedings of the 22nd International Middleware Conference (Middleware). ACM. 2021. [Open Access] [code]