Auto-Tuning of Scalable Systems
The performance and dependability of scalable distributed systems often depends on an adequate configuration for the specific workload, computing environment, and application requirements. However, systems often offer many options that can be configured and each option usually can be set to a large range of values. Moreover, as workloads and environments change dynamically, the optimal configuration of a system usually changes as well. We, therefore, work on novel mechanisms for automatically tuning the configurations of scalable distributed systems. Our goal is to support systems in efficiently adjusting configurations to variable workloads and node failures dynamically. Key techniques used for this include runtime profiling, performance modeling, optimization, and time series forecasting.
Ongoing Research
We currently work on multiple topics in this area:
People
Publications
-
Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning.
Zhu, Houkun, Dominik Scheinert, Lauritz Thamsen, Kordian Gontarska, and Odej Kao. In 2022 IEEE International Conference on Cloud Engineering (IC2E), pp. 150-159. IEEE, 2022.[arXiv prepprint]
- Rafiki: Task-level Capacity Planning in Distributed Stream Processing Systems. Benjamin J. J. Pfister, Wolf S. Lickefett, Jan Nitschke, Sumit Paul, Morgan K. Geldenhuys, Dominik Scheinert, Kordian Gontarska, and Lauritz Thamsen. To appear in the Proceedings of the Euro-Par 2021 Workshops (Euro-Par). Presented at the 3rd International Workshop on Parallel Programming Models in High-Performance Cloud (ParaMo). Springer. 2021. [Google Scholar]
- Evaluation of Load Prediction Techniques for Distributed Stream Processing. Kordian Gontarska, Morgan Geldenhuys, Dominik Scheinert, Philipp Wiesner, Andreas Polze, and Lauritz Thamsen. To appear in the Proceedings of the 9th IEEE International Conference on Cloud Engineering (IC2E). IEEE. 2021. [arXiv preprint]
- Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs. Morgan K. Geldenhuys, Lauritz Thamsen, and Odej Kao. In the Proceedings of the 2020 IEEE International Conference on Big Data (Big Data). IEEE. 2020. [arXiv preprint]
- Effectively Testing System Configurations of Critical IoT Analytics Pipelines. Morgan K. Geldenhuys, Lauritz Thamsen, Kain Kordian Gontarska, Felix Lorenz, and Odej Kao. In the Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData). Presented at the Second International Workshop on the Internet of Things Data Analytics (IoTDA). IEEE. 2019. [Google Scholar]