FONDA: Foundations of Workflows for Large-Scale Scientific Data Analysis

The project FONDA (Foundations of Workflows for Large-Scale Scientific Data Analysis (FONDA) is a new Collaborative Research Center (CRC) funded by DFG. FONDA will investigate methods to support scientists, who work with cluster infrastructures to analyze very large datasets.

Today, large-scale scientific data analysis is complicated by the necessity to select among different available computational resources and hand-tune distributed processing jobs. These settings are not straightforward and often platform-specific, yet have a significant impact on runtimes and efficiency and lead to either platform lock-in or performance losses. In FONDA, we are going to develop new methods for profiling, performance modeling, and task placement that will enable resource management systems to use the available cluster resources efficiently and, therefore, allow scientists to focus on the domain-specific challenges in their work.

We are part of the projects B1, B7, and T6. Our focus in B1 lies on the enablement of carbon-aware execution of large-scale data analysis workflows by accurately modelling the behavior of computational jobs on heterogeneous infrastructures like clusters or grids. Our methods range from infrastructure discovery, to resource management and scheduling under uncertainty. In B7, we develop resource-efficient adaptive machine-learning workflows for Earth observation applications. Team T6 provides a reusable and deployable reference software stack that makes FONDA methods and workflow technologies accessible, integrable, and easy to apply across domains.

Participating Organizations

Humboldt-Universität zu Berlin
Charité – Universitätsmedizin Berlin
Freie Universität Berlin
Max Delbrück Center for Molecular Medicine
Technische Universität Berlin
Universität Osnabrück
Universität Potsdam
Zuse-Institut Berlin
Fraunhofer Heinrich-Hertz-Institut, Berlin

Selected Publications

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters. Jonathan Bader, Lauritz Thamsen, Svetlana Kulagina, Jonathan Will, Henning Meyerhenke, and Odej Kao. Proceedings of the 2021 IEEE International Conference on Big Data (Big Data). IEEE. 2021. [arXiv]
Lotaru: Locally estimating runtimes of scientific workflow tasks in heterogeneous clusters. Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Jonathan Will, Ulf Leser, Odej Kao. Proceedings of the 34th International Conference on Scientific and Statistical Database Management. ACM. 2022. [arXiv]
Reshi: Recommending resources for scientific workflow tasks on heterogeneous infrastructures. Jonathan Bader, Fabian Lehmann, Alexander Groth, Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Ulf Leser, Odej Kao. 2022 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE. 2022. [arXiv]
Macaw: The machine learning magnetometer calibration workflow. Jonathan Bader, Kevin Styp-Rekowski, Leon Doehler, Soeren Becker, Odei Kao. 2022 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE. 2022. [arXiv]
Towards Advanced Monitoring for Scientific Workflows. Jonathan Bader, Joel Witzke, Soeren Becker, Ansgar Lößer, Fabian Lehmann, Leon Doehler, Anh Duc Vu, Odej Kao. 2022 IEEE International Conference on Big Data (Big Data). IEEE. 2022. [arXiv]
Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows. Jonathan Bader, Nicolas Zunker, Soeren Becker, Odej Kao. 2022 IEEE International Conference on Big Data (Big Data). IEEE. 2022. [arXiv]
How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface. Fabian Lehmann, Jonathan Bader, Friedrich Tschirpke, Lauritz Thamsen, Ulf Leser. 2023 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE/ACM. 2023. [arXiv]
Lotaru: Locally predicting workflow task runtimes for resource management on heterogeneous infrastructures. Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Ulf Leser, Odej Kao. 2024. Elsevier. [Elsevier]
Sizey: Memory-Efficient Execution of Scientific Workflow Tasks. Jonathan Bader, Fabian Skalski, Fabian Lehmann, Dominik Scheinert, Jonathan Will, Lauritz Thamsen, Odej Kao. 2024 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. 2024. [arXiv]
Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows. Fabian Lehmann, Jonathan Bader, Ninon De Mecquenem, Xing Wang, Vasilis Bountris, Florian Friederici, Ulf Leser, Lauritz Thamsen. 2024 IEEE 20th International Conference on e-Science (e-Science). IEEE. 2024. [arXiv]
KS+: Predicting Workflow Task Memory Usage Over Time. Jonathan Bader, Ansgar Lößer, Lauritz Thamsen, Björn Scheuermann, Odej Kao. 2024 IEEE 20th International Conference on e-Science (e-Science). IEEE. 2024. [arXiv]

Further project information