FONDA: Foundations of Workflows for Large-Scale Scientific Data Analysis

The project FONDA (Foundations of Workflows for Large-Scale Scientific Data Analysis (FONDA) is a new Collaborative Research Center (CRC) funded by DFG. FONDA will investigate methods to support scientists, who work with cluster infrastructures to analyze very large datasets.

Today, large-scale scientific data analysis is complicated by the necessity to select among different available computational resources and hand-tune distributed processing jobs. These settings are not straightforward and often platform-specific, yet have a significant impact on runtimes and efficiency and lead to either platform lock-in or performance losses. In FONDA, we are going to develop new methods for profiling, performance modeling, and task placement that will enable resource management systems to use the available cluster resources efficiently and, therefore, allow scientists to focus on the domain-specific challenges in their work.

We are part of the projects B1, B7, and T6. Our focus in B1 is on infrastructure discovery and description and the creation of infrastructure-aware task execution profiles. In B7, we develop resource-efficient adaptive machine-learning workflows for Earth observation applications. Team T6 provides a reusable and deployable reference software stack that makes FONDA methods and workflow technologies accessible, integrable, and easy to apply across domains.

Participating Organizations

Selected Publications

Further project information