Assignment:
As the amount of data available to researchers in fields ranging from bioinformatics to physics to remote sensing continues to grow, the importance of scientific workflow systems has increased dramatically. These systems play a critical role in creating and executing scalable data analysis pipelines. When designing these workflows, it’s important for users to define the resources required for each task and ensure that sufficient resources are allocated on the intended cluster infrastructure. A critical problem is underestimating a task’s memory requirements, which can lead to task failures. As a result, users often over-allocate resources, resulting in significant resource inefficiency and reduced overall throughput.
The announced bachelor’s thesis builds upon an existing dynamic memory prediction model. The challenge is to select the model’s parameters automatically and improve the prediction model performance by selecting appropriate offsetting and failure handling strategies.
The quality of the developed methods should be evaluated with traces from real-world workflow cluster executions.
Requirements: Knowledge of machine learning techniques and advanced Python skills.
Start: immediately
Contact: Jonathan Bader (jonathan.bader ∂ tu-berlin.de)