Context-Aware Performance Modeling for Resource Management of Distributed Dataflows

Overview:

Distributed data analysis pipelines play an important role in modern IT architectures. They process huge amounts of data, either continuously (stream processing) or at regular intervals (batch processing). However, allocating the right amount of resources to these data processing frameworks is a non-trivial task, especially as the users responsible for these systems may have specific objectives or constraints to adhere to. Assuming that sufficient historical performance data is available, one promising approach is the automated and data-driven performance modeling of such computing jobs in terms of resource requirements. Unfortunately, however, such performance data is often not sufficiently available, making these approaches infeasible. On the other hand, even when such data is available, existing methods often fail to adapt to contextual changes.

Research Goal:

We aim to develop methods to better understand the surrounding - and often changing - execution context of a computing job, so that better overall performance can be achieved. From the user’s point of view, the benefits are not only that the application is stable and meets its goals, but also that it is more cost and energy (generally: resource) efficient. To this end, novel methods for performance modeling for resource management should focus on detecting, understanding and adapting to changes at the application level (e.g. data inputs, parameterizations) and at the infrastructure level (e.g. hardware characteristics, performance degradations) at runtime. In many cases, such a context-aware perspective and approach also lends itself to a collaborative scenario, where users share performance data and individually improve their performance modeling capabilities.

Required Skills:

If you find the above research area interesting and would like to contribute to it, then this is a very good start. In addition: The minimum skills required are knowledge of virtualization, distributed computing systems, and experience with machine learning. Also, you need to read and understand the process of writing a thesis at our research group.

Start: Immediately

Contact: Dominik Scheinert (dominik.scheinert ∂ tu-berlin.de)