Fast Adapting Models in Distributed Stream Processing

The scenarios in which Distributed Streaming Processing (DSP) systems are being used are manifold. We aim to support users to use state-of-the-art modeling approaches for resource management. Our goal is to let them adapt models to their particular use-case as fast as possible with as little work and data as feasible.

Motivation

DSP Systems are being widely used to perform tasks like market research, real-time monitoring, financial trading, social media analysis, live media streaming and many others. These different use-cases result in varying requirements and prerequisites, e.g. the data load can changeable and sudden, fault tolerance may be important, or a high latency could be acceptable. These different requirements make it difficult to configure and maintain a system in such a way, that the main objectives (Service Level Agreements) are being fulfilled and at the same time the used resources are kept as low as possible.

This difficulty leads to various issues such as low resource utilization, high power consumption, and inconsistent quality of service. The configuration of such systems is usually done by trained experts as an automatic configuration is hard and slow due to a large parameter search space.

So far approaches using machine learning suffered from the cold-start problem, as a new model has been trained from scratch for each new application. We intend to speed up the process of achieving a working model through means of knowledge transfer.

Approach

The cold-start problem is commonly met in the Machine Learning domain, as the amount of available training data needs to b sufficiently large to train a sensible model. There are several approaches that address this problem, like Transfer Learning, Similarity Functions, Gradient Based Meta Learning, and Few-Shot Learning.

Assuming that models which need to perform some task within a DSP system - like a Reinforcement Learning Model which configures the DSP system online, a model which predicts the right level of parallelization to autoscale a system, etc. - are similar to each other in their input and output, we can infer that these models come from a distribution of models. We can now concentrate on this distribution and account for the shift in the distribution between the different models while developing these models. Model-Agnostic Meta-Learning consists of finding a set of similar tasks (e.g. RL Models that configure systems) and train a model jointly on these tasks s.t. the resulting model is adapting to a new unseen task with only a few data examples.

Results

We evaluated a variety of approaches for predicting loads in DSP systems. The insights we gained from this will be used to develop a fast adapting dynamic resource auto-scaling approach. Furthermore, we developed a Decision Support System for a tele-medical center which helps classifying patients into critical and non-critical cases and thus pre-filtering these. A next step will be to enhance the model and make it rapidly adaptable to each new patient.

Publications

Evaluation of Load Prediction Techniques for Distributed Stream Processing. Kordian Gontarska, Morgan Geldenhuys, Dominik Scheinert, Philipp Wiesner, Andreas Polze and Lauritz Thamsen. Currently under review.
Predicting Medical Interventions from Vital Parameters: Towards a Decision Support System for Remote Patient Monitoring. Kordian Gontarska, Weronika Wrazen, Jossekin Beilharz, Robert Schmid, Lauritz Thamsen and Andreas Polze. In the Proceedings of the 2021 IEEE International Conference on Artificial Intelligence in Medicine (AIME’21). Springer. 2021. [arXiv preprint]
Effectively Testing System Configurations of Critical IoT Analytics Pipelines. Morgan K. Geldenhuys, Lauritz Thamsen, Kain Kordian Gontarska, Felix Lorenz, and Odej Kao. In the Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData). Presented at the Second International Workshop on the Internet of Things Data Analytics (IoTDA). IEEE. 2019. [Google Scholar]

Contact

If you have any questions or are interested in collaborating with us on this topic, please get in touch with Kordian!

Acknowledgments

This work has been supported through grants by the German Federal Ministry for Economic Affairs and Energy (BMWi) as Telemed5000 (funding mark 01MD19014C) as part of the program “Smart Data”.