The scenarios in which Distributed Streaming Processing (DSP) systems are being used are manifold. We aim to support users to use state-of-the-art modeling approaches for resource management. Our goal is to let them adapt models to their particular use-case as fast as possible with as little work and data as feasible.
DSP Systems are being widely used to perform tasks like market research, real-time monitoring, financial trading, social media analysis, live media streaming and many others. These different use-cases result in varying requirements and prerequisites, e.g. the data load can changeable and sudden, fault tolerance may be important, or a high latency could be acceptable. These different requirements make it difficult to configure and maintain a system in such a way, that the main objectives (Service Level Agreements) are being fulfilled and at the same time the used resources are kept as low as possible.
This difficulty leads to various issues such as low resource utilization, high power consumption, and inconsistent quality of service. The configuration of such systems is usually done by trained experts as an automatic configuration is hard and slow due to a large parameter search space.
So far approaches using machine learning suffered from the cold-start problem, as a new model has been trained from scratch for each new application. We intend to speed up the process of achieving a working model through means of knowledge transfer.
The cold-start problem is commonly met in the Machine Learning domain, as the amount of available training data needs to b sufficiently large to train a sensible model. There are several approaches that address this problem, like Transfer Learning, Similarity Functions, Gradient Based Meta Learning, and Few-Shot Learning.
Assuming that models which need to perform some task within a DSP system - like a Reinforcement Learning Model which configures the DSP system online, a model which predicts the right level of parallelization to autoscale a system, etc. - are similar to each other in their input and output, we can infer that these models come from a distribution of models. We can now concentrate on this distribution and account for the shift in the distribution between the different models while developing these models. Model-Agnostic Meta-Learning consists of finding a set of similar tasks (e.g. RL Models that configure systems) and train a model jointly on these tasks s.t. the resulting model is adapting to a new unseen task with only a few data examples.
We evaluated a variety of approaches for predicting loads in DSP systems. The insights we gained from this will be used to develop a fast adapting dynamic resource auto-scaling approach. Furthermore, we developed a Decision Support System for a tele-medical center which helps classifying patients into critical and non-critical cases and thus pre-filtering these. A next step will be to enhance the model and make it rapidly adaptable to each new patient.
If you have any questions or are interested in collaborating with us on this topic, please get in touch with Kordian!
This work has been supported through grants by the German Federal Ministry for Economic Affairs and Energy (BMWi) as Telemed5000 (funding mark 01MD19014C) as part of the program “Smart Data”.