AIOps Systems

Telecommunication service and network operators are confronted with rising expectations towards availability, performance, and guaranteed QoS. The complexity of modern IT infrastructures has increased to a point, where traditional IT administration procedures fail to holistically ensure the dependability of the systems. In addition, the number of internet-connected devices and the amount of mobile traffic and internet traffic in general is rapidly increasing. This results in highly-distributed environments which not only constitute an increase in complexity through the number of devices but further introduce new operational challenges and a paradox situation: A vulnerable infrastructure has a decisive impact on our everyday life, as it delivers crucial data for i.e. autonomous driving, connected healthcare, or other critical processes.

We are developing frameworks to provide scalable systems for monitoring, hierarchical in-place data analytics, and predictive remediation workflows. We aim to increase the availability, resilience, and fault-tolerance of highly distributed and possibly critical environments. Therefore we are researching methods to apply and incorporate AIOps methods in those environments while complying with the imposed requirements. This includes - among other approaches - gradually automating administrative processes, developing methods for the profiling and scheduling of machine learning workflows, improving the lifecycle from model training to deployment as well as utilizing decentralized peer-to-peer approaches to cope with the increasing scale of Cloud, Edge and Fog Computing environments.

In a nutshell, we are conducting research on the design, operation and maintenance of AI systems that combine machine learning workflows and sensing capabilities in order to automatically detect anomalous situations and act accordingly.

Ongoing Research

We currently work on:

Cloud Testbed for Failure Injection, in this project we construct a large scale testbed and collect data for different failure types.