Advancing Log Anomaly Detection and Root Cause Analysis for Enhanced IT Reliability in AI Operations (AIOps)

This master’s thesis focuses on elevating the precision and effectiveness of log anomaly detection and root cause analysis techniques within the context of AIOps. By developing innovative methodologies and harnessing machine learning and data analysis, this research aims to bolster IT reliability and operational efficiency by promptly identifying anomalies and uncovering their underlying causes in complex IT infrastructures.

Research Goal:

The principal goal of this research is to contribute to the field of AIOps by enhancing the capabilities of log anomaly detection and root cause analysis. The specific objectives are as follows:

  1. Innovate and construct novel algorithms and models for early and accurate log anomaly detection in IT systems.

  2. Create a robust methodology for automated root cause analysis, enabling the identification of the origin and impact of anomalies in AI-driven operations.

  3. Evaluate the proposed techniques on real-world datasets, demonstrating their efficacy, scalability, and applicability in AIOps environments.

Methodology:

  1. Data Acquisition: Gather extensive log data from diverse IT systems, encompassing various components such as servers, networks, and applications.

  2. Data Transformation: Preprocess and clean the data to eliminate noise, handle missing values, and convert logs into a suitable format for analysis.

  3. Anomaly Detection: Develop machine learning models and statistical approaches tailored for log anomaly detection, utilizing techniques such as clustering, classification, and time-series analysis.

  4. Root Cause Analysis: Implement algorithms to perform automated root cause analysis on detected anomalies, tracing their origins and cascading effects within AI-driven operations.

  5. Evaluation: Assess the performance of the developed models and techniques through quantitative metrics, such as precision, recall, F1-score, and case studies from real-world AIOps scenarios.

Needed Skills:

To successfully undertake this master’s thesis, the following skills are essential:

  1. Data Wrangling: Proficiency in data preprocessing, feature engineering, and data transformation techniques to prepare logs for analysis.

  2. Machine Learning: Strong knowledge of machine learning algorithms, particularly for anomaly detection, and their application in AIOps environments.

  3. Programming: Proficiency in Python for algorithm implementation and data analysis.

  4. Statistical Analysis: Familiarity with statistical methods and tools for deriving insights from log data.

  5. AIOps Domain Knowledge: Understanding of AIOps concepts, AI-driven operations, and IT reliability principles.

  6. Data Visualization: Ability to create informative visualizations for effectively communicating findings.

  7. Research Skills: Strong research capabilities, including literature review, hypothesis formulation, and experimental design.

Start: Immediately

Contact: Thorsten Wittkopp (t.wittkopp ∂ tu-berlin.de)