Developers insert logging statements in the source code to collect valuable runtime information of software systems. Logging statements produce execution logs at runtime, which are usually appended onto log files or the standard output. Logs play important roles in the daily tasks of developers and other software practitioners. For example, logs are usually the only available resource for diagnosing field failures. Logs are textual data recording events with different granularity, providing human-understandable clues for the failure and its type. For example, from several repetitions of the two consecutive log lines, “l1: Interface changed state to up.” and “l2: Interface changed state to down.”, operators can detect a failure in the system, assign its type “Interface Flapping”, conclude that the interface is flapping and obtain clues for potential root causes (bad cable connection) a common suspect in this example). A single log is composed of a static event template describing the event (e.g., “Interface changed state to <*>”) and parameters (e.g., up) giving variable event information. Other monitoring data, like key performance indicator metrics (KPI, e.g., memory utilization, I/O bytes), provide clues for detecting failures, however, they do not provide a verbose description of the type of the failure making the failure identification incomplete. For example, a sharp increase in the curve of memory utilization only indicates that the memory utilization increases, but it cannot tell why it happens in isolation. This is a very active area of research in which we are contributing extensively as seen by our publication list.
We currently work on:
Hardware Failure Prediction based on Event Logs, where we research new methods to anticipate incoming failure in memory.