– layout: default title: From Counters To Carbon: Learning Cross-Platform Power Models for Multi-Site Energy Profiling —

Learning Cross-Platform Power Models for Multi-Site Energy Profiling

Assignment:

As the amount of data available to researchers in fields ranging from bioinformatics to physics to remote sensing continues to grow, the importance of scientific workflow management systems has increased dramatically. These systems play a critical role in creating and executing scalable data analysis pipelines on high-performance compute infrastructures.

However, due to growing concerns about carbon emissions and rising energy costs, improving energy efficiency can relieve the need for expensive infrastructure overhauls or increased energy supply. To improve practical efficiency, ML workflows could be adapted, software optimized, or entire pipelines assigned to different machines based on energy availability and load patterns. However, to make these decisions,detailed insights into energy consumption are necessary.

In this thesis you will investigate how rich compute telemetry like performance counters, utilization signals and power sensors enable data-driven power models that are both fine-grained and portable enough to compare machines across platforms. The core motivation is that reliable power estimation is a prerequisite for attributing energy to workloads in shared environments and for turning energy into carbon using time-varying electricity-carbon signals.

Tasks:

During this thesis you will develop and evaluate machine learning models to predict power consumption based on telemetry data on heterogeneous infrastructures. You will analyze patterns in energy consumption across different workloads and platforms to identify opportunities for optimization and enable intelligent decision-making for resource management systems.

The quality of the developed methods should be evaluated on a compute cluster with real-world workflows and compared to existing state-of-the-art approaches.

Research Questions:

Requirements:

Knowledge of operating system concepts, machine learning techniques and advanced programming skills (e.g. Go, Rust, Python) are required.

Start: Immediately

Contact: Niklas Fomin (fomin ∂ tu-berlin.de)

Note: For undergraduate students, the topic can be scaled and be either expanded or narrowed down after consultation with the staff member.

Resources: