In the majority of machine learning (ML) methods in use today, models are trained centrally. This means that all required training data is collected and processed in a common location, usually in highly optimized data centers with very good energy efficiency. However, in many practical use cases it is not possible to collect data centrally, for example due to security and privacy concerns. In these cases, federated learning (FL) approaches are therefore increasingly used, which enable distributed training of ML models without training data leaving the end devices or data silos in the process. However, the use of FL inevitably introduces inefficiencies: FL requires significantly more training rounds, and thus computational power, than centralized training, and is often executed on infrastructures that are significantly less energy efficient than modern GPU clusters. In practice, this often leads to a significant increase in energy consumption and associated CO2 emissions.
Carbon-aware computing is a new paradigm for improving the carbon footprint of distributed systems. The core idea behind carbon-aware computing is to better align the execution of computational loads, and thus the energy consumption of IT infrastructure, with the availability of renewable energy. FL is a very promising use case for carbon-aware computing, as FL training jobs are subject to some flexibility in both location and timing, and thus their execution can be adapted to variable solar and wind production.
Potential research questions for theses lie at the intersection of distributed systems and ML:
How can carbon-awareness be integrated into existing FL strategies that primarily optimize for fast convergence in client selection, such as Oort? What is the impact of synchronous, semi-synchronous, and asynchronous approaches?
What is the impact of carbon-aware scheduling on the training process and how can negative effects be mitigated? For example, how does the seasonality of solar production and the accompanying periodic shift in data distributions affect training?
Can we make the developed approaches more tractable, resilient, and powerful using explainable artificial intelligence (XAI)?
Work in this research area requires very good knowledge of machine learning and solid knowledge of distributed systems. In addition, the information on how to conduct theses at our department must be read and considered.
Contact: Philipp Wiesner (wiesner ∂ tu-berlin.de)
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, 2016.
X. Qiu, T. Parcollet, J. Fernandez-Marques, P. P. B. de Gusmao, D. J. Beutel, T. Topal, A. Mathur, and N. D. Lane, “A first look into the carbon footprint of federated learning,” in arXiv:2102.07627, 2021.
P. Wiesner, I. Behnke, D. Scheinert, K. Gontarska, and L. Thamsen, “Let’s wait awhile: How temporal workload shifting can reduce carbon emissions in the cloud,” in ACM/IFIP Middleware, 2021.
P. Wiesner, D. Scheinert, T. Wittkopp, L. Thamsen, and O. Kao, “Cucumber: Renewable-aware admission control for delay-tolerant cloud and edge workloads,” in Euro-Par, 2022.
P. Wiesner, R. Khalili, D. Grinwald, P. Agrawal, L. Thamsen, and O. Kao, “FedZero: Leveraging renewable excess energy in federated learning,” arXiv:2305.15092, 2023.
F. Lai, X. Zhu, H. V. Madhyastha, and M. Chowdhury, “Oort: Efficient federated learning via guided participant selection,” in USENIX OSDI, 2021.
Y. Jee Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” in AISTATS, 2022.
C. Zhu, Z. Xu, M. Chen, J. Konečný, A. Hard, and T. Goldstein, “Diurnal or nocturnal? federated learning of multi-branch networks from periodically shifting distributions,” in ICLR, 2022.