Scheduling Distributed Stream Processing Tasks in Fog Computing Environments

Efficient distributed stream processing in fog computing environments needs to take data, task, and infrastructure characteristics into account.

Motivation

Low-latency processing of data streams from distributed sensors is becoming increasingly important for a growing number of IoT applications. In these environments sensor data collected at the edge of the network is typically transmitted in a number of hops: from devices to intermediate edge and fog resources to clusters of cloud resources. Scheduling stream processing tasks onto all these resources can significantly reduce application latencies and network congestion, yet only if schedulers take the heterogeneity of processing resources and network topologies as well as processing tasks into consideration.

Approach

Given the increasingly distributed and heterogeneous computing environments of the Internet of Things, the goal becomes to effectively schedule distributed stream processing tasks based on the capacities of nodes as well as network properties such as latencies and bandwidths. We, therefore, developed a scheduler for Apache Flink that incorporates monitoring data on distributed infrastructures and stream processing tasks, optimizing task placements for low-latency stream processing. In addition, a continuous monitoring and re-evaluation of our objective function allows to dynamically adjust placements in response to changes in workloads and computing environments, while considering the current ingestion rate, the size of task state, and snapshots enables to migrate to more optimal placements with relatively low overhead.

Publications

Scheduling Stream Processing Tasks on Geo-Distributed Heterogeneous Resources. Gerrit Janßen, Ilya Verbitskiy, Thomas Renner, and Lauritz Thamsen. In the Proceedings of the 2018 IEEE International Conference on Big Data (IEEE BigData). Presented at the First International Workshop on the Internet of Things Data Analytics (IoTDA). IEEE. 2018. [Google Scholar]

Contact

If you have any questions or are interested in collaborating with us on this topic, please get in touch with Lauritz!

Acknowledgments

This work has been supported through grants by the German Ministry for Education and Research as Berlin Big Data Center BBDC (BMBMF grant 01IS14013A) and Berlin Institute for the Foundations of Learning and Data BIFOLD (BMBF grant 01IS18025A).