In the era of Big Data, with streaming applications such as social media, surveillance monitoring and real-time search generating large volumes of data, efficient Data Stream Processing Systems (DSPSs) have become essential. An application processed by DSPSs is represented as a DAG, where each vertex represents a task and the edges show the data flow between the tasks. Task allocation can be defined as the assignment of the vertices in the DAG to the physical compute nodes such that the data movement between the nodes is minimised. The placement of the tasks has a significant impact on performance metrics such as data processing latency and system throughput. Finding an optimal task placement for stream processing systems is NP-hard. Thus, approximate scheduling approaches are required to improve the performance of DSPSs.
In this talk, I will present our three proposed schedulers, each having a different heuristic partitioning approach to minimise inter-node communication for either homogeneous or heterogeneous clusters. I will demonstrate how each scheduler can efficiently assign groups of highly communicating tasks to compute nodes for real world workloads.
Last modified: Thursday, 10-May-2018 10:18:39 NZST
This page is maintained by the seminar list administrator.