Efficient distributed load balancing for parallel algorithms
Abstract
With the advent of massive parallel processing technology, exploiting the power
offered by hundreds, or even thousands of processors is all but a trivial task.
Computing by using multi-processor, multi-core or many-core adds a number of
additional challenges related to the cooperation and communication of multiple
processing units.
The uneven distribution of data among the various processors, i.e. the load
imbalance, represents one of the major problems in data parallel applications.
Without good load distribution strategies, we cannot reach good speedup, thus
good efficiency.
Load balancing strategies can be classified in several ways, according to the
methods used to balance workload. For instance, dynamic load balancing algorithms
make scheduling decisions during the execution and commonly results
in better performance compared to static approaches, where task assignment is
done before the execution.
Even more important is the difference between centralized and distributed
load balancing approaches. In fact, despite that centralized algorithms have
a wider vision of the computation, hence may exploit smarter balancing techniques,
they expose global synchronization and communication bottlenecks involving
the master node. This definitely does not assure scalability with the
number of processors.
This dissertation studies the impact of different load balancing strategies.
In particular, one of the key observations driving our work is that distributed
algorithms work better than centralized ones in the context of load balancing
for multi-processors (alike for multi-cores and many-cores as well).
We first show a centralized approach for load balancing, then we propose several
distributed approaches for problems having different parallelization, workload
distribution and communication pattern. We try to efficiently combine several
approaches to improve performance, in particular using predictive metrics
to obtain a per task compute-time estimation, using adaptive subdivision, improving
dynamic load balancing and addressing distributed balancing schemas.
The main challenge tackled on this thesis has been to combine all these approaches
together in new and efficient load balancing schemas.
We assess the proposed balancing techniques, starting from centralized approaches
to distributed ones, in distinctive real case scenarios: Mesh-like computation,
Parallel Ray Tracing, and Agent-based Simulations. Moreover, we
test our algorithms with parallel hardware such has cluster of workstations,
multi-core processors and exploiting SIMD vectorial instruction set.
Finally, we conclude the thesis with several remarks, about the impact of
distributed techniques, the effect of the communication pattern and workload
distribution, the use of cost estimation for adaptive partitioning, the trade-off
fast versus accuracy in prediction-based approaches, the effectiveness of work
stealing combined with sorting, and a non-trivial way to exploit hybrid CPUGPU
computations. [edited by author]