Exhaustion Dominated Performance: Methodology, Tools and Empirical Experiments
Licentiatavhandling, 2008
Problem with application sensitivity to insufficient resources in High Performance
Computing (HPC) clusters has been a longstanding issue of concern for industries. In
this thesis we propose a method based on a black box approach for characterization
and analysis of engineering simulation applications with respect to available hardware
resources in the situation of resource depletion. The basis of this hypothesis is the
existence of one dominating bottleneck at any given moment during execution time.
This method suggests that engineering simulation application’s behavior to exhaustion
can be explained as an approximately linear dependency of execution time for the
problem at hand.
Verification of this hypothesis required accurate non-intrusive measurement,
precise methods and tools for individual depletion of hardware resources. The results
of these efforts were methods for depletion of available bandwidth, available RAM
memory, processor capacity of compute nodes and practical guidelines for recognition
of bottlenecks.
The method of bandwidth depletion succeeded to decrease the available bandwidth,
by generating stable and robust network traffic, with only 0.5% deviation from the
desired target with a maximum variance of approximately 1%.
Furthermore, the method for depletion of processor capacity resulted to a successful
to generate artificial CPU load and occupy as well as 90% of the processor capacity.
The verification for this tool confirmed the accuracy of 0.11% median deviation. The
error range for attempts for a target load lay between 0.00% at minimum and 1.04% at
maximum.
Our experiments with HPL and Fluent whilst exhausting the RAM memory and the
available bandwidth of the computer cluster, confirmed that our proposed method for
analysis and recognition of bottlenecks is fully applicable and a viable option for
characterization in this domain.
engineering simulation applications.
measurement methodology
Computer cluster
verification methods
resource exhaustion