Finding performance bugs with the TNO-benchmark suite.
 
Will J. A. Denissen, TNO-TPD, Delft, the Netherlands (den-wja@tpd.tno.nl),
Henk J. Sips, Delft University of Technology, Delft, the Netherlands (sips@cs.tudelft.nl)

HPF was designed to provide high performance for different parallel machines. In other words, performance should be portable between different machines. Now that HPF has been out for a while, several HPF compilers are currently available for many parallel machines. But how does the generated code perform for each of these compilers? May an HPF programmer expect comparable performance between different HPF compilers, given the same program and the same parallel machine? HPF compilers that do not have the best performance for a given example program, are said to have a performance bug.

To get an idea of the performance portability between compilers, we have designed a special benchmark suite; called the TNO-benchmark suite. It consists of a set of HPF programs that test various aspects of efficient parallel code generation. The TNO-benchmark suite consists of a number of template programs that can generate test programs with different array sizes, alignments, distributions, and iteration spaces. It ranges from very simple assignments, called basic assignments, to ease the identification of the origin of possible performance bugs, to more complex assignments like triangular iteration spaces, convex iteration spaces, coupled subscripts, indirection arrays, partial replication, and nested distributions.

We have run the benchmark suite on three compilers: our prototype compiler [1] (which uses the efficient enumeration techniques described in [2]), the PGI HPF compiler, and the GMD Adaptor HPF compiler, with suprising results. For example, the simplest test, an initialization of an array, showed performance differences between the compilers ranging from 2 to 40 times for the PGI HPF compiler, and 30 to 400 for the GMD HPF compiler, compared to our prototype compiler. Similar results will be presented for other test programs.

Closer inspection reveals that the origin of most performance bugs are found in sub-optimal or less flexible enumeration and storage of distributed array elements.

We believe the TNO-benchmark suite will help compiler builders to identify performance bugs, resulting in more portable performance between compilers. The TNO-benchmark suite will also help HPF users to profile the strengths and weaknesses of different compilers. As such, HPF users will be able to select a compiler that suits their problem, or to modify their programs to exploit the compiler strengths as much as possible.

  1. W.J.A Denissen, "Design of an HPF Compiler: A compilation framework for a data-parallel language", Ph.D. Thesis, Delft University of Technology, 2000, ISBN 90-6464-197-8.
  2. C. van Reeuwijk, W.J.A. Denissen, H.J. Sips, E.M.R.M. Paalvast, "An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems", IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No 9, pp. 897-914, September 1996.