Implementation and Evaluation of an HPF Compiler for vector parallel machines, HPF/SX V2

Hitoshi Murai, Yasuharu Hayashi and Kenji Suehiro
1st Computer Sowtware Devision, NEC Solutions
murai@hpc.bs1.fc.nec.co.jp

Takuya Araki
HPC Technology Group, NEC Laboratories
takuya@ccm.cl.nec.co.jp

Abstract

We are developing an HPF compiler for vector parallel machines, called HPF/SX V2. It provides some unique extensions as well as the features of HPF 2.0 and HPF/JA. This presentation describes particularly four of them: 1) ON directives of HPF 2.0, 2) REFLECT and LOCAL directives of HPF/JA, 3) vectorization directives and 4) automatic parallelization.

The ON directive in the HPF 2.0 approved extensions is used for partitioning computations among processors. HPF/SX V2 supports a subset of it. When the compiler can not determine the best computation partitioning, programmers can specify it by writing the ON directives.

The REFLECT and LOCAL directives are defined in the HPF/JA language specification. The REFLECT directive updates the shadow area of each processor and the LOCAL directive is the assertion to the compiler that each processor must use the data in its shadow area for remote access. They allow HPF programmers directly specify the access of a shadow area and reduce communication overheads.

It is important to increase vectorization ratio in order to achieve good performance on vector machines. But transformations done by a compiler for parallelization may prevent vectorization to cause performance degradation. The vectorization directives that programmers insert into their HPF source codes are accepted, recognized and used by the compiler for generating more efficient vectorized codes.

If programmers enable automatic parallelization, the compiler tests data dependency among arrays accessed in each loop, detects NEW variables and parallelizes the loop if possible. This feature releases programmers from the burden of inserting INDEPENDENT and NEW directives into their source codes.

We compile some benchmark programs, such as SOR, PDE1, Shallow water model, etc., by the current version of HPF/SX V2 and evaluate their performance on NEC SX-5. The preliminary results show that they achieve a 5-8 times speedup in 8-CPU parallel execution and it can be said that the four features are very useful for vector parallel machines.