Integration and Performance Issues with the Implementation of HPF Intrinsic Functions

Matthijs van Waveren, Peter Harrison, and Cliff Addison
waveren@fecit.co.uk
Fujitsu European Centre for Information Technology Ltd.
2 Longwalk Road, Stockley Park
Uxbridge, Middlesex UB11 1AB, UK


Dave Orange and Norman Brown
NA Software Ltd.
Roscoe House, 62 Roscoe Street
Liverpool Merseyside L1 9DW, UK


Hidetoshi Iwashita
Compiler Technology Department
Middleware Division, Software Group
Fujitsu Ltd.
140 Miyamoto
Numazu-shi Shizuoka 410-0396, Japan

 

The implementation of HPF Library and of the HPF version of the Fortran 95 array transformational functions presents the challenge that all data types, data kinds, array ranks and input distributions need to be supported. For instance, more than 2 billion separate functions are required to support COPY_SCATTER when the full range of data types, data kinds, and arrays ranks is considered. We have developed a library generator which uses templates, in order to solve the problem of the astronomical number of specific functions.

This library generator interfaces with the Fujitsu HPF compiler. When the HPF compiler encounters a call to a library function in the user code, it calls the library generator and passes the name of the required function, the type, kind, rank, lower and upper bounds, and distribution in each dimension of all dummy arguments. The library generator reads in the corresponding template, and generates and passes the library code to the compiler. It also analyses the input distributions, in order to feed back to the compiler whether it is necessary to redistribute or replicate the input arrays on entry to the library function. This feedback mechanism will be discussed.

The templates contain the following language items: Fortran 95 code, HPF 2.0 and JAHPF directives, CPP directives, and template parameters and macros. We will discuss which combination of HPF 2.0 and JAHPF directives lead to high performance with the Fujitsu HPF compiler. The DOT_PRODUCT and MATMUL templates will be taken as example. These functions make use of the highly efficient single-processor BLAS functions. The HPF code in the templates takes care of the distribution or replication of the data in order to enable the utilisation of the highly efficient single-processor BLAS functions.