HUG2000 abstract : Saeri Lee

Logical Processor Reordering Toward

Efficient GEN_BLOCK Redistribution

Saeri Lee, Hyun-Gyoo Yook, Mi-Soon Koo and Myong-Soon Park

Dept. of Computer Science & Engineering, Korea University,

Sungbuk-ku, Seoul 136-701, Korea

{saeri, hyun, kms, myongsp}@iLab.korea.ac.kr

Abstract

Many data parallel programming languages, such as High Performance Fortran(HPF), Fortran D, Vienna Fortran, and High Performance C(HPC), provide compiler directives for programmers to specify array distribution. The “generalized” block distribution, GEN_BLOCK, which is supported in High Performance Fortran version 2, allows contiguous segments of an array, of possibly unequal sizes, to be mapped onto processors and is therefore useful for solving load balancing problems.

The array redistribution problem has recently received considerable attention. This interest is motivated largely by the HPF programming style, in which science applications are decomposed into phases. At each phase, there is an optimal distribution of the arrays onto the processor grid. Because the optimal distribution changes from phase to phase, the array redistribution is a critical operation. Since current GEN_BLOCK distribution may not be effective for dynamic load balancing at the next time, the redistribution between different GEN_BLOCKs is necessary.

The use of data redistribution represents a performance tradeoff between the expected higher efficiency of a new distribution for subsequent computation and the communication cost of redistributing the data among processor memories. Consequently, minimizing the execution time of data redistribution has obvious merit. Reducing the amount of data exchanged among processor memories is one possible way of reducing overall redistribution execution time.

This paper proposes logical processor reordering for minimizing the communication cost of redistributing the data among processor memories in the GEN_BLOCK redistribution. The aim of this study is to reduce overall redistribution execution time by causing an increasing amount of data to remain on the same processor after redistribution. The data remaining on the same processor following redistribution is called a Data Hit. This eliminates the exchange of data elements among processors. Another advantage is that Data Hits reduce the number of destination processors. Each of these factors reduces interprocessor communication overhead. This paper focuses on increasing the number and size of Data Hits. We propose three algorithms for logical processor reordering. The first one reorders the logical processors to increase Data Hits regardless of the message sizes exchanged. The second one enlarges the message size of Data Hits. The third one increases the number and size of Data Hits by combining the two algorithms.

Various experiments on CRAY T3E were performed to evaluate the proposed algorithms and compare typical GEN_BLOCK redistribution, which does not reorder logical processor numbers. MPI primitives were used for interprocessor communication. According to the experiments, Number_Oriented algorithm shows better performance than Size_Oriented algorithm, and Numsize_Oriented algorithm shows the best performance of the three algorithms. The logical processor reordering algorithms were found to perform better than typical GEN_BLOCK redistribution in all cases.