Parallelization of a Cartesian CFD Code in High Performance Fortran

Masahiro Nakao

Mitsubishi Heavy Industries, Ltd.

 

Introduction

Recent progress of CFD(Computational Fluid Dynamics) codes has enabled us to simulate more complex flow fields such as unsteady flow. Faster and more cost effective hardware has been required to solve these large-scale computationally-expensive problems. Conventional single processor computers such as vector super-computers, however, have been progressed almost to their speed limitations. In late 80's parallel machines have been introduced as one of the way to overcome this limit. In theory parallel machine can increase their speed according to the number of processor nodes.

On the other hand, parallel computers require much more efforts for parallel implementation of CFD codes in comparison with vectorization. From engineering view point, loading of parallel tuning for a conventional CFD code should be minimized and paralleled codes should be installed on other parallel machines without special tuning for codes.

In the present study a two-dimensional Cartesian Euler/Navier-Stokes code is implemented on a parallel computer using High Performance Fortran. The efficiency of the paralleled program depends on the load balance of each processor node. Fully paralleled codes would be able to realize the ideal load balance. In this study, however, the code is not fully paralleled. The purpose of the study is to know how efficient the code become with easy tuning for parallel implementation in HPF and what is important in parallel implementation as the first trial.

Code Description

The CFD code implemented on a parallel machine in this study is Cartesian code which was developed by Mitsubishi Heavy Industries, Ltd. in 1998. The numerical method is based on an AUSM(Advection Upstream Splitting) type upwind difference. The time integration algorithm of the code is the LU-ADI(Lower-Upper Alternative Direction Implicit) scheme. The main program was originally written in standard FORTRAN 77 language and fully vectored on vector computers.

Machine Description

A parallel computer used in this paper is the NEC Cenju-3 in NEC Parallel Processing Center. The machine has 128 processor nodes. Each processing node has 64 Mbytes of memory and delivers up to 50 Mflops.

Computations

 

Parallelizations are tried for the test programs that include large do loops and simple instructions. The performance of these codes improves almost linearly according to the number of processors. The Cartesian CFD code is parallelized and computations are performed for the standard NACA0012 airfoil. The number of the grid points is about 22,000 points and CPU time required for convergence is about 2 hours on the single vector processor.

Further application using the parallel machine will be presented and efficiency of the paralleled program compared to the single processor code will be also included in the final report.