Parallel Performance of Astrophysical Rotating Plasma Simulator

Ryoji Matsumoto, Mami Machida (Chiba University), Kenji, E. Nakamura (JST), and Mitsuru, R. Hayashi (NAOJ)

We have developed astrophysical rotating plasma simulator (ARPS) by which we first carried out global three-dimensional magnetohydrodynamic (MHD) simulations of rotating plasmas surrounding a gravitating object. Typical number of grid points is 200*64*240 in cylindrical coordinates. The simulator consists of 3D MHD engine and modules which include various physical processes such as resistivity, self-gravity, and heat conduction, 3D visualizer by using AVS, and graphical user-interface. The simulation code has been applied to active astrophysical phenomena such as 1/f noise-like sporadic X-ray time variations in black hole candidates, X-ray flares and outflows in star forming regions, and the formation of collimated jets in active galactic nuclei.

The MHD part of ARPS has been parallelized by domain decomposition. By using MPI, we achieved 90% parallel performance on SR2201, CP-PACS, SR8000, VPP300, and VPP5000. When the number of PEs are larger than 100 (e.g, CP-PACS), we adopted 2D decomposition. In Fujitsu machines (VPP300, VPP5000), we found that VPP-FORTRAN gives parallel performance comparable to that of MPI. Since HPF/JA adopts shadow/reflect directive which corresponds to overlap/overlapfix directive in VPP-FORTRAN, it is straightforward to rewrite the MHD engine written by VPP-FORTRAN to that by HPF/JA. We compare the parallel performance of the MHD engine written by MPI, VPP-FORTRAN, and HPF/JA. Furthermore, we report the parallel performance of the MHD code on SR8000 at Chiba University by using the parallel fortran (HPF2 compiler) by Hitachi.

In the self-gravity module and heat conduction modules in ARPS, we adopt CG algorithm (ICCG for self-gravity and BiCG stab for heat conduction). We parallelized the pre-conditioner by using localized incomplete Cholesky decomposition or incomplete LU (ILU) decomposition which decomposes the original matrix incompletely by only using matrix elements inside each PE. In MPI codes, 70% parallel performance was obtained by this method. We are rewriting these modules by using HPF/JA in Fujitsu machines and by using Parallel Fortran (HPF2) in Hitachi machines. The parallel performance of these HPF modules will be presented.

This work is supported by Japan Science and Technology Corporation (ACT-JST).