[lustre-discuss] LS - DYNA Software causing millions of sync RPCs OpenMPI vs Intel oneAPI Parallel Studio

Rob Kudyba rk3199 at columbia.edu
Wed Jun 12 12:22:34 PDT 2024


Hello,

We have a user that uses the LS-DYNA software, <https://lsdyna.ansys.com/> from
Ansys.

LS-DYNA software from Livermore Software Technology Corporation is a
> general purpose structural and fluid analysis simulation software package
> capable of simulating complex real world problems. It is widely used in the
> automotive industry for crashworthiness analysis, occupant safety analysis,
> metal forming and much more. In most cases, LS-DYNA is being used in
> cluster environments as these environments provide better flexibility,
> scalability and efficiency for such simulations.


The user will run a 3-5 day Slurm job where he asks for between 1-2 compute
nodes (usually 3), using these Slurm options:
#SBATCH --ntasks-per-node=32
#SBATCH --mem-per-cpu=4000M

module load intel-parallel-studio/2020
openmpi/gcc/64/4.1.1_cuda_11.0.3_aware

export
LD_LIBRARY_PATH="/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib:$LD_LIBRARY_PATH"

# OpenMPI
LSDYNA=/path/to/lsDyna/r1502/ls-dyna_mpp_d_R15_0_2_x64_centos79_ifort190_sse2_openmpi405
IFILE=./fsi.k
MEMORY=20000M
MEMORY2=250M
NCORES=96

mpirun -n 96 ${LSDYNA} I=${IFILE} MEMORY=${MEMORY} MEMORY2=${MEMORY2}
NCPU=${NCORES}

Here is a recent job stats:
- user: {ops: 37124243, op: 1366026, cl: 1367353, mn: 1346609, ul: 1346112,
ga: 4251676, sa: 19548, gx: 8097, sy: 23465265, rd: 51877, wr: 3899999, pu:
1681}

The job is creating 1.3 million files, but is issuing many millions of sync
RPCs (almost 23.5M sync for 3.9M writes), which is likely hurting overall
filesystem performance because it is forcing all of the other writers to
block waiting for the sync to complete.

Based on the white paper best practices white paper released by NVIDIA
<https://network.nvidia.com/pdf/whitepapers/wp_LS-DYNA_Best_Practices.pdf> on
page 7 (I'll leave the 'luster' typo and poor grammar in place) :

> Due to the parallel file system capabilities and the usage of InfiniBand
> as the network for Lustre, using Lustre instead of the local disk increased
> LS-DYNA performance 20% in average for both HP MPI and Intel MPI. Intel MPI
> has native Lustre support (command line mpiexec -genv I_MPI_ADJUST_BCAST
> 5 -genv I_MPI_EXTRA_FILESYSTEM on -genv I_MPI_EXTRA_FILESYSTEM_LIST luster
> ).


A few weeks back I asked about those options on the list
<http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2024-April/019108.html>,
no response to date.

I will pass this on to the user but could there be something else causing
those high values seen in the job stats? We're running an ExaScaler on
5.2.8 with
lustre-2.12.9_ddn26-1.el7.x86_64

I don't think switching from OpenMPI to Intel would improve performance by
much but would like some feedback.

Thanks,

Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240612/9454b5a4/attachment.htm>


More information about the lustre-discuss mailing list