HPC/Applications/lammps

Benchmark

Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on both node types.

LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).

Job tag	Node type	Interconnect	Additional OpenMPI options	Relative speed (1000 steps/3 hours)	Notes
gen1	gen1	IB	(none)	36
gen1srqpin	gen1	IB	-mca btl_openib_use_srq 1 -mca mpi_paffinity_alone 1	39
gen1eth	gen1	Ethernet	-mca btl self,tcp	44	fastest for gen1
gen2eth	gen2	Ethernet	-mca btl self,tcp	49
gen2srq	gen2	IB	-mca btl_openib_use_srq 1	59
gen2	gen2	IB	(none)	59	fastest for gen2

Sample job file gen1

#!/bin/bash
#PBS -l nodes=10:ppn=8:gen1
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea

# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR

mpirun  -machinefile  $PBS_NODEFILE \
        -np $(wc -l < $PBS_NODEFILE) \
        -mca btl self,tcp \
        lmp_openmpi < lammps.in > lammps.out 2> lammps.err

Sample job file gen2

#!/bin/bash
#PBS -l nodes=10:ppn=8:gen2
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea

# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR

mpirun  -machinefile  $PBS_NODEFILE \
        -np $(wc -l < $PBS_NODEFILE) \
        lmp_openmpi < lammps.in > lammps.out 2> lammps.err

OpenMP usage (experimental)

LAMMPS modules since 2012 are compiled with yes-user-omp, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs.

Be careful not to Use the following for the core of your job script:

#PBS -l naccesspolicy=SINGLEJOB
...
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' )	# grab first (and usually only) ppn value of the job
ppn_phys=8										# number of cores on first execution node
#ppn_phys=12										# experimental: 3-to-2 oversubscription of cores.

OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi ))			# calculate number of threads available per MPI process

mpirun -x OMP_NUM_THREADS \
    -machinefile  $PBS_NODEFILE \
    -np $(wc -l < $PBS_NODEFILE) \
    lmp_openmpi -sf omp -in in.script

LAMMPS echoes it parallelization scheme first thing in the output:

LAMMPS (10 Feb 2012)
  using 3 OpenMP thread(s) per MPI task
...
  2 by 2 by 2 MPI processor grid
Lattice spacing in x,y,z = 3.52 4.97803 4.97803
...

and near the end:

Loop time of 11.473 on 24 procs (8 MPI x 3 OpenMP) for 100 steps with 32000 atoms

To learn more:

HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)
Hints on using the OMP package
Command-line options (explanation for -sf style or -suffix style)

HPC/Applications/lammps

Contents

Benchmark

Sample job file gen1

Sample job file gen2

OpenMP usage (experimental)

Navigation menu

HPC/Applications/lammps

Benchmark

Sample job file gen1

Sample job file gen2

OpenMP usage (experimental)

Navigation menu

Search