HPC/Applications/lammps: Difference between revisions

From CNM Wiki
Jump to navigation Jump to search
Line 71: Line 71:


== OpenMP usage (experimental) ==
== OpenMP usage (experimental) ==
LAMMPS modules since 2012 are compiled with <code>yes-user-omp</code>, permitting multi-threaded use.
LAMMPS modules since 2012 are compiled with <code>yes-user-omp</code>, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs.


* [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_2 Hints on using the OMP package]
Be careful not to
 
Use the following for the core of your job script:
Reviewing [[HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)]] it appears the correct job script stub would be:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
...
...
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' )     # grab first (and usually only) ppn from qsub
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' ) # grab first (and usually only) ppn value of the job
ppn_phys=$( grep -c ^processor /proc/cpuinfo )
ppn_phys=8 # number of cores on first execution node
# Calculate number of threads available per MPI process
#ppn_phys=12 # experimental: 3-to-2 oversubscription of cores.
OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi ))
 
OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi )) # calculate number of threads available per MPI process
 
mpirun -x OMP_NUM_THREADS \
mpirun -x OMP_NUM_THREADS \
     -machinefile  $PBS_NODEFILE \
     -machinefile  $PBS_NODEFILE \
Line 89: Line 89:
     lmp_openmpi -sf omp -in in.script
     lmp_openmpi -sf omp -in in.script
</syntaxhighlight>
</syntaxhighlight>
LAMMPS echoes it parallelization scheme first thing in the output:
LAMMPS (10 Feb 2012)
  using 2 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.52 3.52 3.52
Created orthogonal box = (0 0 0) to (56.32 35.2 9.95606)
  2 by 1 by 1 MPI processor grid
Lattice spacing in x,y,z = 3.52 4.97803 4.97803
To learn more:
* [[HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)]]
* [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_2 Hints on using the OMP package]
* [http://lammps.sandia.gov/doc/Section_start.html#start_7 Command-line options] (explanation for -sf ''style'' or -suffix ''style'')

Revision as of 06:02, February 9, 2012

Benchmark

Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on both node types.

LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).

Job tag Node type Interconnect Additional OpenMPI options Relative speed
(1000 steps/3 hours)
Notes
gen1 gen1 IB (none) 36
gen1srqpin gen1 IB -mca btl_openib_use_srq 1
-mca mpi_paffinity_alone 1
39
gen1eth gen1 Ethernet -mca btl self,tcp 44 fastest for gen1
gen2eth gen2 Ethernet -mca btl self,tcp 49
gen2srq gen2 IB -mca btl_openib_use_srq 1 59
gen2 gen2 IB (none) 59 fastest for gen2

Sample job file gen1

#!/bin/bash
#PBS -l nodes=10:ppn=8:gen1
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea

# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR

mpirun  -machinefile  $PBS_NODEFILE \
        -np $(wc -l < $PBS_NODEFILE) \
        -mca btl self,tcp \
        lmp_openmpi < lammps.in > lammps.out 2> lammps.err

Sample job file gen2

#!/bin/bash
#PBS -l nodes=10:ppn=8:gen2
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea

# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR

mpirun  -machinefile  $PBS_NODEFILE \
        -np $(wc -l < $PBS_NODEFILE) \
        lmp_openmpi < lammps.in > lammps.out 2> lammps.err

OpenMP usage (experimental)

LAMMPS modules since 2012 are compiled with yes-user-omp, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs.

Be careful not to Use the following for the core of your job script:

...
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' )	# grab first (and usually only) ppn value of the job
ppn_phys=8										# number of cores on first execution node
#ppn_phys=12										# experimental: 3-to-2 oversubscription of cores.

OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi ))			# calculate number of threads available per MPI process

mpirun -x OMP_NUM_THREADS \
    -machinefile  $PBS_NODEFILE \
    -np $(wc -l < $PBS_NODEFILE) \
    lmp_openmpi -sf omp -in in.script

LAMMPS echoes it parallelization scheme first thing in the output:

LAMMPS (10 Feb 2012)
  using 2 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.52 3.52 3.52
Created orthogonal box = (0 0 0) to (56.32 35.2 9.95606)
  2 by 1 by 1 MPI processor grid
Lattice spacing in x,y,z = 3.52 4.97803 4.97803


To learn more: