HPC/Applications/lammps: Difference between revisions
< HPC | Applications
Jump to navigation
Jump to search
Line 71: | Line 71: | ||
== OpenMP usage (experimental) == | == OpenMP usage (experimental) == | ||
LAMMPS modules since 2012 are compiled with <code>yes-user-omp</code>, permitting multi-threaded | LAMMPS modules since 2012 are compiled with <code>yes-user-omp</code>, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs. | ||
Be careful not to | |||
Use the following for the core of your job script: | |||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
... | ... | ||
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' ) | ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' ) # grab first (and usually only) ppn value of the job | ||
ppn_phys= | ppn_phys=8 # number of cores on first execution node | ||
# | #ppn_phys=12 # experimental: 3-to-2 oversubscription of cores. | ||
OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi )) | |||
OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi )) # calculate number of threads available per MPI process | |||
mpirun -x OMP_NUM_THREADS \ | mpirun -x OMP_NUM_THREADS \ | ||
-machinefile $PBS_NODEFILE \ | -machinefile $PBS_NODEFILE \ | ||
Line 89: | Line 89: | ||
lmp_openmpi -sf omp -in in.script | lmp_openmpi -sf omp -in in.script | ||
</syntaxhighlight> | </syntaxhighlight> | ||
LAMMPS echoes it parallelization scheme first thing in the output: | |||
LAMMPS (10 Feb 2012) | |||
using 2 OpenMP thread(s) per MPI task | |||
Lattice spacing in x,y,z = 3.52 3.52 3.52 | |||
Created orthogonal box = (0 0 0) to (56.32 35.2 9.95606) | |||
2 by 1 by 1 MPI processor grid | |||
Lattice spacing in x,y,z = 3.52 4.97803 4.97803 | |||
To learn more: | |||
* [[HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)]] | |||
* [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_2 Hints on using the OMP package] | |||
* [http://lammps.sandia.gov/doc/Section_start.html#start_7 Command-line options] (explanation for -sf ''style'' or -suffix ''style'') |
Revision as of 06:02, February 9, 2012
Benchmark
Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on both node types.
LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).
Job tag | Node type | Interconnect | Additional OpenMPI options | Relative speed (1000 steps/3 hours) |
Notes |
---|---|---|---|---|---|
gen1 | gen1 | IB | (none) | 36 | |
gen1srqpin | gen1 | IB | -mca btl_openib_use_srq 1 -mca mpi_paffinity_alone 1 |
39 | |
gen1eth | gen1 | Ethernet | -mca btl self,tcp | 44 | fastest for gen1 |
gen2eth | gen2 | Ethernet | -mca btl self,tcp | 49 | |
gen2srq | gen2 | IB | -mca btl_openib_use_srq 1 | 59 | |
gen2 | gen2 | IB | (none) | 59 | fastest for gen2 |
Sample job file gen1
#!/bin/bash
#PBS -l nodes=10:ppn=8:gen1
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea
# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR
mpirun -machinefile $PBS_NODEFILE \
-np $(wc -l < $PBS_NODEFILE) \
-mca btl self,tcp \
lmp_openmpi < lammps.in > lammps.out 2> lammps.err
Sample job file gen2
#!/bin/bash
#PBS -l nodes=10:ppn=8:gen2
#PBS -l walltime=1:00:00:00
#PBS -N <jobname>
#PBS -A <account>
#
#PBS -o job.out
#PBS -e job.err
#PBS -m ea
# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR
mpirun -machinefile $PBS_NODEFILE \
-np $(wc -l < $PBS_NODEFILE) \
lmp_openmpi < lammps.in > lammps.out 2> lammps.err
OpenMP usage (experimental)
LAMMPS modules since 2012 are compiled with yes-user-omp
, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs.
Be careful not to Use the following for the core of your job script:
...
ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' ) # grab first (and usually only) ppn value of the job
ppn_phys=8 # number of cores on first execution node
#ppn_phys=12 # experimental: 3-to-2 oversubscription of cores.
OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi )) # calculate number of threads available per MPI process
mpirun -x OMP_NUM_THREADS \
-machinefile $PBS_NODEFILE \
-np $(wc -l < $PBS_NODEFILE) \
lmp_openmpi -sf omp -in in.script
LAMMPS echoes it parallelization scheme first thing in the output:
LAMMPS (10 Feb 2012) using 2 OpenMP thread(s) per MPI task Lattice spacing in x,y,z = 3.52 3.52 3.52 Created orthogonal box = (0 0 0) to (56.32 35.2 9.95606) 2 by 1 by 1 MPI processor grid Lattice spacing in x,y,z = 3.52 4.97803 4.97803
To learn more:
- HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)
- Hints on using the OMP package
- Command-line options (explanation for -sf style or -suffix style)