HPC/Applications/lammps: Difference between revisions

Latest revision as of 21:58, August 11, 2015

Binaries

Several LAMMPS binaries are provided by the LAMMPS module, giving you options to run with MPI and/or a couple of GPU packages. Earlier modules only contained one MPI binary. Binaries compiled with GPU support will not run on nodes without a GPU (CUDA libraries are deliberately only installed on GPU nodes.) Moreover, a binary built with the USER-CUDA package will attempt to access the GPU by default [1].

Binary name	Description
`lmp_openmpi-main`	The baseline binary, containing the packages shown by `module help lammps`.
`lmp_openmpi`	The distribution's default name; synonym for `lmp_openmpi-main`;
`lmp_openmpi-gpu`	The package "gpu" and all packages from main.
`lmp_openmpi-user-cuda`	The package "user-cuda" and all packages from main.
`lmp_openmpi-jr`	A custom build for user J.R.

To use the *-gpu and *=user-cuda binaries, load the cuda module in addition to lammps.

module load cuda
module load lammps

To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended.

Library linking

Consult the LAMMPS documentation
Carbon-specifics: To point your compiler and linker to the installed LAMMPS module, always use the environment variable $LAMMPS_HOME, never full path names. Edit the makefile of your application and add settings similar to the following:

CPPFLAGS += -I${LAMMPS_HOME}/include
FPPFLAGS += -I${LAMMPS_HOME}/include
LDFLAGS += -L$(FFTW3_HOME) -L${LAMMPS_HOME}/lib -llammps -lfftw3 -limf

These settings refer to variables customarily used in makefiles for GNU Make. Your package might use different variables. Adapt as needed.

The library created with the -llammps link option provides the same LAMMPS package set as the main binary, and supports MPI and OpenMP. It is actually equivalent to using:

-llammps_mpi-main

This variant is available for both static (*.a) and dynamic (*.so) linkage but supports no GPUs.

To use GPU nodes, link with one of the following instead:

-llammps_mpi-gpu
-llammps_mpi-user-cuda

These variants are only available for static linkage.

Inspect a sample code that uses LAMMPS as a library at:

cd ${LAMMPS_HOME}/src/examples/COUPLE/simple
less README Makefile

GPU support

LAMMPS offers two different packages for using GPUs, one official, the other user-contributed. Only one of these packges can be used for a run. The packages are fully documented in the following sections of the LAMMPS manual:

5. Accelerating LAMMPS performance

To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section.

Using GPU packages

HPC/Applications/lammps/Package GPU
HPC/Applications/lammps/Package USER-CUDA
HPC/Applications/lammps/Package OMP – if you really want to.

Jobs on Carbon

For sample PBS scripts, consult these files:

$LAMMPS_HOME/sample.job
$LAMMPS_HOME/sample-hybrid.job

See also HPC/Submitting and Managing Jobs/Example Job Script#GPU nodes

Benchmark (pre-GPU version)

Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on node types gen1 and gen2.

LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).

Job tag	Node type	Interconnect	Additional OpenMPI options	Relative speed (1000 steps/3 hours)	Notes
gen1	gen1	IB	(none)	36
gen1srqpin	gen1	IB	-mca btl_openib_use_srq 1 -mca mpi_paffinity_alone 1	39
gen1eth	gen1	Ethernet	-mca btl self,tcp	44	fastest for gen1
gen2eth	gen2	Ethernet	-mca btl self,tcp	49
gen2srq	gen2	IB	-mca btl_openib_use_srq 1	59
gen2	gen2	IB	(none)	59	fastest for gen2

Diagnostic for hybrid parallel runs

LAMMPS echoes it parallelization scheme first thing in the output:

LAMMPS (10 Feb 2012)
  using 4 OpenMP thread(s) per MPI task
...
  1 by 2 by 2 MPI processor grid
  104 atoms
...

and near the end:

Loop time of 124.809 on 16 procs (4 MPI x 4 OpenMP) for 30000 steps with 104 atoms

To see if OpenMP is really active, log into a compute node while a job is running and run top or psuser – The %CPU field should be about OMP_NUM_THREADS × 100%

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                             
8047 stern     25   0 4017m  33m 7540 R 401.8  0.1   1:41.60 lmp_openmpi                                                                                         
8044 stern     25   0 4017m  33m 7540 R 399.9  0.1   1:43.50 lmp_openmpi                                                                                         
4822 root      34  19     0    0    0 S  2.0  0.0 115:34.98 kipmi0

References

HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)
LAMMPS documentation for the OMP package
Command-line options (explanation for -sf style or -suffix style)

@@ Line 1: / Line 1: @@
-== Benchmark ==
+== Binaries ==
-Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on both node types.
+Several LAMMPS binaries are provided by the LAMMPS module, giving you options to run with MPI and/or a couple of GPU packages.
+Earlier modules only contained one MPI binary.
+Binaries compiled with GPU support will not run on nodes without a GPU
+(CUDA libraries are deliberately only installed on GPU nodes.)
+Moreover, a binary built with the USER-CUDA package ''will'' attempt to access the GPU by default [http://lammps.sandia.gov/doc/Section_start.html#start_7].
+{| class="wikitable" cellpadding="5" style="text-align:left;"
+|-
+! Binary name !! Description
+|-
+| <code>'''lmp_openmpi-main'''</code>		|| The baseline binary, containing the packages shown by [[../#lammps | <code>module help lammps</code>]].
+|-
+| <code>'''lmp_openmpi'''</code>		|| The distribution's default name; synonym for <code>lmp_openmpi-main</code>;
+|-
+| <code>'''lmp_openmpi-gpu'''</code>		|| The package "gpu" and all packages from main.
+|-
+| <code>'''lmp_openmpi-user-cuda'''</code>	|| The package "user-cuda" and all packages from main.
+|-
+| <code>'''lmp_openmpi-jr'''</code>		|| A custom build for user J.R.
+|}
+To use the *-gpu and *=user-cuda binaries, load the [[../#cuda|<code>cuda</code>]] module in addition to lammps.
+ module load '''cuda'''
+ module load lammps
+To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended.
+== Library linking ==
+* Consult the [http://lammps.sandia.gov/doc/Section_howto.html#library-interface-to-lammps LAMMPS documentation]
+* Carbon-specifics: To point your compiler and linker to the installed LAMMPS module, always use the environment variable <code>$LAMMPS_HOME</code>, never full path names. Edit the makefile of your application and add settings similar to the following:
+<source lang="make">
+CPPFLAGS += -I${LAMMPS_HOME}/include
+FPPFLAGS += -I${LAMMPS_HOME}/include
+LDFLAGS += -L$(FFTW3_HOME) -L${LAMMPS_HOME}/lib -llammps -lfftw3 -limf
+</source>
+: These settings refer to variables customarily used in makefiles for GNU Make. Your package might use different variables. Adapt as needed.
+* The library created with the <code>-llammps</code> link option provides the same LAMMPS package set as the main binary, and supports MPI and OpenMP. It is actually equivalent to using:
+ -llammps_mpi-main
+: This variant is available for both ''static'' (*.a) and ''dynamic'' (*.so) linkage but supports no GPUs.
+* To use GPU nodes, link with one of the following instead:
+ -llammps_mpi-gpu
+ -llammps_mpi-user-cuda
+: These variants are only available for ''static'' linkage.
+Inspect a sample code that uses LAMMPS as a library at:
+<source lang="bash">
+cd ${LAMMPS_HOME}/src/examples/COUPLE/simple
+less README Makefile
+</source>
+== GPU support ==
+LAMMPS offers ''two different'' packages for using GPUs, one official, the other user-contributed.
+Only one of these packges can be used for a run.
+The packages are fully documented in the following sections of the [http://lammps.sandia.gov/doc/Manual.html LAMMPS manual]:
+* [http://lammps.sandia.gov/doc/Section_accelerate.html 5. Accelerating LAMMPS performance]
+<!-- ** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_5 5.5 USER-OMP package] -->
+** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_6 5.6 GPU package]
+** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_7 5.7 USER-CUDA package]
+** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_8 5.8 Comparison of GPU and USER-CUDA packages]
+To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section.
+=== Using GPU packages ===
+* [[../lammps/Package GPU]]
+* [[../lammps/Package USER-CUDA]]
+* [[../lammps/Package OMP]] – if you really want to.
+=== Jobs on Carbon ===
+For sample PBS scripts, consult these files:
+ $LAMMPS_HOME/sample.job
+ $LAMMPS_HOME/sample-hybrid.job
+* See also [[ HPC/Submitting and Managing Jobs/Example Job Script#GPU nodes]]
+== Benchmark (pre-GPU version) ==
+Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on node types gen1 and gen2.
 LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).
-{| class="wikitable" cellpadding="5" style="text-align:center;  margin: 1em auto 1em 1em;"
+{| class="wikitable" style="text-align:center;  margin: 1em auto 1em 1em; padding: 5px;"
 |- style="background:#eee;"
 ! Job tag          || Node type  !! Interconnect !! Additional OpenMPI options              !! Relative speed<br>(1000 steps/3 hours) !! Notes
@@ Line 29: / Line 100: @@
 |}
-=== Sample job file gen1 ===
+=== Diagnostic for hybrid parallel runs ===
-<syntaxhighlight lang="bash">
+* LAMMPS echoes it parallelization scheme first thing in the output:
-#!/bin/bash
-#PBS -l nodes=10:ppn=8:gen1
-#PBS -l walltime=1:00:00:00
-#PBS -N <jobname>
-#PBS -A <account>
-#
-#PBS -o job.out
-#PBS -e job.err
-#PBS -m ea
-# change into the directory where qsub will be executed
-cd $PBS_O_WORKDIR
-mpirun  -machinefile  $PBS_NODEFILE \
-        -np $(wc -l < $PBS_NODEFILE) \
-        -mca btl self,tcp \
-        lmp_openmpi < lammps.in > lammps.out 2> lammps.err
-</syntaxhighlight>
-=== Sample job file gen2 ===
-<syntaxhighlight lang="bash">
-#!/bin/bash
-#PBS -l nodes=10:ppn=8:gen2
-#PBS -l walltime=1:00:00:00
-#PBS -N <jobname>
-#PBS -A <account>
-#
-#PBS -o job.out
-#PBS -e job.err
-#PBS -m ea
-# change into the directory where qsub will be executed
-cd $PBS_O_WORKDIR
-mpirun  -machinefile  $PBS_NODEFILE \
-        -np $(wc -l < $PBS_NODEFILE) \
-        lmp_openmpi < lammps.in > lammps.out 2> lammps.err
-</syntaxhighlight>
-== OpenMP usage (experimental) ==
-LAMMPS modules since 2012 are compiled with <code>yes-user-omp</code>, permitting multi-threaded runs, and in particular MPI/OpenMP hybrid parallel runs.
-Be careful not to
-Use the following for the core of your job script:
-<syntaxhighlight lang="bash">
-#PBS -l naccesspolicy=SINGLEJOB
-...
-ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' )	# grab first (and usually only) ppn value of the job
-ppn_phys=8										# number of cores on first execution node
-#ppn_phys=12										# experimental: 3-to-2 oversubscription of cores.
-OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi ))			# calculate number of threads available per MPI process
-mpirun -x OMP_NUM_THREADS \
-    -machinefile  $PBS_NODEFILE \
-    -np $(wc -l < $PBS_NODEFILE) \
-    lmp_openmpi -sf omp -in in.script
-</syntaxhighlight>
-LAMMPS echoes it parallelization scheme first thing in the output:
   LAMMPS (10 Feb 2012)
-    '''using 3 OpenMP thread(s) per MPI task'''
+    '''using 4 OpenMP thread(s) per MPI task'''
   ...
 by 2 by 2 MPI processor grid
- Lattice spacing in x,y,z = 3.52 4.97803 4.97803
+atoms
   ...
 and near the end:
-  Loop time of 11.473 on 24 procs ('''8 MPI x 3 OpenMP''') for 100 steps with 32000 atoms
+  Loop time of 124.809 on 16 procs ('''4 MPI x 4 OpenMP''') for 30000 steps with 104 atoms
+* To see if OpenMP is really active, log into a compute node while a job is running and run <code>top</code> or <code>psuser</code> – The <code>%CPU</code> field should be about <code>OMP_NUM_THREADS × 100%</code>
+  PID USER      PR  NI  VIRT  RES  SHR S '''%CPU''' %MEM    TIME+  COMMAND
+stern     25   0 4017m  33m 7540 R '''401.8'''  0.1   1:41.60 lmp_openmpi
+stern     25   0 4017m  33m 7540 R '''399.9'''  0.1   1:43.50 lmp_openmpi
+root      34  19     0    0    0 S  2.0  0.0 115:34.98 kipmi0
-To learn more:
+=== References ===
 * [[HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)]]
-* [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_2 Hints on using the OMP package]
+* [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_2 LAMMPS documentation for the OMP package]
 * [http://lammps.sandia.gov/doc/Section_start.html#start_7 Command-line options] (explanation for -sf ''style'' or -suffix ''style'')

HPC/Applications/lammps: Difference between revisions

Latest revision as of 21:58, August 11, 2015

Contents

Binaries

Library linking

GPU support

Using GPU packages

Jobs on Carbon

Benchmark (pre-GPU version)

Diagnostic for hybrid parallel runs

References

Navigation menu

HPC/Applications/lammps: Difference between revisions

Latest revision as of 21:58, August 11, 2015

Binaries

Library linking

GPU support

Using GPU packages

Jobs on Carbon

Benchmark (pre-GPU version)

Diagnostic for hybrid parallel runs

References

Navigation menu

Search