HPC/Applications/lammps

From CNM Wiki
< HPC‎ | Applications
Revision as of 20:24, August 10, 2015 by Stern (talk | contribs)
Jump to navigation Jump to search

Binaries

Several LAMMPS binaries are provided by the LAMMPS module, giving you options to run with MPI and/or a couple of GPU packages. Earlier modules only contained one MPI binary. Binaries compiled with GPU support will not run on nodes without a GPU (CUDA libraries are deliberately only installed on GPU nodes.) Moreover, a binary built with the USER-CUDA package will attempt to access the GPU by default [1].

Binary name Description
lmp_openmpi-main The baseline binary, containing the packages shown by module help lammps.
lmp_openmpi The distribution's default name; synonym for lmp_openmpi-main;
lmp_openmpi-gpu The package "gpu" and all packages from main.
lmp_openmpi-user-cuda The package "user-cuda" and all packages from main.
lmp_openmpi-jr A custom build for user J.R.

To use the *-gpu and *=user-cuda binaries, load the cuda module in addition to lammps.

module load cuda
module load lammps

To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended.

Library linking

CFLAGS += $(LAMMPS_HOME)/include
FFLAGS += $(LAMMPS_HOME)/include

LDFLAGS += $(LAMMPS_HOME)/lib -llammps


GPU support

LAMMPS offers two different packages for using GPUs, one official, the other user-contributed. Only one of these packges can be used for a run. The packages are fully documented in the following sections of the LAMMPS manual:

To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section.

Using GPU packages

Jobs on Carbon

For sample PBS scripts, consult these files:

$LAMMPS_HOME/sample.job
$LAMMPS_HOME/sample-hybrid.job

Benchmark (pre-GPU version)

Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on node types gen1 and gen2.

LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).

Job tag Node type Interconnect Additional OpenMPI options Relative speed
(1000 steps/3 hours)
Notes
gen1 gen1 IB (none) 36
gen1srqpin gen1 IB -mca btl_openib_use_srq 1
-mca mpi_paffinity_alone 1
39
gen1eth gen1 Ethernet -mca btl self,tcp 44 fastest for gen1
gen2eth gen2 Ethernet -mca btl self,tcp 49
gen2srq gen2 IB -mca btl_openib_use_srq 1 59
gen2 gen2 IB (none) 59 fastest for gen2

Diagnostic for hybrid parallel runs

  • LAMMPS echoes it parallelization scheme first thing in the output:
LAMMPS (10 Feb 2012)
  using 4 OpenMP thread(s) per MPI task
...
  1 by 2 by 2 MPI processor grid
  104 atoms
...

and near the end:

Loop time of 124.809 on 16 procs (4 MPI x 4 OpenMP) for 30000 steps with 104 atoms
  • To see if OpenMP is really active, log into a compute node while a job is running and run top or psuser – The %CPU field should be about OMP_NUM_THREADS × 100%
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                             
8047 stern     25   0 4017m  33m 7540 R 401.8  0.1   1:41.60 lmp_openmpi                                                                                         
8044 stern     25   0 4017m  33m 7540 R 399.9  0.1   1:43.50 lmp_openmpi                                                                                         
4822 root      34  19     0    0    0 S  2.0  0.0 115:34.98 kipmi0

References