HPC/Applications/lammps: Difference between revisions
m (→GPU support) |
mNo edit summary |
||
Line 23: | Line 23: | ||
To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended. | To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended. | ||
== GPU support == | |||
LAMMPS offers ''two different'' packages for using GPUs, one official, the other user-contributed. | LAMMPS offers ''two different'' packages for using GPUs, one official, the other user-contributed. | ||
Only one of these packges can be used for a run. | Only one of these packges can be used for a run. | ||
Line 32: | Line 32: | ||
** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_7 5.7 USER-CUDA package] | ** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_7 5.7 USER-CUDA package] | ||
** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_8 5.8 Comparison of GPU and USER-CUDA packages] | ** [http://lammps.sandia.gov/doc/Section_accelerate.html#acc_8 5.8 Comparison of GPU and USER-CUDA packages] | ||
To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section. | To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section. | ||
=== Package-specific notes === | |||
== Package-specific | |||
* [[../lammps/Package GPU]] | * [[../lammps/Package GPU]] | ||
* [[../lammps/Package USER-CUDA]] | * [[../lammps/Package USER-CUDA]] | ||
* [[../lammps/Package OMP]] | * [[../lammps/Package OMP]] | ||
=== Jobs on Carbon === | |||
For sample PBS scripts, consult these files: | For sample PBS scripts, consult these files: | ||
$LAMMPS_HOME/sample.job | $LAMMPS_HOME/sample.job | ||
$LAMMPS_HOME/sample-hybrid.job | $LAMMPS_HOME/sample-hybrid.job | ||
* To request your job to run on a GPU node use in the job file: | |||
#PBS -l nodes=…:gpus=1 | |||
At the moment this is synonymous with but preferable to: | |||
#PBS -l nodes=…:gen3 | |||
* Each GPU node has 12 cores; if you submit jobs with <code>:ppn < 12</code> and <code>:gpus=1</code> the node may be shared with purely CPU jobs. It is to be tested if and how much interference this causes for either job. See [[HPC/Submitting and Managing Jobs/Advanced node selection|Advanced node selection]] to reserve entire nodes while controlling <code>ppn</code> for MPI or OpenMP. | |||
== Benchmark (pre-GPU version) == | == Benchmark (pre-GPU version) == |
Revision as of 22:37, November 14, 2012
Binaries
As of module version lammps/2012-10-10-3 (which currently is the default) several LAMMPS binaries are provided within one module. Binaries compiled with GPU support will not run on nodes without a GPU (CUDA libraries are deliberately only installed on GPU nodes.) Moreover, a binary built with the USER-CUDA package will attempt to access the GPU by default [1].
Binary name | Description |
---|---|
lmp_openmpi-main |
The baseline binary, containing the packages shown by module help lammps .
|
lmp_openmpi |
The distribution's default name; synonym for lmp_openmpi-main ;
|
lmp_openmpi-gpu |
The package "gpu" and all packages from main. |
lmp_openmpi-user-cuda |
The package "user-cuda" and all packages from main. |
lmp_openmpi-jr |
A custom build for user J.R. |
To use the *-gpu and *=user-cuda binaries, load the cuda
module in addition to lammps.
module load cuda module load lammps
To use any one of the binaries, simply name the appropriate one in the job file; full paths are neither necessary nor recommended.
GPU support
LAMMPS offers two different packages for using GPUs, one official, the other user-contributed. Only one of these packges can be used for a run. The packages are fully documented in the following sections of the LAMMPS manual:
To use LAMMPS with GPUs on Carbon you must read and understand these sections. A summary and Carbon-specific details are given in the next section.
Package-specific notes
- HPC/Applications/lammps/Package GPU
- HPC/Applications/lammps/Package USER-CUDA
- HPC/Applications/lammps/Package OMP
Jobs on Carbon
For sample PBS scripts, consult these files:
$LAMMPS_HOME/sample.job $LAMMPS_HOME/sample-hybrid.job
- To request your job to run on a GPU node use in the job file:
#PBS -l nodes=…:gpus=1
At the moment this is synonymous with but preferable to:
#PBS -l nodes=…:gen3
- Each GPU node has 12 cores; if you submit jobs with
:ppn < 12
and:gpus=1
the node may be shared with purely CPU jobs. It is to be tested if and how much interference this causes for either job. See Advanced node selection to reserve entire nodes while controllingppn
for MPI or OpenMP.
Benchmark (pre-GPU version)
Using a sample workload from Sanket ("run9"), I tested various OpenMPI options on node types gen1 and gen2.
LAMMPS performs best on gen2 nodes without extra options, and pretty well on gen1 nodes over ethernet(!).
Job tag | Node type | Interconnect | Additional OpenMPI options | Relative speed (1000 steps/3 hours) |
Notes |
---|---|---|---|---|---|
gen1 | gen1 | IB | (none) | 36 | |
gen1srqpin | gen1 | IB | -mca btl_openib_use_srq 1 -mca mpi_paffinity_alone 1 |
39 | |
gen1eth | gen1 | Ethernet | -mca btl self,tcp | 44 | fastest for gen1 |
gen2eth | gen2 | Ethernet | -mca btl self,tcp | 49 | |
gen2srq | gen2 | IB | -mca btl_openib_use_srq 1 | 59 | |
gen2 | gen2 | IB | (none) | 59 | fastest for gen2 |
Diagnostic for hybrid parallel runs
- LAMMPS echoes it parallelization scheme first thing in the output:
LAMMPS (10 Feb 2012) using 4 OpenMP thread(s) per MPI task ... 1 by 2 by 2 MPI processor grid 104 atoms ...
and near the end:
Loop time of 124.809 on 16 procs (4 MPI x 4 OpenMP) for 30000 steps with 104 atoms
- To see if OpenMP is really active, log into a compute node while a job is running and run
top
orpsuser
– The%CPU
field should be aboutOMP_NUM_THREADS × 100%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8047 stern 25 0 4017m 33m 7540 R 401.8 0.1 1:41.60 lmp_openmpi 8044 stern 25 0 4017m 33m 7540 R 399.9 0.1 1:43.50 lmp_openmpi 4822 root 34 19 0 0 0 S 2.0 0.0 115:34.98 kipmi0
References
- HPC/Submitting_Jobs/Advanced node selection#Multithreading (OpenMP)
- LAMMPS documentation for the OMP package
- Command-line options (explanation for -sf style or -suffix style)