HPC/Benchmarks/Generation 1 vs 2
< HPC
Jump to navigation
Jump to search
Introduction
Earlier this year, we received 200 additional nodes with E5540 processors. The processors have 8 cores, and support Hyperthreading, a feature which allows 2 threads per core. This benchmark investigates the benefit of hyperthreading (HT), and suggests optimal values for the nodes and processors per node (ppn) parameters in PBS.
Test description
The test runs /opt/soft/vasp-4.6.35-mkl-8/bin/vasp
with th following workload (Credit: D. Shin, Northwestern Univ.).
INCAR
SYSTEM = Al12Mg17 ISTART = 0 ISMEAR = 1 SIGMA = 0.1 ISIF = 3 PREC = HIGH IBRION = 2 LWAVE = .FALSE. LCHARG = .FALSE. LREAL = .TRUE. ENCUT = 346
KPOINTS
KPOINTS file 0 Monkhorst-Pack 10 10 10 0 0 0
POSCAR
Al12Mg17 1.0000000000 -5.2719000000 5.2719000000 5.2719000000 5.2719000000 -5.2719000000 5.2719000000 5.2719000000 5.2719000000 -5.2719000000 12 17 Direct 0.3679000000 0.3679000000 0.1908000000 Al 0.1771000000 0.1771000000 0.8092000000 Al 0.6321000000 0.8229000000 0.0000000000 Al 0.8229000000 0.6321000000 0.0000000000 Al 0.1908000000 0.3679000000 0.3679000000 Al 0.3679000000 0.1908000000 0.3679000000 Al 0.0000000000 0.8229000000 0.6321000000 Al 0.1771000000 0.8092000000 0.1771000000 Al 0.8092000000 0.1771000000 0.1771000000 Al 0.8229000000 0.0000000000 0.6321000000 Al 0.0000000000 0.6321000000 0.8229000000 Al 0.6321000000 0.0000000000 0.8229000000 Al 0.3975000000 0.3975000000 0.7164000000 Mg 0.6811000000 0.6811000000 0.2836000000 Mg 0.6025000000 0.3189000000 0.0000000000 Mg 0.3189000000 0.6025000000 0.0000000000 Mg 0.7164000000 0.3975000000 0.3975000000 Mg 0.3975000000 0.7164000000 0.3975000000 Mg 0.0000000000 0.3189000000 0.6025000000 Mg 0.6811000000 0.2836000000 0.6811000000 Mg 0.2836000000 0.6811000000 0.6811000000 Mg 0.3189000000 0.0000000000 0.6025000000 Mg 0.0000000000 0.6025000000 0.3189000000 Mg 0.6025000000 0.0000000000 0.3189000000 Mg 0.6480000000 0.6480000000 0.6480000000 Mg 0.0000000000 0.0000000000 0.3520000000 Mg 0.3520000000 0.0000000000 0.0000000000 Mg 0.0000000000 0.3520000000 0.0000000000 Mg 0.0000000000 0.0000000000 0.0000000000 Mg
POTCAR
PAW_GGA Al 05Jan2001 3.00000000000000000 parameters from PSCTR are: VRHFIN =Al: s2p1 LEXCH = 91 EATOM = 53.6910 eV, 3.9462 Ry ... PAW_GGA Mg 05Jan2001 2.00000000000000000 parameters from PSCTR are: VRHFIN =Mg: s2p0 LEXCH = 91 EATOM = 23.0823 eV, 1.6965 Ry ...
(appbreviated)
Data
- HPC/Generation-2 nodes/vasp/vasp.lst, grep-able
- media:Vasp.txt, tab-separated CSV
- media:Vasp.pdf, PDF
Observations
- 4-core runs give a high numerical throughput in each node type (run=01 to 04)
- gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
- Adding more nodes allows for the fastest run (run=54) or 40% slower and a better charge rate (run=52)
- Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
- HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50).
Recommendations
For the given workload, the I recommend the following values for optimal performance with respect to the given objective.
Node type | Objective | ||
---|---|---|---|
time → min | charge → min | time × charge → min | |
gen1 | nodes=4:ppn=3 |
nodes=1:ppn=4 |
nodes=3:ppn=4
|
run=35 tmax=503.05 charge=4.47 |
run=01 tmax=1138.83 charge=2.53 |
run=33 tmax=544.74 charge=3.63 | |
gen2 | nodes=4:ppn=4 |
nodes=1:ppn=8 |
nodes=2:ppn=8
|
run=54 tmax=237.59 charge=2.11 |
run=22 tmax=472.16 charge=1.05 |
run=52 tmax=329.10 charge=1.46 |