HPC/Benchmarks/Generation 1 vs 2: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
Line 7: Line 7:
* [[media:Vasp.pdf]], PDF
* [[media:Vasp.pdf]], PDF


== Observations and conclusions ==
== Observations ==
* 4-core runs give a high numerical throughput in each node type  (run=01 to 04)
* 4-core runs give a high numerical throughput in each node type  (run=01 to 04)
* gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
* gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
Line 13: Line 13:
* Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
* Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
* HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive.  This also holds for the only case tested for HT and napps=1 (run=50).
* HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive.  This also holds for the only case tested for HT and napps=1 (run=50).
== Recommendations ==
'''For the given workload,''' the I recommend the following values for optimal performance with respect to the given objective.
{| class="wikitable" cellpadding="5" style="text-align:center;  margin: 1em auto 1em auto;"
|- style="background:#eee;"
! Node type
! colspan=3 | Objective
|- style="background:#eee;"
| width="100px" |
| width="200px" | time → min
| width="200px" | charge →  min
| width="200px" | time  × charge → min
|-
|  gen1  ||  <font color="blue"><code>nodes=4:ppn=3</code></font>  ||  <font color="blue"><code>nodes=1:ppn=4</code></font>  ||  <font color="blue"><code>nodes=3:ppn=4</code></font>
|-
|
| ''run=35<br>tmax=503.05<br>charge=4.47''
| ''run=01<br>tmax=1138.83<br>charge=2.53''
| ''run=33<br>tmax=544.74<br>charge=3.63''
|-
|  gen2  ||  <font color="blue"><code>nodes=4:ppn=4</code></font>  ||  <font color="blue"><code>nodes=1:ppn=8</code></font>  ||  <font color="blue"><code>nodes=2:ppn=8</code></font>
|-
|
| ''run=54<br>tmax=237.59<br>charge=2.11''
| ''run=22<br>tmax=472.16<br>charge=1.05''
| ''run=52<br>tmax=329.10<br>charge=1.46''
|-
|}
<pre>

Revision as of 22:02, March 14, 2010

Introduction

Earlier this year, we received 200 additional nodes with E5540 processors. The processors have 8 cores, and support Hyperthreading, a feature which allows 2 threads per core. This benchmark investigates the benefit of hyperthreading (HT), and suggests optimal values for the nodes and processors per node (ppn) parameters in PBS.

Data

Observations

  • 4-core runs give a high numerical throughput in each node type (run=01 to 04)
  • gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
  • Adding more nodes allows for the fastest run (run=54) or 40% slower and a better charge rate (run=52)
  • Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
  • HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50).

Recommendations

For the given workload, the I recommend the following values for optimal performance with respect to the given objective.

Node type Objective
time → min charge → min time × charge → min
gen1 nodes=4:ppn=3 nodes=1:ppn=4 nodes=3:ppn=4
run=35
tmax=503.05
charge=4.47
run=01
tmax=1138.83
charge=2.53
run=33
tmax=544.74
charge=3.63
gen2 nodes=4:ppn=4 nodes=1:ppn=8 nodes=2:ppn=8
run=54
tmax=237.59
charge=2.11
run=22
tmax=472.16
charge=1.05
run=52
tmax=329.10
charge=1.46