HPC/Benchmarks/Generation 1 vs 2: Difference between revisions
< HPC
Jump to navigation
Jump to search
m (→Data) |
|||
Line 7: | Line 7: | ||
* [[media:Vasp.pdf]], PDF | * [[media:Vasp.pdf]], PDF | ||
== Observations | == Observations == | ||
* 4-core runs give a high numerical throughput in each node type (run=01 to 04) | * 4-core runs give a high numerical throughput in each node type (run=01 to 04) | ||
* gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21) | * gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21) | ||
Line 13: | Line 13: | ||
* Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22) | * Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22) | ||
* HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50). | * HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50). | ||
== Recommendations == | |||
'''For the given workload,''' the I recommend the following values for optimal performance with respect to the given objective. | |||
{| class="wikitable" cellpadding="5" style="text-align:center; margin: 1em auto 1em auto;" | |||
|- style="background:#eee;" | |||
! Node type | |||
! colspan=3 | Objective | |||
|- style="background:#eee;" | |||
| width="100px" | | |||
| width="200px" | time → min | |||
| width="200px" | charge → min | |||
| width="200px" | time × charge → min | |||
|- | |||
| gen1 || <font color="blue"><code>nodes=4:ppn=3</code></font> || <font color="blue"><code>nodes=1:ppn=4</code></font> || <font color="blue"><code>nodes=3:ppn=4</code></font> | |||
|- | |||
| | |||
| ''run=35<br>tmax=503.05<br>charge=4.47'' | |||
| ''run=01<br>tmax=1138.83<br>charge=2.53'' | |||
| ''run=33<br>tmax=544.74<br>charge=3.63'' | |||
|- | |||
| gen2 || <font color="blue"><code>nodes=4:ppn=4</code></font> || <font color="blue"><code>nodes=1:ppn=8</code></font> || <font color="blue"><code>nodes=2:ppn=8</code></font> | |||
|- | |||
| | |||
| ''run=54<br>tmax=237.59<br>charge=2.11'' | |||
| ''run=22<br>tmax=472.16<br>charge=1.05'' | |||
| ''run=52<br>tmax=329.10<br>charge=1.46'' | |||
|- | |||
|} | |||
<pre> |
Revision as of 22:02, March 14, 2010
Introduction
Earlier this year, we received 200 additional nodes with E5540 processors. The processors have 8 cores, and support Hyperthreading, a feature which allows 2 threads per core. This benchmark investigates the benefit of hyperthreading (HT), and suggests optimal values for the nodes and processors per node (ppn) parameters in PBS.
Data
- HPC/Generation-2 nodes/vasp/vasp.lst, grep-able
- media:Vasp.txt, tab-separated CSV
- media:Vasp.pdf, PDF
Observations
- 4-core runs give a high numerical throughput in each node type (run=01 to 04)
- gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
- Adding more nodes allows for the fastest run (run=54) or 40% slower and a better charge rate (run=52)
- Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
- HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50).
Recommendations
For the given workload, the I recommend the following values for optimal performance with respect to the given objective.
Node type | Objective | ||
---|---|---|---|
time → min | charge → min | time × charge → min | |
gen1 | nodes=4:ppn=3 |
nodes=1:ppn=4 |
nodes=3:ppn=4
|
run=35 tmax=503.05 charge=4.47 |
run=01 tmax=1138.83 charge=2.53 |
run=33 tmax=544.74 charge=3.63 | |
gen2 | nodes=4:ppn=4 |
nodes=1:ppn=8 |
nodes=2:ppn=8
|
run=54 tmax=237.59 charge=2.11 |
run=22 tmax=472.16 charge=1.05 |
run=52 tmax=329.10 charge=1.46 |