HPC/Benchmarks/Generation 1 vs 2: Difference between revisions

Revision as of 21:08, March 14, 2010

Introduction

Earlier this year, we received 200 additional nodes with E5540 processors. The processors have 8 cores, and support Hyperthreading, a feature which allows 2 threads per core. This benchmark investigates the benefit of hyperthreading (HT), and suggests optimal values for the nodes and processors per node (ppn) parameters in PBS.

Observations and conclusions

4-core runs give a high numerical throughput in each node type (run=01 to 04)
gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
Adding more nodes allows for the fastest run (run=54) or 40% slower and a better charge rate (run=52)
Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive. This also holds for the only case tested for HT and napps=1 (run=50).

@@ Line 1: / Line 1: @@
 == Introduction ==
-Earlier this year, we received 200 additional nodes with E5540 processors.
+Earlier this year, we received 200 additional nodes with E5540 processors.  The processors have 8 cores, and support Hyperthreading, a feature which allows 2 threads per core.  This benchmark investigates the benefit of hyperthreading (HT), and suggests optimal values for the ''nodes'' and ''processors per node (ppn)'' parameters in PBS.
+== Observations and conclusions ==
+* 4-core runs give a high numerical throughput in each node type  (run=01 to 04)
+* gen2 nodes are fine for VASP with nodes=1:ppn=8; gen1 nodes are not (run=22 vs. 21)
+* Adding more nodes allows for the fastest run (run=54) or 40% slower and a better charge rate (run=52)
+* Running two apps in a single job is mostly not worth the effort of managing them (run=04 vs. 22)
+* HT allows for slightly better charge rates, but usually only with non-MPI jobs (or unsynced MPI jobs) (run=15, 25, 40, 55), and runtimes are nearly proportionately longer, making HT largely unattractive.  This also holds for the only case tested for HT and napps=1 (run=50).

HPC/Benchmarks/Generation 1 vs 2: Difference between revisions

Revision as of 21:08, March 14, 2010

Introduction

Observations and conclusions

Navigation menu

Search