HPC/Hardware Details: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
 
(17 intermediate revisions by the same user not shown)
Line 3: Line 3:


== User nodes ==
== User nodes ==
<!--
[[Image:HPC Compute Node Chassis.jpg|thumb|right|200px|[http://www.supermicro.com/products/nfo/1UTwin.cfm 1U Twin] node chassis ([http://www.supermicro.com/products/chassis/1U/808/SC808T-980V.cfm Supermicro]).]]
[[Image:HPC Compute Node Chassis.jpg|thumb|right|200px|[http://www.supermicro.com/products/nfo/1UTwin.cfm 1U Twin] node chassis ([http://www.supermicro.com/products/chassis/1U/808/SC808T-980V.cfm Supermicro]).]]
[[Image:HPC Compute Rack-up.png|thumb|right|200px|]]
[[Image:HPC Compute Rack-up.png|thumb|right|200px|]]
<!-- * [http://www.supermicro.com/products/nfo/1UTwin.cfm "1U Twin"] by [http://www.supermicro.com/ Supermicro] -->
-->
* Carbon has two major hardware node types, named '''gen1''' and '''gen2'''.
<!--
* Node characteristics
* [http://www.supermicro.com/products/nfo/1UTwin.cfm "1U Twin"] by [http://www.supermicro.com/ Supermicro]
{{Template:Table of node types}}
-->
* All nodes are dual-socket (4 cores/CPU, 8 cores/node).
* Compute time on gen1 nodes is charged at a 50% discount of walltime. Depending on cores used and memory throughput demanded, these nodes may actually be about ''on par'' with gen2 (low memory throughput) or up to about 2–3 times slower.


== Infrastructure nodes ==
{{Template:Section node types}}
* 2 Management nodes
 
* 2 Lustre MDS
<!-- https://www.osc.edu/documentation/knowledge_base/out_of_memory_oom_or_excessive_memory_usage -->
* 4 Lustre OSS
* dual socket, quad core (Intel Xeon E5345, 2.33 GHz)
* pairwise failover


== Storage ==
== Storage ==
* [http://wiki.whamcloud.com/ Lustre] parallel file system
* [http://wiki.whamcloud.com/ Lustre] parallel file system for /home and /sandbox
* 42 TB effective (84 TB raw RAID-10)
* ≈600 TB total
* 2 [http://www.nexsan.com/ NexSAN SATAbeast]
* local disk per compute node,  160–250 GB
* 160–250 GB local disk per compute node
* NFS
** for user applications and cluster management
** highly-available server based on [http://www.drbd.org/ DRBD]
 
[[Image:HPC Infiniband-blue.png|thumb|right|200px|]]
[[Image:HPC Infiniband-blue.png|thumb|right|200px|]]


== Interconnect ==
== Interconnect ==
* Infiniband 4x DDR (Mellanox, ''onboard'' ) MPI and Lustre
* Infiniband – used for parallel communication and storage
* Ethernet 1 GB/s – node access and management
* Gigabit Ethernet – used for general node access and management
* Ethernet 10 Gb/s – crosslinks and uplink
* FibreChannel – storage backends


== Power ==
== Power ==
* UPS (2) – carries infrastructure nodes and network switches
* Power consumption at typical load: ≈125 kW
* PDUs – switched and metered
* Power consumption at typical load: 118 kW


[[Category:HPC|Hardware]]
[[Category:HPC|Hardware]]

Latest revision as of 18:33, January 23, 2023


HPC-Main-external.jpg

Carbon Cluster
User Information


User nodes

Carbon has many major node generations, named genN for short, with N being an integer. In some generations, nodes differ further by the amount of memory.

Node Types

Node
names, types
Node
generation
Node
extra
properties
Node
count
Cores
per node
(max. ppn)
Cores total,
by type
Account
charge
rate
CPU
model
CPUs
per node
CPU
nominal
clock
(GHz)
Mem.
per node
(GB)
Mem.
per core
(GB)
GPU
model
GPU
per node
VRAM
per GPU
(GB)
Disk
per node
(GB)
Year
added
Note
Login
login5…6 gen7a gpus=2 2 16 32 1.0 Xeon Silver 4125 2 2.50 192 12 Tesla V100 2 32 250 2019
Compute
n421…460 gen5 40 16 640 1.0 Xeon E5-2650 v4 2 2.10 128 8 250 2017
n461…476 gen6 16 16 256 1.0 Xeon Silver 4110 2 2.10 96 6 1000 2018
n477…512 gen6 36 16 576 1.0 Xeon Silver 4110 2 2.10 192 12 1000 2018
n513…534 gen7 gpus=2 22 32 704 1.5 Xeon Gold 6226R 2 2.90 192 6 Tesla V100S 2 32 250 2020
n541…580 gen8 20 64 2560 1.0 Xeon Gold 6430 2 2.10 1024 16 420 2024
Total 134 4736 48
  • Compute time is charged as the product of cores reserved × wallclock time × charge rate. The charge rate accommodates nominal differences in CPU speed.
  • gen7 nodes have two GPUs each; GPU usage is currently not "charged" (accounted for) separately.
  • Virtual memory usage on nodes may reach up to about 2 × the physical memory size. Your processes running under PBS may allocate that much vmem but cannot practically use it all for reasons of swap space size and bandwidth. If a node acitvely uses swap for more than a few minutes (which drastically slows down compute performance), the job will automatically be killed.

Major CPU flags

CPU capabilities grow with each node generation. Executables can be compiled to leverage specific CPU capabilities. Jobs using such executables must use the qsub option -l nodes=...:genX to be directed to nodes having that capability.

Major CPU capability flags by node generation. For details, see: CPUID instruction in Wikipedia, a StackExchange article, or /usr/src/kernels/*/arch/x86/include/asm/cpufeatures.h in kernel sources.
Flag name gen5 gen6 gen7 gen8
cat_l2 cdp_l2 cldemote gfni movdir64b movdiri pconfig sha_ni umip vaes vpclmulqdq x
avx512_bitalg x
avx512_vbmi2 x
avx512_vpopcntdq x
avx512ifma x
avx512vbmi x
avx512_vnni x x
mpx x x
avx512bw x x x
avx512cd x x x
avx512dq x x x
avx512f x x x
avx512vl x x x
art clwb flush_l1d ibpb mba md_clear ospke pku ssbd stibp tsc_deadline_timer xgetbv1 xsavec x x x
3dnowprefetch abm acpi aes aperfmperf apic arat arch_perfmon bmi1 bmi2 bts cat_l3 cdp_l3 cmov constant_tsc cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cx16 cx8 dca de ds_cpl dtes64 dtherm dts eagerfpu epb ept erms est f16c flexpriority fpu fsgsbase fxsr hle ht ida invpcid invpcid_single lahf_lm lm mca mce mmx monitor movbe msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pts rdrand rdseed rdt_a rdtscp rep_good rsb_ctxsw rtm sdbg sep smap smep smx ss sse sse2 sse4_1 ssse3 syscall tm tm2 tpr_shadow tsc tsc_adjust vme vmx vnmi vpid x2apic xsave xsaveopt xtopology xtpr x x x x
avx x x x x
avx2 x x x x
fma x x x x
adx x x x x
sse4_2 x x x x


Storage

  • Lustre parallel file system for /home and /sandbox
  • ≈600 TB total
  • local disk per compute node, 160–250 GB
HPC Infiniband-blue.png

Interconnect

  • Infiniband – used for parallel communication and storage
  • Gigabit Ethernet – used for general node access and management

Power

  • Power consumption at typical load: ≈125 kW