ARROW Cluster: Difference between revisions

From TRACC Wiki
Jump to navigation Jump to search
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction To ARROW ==
== Introduction To ARROW ==
TRACC has now combined the hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services with limited staff. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue. The documentation at [[Using the Clusters]] describes procedures for using ARROW.
TRACC has now combined the hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services with limited staff. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue. The documentation at [[Using the Cluster]] describes procedures for using ARROW.
<p>ARROW is arranged such that there is a single set of login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues.
<p>ARROW is arranged such that there is a single set of login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues.
== ARROW Queues==
== ARROW Queues==
There are currently five queues that are available with some restrictions about who can use them as described below.
There are currently several queues that are available, some with restrictions about who can use them as described below. Also be aware that all nodes in some queues have the same characteristics (RAM, etc) while some queues have nodes with different characteristics. Thus jobs using those queues must specify the node names that are to be used.
* batch ('''default queue''', with 94 nodes, each node with 16 floating point cores available for general use)
 
** 92 nodes have 32 GB of RAM
 
** 2 nodes (nodes 1 and 2) with 128GB
* batch queue (default queue)
** 2 nodes (nodes 3 and 4) with 64GB
** 95 nodes numbered n005 through n099
* nhtsa (with 12 nodes, each with 28 cores and 64 GB of RAM, only available to the NHTSA project)
** 2 x AMD Opteron 6276
* arrow (one new EPYC server with 64 cores, for use for testing by TRACC staff or special permission by the TRACC Director)
** 16 floating point cores per node
* virtual (This queue is only available for testing and is considered under construction. Please do not use for now.)
** 32GB of RAM per node
* test (This queue is only available for testing, and is only available with permission by the TRACC Director. The nodes as currently configure are
** available for general use
not very powerful but have large amounts of RAM.)
* batch128 queue
** 2 nodes numbered n001 and n002
** Same design as batch queue
** 128GB of RAM per node
** available for general use
* batch64 queue
** 2 nodes numbered n003 and n004
** Same design as batch queue
** 64GB of RAM per node
** available for general use
* nhtsa queue
** 12 nodes numbered p001 through p012
** 2 x Intel Xeon E5-2690 v4
** 28 floating point cores per node
** 64GB of RAM per node
** only available to the NHTSA project
* arrow queue
** 5 nodes numbered a001 through a005 (more on order)
** 1 x Intel EPYC 7702P
** 64 floating point cores per node
** 256GB of RAM per node, 512GB on nodes a001 through a003
** available for general use
 
* virtual queue
** 5 nodes numbered v001 through v005
** For internal testing and validation only
** Minimal virtual hardware, not capable of running engineering applications
* test queue
** 4 nodes numbered t001 through t004
** 1x Intel Xeon CPU E5-2620
** 6 floating point cores per node
** 64GB of RAM on t001 and t003, 32GB on t002 and t004
** Available upon request, but usually reserved for testing and parallel software development

Revision as of 21:24, July 6, 2021

Introduction To ARROW

TRACC has now combined the hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services with limited staff. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue. The documentation at Using the Cluster describes procedures for using ARROW.

ARROW is arranged such that there is a single set of login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues.

ARROW Queues

There are currently several queues that are available, some with restrictions about who can use them as described below. Also be aware that all nodes in some queues have the same characteristics (RAM, etc) while some queues have nodes with different characteristics. Thus jobs using those queues must specify the node names that are to be used.


  • batch queue (default queue)
    • 95 nodes numbered n005 through n099
    • 2 x AMD Opteron 6276
    • 16 floating point cores per node
    • 32GB of RAM per node
    • available for general use


  • batch128 queue
    • 2 nodes numbered n001 and n002
    • Same design as batch queue
    • 128GB of RAM per node
    • available for general use


  • batch64 queue
    • 2 nodes numbered n003 and n004
    • Same design as batch queue
    • 64GB of RAM per node
    • available for general use


  • nhtsa queue
    • 12 nodes numbered p001 through p012
    • 2 x Intel Xeon E5-2690 v4
    • 28 floating point cores per node
    • 64GB of RAM per node
    • only available to the NHTSA project


  • arrow queue
    • 5 nodes numbered a001 through a005 (more on order)
    • 1 x Intel EPYC 7702P
    • 64 floating point cores per node
    • 256GB of RAM per node, 512GB on nodes a001 through a003
    • available for general use


  • virtual queue
    • 5 nodes numbered v001 through v005
    • For internal testing and validation only
    • Minimal virtual hardware, not capable of running engineering applications


  • test queue
    • 4 nodes numbered t001 through t004
    • 1x Intel Xeon CPU E5-2620
    • 6 floating point cores per node
    • 64GB of RAM on t001 and t003, 32GB on t002 and t004
    • Available upon request, but usually reserved for testing and parallel software development