Under construction to ARROW Cluster: Difference between revisions
Line 1: | Line 1: | ||
== Introduction To ARROW == | == Introduction To ARROW == | ||
TRACC has combined the original hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue | TRACC has combined the original hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue. | ||
<p>ARROW is arranged such that there is a single set of 4 login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues. | <p>ARROW is arranged such that there is a single set of 4 login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues. | ||
Further all nodes on ARROW are logically divided into one scheduling system or another. The first scheduling system is called Torque and Maui (called Torque in the rest of this presentation) and the second scheduling system is called is called Torque.<p> | |||
Current job submission system | Current job submission system- By default, we are currently using Torque as our queuing system. Jobs can be submitted from any of the login nodes. Once a job starts, the nodes assigned to that user are generally accessible by additional ssh sessions from any other node in the system. For example, if you submit a job from login1, you can go to login2 and create an ssh session to the node that was handed out by the scheduler. Think of it as a global resource allocation that gives you access to a few nodes that you can do anything on as you desire until the job time expires. This is true for interactive and batch sessions, it’s all the same. Any node assigned to a user is fully allocated to that user, and a job can only ask for full nodes. No other users can share a node that has been handed out to a user. The queues are used to get certain CPU types for the job. So much about Torque and Maui. | ||
Revision as of 00:18, January 28, 2025
Introduction To ARROW
TRACC has combined the original hardware from the Phoenix and Zephyr clusters into the ARROW cluster. This consolidation allows efficient administration of TRACC cluster services. To avoid the problems of load balancing, the different types of hardware nodes on the ARROW cluster are partitioned and available in queues. When new hardware is installed to expand cluster resources, it will be made available via a new queue.
ARROW is arranged such that there is a single set of 4 login nodes, a singe file system, and single user home directory that serves all of the nodes in all of the queues. Further all nodes on ARROW are logically divided into one scheduling system or another. The first scheduling system is called Torque and Maui (called Torque in the rest of this presentation) and the second scheduling system is called is called Torque.
Current job submission system- By default, we are currently using Torque as our queuing system. Jobs can be submitted from any of the login nodes. Once a job starts, the nodes assigned to that user are generally accessible by additional ssh sessions from any other node in the system. For example, if you submit a job from login1, you can go to login2 and create an ssh session to the node that was handed out by the scheduler. Think of it as a global resource allocation that gives you access to a few nodes that you can do anything on as you desire until the job time expires. This is true for interactive and batch sessions, it’s all the same. Any node assigned to a user is fully allocated to that user, and a job can only ask for full nodes. No other users can share a node that has been handed out to a user. The queues are used to get certain CPU types for the job. So much about Torque and Maui.