HPC/Submitting and Managing Jobs/Queues and Policies: Difference between revisions
(broken out from parent) |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== | == Introduction == | ||
There is one main queue and one debug queue on Carbon, defined as [http://en.wikipedia.org/wiki/TORQUE_Resource_Manager Torque] queues. | |||
Once submitted, job routing decisions are made by the [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab scheduler]. | |||
In this framework, short jobs are accommodated by a daily reserved node and by ''backfill'' scheduling, | |||
i.e. "waving forward" small jobs while one or more big jobs wait for full resources to become available. | |||
== | == Default queue "batch" == | ||
The main queue on Carbon is <code>batch</code> and need not be specified to qsub or in the job script. | |||
The following defaults and limits apply: | |||
resources_'''default.nodes = 1:ppn=8''' | |||
resources_'''default.walltime = 00:15:00''' # 15 min | |||
resources_'''max.walltime = 240:00:00''' # 10 days | |||
max_user_queuable = 2000 | |||
For appropriate <code>ppn</code> values, see [[HPC/Hardware Details]]. | |||
In addition, the Moab scheduler applies various per-user limits. | |||
Straightforward are hard and soft limits on the number of concurrent jobs and CPU cores used (about 60% of the whole machine), | |||
designed to prevent monopolizing the cluster by a single user while permitting use of otherwise idle resources. | |||
A more advanced parameter is a cutoff for queued jobs ''considered for scheduling,'' based on their total number of cores requested (MAXIPROC). | |||
This ensures a fair job turnover between different users, while not restricting throughput for large numbers of "small" jobs. | |||
<!-- | |||
MAXJOB metaphor only: This concept is similar to that a line of people waiting outside a building vs. those permitted to wait inside the lobby. | |||
--> | |||
See also: | |||
* [http://www.adaptivecomputing.com/resources/docs/torque/4.1queueconfig.php Torque queue configuration]. | |||
* [http://www.adaptivecomputing.com/resources/docs/mwm/6.2throttlingpolicies.php Moab Usage Limits/Throttling Policies]. | |||
== Queue "debug" == | |||
For testing job processing and your job environment, use <code>qsub '''-q debug'''</code> on the command line, or the follwing in a job script: | For testing job processing and your job environment, use <code>qsub '''-q debug'''</code> on the command line, or the follwing in a job script: | ||
#PBS '''-q debug''' | #PBS '''-q debug''' | ||
The debug queue accepts jobs under the following conditions | The debug queue accepts jobs under the following conditions | ||
resources_'''default.nodes = 1:ppn=4''' | |||
resources_'''max.nodes = 2:ppn=8''' | |||
resources_'''default.walltime = 00:15:00''' | |||
resources_'''max.walltime = 01:00:00''' | |||
max_user_queuable = 3 | |||
max_user_run = 2 | |||
in other words, | |||
nodes ≤ 2 | nodes ≤ 2 | ||
ppn ≤ | ppn ≤ 8 | ||
walltime ≤ 1:00:00 | walltime ≤ 1:00:00 # 1 hour |
Latest revision as of 17:07, April 9, 2024
Introduction
There is one main queue and one debug queue on Carbon, defined as Torque queues. Once submitted, job routing decisions are made by the Moab scheduler.
In this framework, short jobs are accommodated by a daily reserved node and by backfill scheduling, i.e. "waving forward" small jobs while one or more big jobs wait for full resources to become available.
Default queue "batch"
The main queue on Carbon is batch
and need not be specified to qsub or in the job script.
The following defaults and limits apply:
resources_default.nodes = 1:ppn=8 resources_default.walltime = 00:15:00 # 15 min resources_max.walltime = 240:00:00 # 10 days max_user_queuable = 2000
For appropriate ppn
values, see HPC/Hardware Details.
In addition, the Moab scheduler applies various per-user limits. Straightforward are hard and soft limits on the number of concurrent jobs and CPU cores used (about 60% of the whole machine), designed to prevent monopolizing the cluster by a single user while permitting use of otherwise idle resources. A more advanced parameter is a cutoff for queued jobs considered for scheduling, based on their total number of cores requested (MAXIPROC). This ensures a fair job turnover between different users, while not restricting throughput for large numbers of "small" jobs.
See also:
Queue "debug"
For testing job processing and your job environment, use qsub -q debug
on the command line, or the follwing in a job script:
#PBS -q debug
The debug queue accepts jobs under the following conditions
resources_default.nodes = 1:ppn=4 resources_max.nodes = 2:ppn=8 resources_default.walltime = 00:15:00 resources_max.walltime = 01:00:00 max_user_queuable = 3 max_user_run = 2
in other words,
nodes ≤ 2 ppn ≤ 8 walltime ≤ 1:00:00 # 1 hour