Using the cluster

From Template
Jump to: navigation, search

Contents

Zephyr In Production

TRACC has now installed a new cluster computer called Zephyr. The old cluster has now been named Phoenix and is still available. The following information describes differences between using Zephyr and Phoenix. Note that the old commands without the term Phoenix will still work, but it is preferred that you start to use the new form. The Zephyr commands require the inclusion of the term Zephyr.

Zephyr Outstanding Issues

Data Backup

We currently do full and incremental backups of data on Phoenix's GPFS file system. The data is backed up to a robot tape unit directly connected to Phoenix. We intend to provide an Infiniband connection between Phoenix and Zephyr in the near future. This will allow us to backup some or all of the data residing on Zephyr's Lustre file system onto tape. While Zephyr provides a measure of redundancy using RAID controllers, users should be aware that until data backup on Zephyr is implemented, there is some risk that data could be lost if a catastrophic event should occur on the Lustre file system.

Transferring Files Between the Two Clusters

As described above under data backup, we plan to provide an Infiniband connection between the two clusters in the near future. We will then provide a mechanism to be able to mount Phoenix-based GPFS files (read only) on Zephyr and similarly mount Zephyr-based Lustre files (read only) on Phoenix. Until that feature is implemented, users can still transfer files between the two systems using sftp (see Transferring files from Phoenix to Zephyr below).

Ls-Dyna

The qsub-dyna-gui tool is not fully functional. Users should not use this tool to submit LS-Dyna jobs until it is updated. In the meantime, please use 'qsub-dynatui.py'. Note, please ensure the lstc/base module is loaded prior to executing qsub-dynatui.py. "module load lstc/base".

LS-Opt is not reporting 'success/failure' status of submitted jobs and therefore does not work properly. We are working on resolving this issue.

Ls-Dyna Licensing

LS-Dyna licensing has changed. We are now operating a commercial license that is not shared with students. Academic users need to purchase an Educational License from LSTC for use on the Zephyr cluster. If you are an academic institution and wish to obtain an Educational License for use on TRACC, please let us know so that we can coordinate the efforts with LSTC.

TRANSIMS Studio

This section is still under construction.

How to Connect: Summary

You'll need an SSH client to connect to the cluster.

To Login to Zephyr

hostname: login.zephyr.tracc.anl.gov 
username: your ANL domain username (without the ANL prefix) 
password: your ANL domain password

To Login to Phoenix

hostname: login.Phoenix.tracc.anl.gov
username: your ANL domain username (without the ANL prefix)
password: your ANL domain password

IMPORTANT. In order for the your home directory to be properly setup on Zephyr, please connect to the cluster using Putty on Windows or ssh from a UNIX/Linux workstation using the command: username@login.zephyr.tracc.anl.gov for the initial login. After that nomachine may be used to establish graphical desktop environment.

How to Connect: Details

SSH

  • Access to the TRACC cluster requires use of an SSH client. Free SSH clients are available for Windows, and are included with MacOS, Linux, and most varieties of UNIX.
  • Alternatively, the NoMachine NX client also uses SSH. For Zephyr you will have to select the GNOME DESKTOP.
  • If you're using a graphical application on the cluster, you'll need to enable X11 forwarding in your client. For Windows clients, this will usually involve checking a checkbox; for MacOS and Linux, you can do this:
ssh -X username@login.zephyr.tracc.anl.gov
or
ssh -X username@login.phoenix.tracc.anl.gov

Access restricted to authorized hosts

  • Access to the cluster is currently restricted to pre-approved hosts. Please contact us with the hostnames of the systems from which you would like to access the cluster. Users with Argonne Windows domain accounts can connect from the VPN; please contact us for the details.
  • Under Windows, you can find out your host name with the command "ipconfig". Note that IP addresses beginning with "10." or "192.168." are private addresses, and are not accessible outside your own network. If you have such an address, you can find out your public address at (for example) http://whatismyip.com.

login.Zephyr.tracc.anl.gov or login.Phoenix.tracc.anl.gov

  • To access the cluster, ssh to "login.Zephyr.tracc.anl.gov"or to "login.Phoenix.tracc.anl.gov". These names refers to all of the cluster's login hosts (two on Zephyr and three on Phoenix). Users are placed onto the real hosts in round-robin fashion, and as a result may end up on a different host from one login to the next. Because the real hosts and their hostnames may change without notice, we suggest that you use the alias and not the real hostnames to log onto the cluster.

Passwords

  • Please create an account profile at myPassword so that you can recover your password if you forget it.
  • The traditional commands for changing Unix passwords, passwd and yppasswd, don't work on the cluster, and have been disabled.
  • If your password has expired, or if you've forgotten it and can't reset it using myPassword,
  1. Contact the Argonne Help Desk at help@anl.gov or 630-252-9999. The Help Desk will give us a temporary, one-time password.
  2. Call us to obtain this password. We cannot email it to you.
  3. Go to https://credentials.anl.gov/ and choose a new password.
  • If you attempt to log in to the cluster using the temporary password, the whole process may need to be restarted.
  • TRACC cannot recover or reset your password if you forget it or it expires. You must go through the Argonne Help Desk.

SSH public key authentication

  • We allow SSH public key authentication. Keys used for remote access must be protected with a passphrase. If your keys are found to be unprotected by a password, your cluster account will be revoked.

Transferring Files

  • Files can be transferred to and from either cluster using a client which supports the SFTP protocol, or with scp. Free SFTP clients are available for Windows, and are included with MacOS, Linux, and most varieties of UNIX. Be sure your client is using port 22.
  • Some users have also reported success with cwrsync, a prepackaged version of Cgywin and rsync.

Windows text files

Note that text files created under Windows may cause problems on the cluster. This is because Windows and Linux have different conventions for representing a newline in ASCII text files. Some Windows text editors will allow you to save a file in the UNIX/Linux format before transferring the file to the cluster. Alternatively, you can use the Linux command dos2unix to remove the CR characters that Windows uses at the ends of lines:

dos2unix somefile

dos2unix will have no effect on a Linux text file, so it's safe to use on any text file.

You can determine whether a file is in Windows or Linux format with the command

cat -v somefile

Windows text files will have a ^M at the end of each line.

Transferring files from Phoenix to Zephyr

An easy and efficient way to transfer files is using the tool rsync over ssh.

The following example transfers the hpl_benchmark directory along with its content from the Phoenix cluster to the Zephyr cluster.

  1. Login to login.zephyr.tracc.anl.gov
  2. rsync -avze ssh login.phoenix.tracc.anl.gov:~/jobs/hpl_benchmark ~/jobs/

For more complicated transfers, i.e collaborative groups, please contact us.


The following example shows how to transfer a project file called BridgeModel.cad. First login to Zephyr:

ssh userid@login.zephyr.tracc.anl.gov

Then you must sftp to either:

   login1.phoenix.tracc.anl.gov
   login2.phoenix.tracc.anl.gov
   login3.phoenix.tracc.anl.gov 


To Establish the connection:

sftp jdoe@login1.phoenix.tracc.anl.gov
Connecting to login1.phoenix.tracc.anl.gov...
jdoe@login1.phoenix.tracc.anl.gov's password:

Once connected, change to desired directory:
sftp> cd project
sftp> ls
BridgeModel.cad
sftp> get BridgeModel.cad
Fetching /mnt/gpfs/home/jdoe/project/BridgeModel.cad to BridgeModel.cad
/mnt/gpfs/home/jdoe/project/BridgeModel.cad 100% 100KB 100.0KB/s 00:00
sftp> quit

[jdoe@login1 ~]$ ls -l BridgeModel.cad
-rw-r----- 1 jdoe jdoe 102400 Sep 24 20:30 BridgeModel.cad

Setting Up Your Environment

Shells

  • Your account will be set up with the bash shell, unless you asked for a different shell when you requested your account. You can find out your current shell with the command echo $SHELL.
  • The commands chsh and ypchsh don't work on the cluster, and have been disabled.
  • On Zephyr a jobs directory is created in a user's home directory to encourage users to organize and run their jobs from this directory.

Accessing application software with modules

  • modules documentation is available with the command man module.
  • To set up your environment so that an application can be used, use the command
module load <module>

at the command line or in your shell's configuration files.

  • You can undo the effect of a module load with
module unload <module>
  • It's generally not a good idea to load modules for more than one version of the same application. Instead, swap the modules:
module swap some-application/stable some-application/beta
  • Your .bashrc or .cshrc contains sample commands for loading the most common modules. Please see the instructions in those files, or in /etc/skel/.bashrc or /etc/skel/.cshrc, which will be the latest versions of the default configuration files.
  • Most modules are one of these two forms:
<application type>/<application>/<version>
<application>/<version>

for example

compiler/pathscale/3.1-147
fluent/6.3.26

Omitting the version number will load the latest stable version of the module.

To see what modules are currently loaded in your environment, do

module list

To see what modules are available, do

module avail
  • Modules with "-alpha" or "-beta" appended to the name may not work, either because the software or module is broken.
  • To use modules in shell scripts, including Torque jobs scripts, see below.

File permissions

By default, you will be in your own Unix group, and your files will be readable and writable only by you (that is, the default umask is 027). If you want files to be readable by everyone working on your project, you will need to

  • Have your project leader request that we create a Unix group for the project. They also need to let us know who should be in the group.
  • Change the group ownership of the files you want to be group-readable:
chgrp <project-group> <filename>
  • Make the file group-readable, and, if user-executable, group-executable:
chmod g+rX <filename>

If you want all your new files to be readable by everyone in your group, you'll need to

  • Ask us to change your default group.

Note that a change in your primary group will not affect pre-existing files. To change the group ownership of all your files, do this:

chgrp -R <new-group> ~
chmod -R g+rX ~
chgrp -R $USER ~/.ssh
chmod -R g-rx ~/.ssh

The last two commands are necessary for SSH security; you will lock yourself out of the cluster if you omit them.

Login Nodes vs. Compute Nodes

The cluster consists of login nodes and compute nodes, along with a few other nodes for administration and serving files. The login nodes are shared, and at any time a login node might be in use by 10 or 20 users. For that reason, CPU- and memory-intensive jobs, as well any code that runs a risk of crashing a machine, should be run only on the compute nodes.

Compute nodes can be used both for batch jobs and for interactive jobs. See below for how use a node interactively, and how to run GUI applications on a node.

The login nodes are named

Login1 and Login2 on Zephyr
and
L01, L02, L03 on Phoenix


and the compute nodes are named

n001, n002, ..., n091, n092 on Zephyr
and
n001, n002, ..., n127, n128 on Phoenix

Files & Storage

  • Each node in the cluster provides access to local disk storage and to a globally accessible filesystem mounted as /mnt/lustre on Zephyr and as /mnt/gpfs on Phoenix.
  • Your home directory, the directory you are placed in when you log in, is /home/<your userid> on Phoenix and /mnt/lustre/home/<user> on Zephyr. Your home directory is accessible from all the login nodes and compute nodes. You should use this space for storing files you want to keep long term, such as source code, scripts, input data sets, etc.

Collaborative groups

A collaborative group can be set up on the TRACC cluster as an easy way for the group to create and manage a shared directory tree of files and programs. A TRACC systems administrator will create the base directory for the directory structure shared by the group. The group base directory will have permissions set to populate the group file structure with directories and files that can be read by all group members, yet still retain some security against accidental modification or deletion of files not owned by a group member working in a group directory.

An example base group directory has the following permissions set:

drwsrws--T    3   joe-user     joes-group       32768 Oct 17 17:28 tfhrc 

Note that the setuid and setgid flags, “s,” are in the directory permissions in the position where the execute, “x,” flag would normally appear, allowing the directory owner and group members to list (“ls”) the directory and change to the directory (“cd”). The setuid flag is ignored by Linux. The setgid flag causes new files and subdirectories created within the directory to inherit its group ID rather than the group ID of the user who created the file or subdirectory. This inheritance feature allows users, without first changing their group ID with the “newgrp” command, to create files and subdirectories that have group ID of the parent directory instead of the group ID of the user.

In short, the setgid flag allows group members to work in the group directory without doing anything special when creating files; they will be automatically readable by other group members.

The group directory also has the sticky bit set, indicated by a capitol T in the others execute flag position. When the sticky bit is set on a directory, and the group write permission flag is set, group members may create files in the directory, including new subdirectories, and modify and delete the files that they own, but they will not be able to change or delete the files owned by other group members.

In short, the sticky bit set on a group directory, allows users to work safely in collaboration in a group directory: they can create and change their own files, read and copy the files of others, but they cannot change or delete the files of other group members.

When a user creates a subdirectory in the group directory, it inherits the group ID, but the permissions will be assigned based on the user’s mask (umask). The default umask on the TRACC cluster sets read and execute permissions for group members and no permissions for others. It does not set the group write permission. When a group member creates a subdirectory in the group directory, other group members will be able to change into that directory, read the files in the directory, copy the files into another directory, but they will not be able to create files in a new subdirectory that they do not own, unless the owner sets the group write permission for the directory. In that case, it is good practice for the directory owner to also set the sticky bit on the directory to allow other group members to add files to the directory, but prevent them from accidentally changing or deleting files belonging to other group members (i.e. files that they do not own).

In short, to allow other group members to create and modify their own files in a new group subdirectory, but not the files of others do:

chmod   g+w  subdirectory.name
chmod   +t   subdirectory.name

Group membership and the group directory provide a relatively easy way for a collaborative group to share data and programs.

Backups

We have not yet begun backing up home directories on Zephyr.

For Phoenix:

  • Your home directory is backed up nightly, and backups are retained 30 days. Backups may take some time to restore.
  • The local and global scratch spaces are not backed up.

Compiling Your Code

  • For Phoenix, Intel, GCC, HP, PGI, and MPICH are installed.
  • For Zephyr, Intel, OpenMPI and MPICH are installed. GCC and Open64 compilers are available as well.
  • To begin using any of these packages, add the appropriate module command to your .bashrc or .cshrc file, or load the module in your current environment.
  • You may want to look at the Zephyr and Phoenix Cluster Configurations when you need to optimize your code for our hardware.
  • The best-tested compiler-MPI combination on Phoenix is Intel-Intel. Use this module command to set up your environment to use the latest version of each:
module load compiler/intel mpi/intel

This command will need to go in either your shell's configuration files or your Torque job script.

The directory /doc/pbs/mpitest on the cluster contains small C and Fortran programs you can use to test your compiler and MPI toolchain.

Running Jobs

Please do not run CPU- or memory-intensive tasks on the login nodes; always run them on the compute nodes. Resource-intensive processes running on login nodes may be killed at any time, without warning or notification.

Getting started

Jobs are submitted to the cluster using Torque and scheduled by Maui. The official Torque documentation is available at Cluster Resources and as man pages, which you can read on the cluster with the 'man' command; for example,

man qsub

The official Maui documentation is also available at Cluster Resources. There are no man pages, but see below for basic usage of the most useful Maui commands.

To use Torque and Maui, you'll need to make sure the base module is loaded in your shell's setup files:

module load base

Feel free to contact us if you need help with Torque or Maui, as the documentation can be difficult to follow, and unfortunately is occasionally incorrect. Also note that other documentation you might find regarding Torque and Maui may not apply to the TRACC cluster because of differences in how we've configured the software.

Torque's view of the cluster

In the terminology used in the Torque and Maui documentation:

  • Phoenix consists of 128 nodes, each with 2 quad core CPUs, of which 124 nodes contain 7971 MB of memory, and 2 contain 32195 MB.
  • Zephyr consists of 92 nodes, each with 2 16 core CPUs, of which 88 nodes contain 32 GB of RAM, 2 contain 64 GB of RAM, and 2 contain 128 GB of RAM.

For a full description of the Zephyr and Phoenix configurations see Zephyr and Phoenix Cluster Configurations.

Submitting jobs

Job scripts

The usual way of running a job with Torque is to submit a shell script. Simple example job scripts are available on the cluster in /doc/torque/examples. For most commercial applications we have wrapper scripts to automatically generate job scripts; see our application notes, or contact your sponsor for more info, then skip ahead to information about qsub.

A simple script to run an MPI job would be

#!/bin/bash

# PBS directives must precede all executable commands

cd $PBS_O_WORKDIR

## HPMPI
# mpirun -hostfile $PBS_NODEFILE ./mpi-application

## Intel MPI
# mpirun ./mpi-application

If you need to load a module in your shell script, do it like this, for scripts written in /bin/sh or /bin/bash:

. /etc/profile.d/modules.sh
module load <some-module>

or like this, if your script is written in /bin/csh or /bin/tcsh:

. /etc/profile.d/modules.csh
module load <some-module>

or like this, for Python:

import sys
sys.path.append("/soft/modules/current/init")

import envmodule
envmodule.module("load <some-module>")

or like this, for Perl:

use lib "/soft/modules/current/init";
use envmodule;
module("load <some-module");

For other languages, you can usually achieve the same effect by loading the module in your shell's configuration files (.bashrc or .cshrc).

Torque variables

  • Torque sets several variables which you can use in your job scripts. See the qsub man page for the complete list.
  • All Torque variables begin with "PBS", as Torque used to be known as OpenPBS (which was the precursor of PBS Pro).
  • The most important Torque variables to be aware of are
PBS_O_WORKDIR

which is the working directory at the time of job submission, and

PBS_NODEFILE

which contains a list of the compute nodes which Torque has assigned your job, in the form

n001
n001
n001
n001
n001
n001
n001
n001

with each node listed once for each core.

qsub usage

The qsub command is used to submit jobs to the cluster. qsub's behavior can be customized with command line options or script directives. Note that all script directives must precede all executable commands, so it's best to put all directives at the top of the script file.

For example, on Phoenix:

  • To use qsub to submit a 32-core job with a maximum walltime of 10 hours (meaning that the job will be killed after 10 hours have elapsed):
qsub -l nodes=4:ppn=8,walltime=10:00:00 jobscript

This can be specified in the jobscript with

#PBS -l nodes=4:ppn=8
#PBS -l walltime=10:00:00

in which case the job would be submitted with

qsub jobscript
  • The default resource request if one isn't specified on the command line or in the jobscript is
nodes=1:ppn=1 -l walltime=5:00

meaning 1 core on 1 node, with a walltime of 5 minutes.

For example, on Zephyr:

  • For scientific applications that rely heavily on floating point operations such as STAR-CCM+, LS-DYNA, OpenFOAM, and University CFD research codes, users should specify 16 cores per node ( PPN=16 ), and not 32 since each node has a total of 16 floating point units (FPUs). Specifying 32 cores ( PPN=32 ) per compute node will not provide substantial performance gains, but will use up commerical licenses.
  • To use qsub to submit a 64-core job with a maximum walltime of 10 hours (meaning that the job will be killed after 10 hours have elapsed):
qsub -l nodes=2:ppn=32,walltime=10:00:00 jobscript

This can be specified in the jobscript with

#PBS -l nodes=2:ppn=32
#PBS -l walltime=10:00:00

in which case the job would be submitted with

qsub jobscript


  • The default resource request if one isn't specified on the command line or in the jobscript is
nodes=1:ppn=1 -l walltime=5:00

meaning 1 core on 1 node, with a walltime of 5 minutes.

Initial directory

Torque by default will run your script from your home directory. You can specify a different initial directory with

qsub -d ...

For example, to start in your current directory:

qsub -d `pwd`

Large memory jobs

For Phoenix:

Two compute nodes have 32 GB of RAM. To use them, submit your job to the thirtytwogb queue:

qsub -q thirtytwogb ...

Jobs on the 32 GB nodes are limited to 72 hours, and to 1 node.

Swap has been disabled on these nodes.


For Zephyr:

Two compute nodes have 64 GB of RAM and two have 128 GB of RAM. To use them, submit your job to either the batch64GB or batch128GB queue. As an example:

qsub -q batch64GB ...

Jobs on the 64 GB and 128 GB nodes are limited to 72 hours, and to 2 node.

Swap has been disabled on these nodes.

Short jobs

Weekdays from 7am CDT to 7pm CDT, two nodes are set aside for jobs on Phoenix with walltimes of 2 hours or less. Maui will automatically schedule short jobs to these nodes. No policy is currently in place for Zephyr.

Usage policies

Maximum walltime

Jobs which exceed these walltime limits will be rejected by Torque.

On Zephyr:
batch queue: 168 hours
batch64GB queue: 72 hours
batch128GB queue: 72 hours
On Phoenix:
quadcore queue: 168 hours
thirtytwogb queue: 72 hours

Quotas

Job which exceed the following quotas will be blocked by Maui. Blocked jobs can be seen with the command

showq -b

Node quotas

The batch64GB and batch128GB queues on Zephyr and the thirtytwogb queue on Phoenix have a limit of 1 job per user. This can also be overridden for a particular job by making the job preemptible using the Maui QOS feature.

Queue quotas

The batch64GB and batch128GB queues on Zephyr and the thirtytwogb queue on Phoenix have a limit of 1 job per user. This can also be overridden for a particular job by making the job preemptible using the Maui QOS feature.

Quality of Service (QOS)

In order to override the cluster's usage quotas, you can submit your jobs with a preemptible QOS. The caveat is that a preemptible job may be killed at any time, regardless of its wallclock limit, if doing so would enable another user's ordinary, non-preemptible job to run (your own non-preemptible jobs will not preempt your preemptible jobs).

To make a job preemptible, you can submit it with a QOS (quality of service) flag. For example:

qsub -W x=QOS:preemptible ... <jobscript>

You can make a submitted job (whether queued or running) preemptible with the Maui comand setqos:

setqos preemptible <jobid>

And you can make a job non-preemptible by setting its QOS to DEFAULT:

setqos DEFAULT <jobid>

You can check a job's QOS with the command checkjob.

Note that because of a bug in Torque, if you want to specify a node access policy and a QOS, you should not specify the QOS to Torque, but should instead submit the job without a QOS, then set the QOS using setqos.

How the scheduler determines which job to run next

Maui runs queued jobs in priority order (blocked jobs are not considered). Priority is a function of

  • your recent cluster usage (the less usage, the higher the priority)
  • the number of nodes requested (the more nodes, the higher the priority)
  • a combination of the walltime requested (the less time, the higher the priority) and how long the job has been waiting to run

This last factor is referred to as the "expansion factor" in the Maui documentation, and is equal to

1 + time since submission / wallclock time

which, all else being equal, is designed to cause jobs to wait on average only as long as their requested walltime.

Maui also uses "backfill" to schedule short and small jobs when doing so would not interfere with higher priority jobs.

To find out how the priority of your job was calculated, use the Maui command diagnose, which will show something like this:

$ diagnose -p
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*     FS( User)  Serv(XFctr)   Res( Proc)
             Weights   --------       1(    1)     1(    1)     1(    1)

46                          113     6.9( -9.0)   1.2(  1.6)  91.9(120.0)
47                           77     9.5( -9.0)   1.6(  1.5)  88.9( 84.0)

Percent Contribution   --------     8.0(  8.0)   1.4(  1.4)  90.6( 90.6)

FS refers to the fairshare factor. For both jobs, the user's usage is above the target usage, making the fairshare factor negative.

The exact weight given to each factor is subject to change as we attempt to find a balance between high utilization and fairness to all users.

The diagnose command will also tell you how much you and other users have been using the cluster over the past week.

$ diagnose -f
FairShare Information

Depth: 7 intervals   Interval Length: 1:00:00:00   Decay Rate: 0.75

FS Policy: UTILIZEDPS
System FS Settings:  Target Usage: 0.00    Flags: 0

FSInterval        %     Target       0       1       2       3       4       5       6
FSWeight       ------- -------  1.0000  0.7500  0.5625  0.4219  0.3164  0.2373  0.1780
TotalUsage      100.00 ------- 23779.0  4358.0  6114.8 10451.7 10853.7  6915.1  7861.7

USER
-------------
bvannemreddy      0.00   5.00  ------- ------- ------- ------- ------- ------- -------
ac.smiyawak*     16.74   5.00    18.66   32.48    0.75   16.23    0.71   17.28   27.11
bernard           0.09   5.00  ------- ------- -------    0.80    0.00    0.02    0.00
ac.abarsan        2.17   5.00     2.08 ------- ------- ------- -------   17.27    8.36

The asterisk indicates that the user is above their fairshare target, and that their jobs will be given a negative fairshare weight.

More qsub options

Receiving mail from Torque

You can have Torque automatically email you when a job begins, ends, or aborts.

qsub -m <mail-options>

The <mail-options> argument consists of either the single character 'n', or one or more of 'a', 'b', 'e'.

n No mail will be sent.
a Mail will be sent when the job aborts.
b Mail will be sent when the job begins.
e Mail will be sent when the job ends.

Redirecting job output

Torque will write the standard output of your jobscript to <jobname>.o<jobnumber>, and the standard error to <jobname>.e<jobnumber>. To write them both to the standard output file, submit the job with

qsub -joe ...

or in your jobscript use

#PBS -joe

To write standard output to a different file, submit the job with

qsub -o <some-file>

or in your jobscript:

#PBS -o <some-file>

These can be combined; so, for example, submitting a job with

qsub -joe -o logfile

will cause both standard output and standard error to be written to the file 'logfile' in the directory in which the job is running.

You can see the output file using the command qpeek:

qpeek <jobid>

qpeek has several useful options:

$ qpeek -?
qpeek:  Peek into a job's output spool files

 Usage:  qpeek [options] JOBID

 Options:
   -c      Show all of the output file ("cat", default)
   -h      Show only the beginning of the output file ("head")
   -t      Show only the end of the output file ("tail")
   -f      Show only the end of the file and keep listening ("tail -f")
   -#      Show only # lines of output
   -e      Show the stderr file of the job
   -o      Show the stdout file of the job (default)
   -?      Display this help message

Using fewer than 8 cores per node on Phoenix and fewer than 32 cores per node on Zephyr

For Phoenix

  • To request and use fewer than 8 cores per node, specify the number of cores you need with the ppn node property:
qsub -q quadcore -l nodes=<number of nodes>:ppn=<cores per node> ...

For example, to use 4 cores on each of 8 nodes, do

qsub -q quadcore -l nodes=8:ppn=4 ...
  • By default, the scheduler will assign multiple jobs to a node as long as the node has resources available, and as long as the jobs are owned by the same user. This behavior can be prevented like this:
qsub -W x=NACCESSPOLICY:SINGLEJOB ...

so the previous command would be

qsub -q quadcore -l nodes=8:ppn=4 -W x=NACCESSPOLICY:SINGLEJOB ...

(It would probably make more sense to have SINGLEJOB as the default, but Maui for some reason does not allow that policy to be overridden.)

For Zephyr

  • To request and use fewer than 32 cores per node, specify the number of cores you need with the ppn node property:
qsub -q batch -l nodes=<number of nodes>:ppn=<cores per node> ...

For example, to use 4 cores on each of 8 nodes, do

qsub -q batch -l nodes=8:ppn=4 ...
  • By default, the scheduler will assign multiple jobs to a node as long as the node has resources available, and as long as the jobs are owned by the same user. This behavior can be prevented like this:
qsub -W x=NACCESSPOLICY:SINGLEJOB ...

so the previous command would be

qsub -q batch -l nodes=8:ppn=4 -W x=NACCESSPOLICY:SINGLEJOB ...

(It would probably make more sense to have SINGLEJOB as the default, but Maui for some reason does not allow that policy to be overridden.)

Forcing jobs to run sequentially

  • To have a job start only after another job has finished, use the depend job attribute:
qsub -W depend=afterany:<some job-id> ...

For example,

qsub -W depend=afterany:1198 ...

Many other dependency relationships are possible; please see the qsub man page.

Passing variables to a jobscript

By default, no environment variables are directly exported from the submission environment to the execution environment in which a job runs. This means that you'll usually need to load any modules the job requires in your job script, or in your shell configuration files, as described above, or use one of the following techniques to pass variables to a job.

  • To pass all environment variables:
export name=abc; export model=neon
qsub -V ...
  • To pass only specified variables:
qsub -v name=abc,model=neon ...

or

export name=abc; export model=neon
qsub -v name,model ...
  • You can also specify in a job script which variables Torque should import into the execution environment from the submission environment. To import all environment variables:
#PBS -V

or to import only specified variables:

#PBS -v name,model

Note that you cannot assign values to the variables this way.

Job and cluster info

  • To see a list of jobs which are running, queued, and blocked, use the Maui command showq:
showq [-r | -i | -b ]

For example,

$ showq -r
          JobName  S Par  Effic  XFactor  Q      User    Group    MHost Procs   Remaining            StartTime

            67174  R DEF 100.00      0.0 pr      joe     joe  n101-ib     1    00:23:32  Wed Apr 28 14:05:12
            67233+ R DEF 182.60      1.1 pr      jan     jan  n082-ib     8    14:53:30  Thu Apr 29 13:37:49
            66942- R DEF  49.71      0.3 pr      lin     lin  n056-ib    64    18:59:14  Tue Apr 27 08:41:35

The "+" indicates that the job was backfilled; the "—" that the job is preemptible.

showq lists queued (Idle) jobs in descending priority order. showq may show a job as running for up to a minute before it actually starts. To see if it's actually running in that first minute, use Torque's qstat:

qstat -a
  • To see how many cores are currently free for a job of a given walltime in each queue, use qbf:
qbf -d <walltime>
  • To get more detailed statistics about current cluster usage, use our command qsum:
qsum
  • Use pbstop for a graphical display of cluster usage:
pbstop [-?]
  • pestat will give node-by-node cluster information:
pestat [-?]
  • To get a lot of info from Torque about a running or queued job:
qstat -f <job-id>
  • To diagnose problems with a job, you'll need to use both the Torque command tracejob and the Maui command checkjob, and possibly the Maui command diagnose:
tracejob <job-id>
checkjob [-v] <job-id>
diagnose -j <job-id>
  • If you want to find out when Maui thinks your job will run, use showstart:
showstart <job-id>
$ showstart 56
job 56 requires 32 procs for 1:40:00
Earliest start in          1:32:48 on Wed Nov 25 18:02:37
Earliest completion in     3:12:48 on Wed Nov 25 19:42:37
  • If you want to find out how small and how short you need to make a job for it to run immediately in a particular queue, use the Maui command showbf:
showbf [-f <node-feature>] [-n nodecount] [-d walltime]

where <node-feature> is one of quadcore, eightgb, or thirtytwogb. ("bf" is for backfill, the algorithm used to schedule short and small jobs without interfering with higher priority jobs.) Maui unfortunately doesn't take into account all of our scheduling policies, so showbf's core counts may be too high; nor does it know about our queue walltime limits.

  • We have our own utility, qbf, which summarizes the output of showbf for each queue:
$ qbf
Queue       Cores  Time
quadcore      112  5:49:09
quadcore       40  3:00:00:00
thirtytwogb     8  1:00:00:00

This means that a 40 core job of any duration will run immediately, but a job requesting more than 40 cores will run immediately only if it requests less than 5 hours, 49 minutes. qbf is, however, subject to the same caveat as showbf.

  • To get a list of the nodes on which jobs are running (use -s to sort the list):
jobnodes [-s] <jobid> ...
$ jobnodes -s 45124 45873 
n008,n015
  • To get a list of the nodes on which your jobs are running (use -s to sort the list):
mynodes [-s]
$ mynodes -s
n010,n097
  • To see if your application is still running on your nodes:
$ pdsh -w $(mynodes) ps
n046:   PID TTY          TIME CMD
n046: 13056 ?        00:00:00 bash
n046: 13717 ?        00:00:00 bash
n046: 13721 ?        00:00:00 sshd
n046: 13722 ?        00:00:00 ps

For all of these commands, only the numerical portion of the job-id needs to be specified.

qsub-stat

qsub-stat is available only on Phoenix and requires the ls-dyna/base module to be loaded.

Most of the commands for tracking your jobs and checking cluster status can be accessed via graphical application qsub-stat. Please refer to section on NoMachine to learn how to run graphical sessions on the TRACC Cluster. qsub-stat has three tabs. The first one shows percentage usage of the two queues (8GB and 32GB) and the disk space usage. Once you move mouse cursor on the meter you will get more detailed information on usage of the queues and the disk space.

Resources tab
Resources tab

In the Terminal tab majority of explained previously statistics commands can be called through mouse menu. Press and hold right or left mouse button to have access to the commands' list.

Terminal tab
Terminal tab

The available commands are:

-------------------------------------------
- Cluster Usage (showq):
  - showq
  - showq - r (running)
  - showq - b (blocked)
- Cluster Usage (qsum)
- Cluster Usage (qstat)
- Available Resources (qbf)
-------------------------------------------
- LS-DYNA Usage (qdyna)
- Madymo Usage (license server report)
- Fluent Usage (license server report)
-------------------------------------------
- Current Login Node activity (htop)
- Node activity (pbstop)
- Node activity (pestat)
-------------------------------------------
- See My Jobs
-------------------------------------------

Current usage of the cluster can also be viewed in the Job Status tab. When you press and hold right or left mouse button on a job, you will see a menu of commands that you can apply to it (Some of them will work only on your own jobs). The available commands are:

-------------------------------------------
- Show additional info:
  - qstat -f 
  - chcekjob 
  - tracejob
  - diagnosejob -j
  - showstart
-------------------------------------------
- open standard output
- statistics on master node (htop)
- ssh to master node
-------------------------------------------
- open work directory in browser
- open work directory in terminal
-------------------------------------------
- delete job (qdel)
-------------------------------------------
Job Status tab
Job Status tab

Htop command shows live statistics on the master node of the current job.

Htop
Htop

Deleting a job

To delete a job, use Torque's qdel or Maui's canceljob, which simply calls qdel itself:

qdel <job-id>
canceljob <job-id>

Torque will delete the job immmediately with either command, but Maui may not know about the deletion for up to a minute, during which time it will continue to show up in the output of showq.

If either Torque or Maui tells you it can't delete the job, please send us an email and we'll delete it for you.

Only the numerical portion of the job-id needs to be specified.

Interactive jobs

You can get interactive access to a compute node with Torque by using the -I switch to qsub:

$ qsub -I ...
qsub: waiting for job nnnn.host1 to start
qsub: job nnnn.host1 ready
$ cd $PBS_O_WORKDIR

If you want to run a GUI application on the compute node, use the -X switch:

qsub -I -X ...

For this to work, you will need to either use the NX client to connect to the cluster, or enable trusted X11 forwarding when you ssh to the cluster. Windows SSH clients will allow you to enable forwarding in the preferences for the connection. Under Linux and MacOS, you would do this:

$ ssh -Y login.tracc.anl.gov

Also see below for an alternative means of directly accessing computes nodes.

Using GUI applications on a compute node

If you want to use a GUI application on a compute node, you will need to enable trusted X11 forwarding when you ssh to the cluster, and again when you ssh to the compute node (as described above). Windows SSH clients will allow you to enable forwarding in the preferences for the connection. Under Linux and MacOS, you would do this:

$ ssh -Y login.tracc.anl.gov

After logging in to a login node, you would then do this to access the compute node:

$ ssh -Y n001

If you're using the NX client to connect to the cluster, you won't need to ssh to a login node, but you will still need to ssh to a compute node.

Accessing a compute node directly

The easiest way to do this is by having Torque give you interactive access to a node. But if you want to start a process running on a compute node and have it continue to run after you log out, use the method described here.

  • You can access any compute node which is running one of your jobs with the command
ssh <compute-node>

You can use the command

qstat -n1 <job-id>

to find out which nodes are running <job-id>.

  • If you want Torque to assign you nodes, but don't need it to run a job on your behalf, you can do (for example)
echo sleep 24h | qsub -l select=1,walltime=24:00:00

and then ssh to the node.

Using licensed software

For some of our software we have a limited number of licenses. To ensure that there will be sufficient licenses available when Maui schedules your job, you will need to submit your job under an application-specific account. Any application wrapper scripts we have will take care of this for you, but if you need to specify the account manually, you would do it like this:


  • LS-DYNA:
qsub -A dyna ...

Debugging a job

Some things to keep in mind or to try when your job doesn't run as expected:

  • Jobs don't inherit any environment variables from the submission environment. This means that if your job requires that PATH, LD_LIBRARY_PATH, MPI_HOME, or any other variable be set, you'll need to set it either in the job script, either directly or with a module, or in your shell configuration files.
  • Try running your job interactively on a compute node, like this:
$ qsub -I ...
qsub: waiting for job nnnn.host1 to start
qsub: job nnnn.host1 ready
$ cd $PBS_O_WORKDIR

Now run your job script or start up a debugger with your program, just as you would on a login node.

  • Try some simple jobs, and look at the standard output and standard error files (STDIN.o<job-id> and STDIN.e<job-id>):
echo | qsub
echo env | qsub
  • If you want to see the output of your job as it's being produced, use qpeek:
qpeek <jobid>

Undeliverable job output

If Torque cannot copy the files containing the output of your job to the job's working directory, the files will eventually be copied to /mnt/gpfs/undelivered. They will be deleted after 10 days.

Requeuing of jobs

If your job dies due to a problem with the cluster, it will not automatically be requeued. If you would like it to be, do

qsub -r y ...

Reserving the cluster

Users cannot directly reserve the cluster, but if you think you might have a need for a reservation, please contact us as far in advance as possible.

Graphical Applications using NoMachine NX

To provide for remote use of graphical applications on the cluster, we've installed NoMachine NX on the login hosts. To use it,

  1. Download and install the free NX Client. (Windows users should install the optional font packages.) The NX Client uses SSH behind the scenes; see How to Connect, above.
  2. After installing the client, you'll need to install a public key from one of the login nodes:
    1. Copy /usr/NX/share/keys/default.id_dsa.key from one of the login nodes to your local machine (it doesn't matter where).
    2. In the NoMachine client, choose Configure..., then Key..., then Import. Choose the default.id_dsa.key you just downloaded. Choose Save.
  3. When you log in the first time, you'll see a window with a number of configuration options. Under "Desktop", choose "Unix", either "KDE" or "GNOME", and the appropriate type of connection you have to the Internet. (Users on the Argonne network should use "LAN".)

You should be able to use interactive GUIs with NX even over a DSL or cable connection. Please contact us if that's not the case.

Terminating and Disconnecting

When you quit an NX session, you can either terminate it or disconnect from it. Terminating it will close all your running programs. If you discconnect you can reconnect later to the same session.

If you close the NX client window, you'll be given a choice of terminating or disconnecting.

Reconnecting

If you want to reconnect to an NX session, you'll have to log in to the same login node on which it was started. To do this, instead of using 'login.tracc.anl.gov' as the server host, specify the login node you want to connect to on either Zephyr or Phoenix.

On Zephyr we currently have:
login1.zephyr.tracc.anl.gov
login2.zephyr.tracc.anl.gov
On Phoenix we currently have:
login1.phoenix.tracc.anl.gov
login2.phoenix.tracc.anl.gov
login3.phoenix.tracc.anl.gov

Problems

  • You may see the error message "Error activating XKB configuration" when you start a new session. This can be ignored, but if you want to get rid of it, then: from the Applications menu, go to Preferences -> Keyboard, then the Layouts tab, and click Reset to defaults. This should only need to be done once.
  • If the NX client stops working for you, log in with SSH, then try moving the .nx directory in your home directory out of the way:
cd; mv .nx .nx.bak
  • If you get errors similar to this:
Warning: Cannot convert string "-*-courier-medium-r-*-*-*-120-*-*-*-*-iso8859-*" to type FontStruct

you need to install the optional font packages for Windows from NoMachine.

Graphical Applications using X

xming is a convenient and easy to install X server for Windows. However, if you're doing pre- or post-processing that requires sending a lot of graphics over the network, we recommend NoMachine NX because its compression of the graphics yields a significant speed advantage over X.

Cron and At Jobs

Users can run cron and at jobs on the login nodes. Since we make no guarantee that any particular login node will be up at any time, you should consider installing your crontab on all the login nodes, with the jobs actually enabled on only one of them.

Email on the Cluster

  • No client email services are supported on the TRACC cluster.
  • Email sent by Torque will automatically be forwarded to your canonical email address. For those with regular ANL domain accounts, this is the email address which HR lists for you. For those with ac. collaborator accounts, this is the email address which the MyPassword system lists for you.
  • .forward files are ignored.
  • If we cannot contact you via your canonical email address, your account will be locked.

Mailing Lists

  • tracc-users@anl.gov is used by TRACC to communicate announcements of interest to all TRACC users. All TRACC cluster users with active accounts are automatically subscribed. You cannot unsubscribe from tracc-users.
  • You can have the wiki server automatically notify you of any updates to any wiki page by opening the page for editing, selecting the "Watch this page" box, then saving the page.

Application Notes

The license servers for all applications have been configured to allow use of only one core on the login nodes. Parallel jobs must be run on the compute nodes using Torque. Multicore jobs on the login nodes may be killed without notice.

Software Requests

  • If there's any free open-source software you'd like installed on the cluster, please contact us with the details: what it does, where it can be obtained, and what version you'd like. We'll try to get it installed as soon as possible.
  • If you'd like to use a commercial application on the cluster, please contact us.

TRACC Home