HPC/Submitting and Managing Jobs/Example Job Script

From CNM Wiki
< HPC‎ | Submitting and Managing Jobs
Revision as of 23:41, May 23, 2011 by Stern (talk | contribs) (append from parent)
Jump to navigation Jump to search

Example job file

Here is a sample job script for an MPI application in the default user environment (OpenMPI over Infiniband interconnect):

#!/bin/bash

##  Basics: Number of nodes, processors per node (ppn), and walltime (hhh:mm:ss)
#PBS -l nodes=5:ppn=8
#PBS -l walltime=0:10:00
#PBS -N job_name
#PBS -A account

## File names for stdout and stderr.  If not set here, the defaults
## are <JOBNAME>.o<JOBNUM> and <JOBNAME>.e<JOBNUM>
#PBS -o job.out
#PBS -e job.err

## send mail at begin, end, abort, or never (b, e, a, n)
#PBS -m ea

# change into the directory where qsub will be executed
cd $PBS_O_WORKDIR

# count allocated cores
nprocs=$(wc -l < $PBS_NODEFILE)

# start MPI job over default interconnect
mpirun -machinefile $PBS_NODEFILE -np $nprocs \
        programname
  • If your program reads from files or takes options and/or arguments, use and adjust one of the following forms:
mpirun -machinefile $PBS_NODEFILE -np $nprocs \
       programname  < run.in
mpirun -machinefile $PBS_NODEFILE -np $nprocs \
       programname  -options arguments < run.in
mpirun -machinefile $PBS_NODEFILE -np $nprocs \
       programname < run.in > run.out 2> run.err
In the last form, anything after programname is optional. If you use specific redirections for stdout or stderr as shown (>, 2>), the job-global files job.out, job.err declared earlier will remain empty or only contain output from your shell startup files (which should really be silent), and the rest of your job script.
  • Infiniband (OpenIB) is the default (and fast) interconnect mechanism for MPI jobs. This is configured through the environment variable $OMPI_MCA_btl.
  • To select ethernet transport (e.g. for embarrasingly parallel jobs), specify an -mca option:
mpirun -machinefile $PBS_NODEFILE -np $NPROCS \
	-mca btl self,tcp \
        programname


The account parameter

The parameter for option -A account is in most cases the CNM proposal, specified as follows:

cnm123
(3 digits) for proposals below 1000
cnm01234
(5 digits, 0-padded) for proposals from 1000 onwards.
user
(the actual string "user", not your user name) for a limited personal startup allocation
staff
for discretionary access by staff.

You can check your account balance in hours as follows:

mybalance -h
gbalance -u $USER -h


Advanced node selection

You can refine the node selection (normally done via the PBS resource -l nodes=…) to finely control your node and core allocations. You may need to do so for the following reasons:

  • select specific node hardware generations,
  • permit shared vs. exclusive node access,
  • vary PPN across the nodes of a job,
  • accommodate multithreading (OpenMP).

See HPC/Submitting Jobs/Advanced node selection for these topics.