HPC/Directories: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
mNo edit summary
Line 33: Line 33:
=== Sandbox - global scratch ===
=== Sandbox - global scratch ===
  $SANDBOX
  $SANDBOX
Use this directory for files that need to be shared among the nodes, and are possibly large and change often.
This environment variable points to a ''user-specific'' directory which is shared by Lustre*, but ''not backed up''.
The environment variable points to a ''user-specific'' directory which is shared by Lustre*, but ''not backed up''.


* To protect against accidental overflows, your use of $SANDBOX is subject to hard quota of 3 TB.
Use this directory for short-lived files that need to be shared among multiple nodes, can get large, or change often.
* The soft limit can be exceeded up to the hard limit for a grace period of 2 weeks.
 
* '''Files older than 4 weeks will be automatically removed.'''
<!-- To protect against accidental overflows, your use of $SANDBOX is subject to -->
* Hard quota is 3 TB.
* Soft quota is 10 GB.
* The grace period for overflowing the soft limit is 2 weeks.
*: If usage is still above the soft limit beyond the grace time, the file system will appear to you as full. To recover, delete files.
* Files will be ''removed'' once they get older than 4 weeks.


=== Local scratch space ===
=== Local scratch space ===

Revision as of 20:33, September 1, 2016

Overview

Here is a summary of key directories related to Carbon, and environment variables used to access them:

Environment variable Typical value Shared across nodes Notes
$HOME or ~ (tilde) /home/joe yes home sweet home
$SANDBOX /sandbox/joe yes short-term storage extension, not backed up
$TMPDIR /tmp/12345.mds01.... no job-specfic scratch
$PBS_O_WORKDIR (yes) the directory qsub was run in; typically used with cd $PBS_O_WORKDIR as first line in a job script

Details by function

Home directory

$HOME
~

The users' home directories are kept on a Lustre* file system and are backed up nightly. The home directory can be reached in standard Unix fashion using either the environment variable or the tilde sign in most shells (but generally not application programs, especially not those written in Fortran).

Your total file volume in $HOME is subject to a limit called quota, set at 0.5 TB for most users. This is a soft limit which can be exceeded by about 10% for a grace period of one week.

Sandbox - global scratch

$SANDBOX

This environment variable points to a user-specific directory which is shared by Lustre*, but not backed up.

Use this directory for short-lived files that need to be shared among multiple nodes, can get large, or change often.

  • Hard quota is 3 TB.
  • Soft quota is 10 GB.
  • The grace period for overflowing the soft limit is 2 weeks.
    If usage is still above the soft limit beyond the grace time, the file system will appear to you as full. To recover, delete files.
  • Files will be removed once they get older than 4 weeks.

Local scratch space

$TMPDIR

This variable and the directory it refers to is provided by the queueing system for all processes that execute a job. The directory:

  • resides on local disk on each node,
  • is named the same on each node,
  • is not shared across nodes,
  • is shared for processes on the same node (as many as given in "ppn=…"), in other words, the name is PBS job-specific, but not Unix PID-specific,
  • typically provides about 100 GB of space,
  • will be wiped upon job exit on each node.

The environment variable TMPDIR is not shared across nodes either. Either communicate it internal to your program, or have it exported by mpirun/mpiexec:

OpenMPI
mpirun … \
       -x TMPDIR \
       [-x OTHERVAR] \
       …
Intel MPI
mpiexec.hydra … \
       -genvlist TMPDIR[,OTHERVAR] \
       …

Use in job files

You can use $TMPDIR in one or more of the following ways:

  • direct your application to store its temporary files there, which is typically done by command line switches or an environment variable such as:
export FOO_SCRATCH=$TMPDIR
  • actually run your application there:
cd $TMPDIR
In this case, make sure you either copy your input files there or you specify full paths to $HOME or $PPBS_O_WORKDIR.
  • copy files back upon job termination:
#PBS -W stageout=$TMPDIR/foo.ext@localhost:$PBS_O_WORKDIR
#PBS -W stageout=$TMPDIR/*.bar@localhost:$PBS_O_WORKDIR
You may specify several of these lines and use wildcards in to specify source files on the compute nodes. In contrast to explicit trailing "cp" commands in the job script, this copy will be executed even if a job overruns its walltime. See the qsub manual for further information.



(*) Lustre is a parallel file system that allows concurrent and coherent file access at high data rates.