HPC/Directories: Difference between revisions
m (→Local RAM disk) |
|||
(64 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
Here is a summary of key directories related to Carbon, and environment variables used to access them: | Here is a summary of key directories related to ''Carbon'', and the environment variables used to access them: | ||
{| class="wikitable" cellpadding="5" style="text-align:left; margin: 1em auto 1em auto;" | {| class="wikitable" cellpadding="5" style="text-align:left; margin: 1em auto 1em auto;" | ||
|- style="background:#eee;" | |- style="background:#eee;" | ||
! width=" | ! width="220" | Environment variable | ||
! width=" | ! width="180" | Typical value | ||
! Shared across nodes | ! Shared across nodes? | ||
! | ! Purge schedule | ||
! Purpose | |||
|- | |- | ||
| <code>$HOME</code> | | '''<code>$HOME</code>''' (same as <code>~</code> in shells) | ||
| /home/joe | |||
| align="center" |yes | |||
| align="center" | See [http://www.anl.gov/cnm/user-information/user-access-program#Anchor15 CNM data retention policy] | |||
| Your main configuration files and data. | |||
|- | |- | ||
| <code>$SANDBOX</code> | | '''<code>$SANDBOX</code>''' | ||
| /sandbox/joe | |||
| align="center" | yes | |||
| align="center" | [[#Global scratch space|6 weeks]] | |||
| [https://en.wikipedia.org/wiki/Scratch_space Scratch space] for transient job data, '''not backed up''' | |||
|- | |- | ||
| <code>$TMPDIR</code> | | '''<code>$TMPDIR</code>''' (on login nodes) | ||
| /tmp | |||
| align="center" | no | |||
| align="center" | 6 weeks | |||
| General Unix scratch space | |||
|- | |- | ||
| <code>$ | | '''<code>$TMPDIR</code>''' (during jobs) | ||
| /tmp/12345.sched1.... | |||
| align="center" | no | |||
| align="center" | at end of job | |||
| Job-specific scratch space, provided empty on job start. | |||
|- | |- | ||
<!-- | <code>$PBS_O_INITDIR</code> | | '''<code>$PBS_O_WORKDIR</code>''' | ||
| (directory where <code>qsub</code> was run) | |||
| align="center" | yes | |||
| align="center" | (same as parent file system) | |||
| typically used as <code>cd $PBS_O_WORKDIR </code> as first line in a job script | |||
|- | |||
<!-- | <code>$PBS_O_INITDIR</code> | |||
| | |||
| (yes) | |||
| the directory a job starts to run in (normally $HOME) | |||
//--> | |||
|} | |} | ||
== | == Home directory == | ||
$HOME | |||
~ (tilde character) | |||
Your home directory can be referred to in standard Unix fashion, as shown above, by either the environment variable or the tilde sign in most shells (but generally not application programs, especially not those written in Fortran). | |||
* Files are backed up nightly. | |||
* Your total file volume in $HOME is subject to (soft) [http://en.wikipedia.org/wiki/Disk_quota quota] of generally 0.5 TB. | |||
* You may exceed the soft limit by about 10% during a grace period of one week. You will see an over-quota notice upon login. | |||
*: If your usage remains above the soft limit beyond the grace time, the file system will appear (to you) as being full. To recover, delete files. | |||
Your files in $HOME are subject to [http://www.anl.gov/cnm/user-information/user-access-program#Anchor15 '''CNM's Data Retention Policy'''], | |||
which specifies that all your files may be deleted from our servers as early as '''30 days after your last active proposal''' has expired. | |||
At that time, your access to ''Carbon'' and its SSH gateway will be revoked. | |||
== | == Global scratch space == | ||
$SANDBOX | $SANDBOX | ||
This environment variable points to a ''user-specific'' directory, shared across nodes like the home-directory. | |||
Use this directory for short-lived files that need to be shared among multiple nodes, can get large, numerous, or change often. To accommodate this, usage policies are stricter than for /home: | |||
* Files are ''not backed up''. | |||
* Hard quota are 3 TB in volume and 2 million in file count. | |||
* Soft quota are 10 GB and 10,000 files. | |||
* The grace period for overflowing a soft limit is 3 weeks. | |||
* Files will be ''deleted automatically'' once they are older than 3 months. | |||
These limits are subject to change. | |||
The limits are aimed at keeping the space available for the intended use, typically for files of unusual size ([http://tvtropes.org/pmwiki/pmwiki.php/Main/RodentsOfUnusualSize F.O.U.S.]) or for small files of unusual count. | |||
== Local scratch space == | |||
$TMPDIR | $TMPDIR | ||
This variable and the directory it refers to is provided by the queueing system for all processes that execute a job. | This variable and the directory it refers to is provided by the queueing system for all processes that execute a job. | ||
Line 57: | Line 100: | ||
… | … | ||
=== Use in job files === | |||
You can use <code>$TMPDIR</code> in one or more of the following ways: | You can use <code>$TMPDIR</code> in one or more of the following ways: | ||
Line 67: | Line 110: | ||
* copy files back upon job termination: | * copy files back upon job termination: | ||
#PBS -W stageout=<font color="red">$TMPDIR/'''foo.ext'''</font>@<font color="green">localhost:$PBS_O_WORKDIR</font> | #PBS -W stageout=<font color="red">$TMPDIR/'''foo.ext'''</font>@<font color="green">localhost:$PBS_O_WORKDIR</font> | ||
#PBS -W stageout=<font color="red">$TMPDIR/'''*.bar'''</font>@<font color="green">localhost:$PBS_O_WORKDIR</font> | #PBS -W stageout=<font color="red">$TMPDIR/'''*.bar'''</font>@<font color="green">localhost:$PBS_O_WORKDIR/''subdir''</font> | ||
* The <font color="red">string on the left</font> of the <code>@</code> character names the ''source'' files in the compute node file system. | |||
* The <font color="green">string on the right</font> gives the ''destination'' host and directory, also as seen from the compute node. This means <font color="green">localhost</font> refers to the primary compute node (rank 0 in MPI parlance). <font color="green">$PBS_O_WORKDIR</font> by default stems from what was qsub's current directory on the ''submission'' node, but ''Carbon's'' user file systems are mounted on login nodes and compute nodes under the same paths. | |||
* You may give the <code>stageout=…</code> directive multiple times, as shown. | |||
: In contrast to explicit trailing "cp" commands in the job script, this copy will be executed ''even if a job overruns its walltime''. See the [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml#W qsub manual] for further information. | |||
<!-- | |||
== Local RAM disk == | |||
(2020-05-21: to be added) | |||
[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html XDG Base Directory Specification] <code>$XDG_RUNTIME_DIR</code> – intended for small files. | |||
--> | |||
<!-- | |||
<br> | <br> | ||
<hr> | <hr> | ||
(*) Lustre is a parallel file system that allows concurrent and coherent file access at high data rates. | (*) Lustre is a parallel file system that allows concurrent and coherent file access at high data rates. | ||
--> |
Latest revision as of 19:50, July 18, 2022
Overview
Here is a summary of key directories related to Carbon, and the environment variables used to access them:
Environment variable | Typical value | Shared across nodes? | Purge schedule | Purpose |
---|---|---|---|---|
$HOME (same as ~ in shells)
|
/home/joe | yes | See CNM data retention policy | Your main configuration files and data. |
$SANDBOX
|
/sandbox/joe | yes | 6 weeks | Scratch space for transient job data, not backed up |
$TMPDIR (on login nodes)
|
/tmp | no | 6 weeks | General Unix scratch space |
$TMPDIR (during jobs)
|
/tmp/12345.sched1.... | no | at end of job | Job-specific scratch space, provided empty on job start. |
$PBS_O_WORKDIR
|
(directory where qsub was run)
|
yes | (same as parent file system) | typically used as cd $PBS_O_WORKDIR as first line in a job script
|
Home directory
$HOME ~ (tilde character)
Your home directory can be referred to in standard Unix fashion, as shown above, by either the environment variable or the tilde sign in most shells (but generally not application programs, especially not those written in Fortran).
- Files are backed up nightly.
- Your total file volume in $HOME is subject to (soft) quota of generally 0.5 TB.
- You may exceed the soft limit by about 10% during a grace period of one week. You will see an over-quota notice upon login.
- If your usage remains above the soft limit beyond the grace time, the file system will appear (to you) as being full. To recover, delete files.
Your files in $HOME are subject to CNM's Data Retention Policy, which specifies that all your files may be deleted from our servers as early as 30 days after your last active proposal has expired. At that time, your access to Carbon and its SSH gateway will be revoked.
Global scratch space
$SANDBOX
This environment variable points to a user-specific directory, shared across nodes like the home-directory.
Use this directory for short-lived files that need to be shared among multiple nodes, can get large, numerous, or change often. To accommodate this, usage policies are stricter than for /home:
- Files are not backed up.
- Hard quota are 3 TB in volume and 2 million in file count.
- Soft quota are 10 GB and 10,000 files.
- The grace period for overflowing a soft limit is 3 weeks.
- Files will be deleted automatically once they are older than 3 months.
These limits are subject to change.
The limits are aimed at keeping the space available for the intended use, typically for files of unusual size (F.O.U.S.) or for small files of unusual count.
Local scratch space
$TMPDIR
This variable and the directory it refers to is provided by the queueing system for all processes that execute a job. The directory:
- resides on local disk on each node,
- is named the same on each node,
- is not shared across nodes,
- is shared for processes on the same node (as many as given in "ppn=…"), in other words, the name is PBS job-specific, but not Unix PID-specific,
- typically provides about 100 GB of space,
- will be wiped upon job exit on each node.
The environment variable TMPDIR is not shared across nodes either. Either communicate it internal to your program, or have it exported by mpirun/mpiexec:
- OpenMPI
mpirun … \ -x TMPDIR \ [-x OTHERVAR] \ …
- Intel MPI
mpiexec.hydra … \ -genvlist TMPDIR[,OTHERVAR] \ …
Use in job files
You can use $TMPDIR
in one or more of the following ways:
- direct your application to store its temporary files there, which is typically done by command line switches or an environment variable such as:
export FOO_SCRATCH=$TMPDIR
- actually run your application there:
cd $TMPDIR
- In this case, make sure you either copy your input files there or you specify full paths to
$HOME
or$PPBS_O_WORKDIR
.
- copy files back upon job termination:
#PBS -W stageout=$TMPDIR/foo.ext@localhost:$PBS_O_WORKDIR #PBS -W stageout=$TMPDIR/*.bar@localhost:$PBS_O_WORKDIR/subdir
- The string on the left of the
@
character names the source files in the compute node file system. - The string on the right gives the destination host and directory, also as seen from the compute node. This means localhost refers to the primary compute node (rank 0 in MPI parlance). $PBS_O_WORKDIR by default stems from what was qsub's current directory on the submission node, but Carbon's user file systems are mounted on login nodes and compute nodes under the same paths.
- You may give the
stageout=…
directive multiple times, as shown.
- In contrast to explicit trailing "cp" commands in the job script, this copy will be executed even if a job overruns its walltime. See the qsub manual for further information.