HPC/Applications/g09: Difference between revisions
(adapted from g03) |
m (→Parsing script) |
||
(19 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
=== Version note === | === Version note === | ||
The | The g09 versions on Carbon are 64-bit versions with full support for shared memory and Linda parallelization. | ||
$ module -l avail g09 | |||
- Package -----------------------------+- Versions -+- Last mod. ------ | |||
/opt/soft/modulefiles: | |||
g09/A.02.x86_64-1 2010/04/29 21:07:18 | |||
g09/A.02.x86_64-2 2011/08/05 19:56:04 | |||
g09/C.01.x86_64-1 2012/01/16 20:05:38 | |||
g09/D.01.x86_64-1 2013/08/02 19:19:22 | |||
The last version is the default at the moment for | |||
module load g09 | |||
=== | === License status === | ||
Our license agreement with Gaussian, Inc. restricts the use of Gaussian applications to Argonne employees. | |||
== Job script == | == Job script == | ||
* job script template: | * Inspect the job script template (after having loaded the g09 module) at: | ||
$G09_HOME/g09.job | $G09_HOME/g09.job | ||
* This script shows how to use a preprocessing script | |||
g09preprocess | |||
=== Usage === | === Usage === | ||
Line 34: | Line 44: | ||
=== Operation === | === Operation === | ||
Upon job execution, the script reads the input file, interprets and modifies | Upon job execution, the script reads the input file, and the <code>g09preprocess</code> script interprets and modifies your input for use under PBS on Carbon, as follows: | ||
* Link0-commands <code>%NProcShared=''ppn''</code> and <code>%LindaWorkers=n123,n124,n125,...</code> are inserted automatically. | * Link0-commands <code>%NProcShared=''ppn''</code> and <code>%LindaWorkers=n123,n124,n125,...</code> are inserted automatically. | ||
* Checkpoint-files named in <code>%chk</code> directives are identified and copied into the compute node's job-specifc <code>$TMPDIR</code>. The <code>%chk</code> specification will be changed to include <code>$TMPDIR</code>. Note that this will be echoed in the g03 output as follows: | * Checkpoint-files named in <code>%chk</code> directives are identified and copied into the compute node's job-specifc <code>$TMPDIR</code>. The <code>%chk</code> specification will be changed to include <code>$TMPDIR</code>. Note that this will be echoed in the g03 output as follows: | ||
%chk=/tmp/12345. | %chk=/tmp/12345.sched1.carboncluster/test0420.chk | ||
: Chk-file processing will be performed only for files within the current directory; to skip processing, specify a relative or absolute path, such as <code>%chk=./name.chk</code> | : Chk-file processing will be performed only for files within the current directory; to skip processing, specify a relative or absolute path, such as <code>%chk=./name.chk</code> | ||
* At the end of the job, chk-files in <code>$TMPDIR</code> are moved to <code>$PBS_O_WORKDIR</code>. | * At the end of the job, chk-files in <code>$TMPDIR</code> are moved to <code>$PBS_O_WORKDIR</code>. | ||
== Parsing script == | |||
Much of the output from Gaussian is available in plaintext format in the log file. | |||
Many users write grep-like utilities on their own to extract such information. | |||
I ([[User:Stern|stern]] ([[User talk:Stern|talk]])) wrote my own, which is available alongside each Gaussian module. | |||
gauss-parse | |||
For information on usage, run: | |||
gauss-parse --help | |||
gauss-parse --man | |||
To give an idea of scope, here is the beginning of the ''help'' output: | |||
<pre> | |||
Usage: | |||
gauss-parse [options] [gaussian-output-file ...] | |||
Options: | |||
--energy Output energies (default). | |||
--spectrum Output eigenvalues. | |||
--diis Print DIIS errors. | |||
--geometry Output geometries (into separate files unless "-join" is | |||
given). | |||
--charge Output Mulliken charges. When --Spin is used, output spin | |||
densities instead. | |||
--force Output forces. | |||
--freq Output frequencies. | |||
--archive Expand archive entry. | |||
--last Print output for last frame only. | |||
For energy, geometry, or force modes only: | |||
… | |||
</pre> | |||
The script grew from my needs and is not meant as be-all and end-all. | |||
Contact me if you enounter a bug or might need additional functionality, or simply find it useful. | |||
== Notes on Linda == | |||
G09 is parallelized in two complementary ways: | |||
* on a single node across its processors using ''shared memory'', and | |||
* across different nodes via a subset of the ''Linda'' parallelization language – http://gaussian.com/g_tech/g_ur/m_linda.htm | |||
=== Incomplete node use === | |||
g09/Linda may decide not to use the entirety of the nodes made available to it by the <code>g09preprocess</code> stage in the PBS job file. | |||
This section describes how to determine when this is the case and what to do about it. | |||
Consider the following input: | |||
%mem=20mw | |||
#p MP2/6-311G(2df,p) force symm=loose MaxDisk=250000000 test | |||
… | |||
This being run in the job file through <code>g09preprocess</code>: | |||
#PBS -l nodes=2:ppn=8 | |||
… | |||
g09preprocess test0420.com | g09 > test0420.out | |||
which produces the input that g09 will see, inserting the node names allocated by PBS: | |||
%NProcShared=8 | |||
%LindaWorkers=n994,n992 | |||
%mem=20mw | |||
#p MP2/6-311G(2df,p) force symm=loose MaxDisk=250000000 test | |||
… | |||
Gaussian will echo its input line by line in its log file: | |||
Entering Gaussian System, Link 0=g09 | |||
Initial command: | |||
/opt/soft/g09-D.01.x86_64-1/g09/l1.exe "/tmp/341710.sched1.carboncluster/Gau-32350.inp" -scrdir="/tmp/341710.sched1.carboncluster/" | |||
Entering Link 1 = /opt/soft/g09-D.01.x86_64-1/g09/l1.exe PID= 32351. | |||
Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2013, | |||
Gaussian, Inc. All Rights Reserved. | |||
… | |||
Cite this work as: | |||
Gaussian 09, Revision D.01, | |||
… | |||
****************************************** | |||
Gaussian 09: EM64L-G09RevD.01 24-Apr-2013 | |||
1-Aug-2013 | |||
****************************************** | |||
%NProcShared=8 | |||
'''Will use up to 8 processors via shared memory.''' | |||
%LindaWorkers=n994,n992 | |||
%mem=20mw | |||
SetLPE: input flags="-v -opt "Tsnet.Node.lindarsharg: ssh" " | |||
SetLPE: new flags="-v -opt "Tsnet.Node.lindarsharg: ssh" -nodelist 'n994.carboncluster n992.carboncluster'" | |||
'''Will use up to 2 processors via Linda.''' | |||
-------------------------------------------------------------------- | |||
#p MP2/6-311G force … | |||
-------------------------------------------------------------------- | |||
: The last line will be specific to your Gaussian [http://www.gaussian.com/g_tech/g_ur/m_input.htm route section]. So far, so good. | |||
* I highly recommend to use the <code>'''#p'''</code> ("profuse") flag to better track progress through Gaussian's ''link'' stages. | |||
* The term ''processors'' is used inconsistently here. For shared memory, it means ''cores'', but for Linda it means ''nodes''. | |||
* You can track the use of Linda by: | |||
grep exel ''file''.out | |||
(Enter /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l302.exel) | |||
(Enter /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l401.exel) | |||
… | |||
* g09 may decide to ''not use Linda'' after all. A note like this will appear in the log file: | |||
PrsmSu: requested '''number of processors reduced to: 6 ShMem 1 Linda.''' | |||
: In this case, decide wether it would be better to resubmit the job than to leave nodes allocated by PBS idle. It is possible that g09's decision is different for different Linda links. Only the longest-running link would be relevant for you to evaluate resubmission. | |||
=== Error messages === | |||
When running jobs on more than one node (i.e., using Linda), you may see messages like these in a job's error stream: | |||
eval server 0 on n019.carboncluster has dropped it's connection. | |||
eval server 0 on n019.carboncluster has dropped it's connection. | |||
subprocess pid = 11505 has exited. status = 0x0000, id = 0, state = 17. command was /opt/soft/g09-D.01.x86_64-1/g09/linda8.2/opteron … | |||
died after signing in successfully | |||
eval server 0 on n019.carboncluster has dropped it's connection. | |||
… | |||
* These messages can be ignored, according to http://umbc.rnet.missouri.edu/resources/How2RunGAUSSIAN.html : | |||
<div style="background: #eee;"> | |||
These messages are not indicative of a problem. They indicate that the Linda work in a Gaussian link is finished, and that Gaussian is continuing with a new link. | |||
</div> | |||
* The [http://gaussian.com/g_tech/g_ur/m_linda.htm environment variable GAUSS_FLAGS] (as set in the g09 modulefile) is now configured to include the flag <code>-v</code>, to better trace Linda operations on worker nodes. Additional lines like the following will show up in a job's error stream: | |||
ntsnet: using executable file /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l302.exel | |||
ntsnet: trying to schedule 1 worker | |||
ntsnet: scheduled a total of 1 worker | |||
ntsnet: starting master process on n994.carboncluster | |||
ntsnet: starting 1 worker on n992.carboncluster | |||
: This example shows how one link within a job of <code>#PBS -l nodes=2:ppn=8</code> gets executed under Linda. |
Latest revision as of 23:13, March 6, 2014
Introduction
Gaussian is an electronic structure program used by chemists, chemical engineers, biochemists, physicists and others for research in established and emerging areas of chemical interest.
Starting from the basic laws of quantum mechanics, Gaussian predicts the energies, molecular structures, and vibrational frequencies of molecular systems, along with numerous molecular properties derived from these basic computation types. It can be used to study molecules and reactions under a wide range of conditions, including both stable species and compounds which are difficult or impossible to observe experimentally such as short-lived intermediates and transition structures. This article introduces several of its new and enhanced features.
Version note
The g09 versions on Carbon are 64-bit versions with full support for shared memory and Linda parallelization.
$ module -l avail g09 - Package -----------------------------+- Versions -+- Last mod. ------ /opt/soft/modulefiles: g09/A.02.x86_64-1 2010/04/29 21:07:18 g09/A.02.x86_64-2 2011/08/05 19:56:04 g09/C.01.x86_64-1 2012/01/16 20:05:38 g09/D.01.x86_64-1 2013/08/02 19:19:22
The last version is the default at the moment for
module load g09
License status
Our license agreement with Gaussian, Inc. restricts the use of Gaussian applications to Argonne employees.
Job script
- Inspect the job script template (after having loaded the g09 module) at:
$G09_HOME/g09.job
- This script shows how to use a preprocessing script
g09preprocess
Usage
Copy the template script and use it in one of two ways:
- edit for each job as needed, or
- adapt the script for basic needs of several jobs, and individualize it by the PBS job name.
The job name can be up to 15 characters long and should not contain unusual characters. Set it, along with your project name and the walltime limit on the qsub command line:
qsub -N foo [-l walltime=hhh:mm:ss] g09.job
Operation
Upon job execution, the script reads the input file, and the g09preprocess
script interprets and modifies your input for use under PBS on Carbon, as follows:
- Link0-commands
%NProcShared=ppn
and%LindaWorkers=n123,n124,n125,...
are inserted automatically. - Checkpoint-files named in
%chk
directives are identified and copied into the compute node's job-specifc$TMPDIR
. The%chk
specification will be changed to include$TMPDIR
. Note that this will be echoed in the g03 output as follows:
%chk=/tmp/12345.sched1.carboncluster/test0420.chk
- Chk-file processing will be performed only for files within the current directory; to skip processing, specify a relative or absolute path, such as
%chk=./name.chk
- At the end of the job, chk-files in
$TMPDIR
are moved to$PBS_O_WORKDIR
.
Parsing script
Much of the output from Gaussian is available in plaintext format in the log file. Many users write grep-like utilities on their own to extract such information. I (stern (talk)) wrote my own, which is available alongside each Gaussian module.
gauss-parse
For information on usage, run:
gauss-parse --help gauss-parse --man
To give an idea of scope, here is the beginning of the help output:
Usage: gauss-parse [options] [gaussian-output-file ...] Options: --energy Output energies (default). --spectrum Output eigenvalues. --diis Print DIIS errors. --geometry Output geometries (into separate files unless "-join" is given). --charge Output Mulliken charges. When --Spin is used, output spin densities instead. --force Output forces. --freq Output frequencies. --archive Expand archive entry. --last Print output for last frame only. For energy, geometry, or force modes only: …
The script grew from my needs and is not meant as be-all and end-all. Contact me if you enounter a bug or might need additional functionality, or simply find it useful.
Notes on Linda
G09 is parallelized in two complementary ways:
- on a single node across its processors using shared memory, and
- across different nodes via a subset of the Linda parallelization language – http://gaussian.com/g_tech/g_ur/m_linda.htm
Incomplete node use
g09/Linda may decide not to use the entirety of the nodes made available to it by the g09preprocess
stage in the PBS job file.
This section describes how to determine when this is the case and what to do about it.
Consider the following input:
%mem=20mw #p MP2/6-311G(2df,p) force symm=loose MaxDisk=250000000 test …
This being run in the job file through g09preprocess
:
#PBS -l nodes=2:ppn=8 … g09preprocess test0420.com | g09 > test0420.out
which produces the input that g09 will see, inserting the node names allocated by PBS:
%NProcShared=8 %LindaWorkers=n994,n992 %mem=20mw #p MP2/6-311G(2df,p) force symm=loose MaxDisk=250000000 test …
Gaussian will echo its input line by line in its log file:
Entering Gaussian System, Link 0=g09 Initial command: /opt/soft/g09-D.01.x86_64-1/g09/l1.exe "/tmp/341710.sched1.carboncluster/Gau-32350.inp" -scrdir="/tmp/341710.sched1.carboncluster/" Entering Link 1 = /opt/soft/g09-D.01.x86_64-1/g09/l1.exe PID= 32351. Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2013, Gaussian, Inc. All Rights Reserved. … Cite this work as: Gaussian 09, Revision D.01, … ****************************************** Gaussian 09: EM64L-G09RevD.01 24-Apr-2013 1-Aug-2013 ****************************************** %NProcShared=8 Will use up to 8 processors via shared memory. %LindaWorkers=n994,n992 %mem=20mw SetLPE: input flags="-v -opt "Tsnet.Node.lindarsharg: ssh" " SetLPE: new flags="-v -opt "Tsnet.Node.lindarsharg: ssh" -nodelist 'n994.carboncluster n992.carboncluster'" Will use up to 2 processors via Linda. -------------------------------------------------------------------- #p MP2/6-311G force … --------------------------------------------------------------------
- The last line will be specific to your Gaussian route section. So far, so good.
- I highly recommend to use the
#p
("profuse") flag to better track progress through Gaussian's link stages. - The term processors is used inconsistently here. For shared memory, it means cores, but for Linda it means nodes.
- You can track the use of Linda by:
grep exel file.out
(Enter /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l302.exel) (Enter /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l401.exel) …
- g09 may decide to not use Linda after all. A note like this will appear in the log file:
PrsmSu: requested number of processors reduced to: 6 ShMem 1 Linda.
- In this case, decide wether it would be better to resubmit the job than to leave nodes allocated by PBS idle. It is possible that g09's decision is different for different Linda links. Only the longest-running link would be relevant for you to evaluate resubmission.
Error messages
When running jobs on more than one node (i.e., using Linda), you may see messages like these in a job's error stream:
eval server 0 on n019.carboncluster has dropped it's connection. eval server 0 on n019.carboncluster has dropped it's connection. subprocess pid = 11505 has exited. status = 0x0000, id = 0, state = 17. command was /opt/soft/g09-D.01.x86_64-1/g09/linda8.2/opteron … died after signing in successfully eval server 0 on n019.carboncluster has dropped it's connection. …
- These messages can be ignored, according to http://umbc.rnet.missouri.edu/resources/How2RunGAUSSIAN.html :
These messages are not indicative of a problem. They indicate that the Linda work in a Gaussian link is finished, and that Gaussian is continuing with a new link.
- The environment variable GAUSS_FLAGS (as set in the g09 modulefile) is now configured to include the flag
-v
, to better trace Linda operations on worker nodes. Additional lines like the following will show up in a job's error stream:
ntsnet: using executable file /opt/soft/g09-D.01.x86_64-1/g09/linda-exe/l302.exel ntsnet: trying to schedule 1 worker ntsnet: scheduled a total of 1 worker ntsnet: starting master process on n994.carboncluster ntsnet: starting 1 worker on n992.carboncluster
- This example shows how one link within a job of
#PBS -l nodes=2:ppn=8
gets executed under Linda.