HPC/FAQ: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
Line 35: Line 35:
*: After you reviewed the form, contact the [mailto:[email protected]?subject=Reinstate%20user%20account CNM User Office] to have your Argonne account re-instated.
*: After you reviewed the form, contact the [mailto:[email protected]?subject=Reinstate%20user%20account CNM User Office] to have your Argonne account re-instated.
* You must be a participant in at least one [http://nano.anl.gov/users/index.html '''User Proposal'''].
* You must be a participant in at least one [http://nano.anl.gov/users/index.html '''User Proposal'''].
*: Jobs may be run while a proposal (specifically, a user work authorization) is active. For at least 30 days thereafter, users are entitled to ''data access'' only, following [http://nano.anl.gov/users/overview.html#Anchor15 CNM's Data Retention Policy].
*: Compute jobs may be run while a proposal (specifically, a user work authorization) is active. For at least 30 days thereafter, users are entitled to ''data access'' only, following [http://nano.anl.gov/users/overview.html#Anchor15 CNM's Data Retention Policy].
*: For prior proposals, ask your PI to search their email archive for mails with ''"Work Approval Received"'' or ''"Proposal Expiration"'' in the subject.
*: To review dates for your proposal, ask your PI to search their email archive for mails with ''"Work Approval Received"'' or ''"Proposal Expiration"'' in the subject.
*: For new proposals: It may take several hours to renew user access under a new proposal on mega.
*: It may take several hours to set up or renew user access under a given proposal on mega.
* Access to mega requires that the User Work Submittal for a proposal contain your '''badge number'''.
* Access to mega requires that the User Work Submittal for a proposal contain your '''badge number'''.
*: If your badge number was left empty at the original submission of the UWS (typically when you are a newly registered user), ask the  [mailto:[email protected]?subject=Update%user%in%20UWS CNM User Office] or your Scientific Contact to augment and resubmit the form.
*: If your badge number was left empty at the original submission of the UWS (typically when you are a newly registered user), ask the  [mailto:[email protected]?subject=Update%user%in%20UWS CNM User Office] or your Scientific Contact to augment and resubmit the form.

Revision as of 17:42, September 5, 2014

To report a general problem

Sending a support request

Send mail to [email protected].

  • Choose an appropriate subject line. (Do not merely hit "Reply" on a previous unrelated message.)
  • Be as specific as possible – make sure your request includes the following:
    • The command you were trying to run or the menu item you chose.
    • The exact error message you get. Copy & Paste the message text, or take a screenshot and attach it to your mail.
    • For PBS jobs: The job number and the working directory of the job.
  • Review your subject line before you submit the message.

Receiving support

When you get our support response:

  • Carefully read it.
  • Follow all instructions and answer all questions.
  • For account- and password-related issues: it may take several hours for changes to take effect. If your initial attempt fails, wait at least that long before retrying.

Additional considerations

To help us diagnose a problem:

"I cannot log in" or "My password does not work"

Check host names

Make sure you connect to the correct host name, which is mega.cnm.anl.gov for the SSH gateway and carbon.cnm.anl.gov when connecting from an onsite work computer or over VPN. The previous name clogin.cnm.anl.gov for the latter will continue to work also. -- See HPC/Network Access.

Check your eligibility

To access Carbon as CNM User, a number of items are required, most of which are subject to an expiration date.

  • Review and update your User Registration details
    If you are not a US citizen, you will require a current visa to access Argonne computers, as if your were visiting in person. Your computer account will be disabled upon expiration of certain registration items. This may unfortunately happen in the middle of your proposal's lifetime, and you suddenly find that you can no longer access mega.
    After you reviewed the form, contact the CNM User Office to have your Argonne account re-instated.
  • You must be a participant in at least one User Proposal.
    Compute jobs may be run while a proposal (specifically, a user work authorization) is active. For at least 30 days thereafter, users are entitled to data access only, following CNM's Data Retention Policy.
    To review dates for your proposal, ask your PI to search their email archive for mails with "Work Approval Received" or "Proposal Expiration" in the subject.
    It may take several hours to set up or renew user access under a given proposal on mega.
  • Access to mega requires that the User Work Submittal for a proposal contain your badge number.
    If your badge number was left empty at the original submission of the UWS (typically when you are a newly registered user), ask the CNM User Office or your Scientific Contact to augment and resubmit the form.

Verify your password

Visit https://credentials.anl.gov/ and verify that your username and password are correct.

Request a password reset

  • To have your password reset, email the CNM User Office, at [email protected].
  • When you connect to mega with still your temporary password in place, mega will ask for a new password. You can safely change your password at this point.
  • You can also change your password at https://credentials.anl.gov/ - However, a change there will take a few hours to become active on mega.

Review instructions

Request help for general network or connectivity issues

Contact the NST IT help desk at [email protected] .

Be as specific as possible in the body of your message. Include at least the following:

  • The command you are trying to run or the menu item you choose.
  • The hostname you're trying to reach.
  • The username you use to connect.
  • The exact error message you get. Copy & Paste the message text, or take a screenshot and attach it to your mail.
  • The software name and version you use to connect (e.g. SSH, VNC, or a browser).
  • Your operating system and version.

Choose an appropriate subject line before you submit the message.


I'd like to use application X

Check if the application is already available on Carbon

Either:

module avail
module -l avail 2>&1 | less
The second form gives you browsable output.
If you cannot find the application on Carbon
  1. Describe the problem you are trying to solve – it may well be that we can suggest an alternative solution.
  2. Provide one or more URLs relevant to software you have in mind – be specific.

How do I run application X?

ls $QUANTUM_ESPRESSO_HOME/bin

How do I use application X?

Read the package's documentation, using one or more of the following:

  • Inspect the package's $NAME_HOME/share or $NAME_HOME/doc directory on Carbon (see module conventions).
  • Browse the package's web page, generally mentioned in the module help text or the application catalog entry.
  • Consult a package's man pages. Few packages have them. Man page files are generally installed under $NAME_HOME/man or $NAME_HOME/share/man and if so, will be made available automatically to the man command.

What's my account balance?

Simple answer: mybalance

To find out how many core-hours you have available, the simplest command to run is:

mybalance -h
Project  Machines Balance    
-------- -------- ---------- 
user     ANY         993.26
cnm34567 ANY       158760.93
cnm31234 ANY      -148893.62

The table gives all the Projects you have access to (for use with the qsub -A argument), and their balance. Machine lists all systems that can book jobs against your allocations. Carbon is currently the only machine that can do so. Balance is your account balance, in core-hours, as selected by the -h command option. This is the most useful and recommended unit. Without -h, you get core-seconds, which are integers but rather more unwieldy numbers.

  • The "user" project provides you with a small initial startup allocation of typically 1000 core-hours.
  • When a Balance is reported as negative, that account typically has a CreditLimit assigned, which permits the balance to dip below zero. These details, however, are not shown by mybalance.

Complete answer: gbalance

To get allocation details for accounts that have CreditLimits, run the gbalance command. Pass on -u username or -p projectname to select your allocations:

gbalance -h -u $USER
Use the literal string $USER which makes the shell fill in your actual username.

The ouput looks like:

Id  Name     Amount     Reserved Balance    CreditLimit Available
--- -------- ---------- -------- ---------- ----------- --------- 
100 cnm31234 -148893.62     0.00 -148893.62   150000.00   1106.38
217 kpelzer      993.26     0.00     993.26        0.00    993.26 
123 cnm34567  166440.93  7680.00  158760.93        0.00 158760.93 

The most relevant column for you is Available. The units, given the -h option, are again core-hours.

The colums and their meanings are:

Id
an internal number for the account.
Name
The project name (for use with qsub -A or #PBS -A).
Amount
Amount for transactions completely on the books for the project account; does not include running jobs or credits. Deposits are allocated by the User Office and implemented by the Carbon administrator.
Reserved
Amounts held in reserve by all running jobs using this account. The reserve ensures that a job does not cause an overdraft when it finishes and when its actual use will be booked. The quantity is calculated by walltime * number of cores blocked. When a job terminates, the charge according to the actual time used will be subtracted from Amount, and the unused quantities will be re-added to Amount.
Balance
Available for new jobs; may go negative if CreditLimits are in place.
Balance = Amount - Reserved
CreditLimit
Amount by which Balance may go negative; assigned by the Carbon administrator.
Available
Relevant quantity for new jobs. Must be positive for a new job to start, and large enough to Reserve the entire job.
Available = Balance + CreditLimit