HPC/FAQ: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
Line 60: Line 60:


* Review and update your [https://beam.aps.anl.gov/pls/apsweb/ufr_main_pkg.usr_start_page '''User Registration.''']
* Review and update your [https://beam.aps.anl.gov/pls/apsweb/ufr_main_pkg.usr_start_page '''User Registration.''']
*: Your Argonne computer account will be disabled upon expiration of certain registration items. In particular, if you are not a US citizen, you will require a current '''US visa''' or related work permit to access Argonne computers, just as if you were to visit in person. This will very likely happen in the middle of your proposal's lifetime. For some of these expirations, no notice will be sent to you (for varying reasons), and you will suddenly find that you can no longer access mega.
*: Your Argonne computer account will be disabled upon expiration of certain registration items. Notably, if you are not a US citizen, you will require a current '''US visa''' or related work permit to access Argonne computers, just as if you were to visit in person. This will very likely happen in the middle of your proposal's lifetime. For some of these expirations, no notice will be sent to you (for varying reasons), and you will suddenly find that you can no longer access mega.
*: After you updated your user registration, contact the [mailto:[email protected]?subject=Reinstate%20user%20account CNM User Office] to have your Argonne account re-instated.
*: After you updated your user registration, contact the [mailto:[email protected]?subject=Reinstate%20user%20account CNM User Office] to have your Argonne account re-instated.
* Renew your [http://beam.aps.anl.gov/pls/apsweb/rt0004.intro_process '''User Courses.''']
* Renew your [http://beam.aps.anl.gov/pls/apsweb/rt0004.intro_process '''User Courses.''']
*: For remote users, the ESH223 course "Cybersecurity Annual Education and Awareness" is the one most likely in need of renewal.
*: For remote users, the ESH223 course "Cybersecurity Annual Education and Awareness" is the one most likely in need of renewal.
* You must be a participant in at least one ''active or recently expired'' [http://nano.anl.gov/users/proposal_process_brief.html '''User Proposal'''].
* You must be a participant in at least one '''active or recently expired [http://nano.anl.gov/users/proposal_process_brief.html User Proposal]'''.
*: To review dates for your proposal, ask your PI to search their email archive for mails with ''"Work Approval Received"'' or ''"Proposal Expiration"'' in the subject.
*: To review dates for your proposals, ask your PI to search their email archive for mails with ''"Work Approval Received"'' or ''"Proposal Expiration"'' in the subject.
*: Compute jobs may be run while a proposal (specifically, the user work authorization) is active. For at least 30 days thereafter, users are entitled to ''data access'' only, following [http://nano.anl.gov/users/overview.html#Anchor15 CNM's Data Retention Policy].
** You may run compute jobs under a given proposal while it is active, i.e., within the dates stated in the Work Approval (UWA) notice.
* Access to mega requires that the User Work Submittal for a proposal contain your '''badge number'''.
** For '''up to 30 days''' after your last active proposal has expired you may access only the mega SSH gateway and login nodes, according to [http://nano.anl.gov/users/overview.html#Anchor15 '''CNM's Data Retention Policy'''].  
*: This is a more arcane technical issue. If your badge number was left empty at the original submission of the UWS (typically when you are a newly registered user), ask the  [mailto:[email protected]?subject=Update%user%in%20UWS CNM User Office] or your Scientific Contact to augment and resubmit the form.
** While you still have access, transfer to your institution's computers any files from your Carbon work that you wish to retain. The CNM cannot be expected or held responsible to keep your data beyond your access window.
* For users to access the mega SSH gateway, their '''badge numbers''' must be filled in on User Work Submittal (UWS) form.
*: The badge number can and may have been left empty in the UWS submission, which necssarily happens for very newly registered users. Ask the  [mailto:[email protected]?subject=Update%user%in%20UWS CNM User Office] or your Scientific Contact to augment and resubmit the UWS form.


== Login issues ==
== Login issues ==

Revision as of 19:06, January 16, 2015

Access and proposals

CNM's HPC Carbon facility may be accessed by one of two routes: either under a proposal within the CNM User Program or by being an employee of the CNM's hosting institution, Argonne's NST division.

Your starting point to access Carbon under the user program is to become a CNM user. The following is a summary of the main points as they pertain to Carbon.

Getting started on a proposal

To get a proposal started once it has been approved, Action is required for each proposal by the proposal spokesperson (usually the Principal Investigator,PI), regardless of any previous proposals with us. At this stage, we collect and update information about each proposal for reporting purposes to our funding agencies, and ensure that safety and training requirements are met by all participating users.

To get started as spokesperson or delegate

  • Locate the email that we sent to you with subject User Proposal Status Notification, and follow the instructions therein. Some steps are needed only once because they pertain to your institution. Your key action, for each proposal, will be to fill in the Safety and Data form – follow the User Work Submittal (UWS) link given in the email.
  • Remind participating users to review their individual access requirements.
  • Please be patient. Processing typically takes at least one business day, as both our User Office staff and your Scientific Contact need to attend to your submission in person.

The User Office will notify the spokesperson once work on a proposal may begin.

To get started as participating user

  • Gently remind your spokesperson/PI to do the above, or ask this person to nominate you via an email to the CNM User Office.
  • Enter, review, or update your User Registration.

We will notify you once your account is ready and added to the respective proposals.

Proposal Troubleshooting

Email the CNM User Office or phone us if you have any questions or concerns, such as:

  • about proposals,
  • about users, or
  • you find that responding to any of your submissions or previous communications takes longer than a few business days.

Only our User Office staff is able to review all aspects of your proposal or your user access requirements, and determine any steps that have yet to be taken or need to be refreshed.


Include in your message:

  • Proposal number(s),
  • Name of the Principal Investigator (PI),
  • Names and badge numbers of participants.

Adding users to a proposal

Most users are nominated to work on a proposal when it is first submitted. Additional users can be authorized to work under a proposal at any time after acceptance. Actions are needed from both the user and the proposal spokesperson.

Actions required by the User

1. Enter or review your registration as facility user and make updates as necessary.

Actions required by the PI or spokesperson

2. Confirm that the user has entered or reviewed their registration, and determine their badge number(s).
3. Have the user added to the proposal. How to do this depends on whether the proposal has already started or not (see above).
  • For proposals that have already started, submit the users' names and badge numbers to the proposal's Scientific Contact at the CNM.
  • If the proposal hasn't started yet, add the users yourself when you initially fill in the User Work Submittel (UWS) as part of the regular proposal startup process. The UWS can and must be submitted by the spokesperson, once only, to start a proposal. To make changes after your submission, ask your Scientific Contact at the CNM.
Your Scientific Contact and the link to the UWS for each proposal are shown in our email to you with the subject containing User Proposal Status Notification.


User access requirements

A number of items are required from an individual person to log into Carbon, most of which are subject to an expiration date and require periodic renewal.

  • Review and update your User Registration.
    Your Argonne computer account will be disabled upon expiration of certain registration items. Notably, if you are not a US citizen, you will require a current US visa or related work permit to access Argonne computers, just as if you were to visit in person. This will very likely happen in the middle of your proposal's lifetime. For some of these expirations, no notice will be sent to you (for varying reasons), and you will suddenly find that you can no longer access mega.
    After you updated your user registration, contact the CNM User Office to have your Argonne account re-instated.
  • Renew your User Courses.
    For remote users, the ESH223 course "Cybersecurity Annual Education and Awareness" is the one most likely in need of renewal.
  • You must be a participant in at least one active or recently expired User Proposal.
    To review dates for your proposals, ask your PI to search their email archive for mails with "Work Approval Received" or "Proposal Expiration" in the subject.
    • You may run compute jobs under a given proposal while it is active, i.e., within the dates stated in the Work Approval (UWA) notice.
    • For up to 30 days after your last active proposal has expired you may access only the mega SSH gateway and login nodes, according to CNM's Data Retention Policy.
    • While you still have access, transfer to your institution's computers any files from your Carbon work that you wish to retain. The CNM cannot be expected or held responsible to keep your data beyond your access window.
  • For users to access the mega SSH gateway, their badge numbers must be filled in on User Work Submittal (UWS) form.
    The badge number can and may have been left empty in the UWS submission, which necssarily happens for very newly registered users. Ask the CNM User Office or your Scientific Contact to augment and resubmit the UWS form.

Login issues

When you ask "I cannot log in" or "My password does not work", consider the following sections:

Review host names

Make sure you connect to the correct host name, which is mega.cnm.anl.gov for the SSH gateway and carbon.cnm.anl.gov when connecting from an onsite work computer or over VPN. The previous name for the latter was clogin.cnm.anl.gov and will continue to work. -- See HPC/Network Access.

Verify your password

Visit https://credentials.anl.gov/ and verify that your username and password are correct.

Review your access requirements

See section #User access requirements above.

Request a password reset

  • To have your password reset, email the CNM User Office, at [email protected].
  • When you connect to mega with still your temporary password in place, mega will ask for a new password. You can safely change your password at this point.
  • You can also change your password at https://credentials.anl.gov/ - However, a change there will take a few hours to become active on mega.

Review network configuration

Practical hints

  • After you were added to a user proposal, wait at least an hour or more before trying to access mega, preferably until the next morning.
    Updates of your status need to be propagated through a handful of systems, each being done about hourly, so it may take several cycles, for your status change to reach mega.
  • Set yourself calendar entries about one year into the future to remind yourself to renew any of your user registration or training requirements.

Mailing lists

Announcements about Carbon are made on the cnm-hpc-announce mailing list, hosted at Argonne. These list pages and the archive are, unfortunately, only accessible from onsite or via VPN.

Applications

I'd like to use application X

Check if the application is already available on Carbon

Either:

module avail
module -l avail 2>&1 | less
The second form gives you browsable output.
If you cannot find the application on Carbon
  • Submit a support request.
    • Provide one or more URLs relevant to software you have in mind – be specific.
    • Describe the problem you are trying to solve – it may well be that we can suggest an alternative solution.
    • Give the extent of your planned use.
If you see the application on Carbon but you cannot access it
  • Existing license agreements may cover only a subset of users (typically Argonne employees).
  • If you feel you are eligible, submit a support request.
If a version newer than the installed one on Carbon is available

How do I run application X?

  • Customize your shell environment to load the application module.
  • Learn about module conventions on Carbon.
  • To determine the names of a package's executable scripts and binaries, load the application module (if you have not yet done so in your shell setup), then inspect the module's $NAME_HOME/bin directory. For instance, for the Quantum-ESPRESSO package:
module load quantum_espresso
ls $QUANTUM_ESPRESSO_HOME/bin

How do I use application X?

Read the package's documentation, using one or more of the following:

  • Inspect the package's $NAME_HOME/share or $NAME_HOME/doc directory on Carbon (see module conventions).
  • Browse the package's web page, generally mentioned in the module help text or the application catalog entry.
  • Consult a package's man pages. Few packages have them. Man page files are generally installed under $NAME_HOME/man or $NAME_HOME/share/man and if so, will be made available automatically to the man command.

What's my account balance?

Simple answer: mybalance

To find out how many core-hours you have available, the simplest command to run is:

mybalance -h
Project  Machines Balance    
-------- -------- ---------- 
user     ANY         993.26
cnm34567 ANY       158760.93
cnm31234 ANY      -148893.62

The table gives all the Projects you have access to (for use with the qsub -A argument), and their balance. Machine lists all systems that can book jobs against your allocations. Carbon is currently the only machine that can do so. Balance is your account balance, in core-hours, as selected by the -h command option. This is the most useful and recommended unit. Without -h, you get core-seconds, which are integers but rather more unwieldy numbers.

  • The "user" project provides you with a small initial startup allocation of typically 1000 core-hours.
  • When a Balance is reported as negative, that account typically has a CreditLimit assigned, which permits the balance to dip below zero. These details, however, are not shown by mybalance.

Complete answer: gbalance

To get allocation details for accounts that have CreditLimits, run the gbalance command. Pass on -u username or -p projectname to select your allocations:

gbalance -h -u $USER
Use the literal string $USER which makes the shell fill in your actual username.

The ouput looks like:

Id  Name     Amount     Reserved Balance    CreditLimit Available
--- -------- ---------- -------- ---------- ----------- --------- 
100 cnm31234 -148893.62     0.00 -148893.62   150000.00   1106.38
217 kpelzer      993.26     0.00     993.26        0.00    993.26 
123 cnm34567  166440.93  7680.00  158760.93        0.00 158760.93 

The most relevant column for you is Available. The units, given the -h option, are again core-hours.

The colums and their meanings are:

Id
an internal number for the account.
Name
The project name (for use with qsub -A or #PBS -A).
Amount
Amount for transactions completely on the books for the project account; does not include running jobs or credits. Deposits are allocated by the User Office and implemented by the Carbon administrator.
Reserved
Amounts held in reserve by all running jobs using this account. The reserve ensures that a job does not cause an overdraft when it finishes and when its actual use will be booked. The quantity is calculated by walltime * number of cores blocked. When a job terminates, the charge according to the actual time used will be subtracted from Amount, and the unused quantities will be re-added to Amount.
Balance
Available for new jobs; may go negative if CreditLimits are in place.
Balance = Amount - Reserved
CreditLimit
Amount by which Balance may go negative; assigned by the Carbon administrator.
Available
Relevant quantity for new jobs. Must be positive for a new job to start, and large enough to Reserve the entire job.
Available = Balance + CreditLimit

My question is not answered here

See HPC/Support.