HPC/FAQ: Difference between revisions
Line 67: | Line 67: | ||
::* Your initial user name will be of the form <code>b123456</code>. To personalize this name follow the instructions that you will receive by email (usually later in the day) after you chose your password. | ::* Your initial user name will be of the form <code>b123456</code>. To personalize this name follow the instructions that you will receive by email (usually later in the day) after you chose your password. | ||
::; Existing users: Check for and if necessary renew your expired password, [[#Verify or change your password|see below]]. | ::; Existing users: Check for and if necessary renew your expired password, [[#Verify or change your password|see below]]. | ||
: 9. Await confirmation from us that your account has been enabled on Carbon and been added to one or more proposals. | : 9. Await confirmation from us that your account has been enabled or re-enabled on Carbon (if so needed) and has been added to one or more proposals. | ||
::: If you do not receive a notification within two days, contact | ::: If you do not receive a notification within two days, contact our [mailto:[email protected]?subject=Process%20new%20user%20registration User Office] and describe your issue. | ||
== Ongoing user access requirements == | == Ongoing user access requirements == |
Revision as of 19:51, October 25, 2017
Access and proposals
The HPC Carbon cluster may be accessed, like any tool at the CNM, by one of two routes:
- Facility Users, which includes all of Argonne staff, are eligible for CNM's User Access Program.
- As a secondary route, employees of Argonne's NST division are eligible for discretionary access.
Your hosting institution must have a legal signed User Facility Agreement with Argonne in place before any work under an eventual proposal may be performed. Look up your institution under existing Agreements, and get one started right away if not shown — legal wheels turn slowly and can delay your work.
Under the User Program, your starting point is to become a CNM user. To submit a new proposal, follow the instructions at the CNM's Call for Proposals. After a proposal has been successfully reviewed and been granted an allocation, the contacting author will get notified and must then review and complete certain details such as the names of participating users. At the end of the process you or your users will be able to log in using their existing or newly created individual Argonne computer user account.
The following is a summary of the main points as they pertain to Carbon.
Getting started on a proposal
To get a proposal started once it has been granted, Action is required for each proposal by the proposal spokesperson (usually the Principal Investigator,PI), regardless of any previous proposals with us. At this stage, we collect and update information about each proposal for reporting purposes to our funding agencies, and ensure that safety and training requirements are met by all participating users.
To get started as spokesperson or delegate
- Locate the email that we sent to you with subject User Proposal Status Notification, and follow the instructions therein. Some steps are needed only once because they pertain to your institution. Your key action, for each proposal, will be to fill in the Safety and Data form – follow the User Work Submittal (UWS) link given in the email.
- Direct all users who you expect to access Carbon, possibly including yourself, to follow the instructions in section #Ongoing user access requirements.
- Please be patient. Processing typically takes at least one business day, as both our User Office staff and your Scientific Contact need to attend to your submission in person.
The User Office will notify the spokesperson or delegate once work on a proposal may begin. This is what we call the User Work Authorization (UWA), which is typically valid for one year for regular proposals.
To get started as participating user
For new proposals:
- Gently remind your spokesperson/PI to submit the UWS as shown above, or ask this person to nominate you via an email to the CNM User Office.
- Enter, review, or update your User Registration. We will notify you once your account is ready and added to the respective proposals.
For proposals already in progress, see the next section.
Adding users to a proposal
Most users are nominated to work on a proposal when it is first submitted. Additional users can be authorized to work under a proposal at any time after acceptance. Actions are needed from both the user and the proposal spokesperson.
Actions required by the User
- 1. Enter or review your registration as facility user and update as necessary.
- 2. Follow up in a day or a week as specified below.
Actions required by the PI or spokesperson
- 3. Confirm that the user has entered or reviewed their registration, and determine their badge number(s).
- 4. Have the user added to the proposal. How to do this depends on whether the proposal has already started or not (see above).
- For proposals that have already started, submit the users' names and badge numbers to the proposal's Scientific Contact at the CNM.
- If the proposal hasn't started yet, add the users yourself when you initially fill in the User Work Submittel (UWS) as part of the regular proposal startup process. The UWS can and must be submitted by the spokesperson, once only, to start a proposal. To make changes after your submission, ask your Scientific Contact at the CNM.
- Your Scientific Contact and the link to the UWS for each proposal are shown in our email to you with the subject containing User Proposal Status Notification.
Actions required by the Scientific Contact
- 5. Add users in the section "Participating Personnel" in the UWS-SciCon form. Ensure badge numbers are present and correct for users to be working on Carbon and on
mega
.- Some users' badge numbers can and may have been left empty in earlier UWS submissions, which happens by necessity for very newly registered users.
Follow-up actions required by the User
- 6. Contact our CNM User Office and ask to review your registration and initiate the next steps.
- 7. Complete required training courses as instructed by the User Office.
- 8. Have your Argonne computer account name and password ready.
- New users
- You will need to personalize both your password and user name, in this order.
- When all your registration requirements are met the User Office will assign and send you an initial password. Change it on first login following the accompanying instructions.
- Your initial user name will be of the form
b123456
. To personalize this name follow the instructions that you will receive by email (usually later in the day) after you chose your password.
- Existing users
- Check for and if necessary renew your expired password, see below.
- 9. Await confirmation from us that your account has been enabled or re-enabled on Carbon (if so needed) and has been added to one or more proposals.
- If you do not receive a notification within two days, contact our User Office and describe your issue.
Ongoing user access requirements
For you to log into Carbon and its mega
SSH gateway, a number of criteria must be met, most of which are subject to expiration dates and require action from you for renewal.
- Review and update your User Registration.
- This registration itself requires renewal at least every 2 years.
- Your Argonne computer account will be disabled upon expiration of this registration or any of its prerequisite items. Notably, if you are not a US citizen, you will require a current US visa or related work permit to access Argonne computers, just as if you were to visit in person. This will likely happen in the middle of your proposal's lifetime. For some of these expirations, no notice will be sent to you (for varying reasons), and you will suddenly find that you can no longer access
mega
.
- After you updated your user registration, contact the CNM User Office to have your Argonne account re-instated.
- If you changed your affiliation, check to see if your new institution has a legal User Facility Agreement with Argonne in place and request one if not. Be advised that the process may take several weeks to complete.
- Review and renew your User Courses.
- For remote users, the ESH223 course "Cybersecurity Annual Education and Awareness" is the one most likely in need of renewal.
- You must be a participant in at least one active or recently expired User Proposal.
- To review the dates for your proposals, ask your PI to search their email archive for mails with "Work Approval Received" or "Proposal Expiration" in the subject.
- You may run compute jobs under a given proposal while it is active, i.e., within the dates stated in the User Work Authorization (UWA) notice.
- For up to 30 days after your last active proposal has expired you may access only the
mega
and login nodes, according to CNM's Data Retention Policy. - While you still have access, offload from Carbon all your files that you need to keep. The CNM cannot be expected or held responsible to store your data beyond your access window.
- To review the dates for your proposals, ask your PI to search their email archive for mails with "Work Approval Received" or "Proposal Expiration" in the subject.
- Review and update your password if you have or previously had an Argonne computer account.
- If your account has an expired password, we consider it dormant and will not notify you about new proposals where you are listed as a participant.
Practical hints
- Set yourself calendar entries about one year into the future to remind yourself to renew any of your user registration or training requirements.
- After you were added to a user proposal, wait at least an hour or more before trying to access
mega
, preferably until the next morning.- Updates of your status need to be propagated through a handful of systems, each being done about hourly, so it may take several cycles, for your status change to reach
mega
.
- Updates of your status need to be propagated through a handful of systems, each being done about hourly, so it may take several cycles, for your status change to reach
Proposal troubleshooting
Email the CNM User Office or call us if you have any questions or concerns, such as:
- about proposals,
- about users, or
- you find that responding to any of your submissions or previous communications takes longer than a few business days.
Only our User Office staff is able to review all aspects of your proposal or your user access requirements, and determine any steps that have yet to be taken or need to be refreshed.
Include in your message:
- Proposal number(s),
- Name of the Principal Investigator (PI),
- Names and badge numbers of participants.
Login issues
When you ask "I cannot log in" or "My password does not work", consider the following sections:
Review your access requirements
See section #Ongoing user access requirements above.
Verify or change your password
- Visit https://credentials.anl.gov/ and verify that your username and password are correct.
- Visit https://mypassword.anl.gov/ to change a known or reset an unknown password. For a reset to succeed, you must have previously chosen your security questions. You may not succeed if your account is disabled as such, in which case revisit the section #Ongoing user access requirements.
- You may also change an expired password by logging in to our SSH gateway host
mega
. It will accept your old password, and will ask for a new one. - If you cannot use our automated systems, request a password reset by email to the CNM User Office, at [email protected].
- Wait for about an hour or so. It may take a while for changes to progagate through a number of systems until it reaches
mega
. - Connect to
mega
with the new password. The system may ask you to choose a new password at this point. Choose wisely and remember it.
Review host names
Connect to the correct host names:
- When connecting from outside Argonne:
mega.cnm.anl.gov
– the SSH "tunnel setup" connection.clogin
– one or more SSH "payload" connections. This is not a hostname as such but an alias that you should have set up in your ssh configuration. The alias represents a connection to localhost (your machine), but at a different port than regular ssh.
- When onsite or using VPN:
carbon.cnm.anl.gov
. The previous nameclogin. …
will continue to work.
To learn more, read HPC/Network Access.
Review network configuration
- Read again HPC/Network Access, and follow the instructions for your platform.
Mailing lists
Announcements about Carbon are made on the cnm-hpc-announce mailing list, hosted at Argonne. These list pages and the archive are, unfortunately, only accessible from onsite or via VPN.
- To update your email address on the list, simply unsubscribe as shown in the next item, then re-subscribe by sending a blank message to cnm-hpc-announce-join@lists.anl.gov.
- To unsubscribe from the mailing list, send a blank message to cnm-hpc-announce-leave@lists.anl.gov and follow up on the confirmation notice.
- See the GNU Mailman documenation for background.
Applications
I'd like to use application X
- Check if the application is already available on Carbon
Either:
- Browse the Application Catalog, or
- View the catalog on the Carbon command line:
module avail
module -l avail 2>&1 | less
- The second form gives you browsable output.
- If you cannot find the application on Carbon
- Submit a support request.
- Provide one or more URLs relevant to software you have in mind – be specific.
- Describe the problem you are trying to solve – it may well be that we can suggest an alternative solution.
- Give the extent of your planned use.
- If you see the application on Carbon but you cannot access it
- Existing license agreements may cover only a subset of users (typically Argonne employees).
- If you feel you are eligible, submit a support request.
- If a version newer than the installed one on Carbon is available
- Submit a support request.
- Include a URL to information about the new version.
How do I run application X?
- Customize your shell environment to load the application module.
- Learn about module conventions on Carbon.
- To determine the names of a package's executable scripts and binaries, load the application module (if you have not yet done so in your shell setup), then inspect the module's
$NAME_HOME/bin
directory. For instance, for the Quantum-ESPRESSO package:
module load quantum_espresso
ls $QUANTUM_ESPRESSO_HOME/bin
- Learn how to submit and manage jobs.
How do I use application X?
Read the package's documentation, using one or more of the following:
- Inspect the package's
$NAME_HOME/share
or$NAME_HOME/doc
directory on Carbon (see module conventions). - Browse the package's web page, generally mentioned in the
module help
text or the application catalog entry. - Consult a package's man pages. Few packages have them. Man page files are generally installed under
$NAME_HOME/man
or$NAME_HOME/share/man
and if so, will be made available automatically to theman
command.
What's my account balance?
Simple answer: mybalance
To find out how many core-hours you have available, the simplest command to run is:
mybalance -h
Project Machines Balance -------- -------- ---------- user ANY 993.26 cnm34567 ANY 158760.93 cnm31234 ANY -148893.62
The table gives all the Project
s you have access to (for use with the qsub -A
argument), and their balance.
Machine
lists all systems that can book jobs against your allocations. Carbon is currently the only machine that can do so.
Balance
is your account balance, in core-hours, as selected by the -h command option. This is the most useful and recommended unit.
Without -h, you get core-seconds, which are integers but rather more unwieldy numbers.
- The "user" project provides you with a small initial startup allocation of typically 1000 core-hours.
- When a Balance is reported as negative, that account typically has a CreditLimit assigned, which permits the balance to dip below zero. These details, however, are not shown by
mybalance
.
Complete answer: gbalance
To get allocation details for accounts that have CreditLimits, run the gbalance
command. Pass on -u username or -p projectname to select your allocations:
gbalance -h -u $USER
- Use the literal string
$USER
which makes the shell fill in your actual username.
The ouput looks like:
Id Name Amount Reserved Balance CreditLimit Available --- -------- ---------- -------- ---------- ----------- --------- 100 cnm31234 -148893.62 0.00 -148893.62 150000.00 1106.38 217 kpelzer 993.26 0.00 993.26 0.00 993.26 123 cnm34567 166440.93 7680.00 158760.93 0.00 158760.93
The most relevant column for you is Available. The units, given the -h option, are again core-hours.
The colums and their meanings are:
- Id
- an internal number for the account.
- Name
- The project name (for use with
qsub -A
or#PBS -A
). - Amount
- Amount for transactions completely on the books for the project account; does not include running jobs or credits. Deposits are allocated by the User Office and implemented by the Carbon administrator.
- Reserved
- Amounts held in reserve by all running jobs using this account. The reserve ensures that a job does not cause an overdraft when it finishes and when its actual use will be booked. The quantity is calculated by walltime * number of cores blocked. When a job terminates, the charge according to the actual time used will be subtracted from Amount, and the unused quantities will be re-added to Available.
- Balance
- Available for new jobs; may go negative if CreditLimits are in place.
Balance = Amount - Reserved
- CreditLimit
- Amount by which Balance may go negative; assigned by the Carbon administrator.
- Available
- Relevant quantity for new jobs. Must be positive for a new job to start, and large enough to Reserve the entire job.
Available = Balance + CreditLimit
Allocation expiration policy – Or: Why did my account balance suddenly drop?
The compute time physically available by Carbon's processors is a perishable resource. Hence, your allocations are time-restricted in a use-it-or-lose-it manner. This is done to encourage consistent use of the machine throughout allocation cycles.
Because of resource contention, especially near the end of a cycle, it will be increasingly impractical and eventually physically impossible to use up a large remaining allocation within a short time. You would need to claim a large fraction of Carbon's nodes during a relatively short time window, which is unlikely to be possible because jobs from other users will be running as well.
The expiration schedule is as follows:
- Your allocation is provided in three equal-sized installments.
- All installments are active from the beginning.
- Installments expire in a staggered fashion, currently after 4, 8, and 12 months, respectively. A diagram might illuminate this:
Proposal Proposal start expiration |-------|-------|-------|------> Time 0 4 8 12 (months) | Installment | (1)|########.................... KEY | (2)|################............ . Installment is inactive | (3)|########################.... # Installment is active |
- Your jobs will, sensibly, be booked against the installments that expire the earliest.
My question is not answered here
See HPC/Support.