HPC/FAQ: Difference between revisions
m (→Mailing lists) |
|||
Line 235: | Line 235: | ||
=== Review network configuration === | === Review network configuration === | ||
* Read again [[HPC/Network Access]], and follow the instructions for your platform. | * Read again [[HPC/Network Access]], and follow the instructions for your platform. | ||
== My home directory is read-only == | |||
== Mailing lists == | == Mailing lists == |
Revision as of 17:12, October 8, 2020
Access and proposals
The HPC Carbon cluster may be accessed, like any tool at the CNM, by one of two routes:
- Facility Users, which includes all of Argonne staff, are eligible under CNM's User Access Program.
- As a secondary route, employees of Argonne's NST division are eligible for discretionary access.
The following is a summary of the main points as they pertain to Carbon.
User Registration
Your hosting institution must have a legal signed User Facility Agreement with Argonne in place before any work under an eventual proposal may be performed. Look up your institution under existing Agreements. If your institution is not shown, get started with establishing a user agreement as soon as possible. Legal wheels turn slowly and could delay your work.
Under the User Program, your starting point is to become a CNM user.
Proposal Overview
- To submit a new proposal, follow the instructions at the CNM's Call for Proposals.
- If a proposal has been successfully reviewed and been granted an allocation, the contacting author will get notified.
- The contacting author must respond, most importantly to confirm the names of participating users, as well as various characteristics of the proposal that we need for reporting to our funding agencies.
- User accounts will be created, activated or reactivated as needed, with ongoing input and responses from users.
- At the end of the process you as PI, and your users will be able to log in using their existing or newly created individual Argonne computer user account.
Account Types
CNM Users access our computer systems by several different accounts:
- User Registration Account
- The user name is your badge number, which is all numeric and has no leading zeroes.
- The account is created for all CNM facility users, including Principal Investigators (PIs), when you register at User Registration Portal.
- Your badge number will be assigned early on in the process, and you will be asked for a password.
- Use this account (your badge number and its associated password) to log in later to update your user registration.
- If you are a Proposal spokesperson (PI or delegate so designated), also use this account to manage proposals at our CNM Proposal Submission Portal.
- Finallym, use this account to take initial training that you may be asked to prior to beginning work, and prior to being assigned an Argonne computer account.
- Argonne Computer Account
- The user name for this account is initially a letter
b
followed by your badge number (without leading zeroes), and later will be personalized to a more human-friendly user name. - Use this regular Argonne computer account to perform hands-on work at CNM's Carbon High Performance Computing system, where you will typically run Linux terminal commands. This account's user name never begins with a number.
- You may use your Argonne Computer account once you have it to log in to the CNM Proposal Submission Portal.
For managing either account, including password resets, please visist https://mypassword.anl.gov or contact the CNM User Office. When contacting the user office, clarify which account you are referring to, such as by stating for which task you have difficulties.
Getting started on a proposal
To start working on a proposal once it has been granted, action is required for each proposal by the proposal spokesperson (usually the Principal Investigator,PI), regardless of any previous proposals with us. At this stage, we collect and update information about each proposal for reporting purposes to our funding agencies, and ensure that safety and training requirements are met by all participating users.
To get started as spokesperson or delegate
- Locate the email to you with subject User Proposal Status Notification, sent by the CNM User Office.
- If you are delayed in starting your proposal, you may need to expand your mailbox search time frame, to begin the search at possibly several months before present.
- You may need to inspect the "Junk" or "Spam" folder of your mail application.
- Follow the instructions in that email.
- Certain steps are needed only once because they pertain to your institution, notably filing a User Agreement.
- Before work on a new proposal may begin, we may ask you to submit User Activity Highlights that are due for previous proposals upon their conclusion.
- Fill in the Safety and Data form, i.e., follow the User Work Submittal (UWS) link that is given in our notification email.
- Only the spokesperson may fill in the UWS form, or the PI may designate a delegate. Changes after the initial UWS submission by you (the PI or delegate) can only be made by your Scientific Contact at the CNM – see below.
- The UWS is due for each proposal, as it collects certain metadata that we need, on a per-proposal-basis, for usage reporting to our funding agencies.
- Confirm or add users in section "Personnel Participating In User Proposal" of the UWS form. Enter the badge numbers for all participating users, as far as they are known at submission time.
- Direct all users who you expect to access Carbon, possibly including yourself, to follow the instructions in section #Ongoing user access requirements.
- Be patient. UWS Processing typically takes at least one business day, as both our User Office staff and your Scientific Contact need to attend to your submission in person.
The User Office will notify the spokesperson or delegate once work on a proposal may begin. This is what we call the User Work Authorization (UWA), which is typically valid for one year for regular proposals.
To get started as participating user
For new proposals or users new to CNM:
- Gently remind your spokesperson/PI to submit the UWS as shown above, or ask this person to nominate you via an email to the CNM User Office.
- Continue at section #Actions required by the User.
For proposals already in progress, see the next section.
Adding users to a proposal
Most users are nominated to work on a proposal when it is first submitted. Additional users can be authorized to work under a proposal at any time after acceptance. Actions are needed from both the user and the proposal spokesperson.
Actions required by the User
- 1. Enter or review your registration as facility user and update as necessary.
- 2. Follow up in a day or a week as specified at #Follow-up actions required by the User.
Actions required by the PI or spokesperson
- 3. Confirm that the user has entered or reviewed their registration, and determine their badge number(s).
- 4. Add the user to the proposal.
- If the proposal has not yet started: See the section above #To get started as spokesperson or delegate.
- If the proposal has already started: Email the users' names and badge numbers to the proposal's Scientific Contact at the CNM and ask for the UWS to be augmented.
Actions required by the Scientific Contact
- 5. Open the UWS-SciCon form in the CNM Proposal Dashboard on the "Inside CNM" web page.
- Add users under section "Participating Personnel".
- Enter users' badge numbers which may have been left empty in earlier UWS submissions, as happens by necessity for very newly registered users.
- Again, review that badge numbers are present and are correct for all relevant participants that are to work on Carbon, or at least need to use
mega
.
Follow-up actions required by the User
- 6. Contact our CNM User Office and ask to review your registration and to initiate the next steps.
- To work onsite at CNM or remotely on Carbon, you will need an Argonne computer account, which is different from the account you used to enter your User Registration details.
- If you are new to Argonne, our User Office will be in contact with you to set up this account. Since the Argonne account is for more general use than your registration account, it will involve formal approvals, cyber security training, etc. and typically takes about one week to complete.
- 7. Complete or refresh training courses, as requested by the User Office.
- Course requirements are triggered at various levels. There are courses for being a visitor at Argonne, a user at CNM, and for work performed under proposals, be it onsite or remotely.
- 8. Have your Argonne computer account finalized and ready.
- New users: See section #Finalize your Argonne computer account.
- Existing users: Check for and if necessary renew your expired password, see below.
Carbon account and access
- When all prerequisites are met, your Argonne account will be enabled or (if so needed) re-enabled for use on Carbon.
- 9. Await confirmation from us that your account has been enabled on Carbon and been added to one or more proposals.
- 10. Review and follow the instructions on HPC/Network Access, i.e., how to connect to the cluster.
If you do not receive an expected notification within two days, contact our User Office and describe your issue.
Finalize your Argonne computer account
After your registration as a new Argonne Facility User has been processed, an Argonne computer account will be created for you.
Its user name will be in the form b123456
and our User Office will have communicated to you an initial password.
To begin work on Carbon and possibly other CNM instruments, personalize both your password and your user name, in this order.
Change your password
- Change your password as instructed when your account got created.
- See also section #Verify or change your password below.
Change your user name
- Request a user name change.
- The link above will open a text form in your email application.
- For your "From" address, be sure to use the mail account that corresponds to the email address that you entered at User Registration – Messages pertaining to your Argonne activities sent from an unrelated email account may be disregarded, for obvious reasons.
- In the form itself, fill in your current name details and preferences for your new user name.
- If you have a user name at your institution and you are happy with it, supply that name as the first of three name preferences. This will make it simpler to configure your network connections to the cluster.
- Your preferred user name choices should have:
- letters in all lowercase, at most 8;
- at most 2 numerals, preferably none;
- no punctuation, including none of
.
-
_
, and of course none of this silliness:@,;:%&+*=()[]{}\|/<>#'"`~!$^
.
- Await an emailed ticket number from Argonne's service desk.
- Ticket numbers are assigned upon manual review, typically within an hour or so during business hours.
- If you did not get a ticket number after such reasonable time, re-send the request.
- Proceed with #Carbon account and access.
Proposal troubleshooting
Email the CNM User Office with any questions or concerns, such as:
- about proposals,
- about users, or
- you find that responding to any of your submissions or previous communications takes longer than a few business days.
Only our User Office staff is able to review all aspects of your proposal or your user access requirements, and determine any steps that have yet to be taken or need to be refreshed.
Include in your message:
- Proposal number(s),
- Name of the Principal Investigator (PI),
- Names and badge numbers of participants.
Ongoing user access requirements
For you to log into Carbon and its SSH gateway, a number of criteria must be met, most of which are subject to expiration dates and require action from you for renewal. You will NOT get notified (for varying reasons) for some of these expirations. Different duration terms may cause expirations to happen right in the middle of your proposal's lifecycle and can cause immediate inconvenience.
Depending on which criterion expired, one or both of the following happens:
- access under your account to
mega
will be revoked, - your Argonne computer account as such will be disabled.
To recover, do one or more of the following:
User Registration
- Review and update your User Registration.
- For returning visits, use your badge number to log in.
- The registration itself requires renewal at least every 2 years. It also expires when any of its prerequisite items expires.
- If you are not a US citizen, note especially the entries regarding your US visa or related work permit. Current status is required to access Argonne computers, just as if you were to visit in person.
- After you updated your user registration, contact the CNM User Office to have your Argonne account created or re-instated.
Legal User Agreement
- If you changed your affiliation, check to see if your new institution has a legal User Facility Agreement with Argonne in place and request one if not.
- Be advised that the process may take several weeks to percolate through legal and adminstrative channels.
Training
- Review and renew your User Courses.
- For remote users, the ESH223 course "Cybersecurity Annual Education and Awareness" is the one most likely to need renewal.
Password
- Review and update the password to Argonne computer account if you now have or previously had such an account.
- If your account's password expired, we consider the account dormant and will not notify you about new proposals where you are listed as a participant.
Active proposal
You must be a participant in at least one active or recently expired User Proposal. You may run compute jobs under a given proposal while it is active, i.e., within the dates stated in the proposal's User Work Authorization (UWA).
- To review the dates for your proposals, ask your PI to search their email archive for subjects "Work Approval Received" or "Proposal Expiration".
Data-only access for inactive proposals
After your last active proposal on Carbon has expired you may still access, for up to 30 days, the SSH gateway and Carbon's login nodes. Past that time window, your access will be revoked and your data may be deleted, following CNM's Data Retention Policy.
- While you still have access, offload from Carbon all your files and data that you may wish to keep.
- The CNM cannot be expected or held responsible to store your data beyond your access window.
Practical hints
- Set yourself calendar entries about one year into the future to remind yourself to renew any of your user registration or training requirements.
- After you were added to a user proposal, wait at least an hour or more before trying to access our SSH gateway, preferably until the next morning.
- Updates of your status need to be propagated through a handful of systems, each being done about hourly, so it may take several cycles, for your status change to reach
mega
.
- Updates of your status need to be propagated through a handful of systems, each being done about hourly, so it may take several cycles, for your status change to reach
Login issues
When you ask "I cannot log in" or "My password does not work", consider the following sections:
Review your access requirements
See section #Ongoing user access requirements above.
Verify or change your password
To verify, change, or reset your password, do one of the following:
- Visit https://servicenow.anl.gov/pr?id=credentials .
- Complete the Password Reset Enrollment at this service if you have not already done so.
- You cannot reset your password if your account has been disabled as such, in which case revisit the section #Ongoing user access requirements.
- Visit our older account profile service, being phased out in early 2021: https://mypassword.anl.gov/.
- You must have previously completed an Argonne Account Profile at this service.
- Profiles from this service were not transferred to our current account service above.
- Log in to our SSH gateway host
mega
but only if you have previously used it.- You will be asked to update an expired password upon logging in.
- At least one of your CNM proposals must still be active.
- You cannot log in if your account is disabled as such.
- Error messages for login failures will not disclose the specific reason, as is security practice. In other words, you cannot tell if merely your password was wrong or your account is disabled.
- If you cannot use our automated systems, request a password reset from the CNM User Office.
You can connect to mega
normally after a password update. If you find that you cannot log in yet, retry after an hour or two – it can take a while for passwords or other account changes to fully progagate to various systems.
Password lockouts
After trying incorrect passwords several times, your account may be temporarily locked out. To recover, try one of the following:
- Wait approximately 30 min.
- Unlock at https://mypassword.anl.gov/ if you have set up your Account Profile there.
Review host names
Connect to the correct host names:
- When connecting from outside Argonne, at least two ssh sessions are required.
mega.cnm.anl.gov
– an ssh "tunnel setup" connection.carbon
– one or more "payload" connections for the ssh, scp, or sftp commands. Here,carbon
is not a hostname but an entry in your ssh configuration that you must make (called a host alias or "profile", depending on the ssh application). The entry stands for a connection to localhost (your machine), at a port number forwarded by the preceding tunnel setup connection.
- When onsite (for any user) or using VPN (for Argonne staff):
carbon.cnm.anl.gov
. The previous name wasclogin. …
.
To learn more, read HPC/Network Access.
Review network configuration
- Read again HPC/Network Access, and follow the instructions for your platform.
My home directory is read-only
Mailing lists
Announcements about Carbon are made on the cnm-hpc-announce mailing list, hosted at Argonne. The mailing list home page and its archive are, unfortunately, only accessible from onsite or (for authorized users) over VPN.
- To unsubscribe from the mailing list, do one of the following:
- Open the "unsubscribe" link at the bottom of a recent message that you received from the mailing list.
- Send a blank message to cnm-hpc-announce-leave@lists.anl.gov and follow up on the confirmation notice.
- To subscribe to the mailing list, send a blank message to cnm-hpc-announce-join@lists.anl.gov and follow up on the confirmation notice.
- To change your email address on the list, simply do both of the above, in order.
- Hints
- When sending commands by email, be sure to have the relevant email account selected in the "From" line of the compose window in your email application.
- Inspect the Junk mail folder of your email application if you do not receive a confirmation message for subscribing or unsubscribing within a minute or so.
- See the GNU Mailman documenation for background.
Applications
I'd like to use application X
- Check if the application is already available on Carbon
Either:
- Browse the Application Catalog, or
- View the catalog on the Carbon command line:
module avail
module -l avail 2>&1 | less
- The second form gives you browsable output.
- If you cannot find the application on Carbon
- Submit a support request.
- Provide one or more URLs relevant to software you have in mind – be specific.
- Describe the problem you are trying to solve – it may well be that we can suggest an alternative solution.
- Give the extent of your planned use.
- If you see the application on Carbon but you cannot access it
- Existing license agreements may cover only a subset of users (typically Argonne employees).
- If you feel you are eligible, submit a support request.
- If a version newer than the installed one on Carbon is available
- Submit a support request.
- Include a URL to information about the new version.
How do I run application X?
- Customize your shell environment to load the application module.
- Learn about module conventions on Carbon.
- To determine the names of a package's executable scripts and binaries, load the application module (if you have not yet done so in your shell setup), then inspect the module's
$NAME_HOME/bin
directory. For instance, for the Quantum-ESPRESSO package:
module load quantum_espresso
ls $QUANTUM_ESPRESSO_HOME/bin
- Learn how to submit and manage jobs.
How do I use application X?
Read the package's documentation, using one or more of the following:
- Inspect the package's
$NAME_HOME/share
or$NAME_HOME/doc
directory on Carbon (see module conventions). - Browse the package's web page, generally mentioned in the
module help
text or the application catalog entry. - Consult a package's man pages. Few packages have them. Man page files are generally installed under
$NAME_HOME/man
or$NAME_HOME/share/man
and if so, will be made available automatically to theman
command.
What's my account balance?
Simple answer: mybalance
To find out how many core-hours you have available, the simplest command to run is:
mybalance -h
Project Machines Balance -------- -------- ---------- user ANY 993.26 cnm34567 ANY 158760.93 cnm31234 ANY -148893.62
The table gives all the Project
s you have access to (for use with the qsub -A
argument), and their balance.
Machine
lists all systems that can book jobs against your allocations. Carbon is currently the only machine that can do so.
Balance
is your account balance, in core-hours, as selected by the -h command option. This is the most useful and recommended unit.
Without -h, you get core-seconds, which are integers but rather more unwieldy numbers.
- The "user" project provides you with a small initial startup allocation of typically 1000 core-hours.
- When a Balance is reported as negative, that account typically has a CreditLimit assigned, which permits the balance to dip below zero. These details, however, are not shown by
mybalance
.
Complete answer: gbalance
To get allocation details for accounts that have CreditLimits, run the gbalance
command. Pass on -u username or -p projectname to select your allocations:
gbalance -h -u $USER
- Use the literal string
$USER
which makes the shell fill in your actual username.
The ouput looks like:
Id Name Amount Reserved Balance CreditLimit Available --- -------- ---------- -------- ---------- ----------- --------- 100 cnm31234 -148893.62 0.00 -148893.62 150000.00 1106.38 217 kpelzer 993.26 0.00 993.26 0.00 993.26 123 cnm34567 166440.93 7680.00 158760.93 0.00 158760.93
The most relevant column for you is Available. The units, given the -h option, are again core-hours.
The colums and their meanings are:
- Id
- an internal number for the account.
- Name
- The project name (for use with
qsub -A
or#PBS -A
). - Amount
- Amount for transactions completely on the books for the project account; does not include running jobs or credits. Deposits are allocated by the User Office and implemented by the Carbon administrator.
- Reserved
- Amounts held in reserve by all running jobs using this account. The reserve ensures that a job does not cause an overdraft when it finishes and when its actual use will be booked. The quantity is calculated by walltime * number of cores blocked. When a job terminates, the charge according to the actual time used will be subtracted from Amount, and the unused quantities will be re-added to Available.
- Balance
- Available for new jobs; may go negative if CreditLimits are in place.
Balance = Amount - Reserved
- CreditLimit
- Amount by which Balance may go negative; assigned by the Carbon administrator.
- Available
- Relevant quantity for new jobs. Must be positive for a new job to start, and large enough to Reserve the entire job.
Available = Balance + CreditLimit
Allocation expiration policy – Or: Why did my account balance suddenly drop?
The compute time physically available by Carbon's processors is a perishable resource. Hence, your allocations are time-restricted in a use-it-or-lose-it manner. This is done to encourage consistent use of the machine throughout allocation cycles.
Because of resource contention, especially near the end of a cycle, it will be increasingly impractical and eventually physically impossible to use up a large remaining allocation within a short time. You would need to claim a large fraction of Carbon's nodes during a relatively short time window, which is unlikely to be possible because jobs from other users will be running as well.
The expiration schedule is as follows:
- Your allocation is provided in three equal-sized installments.
- All installments are active from the beginning.
- Installments expire in a staggered fashion, currently after 4, 8, and 12 months, respectively. A diagram might illuminate this:
Proposal Proposal start expiration |-------|-------|-------|------> Time 0 4 8 12 (months) | Installment | (1)|########.................... KEY | (2)|################............ . Installment is inactive | (3)|########################.... # Installment is active |
- Your jobs will, sensibly, be booked against the installments that expire the earliest.
My question is not answered here
See HPC/Support.