HPC/Support: Difference between revisions

From CNM Wiki
< HPC
Jump to navigation Jump to search
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Request Suppport ==
== Review the FAQ list ==
For questions regarding the Carbon HPC system, first consult '''[[HPC/FAQ | Frequently Asked Questions]]''' to find most answers.
For frequently asked questions regarding the Carbon HPC system, review '''[[HPC/FAQ | Frequently Asked Questions]]''' to find answers.


=== Compose a support request Email ===
== Compose a support request for CNM HPC==
Compose an Email to our help desk.
If your question is not answered in the FAQ: <!-- , choose one of the following to get help: -->
* [https://argonne.servicenowservices.com/sp?id=sc_cat_item&sys_id=4e37eb5b13199f0016ff33228144b036 '''Submit a support request''']  – Log in with your regular Argonne username and password (same as on Carbon).
* Choose a suitable Request Type and provide the details as shown in the next section.
<!-- * [mailto:[email protected]?subject=CNM%20HPC%20issue Send an email to help@anl.gov] -->


Provide the following pieces of information:
== Required information ==
* Be as ''specific as possible'' in the body of your message. Include the following:
<!-- *: Do not merely hit "Reply" on a previous unrelated message - doing so might cause your request to get overlooked in a previous thread. -->
** The '''command''' you were trying to run or the menu item you chose.
Be as ''specific as possible'' in your free-form description or form input fields.
** The '''exact error message''' you get. Copy & Paste the message text, or take a screenshot and attach it to your mail.
** The path and '''file name''' of files you were working on, if applicable.
**: It is generally not necessary to include in your message copies of files that already reside on Carbon.
* For PBS jobs, include:
** The '''job number'''.
** The '''working directory''' of the job.
** We may ask you to prepare a '''test directory''' on Carbon, usually at the same directory level as your troubled runs.
**: Ideally, errors are reproducible. To diagnose subtle errors (where your application runs but does not fail outright), we will need a test directory which contains ''only'' files required by PBS or the application:
*** the PBS job file,
*** input files,
*** any local data files,
*** if available, a file with the ''expected'' (correct) output, with a name differnt from the one generated by the job, e.g. a file with an added extension <code>.ref</code>.
* For remote access issues, include the following:
** The '''hostname''' you ran the command on (generally shown in the command prompt).
** The '''hostname''' you're trying to ''reach.''
** The '''username''' you use to connect.
** The '''software''' name and version you use to connect (e.g. SSH, VNC, or a browser).
** Your '''operating system''' and version.
* Review your '''subject line''' before you submit the message.


=== For all issues – local and remote ===
Provide the following:
* The '''command''' you were trying to run or the menu item you chose in a graphical user interface
* The <font color="red">'''exact error message'''</font> you get. Copy & Paste the message text, or take a screenshot and attach it to your mail.
* The '''directory path''' and '''file name''' you were working on, if applicable.
*: It is generally not necessary to include in your message copies of files that already reside on Carbon.
* Did the same command or action '''work previously''', if ever?
* Which '''changes''', if any, did you make to your configuration or input files since the last time this worked?
* Fill in an appropriate email '''Subject''' or form '''Title'''.
*: Best do this ''after'' you typed and reviewed the main text of your request.


If your question is not answered in the FAQ, send your support request to the NST IT help desk:
=== For issues with PBS jobs ===
In addition to the [[#For all issues|preceding section]], for questions concerning PBS jobs, provide:
* the '''working directory''' of the job
* the name of the '''job file''' used
* the '''job number''' (for queued or running jobs)


=== For remote access ===
When you cannot log in to mega or a ''Carbon'' login node, provide:
* The details in section [[#For all issues]],
plus the following:
* The '''hostname''' you ran the command on. It is often shown in the command prompt. (If the prompt shows <code>localhost</code> it is not specific enough – ask your local support person for help.)
* The '''hostname''' you're trying to ''reach.''
* The '''username''' you use to connect.
* The '''software''' name and version you use to connect (e.g. SSH, VNC, or a browser).
* Your '''operating system''' and version.


=== Receiving support ===
If you use ssh or scp, repeat your command in verbose mode by supplying the "-v" option twice:
<source lang="bash">
scp -v -v [… other options and arguments …]
ssh -v -v [… other options and arguments …]
</source>
: Copy and Paste the possibly multi-page output into your request.
 
<!--
=== Review ===
* Make sure you included the ''error message'' that prompted your request.
* Review and if needed edit the '''Subject''' to match your question. <!--  (A simple reply to an unrelated system announcement may get ignored.)
* Submit.
//-->
 
=== Test directory ===
We may ask you to prepare a ''test directory'', which should help to:
* Reproduce your problem.
* Diagnose your problem, especially when errors are subtle or difficult to reproduce, such as where your application runs but does not fail outright.
* Confirm resolution.
 
Prepare the test directory so that it contains ''only'' files to run your application, either interactively or in a single PBS job, and possibly a file with correct output.
; Preferred location:
* In or under a ''sibling'' directory of one of a your failed runs (as opposed to a ''child'' directory).
; Include:
* PBS job file, if applicable,
* Run-specific data and script files (files named in input data or hardcoded in the code, possibly as symbolic links),
* If available, a file with the ''expected'' (correct) output, but with a file name different from the one generated by the job, such as by adding <code>.ref</code>.
 
In the course of diagnostics, we will suitably duplicate your directory as needed.
 
== Receiving support ==
When you get our support response:
When you get our support response:
* Carefully read it.
* Carefully read it.
Line 39: Line 75:
* For account- and password-related issues: it may take several hours for changes to take effect. If your initial attempt fails, wait at least that long before retrying.
* For account- and password-related issues: it may take several hours for changes to take effect. If your initial attempt fails, wait at least that long before retrying.


=== Additional considerations ===
== Additional considerations ==
To help us diagnose a problem:
To help us diagnose a problem:
* Read [http://www.chiark.greenend.org.uk/~sgtatham/bugs.html How to Report Bugs Effectively], by [http://www.chiark.greenend.org.uk/~sgtatham/ Simon Tatham], programmer.<!-- of [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] fame. -->
* Read [http://www.chiark.greenend.org.uk/~sgtatham/bugs.html How to Report Bugs Effectively], by [http://www.chiark.greenend.org.uk/~sgtatham/ Simon Tatham], programmer.<!-- of [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] fame. -->
* For deeper reading, consider [http://www.catb.org/~esr/faqs/smart-questions.html How To Ask Questions The Smart Way], by [http://en.wikipedia.org/wiki/Eric_S._Raymond Eric S. Raymond], open source pioneer.
* For deeper reading, consider [http://www.catb.org/~esr/faqs/smart-questions.html How To Ask Questions The Smart Way], by [http://en.wikipedia.org/wiki/Eric_S._Raymond Eric S. Raymond], open source pioneer.
<!-- * Also, the [http://mywiki.wooledge.org/XyProblem X-Y Problem] (mostly for programming-related tasks). -->
<!-- * Also, the [http://mywiki.wooledge.org/XyProblem X-Y Problem] (mostly for programming-related tasks). -->

Latest revision as of 16:57, January 5, 2022

Review the FAQ list

For frequently asked questions regarding the Carbon HPC system, review Frequently Asked Questions to find answers.

Compose a support request for CNM HPC

If your question is not answered in the FAQ:

  • Submit a support request – Log in with your regular Argonne username and password (same as on Carbon).
  • Choose a suitable Request Type and provide the details as shown in the next section.

Required information

Be as specific as possible in your free-form description or form input fields.

For all issues – local and remote

Provide the following:

  • The command you were trying to run or the menu item you chose in a graphical user interface
  • The exact error message you get. Copy & Paste the message text, or take a screenshot and attach it to your mail.
  • The directory path and file name you were working on, if applicable.
    It is generally not necessary to include in your message copies of files that already reside on Carbon.
  • Did the same command or action work previously, if ever?
  • Which changes, if any, did you make to your configuration or input files since the last time this worked?
  • Fill in an appropriate email Subject or form Title.
    Best do this after you typed and reviewed the main text of your request.

For issues with PBS jobs

In addition to the preceding section, for questions concerning PBS jobs, provide:

  • the working directory of the job
  • the name of the job file used
  • the job number (for queued or running jobs)

For remote access

When you cannot log in to mega or a Carbon login node, provide:

plus the following:

  • The hostname you ran the command on. It is often shown in the command prompt. (If the prompt shows localhost it is not specific enough – ask your local support person for help.)
  • The hostname you're trying to reach.
  • The username you use to connect.
  • The software name and version you use to connect (e.g. SSH, VNC, or a browser).
  • Your operating system and version.

If you use ssh or scp, repeat your command in verbose mode by supplying the "-v" option twice:

scp -v -v [… other options and arguments …]
ssh -v -v [… other options and arguments …]
Copy and Paste the possibly multi-page output into your request.


Test directory

We may ask you to prepare a test directory, which should help to:

  • Reproduce your problem.
  • Diagnose your problem, especially when errors are subtle or difficult to reproduce, such as where your application runs but does not fail outright.
  • Confirm resolution.

Prepare the test directory so that it contains only files to run your application, either interactively or in a single PBS job, and possibly a file with correct output.

Preferred location
  • In or under a sibling directory of one of a your failed runs (as opposed to a child directory).
Include
  • PBS job file, if applicable,
  • Run-specific data and script files (files named in input data or hardcoded in the code, possibly as symbolic links),
  • If available, a file with the expected (correct) output, but with a file name different from the one generated by the job, such as by adding .ref.

In the course of diagnostics, we will suitably duplicate your directory as needed.

Receiving support

When you get our support response:

  • Carefully read it.
  • Follow all instructions and answer all questions.
  • For account- and password-related issues: it may take several hours for changes to take effect. If your initial attempt fails, wait at least that long before retrying.

Additional considerations

To help us diagnose a problem: