HPC/Applications/lumerical: Difference between revisions
m (→Native) |
m (→Native) |
||
Line 33: | Line 33: | ||
To get started: | To get started: | ||
* [mailto:[email protected]?Subject=Install%20Lumerical-FDTD Submit an IT support request] to have the | * You will need to have the FDTD-Solutions software installed on your desktop or laptop. [mailto:[email protected]?Subject=Install%20Lumerical-FDTD Submit an IT support request] to have the software installed on an Argonne-owned machine. | ||
* Configure license access as shown below. | * Configure license access as shown below. | ||
Revision as of 22:13, April 24, 2019
Introduction
Lumerical has two main components:
- fdtd-solutions (also known as CAD): The GUI. We have licensed 4 concurrent seats. Access is for Argonne staff only. To request access, submit a support request.
- fdtd-engine (also known as FDTD): The compute engine, called either by the GUI or in a PBS job file. We have 17 concurrent seats. Access is granted to all CNM users.
Manual and Knowledge Base
Running the fdtd-solutions GUI (CAD)
Using the fdtd-solutions GUI application requires access to a Lumerical license service that runs on Carbon. There are three ways to run the appliation with the proper license access:
- Run on Carbon and display on your machine over X11 forwarding
- Run on Carbon in a VNC virtual desktop and display that desktop on your machine.
- Run on your desktop directly and use license port forwarding.
X11
Run "fdtd-solutions" as an X11 application on a Carbon login node
fdtd-solutions &
- Requires an X11 server application on your desktop machine.
- Requires X11 forwarding under SSH.
- Disadvantage: May run sluggish.
VNC
Run "fdtd-solutions" on Carbon in a VNC virtual desktop.
- Requires a VNC client on your side.
- Advantage: uses compression and hence can be faster than X11.
- Disadvantage: Limited desktop environment.
Native
Run the "FDTD Solutions" application on your desktop, and configure it to obtain the Lumerical license from Carbon.
- Advantage: Native graphics speed.
- Caveat: When running outside of CNM (which is possible), you will need an active network connection to NST.
To get started:
- You will need to have the FDTD-Solutions software installed on your desktop or laptop. Submit an IT support request to have the software installed on an Argonne-owned machine.
- Configure license access as shown below.
Running compute jobs within fdtd-engine (FDTD)
To run a presumably parallel job, construct your model and task within the fdtd-solutions application and save it as an .fsp file. Then copy and customize the following job template, entering your account and file names:
$LUMERICAL_HOME/sample.job
Note: fdtd-engine does not support checkpointing. Select your #PBS -l walltime
parameter generously.
Memory issues
- Problem
When running an fdtd-engine compute job in parallel, during the final stage (Data Collection) the memory use of the MPI master process could dramatically increase. For example, during the calculation ("cruising") stage each MPI process may be happy with using 2-4 GB, but in the collection stage, the master process (MPI rank 0) may require 10 times as much and more. The other MPI processes evidently remain idle but running during the collection stage, and continue to hold on to their memory, which could amount to a valuable 10–15 GB, say. The increased memory demand may cause the node to go into swap, even for Carbon's "bigmem" nodes which have 48 GB RAM each. Swap use will be detected and at first tolerated by TORQUE (PBS), but after a few minutes TORQUE will kill the job. In that case, the following error will appear in the standard error stream:
=>> PBS: job killed: swap rate due to memory oversubscription is too high mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate mpirun: killing job...
The standard output stream may reach 99% or 100%, then stop:
0% complete. Max time remaining: 16 mins, 4 secs. Auto Shutoff: 1 1% complete. Max time remaining: 15 mins, 4 secs. Auto Shutoff: 1 … 98% complete. Max time remaining: 20 secs. Auto Shutoff: 6.21184e-05 99% complete. Max time remaining: 9 secs. Auto Shutoff: 5.5162e-05 100% complete. Max time remaining: 0 secs. Auto Shutoff: 4.51006e-05
- For success, there should rather be several more lines with collection notes and of course "Simulation completed successfully".
- Workarounds
- Try to collect less data.
Chris K. from Lumerical Support wrote:
For example, all field monitors allow you to choose which E/H/P fields to collect. If you only care about power transmission through a monitor, you can disable all the E/H/P fields and just collect the 'net power'. You can also control the number of frequency points, and you can enable spatial downsampling, and obviously just make the monitors smaller.
- Have the master process run on a node of its own, as shown at HPC/Submitting and Managing Jobs/Advanced node selection#Different PPN by node. You will trade away compute capacity for memory. Caveat: Try to keep the total number of cores highly divisible. Example:
- A typical request for 64 cores:
#PBS -l nodes=8:ppn=8
- Improved request:
#PBS -l nodes=1:ppn=1:bigmem+1:ppn=7+7:ppn=8 #PBS -l naccesspolicy=SINGLEJOB -n
Note that the +
sign is a field delimiter in the "nodes" specification.
This specification requests the same number of cores total, but split up the load of the original first node over two nodes, one with a single core, and the second with the remaining 7 cores, followed by as many 8-core nodes as needed.
nodes = ( 1:ppn=1 ) + ( 1:ppn=7 ) + ( 7:ppn=8 ) = 1 + 7 + 7 * 8 = 64 cores.
Rank 0 will have the entire RAM on the first node available, and is the only rank to likely need "bigmem". The other ranks are modest in memory needs and unlikely to face contention. Usually, Moab will ensure that all ranks run on the same node generation, in this case gen2 (see HPC/Submitting and Managing Jobs/Advanced node selection#Hardware).
Running Optimization Jobs
- Prepare your optimization project as needed, save it as *.fsp file, and if needed copy it onto Carbon.
- Open fdtd-solutions on Carbon.
Resource Configuration
- Click Configure resources under the Simulation menu entry.
- Remove all but the "localhost" entry.
- Push Add.
- Double-click the Name column of the new resource entry, enter "Carbon", and press enter.
- Push Edit.
- Enter the advanced options as shown:
Job launching: Custom mpiexec engine: /opt/apps/lumerical/8.16.931-1/bin/mpiexec-as-job (Copy the directory components from the "FDTF engine" setting below). Extra mpiexec command line options: -l nodes=1:ppn=8 -l walltime=2:00:00 -- Suppress any default mpiexec options: yes Bypass mpi on localhost: no FDTD engine: (leave default) Extra FDTD … options: (leave empty) Create log for all processes: no
- Push OK. The window will close.
Testing the Configuration
- In the toplevel Resource Configuration window you may want to push "Run tests". This may work and you get "MPICH tests completed successfully". Likely, this will turn into "Timeout" shortly afterwards.
- Duplicate the "Carbon" resource about 5-8 times.
- Push Save.
You are ready to run the optimization.
Running the optimization
- You'll get windows like the following. There will be several volleys, each producing a "swarm" point in the figure-of-merit trend plot.
- Hints
- You may go back and edit the Advanced options of a resource, but you must remove all other previously cloned entries, and re-clone the newly edited resource.
- When you choose to provide qsub options under "Extra mpiexec command line options", such as to allow for a walltime longer than the default 1 hour, ensure to append "--".
Remote license access
To use #Native access, your desktop copy of Lumerical must be configured to connect to Carbon's license server. The steps required depend on the network location of your computer, as follows:
When inside CNM or using VPN
Note: This section does not apply for running fdtd-solutions or fdtd-engine on Carbon itself.
On your desktop or laptop, do the following:
- Verify that you are inside CNM's network
- Open the link http://carbonmgmt/ in your browser, exactly as shown (using an unqualified hostname, i.e., without dots).
- If you get a status page mentioning "Carbon", your configuration is suitable. Otherwise, if you get an error message similar to "server cannot be found":
- Open your computer's "Network" configuration settings. – You may need to have administrator privileges to do so. Request assistance if needed.
- Locate the tab or section for DNS configuration. On Mac, push the "Advanced…" button first.
- Add
cnm.anl.gov
as a DNS Search Domain and apply the settings. - Retry the link above.
- Explanation: There are two reasons for the steps above:
- Reconcile our division domain name "nst.anl.gov" with previous Carbon and Lumerical license configurations made for "cnm.anl.gov".
- Accommodate license access from both inside and outside of the Carbon cluster.
- Start the License Configure or simply the main FDTD Solutions application on your machine.
- Note: You may see a window with title "Getting Started" containing a notice titled "Blocked by Outdated Software" from Argonne's Cyber Security Program Office. You may safely close the window. The notice appears because the window is a web page that is rendered by an often older browser engine compiled-in to FDTD.
- If you started FDTD Solutions, locate the Configure License menu item, either in the application or Help menu.
- Choose the tab "Floating" in the Configure License window.
- Activate the checkbox: Configure redundant servers.
- Set the Server entries to
cmgmt3
,cmgmt4
, andcsched1
, exactly as shown, without a domain name part.
- If you used the License Configure app, close it, and start the main FDTD Solutions application.
- In FDTD, choose About FDTD Solutions.
- Compare the License Server string to the entry shown below.
Web blocking
- Problem
For the About Lumerical menu item, instead of proper license information, an Argonne network blocking page may appear when using an older (or even not-so old) version of FDTD. This is due to Lumerical using a built-in web browser engine that may be considered outdated by Argonne's network security configuration.
- Workaround
- Close the About window and work normally. Normal operation should not be affected by the blocking.
When outside CNM
- Close any connection to Carbon's login nodes, and Mega.
- Revisit the port forwarding configuration for your your SSH client (one-time only).
- Mac and Linux users: Add to ~/.ssh/config, in the section for mega:
Host mega
…
# Lumerical
LocalForward 27011 carbonmgmt:27011
LocalForward 27014 carbonmgmt:27014
- Windows users: See HPC/Network Access/PuTTY Configuration/Accessing Carbon licenses remotely
- Re-open the connection to Mega.
- Install or upgrade the Lumerical application
- Open Lumerical as usual. When asked for the license information, enter the host IP address as follows:
- Start Lumerical.: To inspect the license setting, choose About Lumerical from the application menu.
Advanced: Licensing entry in .ini files
- Each application from Lumerical, Inc. stores the license setting in its own
*.ini
file, which on Mac and Linux platforms is located in your~/.config/Lumerical/
directory. The file names are:
FDTD Solutions.ini INTERCONNECT.ini Lumerical DEVICE.ini MODE Solutions.ini
- The license entry is typically at the end of each
*.ini
file and should read one of the following, mirroring the previous two sections:
…
[license]
type=flex
flexserver\host=27011@cmgmt3:27011@cmgmt4:27011@csched1
or
…
[license]
type=flex
flexserver\host=[email protected]
- You may edit the files manually in a text editor while the respective application is not running. (Otherwise your edits will be overwritten when the application closes and saves state.)
Legacy mechanism - Retired July 2013
This was the license mechanism [1] in use before version 8.6. No license tokens for this mechanism remain active.