HPC/Applications/python: Difference between revisions

From CNM Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(80 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOC_right}}
{{TOC_right}}
Within recent years, entire Python software suites became popular,
There are many Python installations on Carbon, each normally prepared for access by a <code>module load  ''modulename''</code> shell command.
usually based on the [https://en.wikipedia.org/wiki/Conda_(package_manager) Conda package manager].
Modules are grouped and named according to the type and scope of the Python installation provided.
The suites typically contain:
Each module gives you access to a Python interpreter and typically
* A Python interpreter.
one or more Python-native packages for <source lang="python" inline>import</source>,
* The conda/pip package management systems.
and to supporting software.
* A wide-ranging set of Python-native packages, extensions, and add-on executables, often including:
 
** Idle – Python Integrated DeveLopment Environment
<!-- To select a particular Python installation, use <code>module load ''modulename''</code>. -->
** iPython and juPyter – Interactive computation system
As with other modules, each <code>''modulename''</code> will look like a Unix file name that has one or more directory components.
** Cython – Compiler for a superset of Python and C/C++
You can (and normally should) use abbreviated module names, with recommended name forms shown below.
* Underlying libraries and supporting binaries, e.g.:
For abbreviated names, the <code>module</code> shell command
** Tcl – Interpreter for the Tool Command Language
will select the most suitable module from the modules that would complete your abbreviated name.
** BLAS/LAPACK – Linear algebra libraries
This will usually be the module with the highest version number or an administrator-designated default.
** MPI – Runtime environment for parallel applications using Message Passing Interface, providing the <code>mpirun</code> or <code>mpiexec</code> commands, plus related binaries and libraries.


<div style="background: #fdd">
What follows are the types of Python installations and their module naming convention.
'''Caveat''': An MPI Runtime within a Python suite makes parallel computing easily accessible under Python,
For the current full list of these modules, run the shell command <code>module avail python</code>.
but this means a "mere python module" can clash with other MPI implementations used on Carbon.
Please contact [mailto:[email protected]?Subject=Python%20on%20Carbon [email protected]] for help to choose an approach.


Avoid loading modules at the same time for both an MPI implementation and a Python suite, especially in your dot-files.
== Python standard distributions ==
Doing so may lead to non-obvious failures with either the bundled parallel Python applications or (worse) unrelated compiled applications on Carbon.
; Contains:
</div>
: A Python interpreter and solely its standard packages (which vary by version), as distributed by the Python Software Foundation at [http://www.python.org python.org].


== Interpreter-only releases from the Python Software Foundation ==
; Module nomenclature:
; Module nomenclature:
: <code>'''python'''/''pyMajor.pyMinor''/''compilerName-compMajor.compMinor''/''pyMajor.pyMinor.pyPatch-moduleBuild''</code>
: <code>'''python'''/''pyMajor.pyMinor''/''compilerName-compMajor.compMinor''/''pyMajor.pyMinor.pyPatch-moduleBuild''</code>
<!-- '''<font color="green">python</font>/<font color="green">''pyMajor.pyMinor''</font>'''/<font color="#c80">''compilerName-compMajor.compMinor''</font>/<font color="#888">''pyMajor.pyMinor.pyPatch''-''carbonBuild''</font> -->
; Examples of full module names:
: <code>python/2.7/gcc-4.1/2.7.3-1</code>
: <code>python/2.7/gcc-4.4/2.7.11-1</code>
: <code>python/3.5/gcc-4.4/3.5.1-1</code>
; Recommended module abbreviations:
: <code>python/2.7</code>
: <code>python/3.5</code>
; When to use:
: Use one of these modules as base for installing one or a few Python-native packages yourself, as long as the dependencies are not too odious or performance is not critical. For more complex requirements, choose a [[#Python suites with vendor-distributed package set|vendor-distributed]] or [[#Python suites with customized package set|Carbon-customized Python suite]] as shown below.
{{caution|Packages that were installed under one Python version (or module) usually are not accessible through another version, especially packages that contain binaries like shared libraries and executables.}}
== Python bundles within the OS ==
; Contains:
; Contains:
:* The Python interpreter only, from the [http://www.python.org python.org] sources.
: The Python interpreter that comes with the operating system. This particular installation is required for many system internals and is the one expected by system-provided add-on packages (installed via rpm or yum). Fortunately, system functionality is not affected (except for possible resource exhaustion) when you choose a different Python installation for user application tasks.
<!-- '''<font color="green">python</font>/<font color="green">''pyMajor.pyMinor''</font>'''/<font color="#c80">''compilerName-compMajor.compMinor''</font>/<font color="#888">''pyMajor.pyMinor.pyPatch''-''carbonBuild''</font> -->


== Python interpreters and packages as bundled with the OS ==
; Module nomenclature:
; Module nomenclature:
: <code>'''python-''osname'''''/''pyMajor.pyMinor''/''compilerName-compMajor.compMinor''/''pyMajor.pyMinor.pyPatch''</code>
: <code>'''python-''osname'''''/''pyMajor.pyMinor''/''compilerName-compMajor.compMinor''/''pyMajor.pyMinor.pyPatch''</code>
; Examples of full module names:
: <code>python-centos/2.4/gcc-4.1/2.4.3</code>
: <code>python-centos/2.6/gcc-4.4/2.6.6</code>
: <code>python-centos/2.7/gcc-4.8/2.7.5</code>
; Recommended module abbreviations:
: <code>(none)</code>
; When to use:
: You normally do not need to load these modules. They are included here for visibility in the module lineup, and as a prerequisite for some older modules which provided Python-native packages independently (outside a package manager), and which are usually tied to the version of the Python interpreter under which they were installed.
== Python suites with vendor-distributed package set ==
; Contains:
; Contains:
:* The Python interpreter that comes with the operating system, which is required for many system internals and for system-provided add-on packages (installed via rpm or yum).
: Python software suite with the '''package selection as distributed by the vendor.'''
These module normally do not need to be loaded.
They are included here for visibility in the module system,
and as a prerequisite for (usually older) modules which provide Python packages outside of Conda or Pip.
Such packages are usually tied to the version of the Python interpreter under which they were installed.


== Vendor-distributed suites ==
; Module nomenclature:
; Module nomenclature:
: <code>'''python-''distributor'''''/''pyMajor.pyMinor''/''distMajor''/''distributor_defined_version[…]-moduleBuild''</code>
: <code>'''python-''distributor'''''/''pyMajor.pyMinor''/''distMajor''/''distributor_defined_version[…]-moduleBuild''</code>
; Examples of full module names:
: <code>python-intel/2.7/2018/2.7.14-2018.1.023-1</code>
: <code>python-intel/3.5/2017/3.5.3-2017.3.052-1</code>
: <code>python-intel/3.6/2018/3.6.3-2018.1.023-1</code>
: <code>python-anaconda/2.7/4/2.7.11-4.0.0-2</code>
; Recommended module abbreviations:
: <code>python-intel/2.7</code>
: <code>python-intel/3.6</code>
: <code style="color:#999;">python-anaconda/2.7</code> <span style="color:#999;">(deprecated)</span>
: <code style="color:#999;">python-anaconda/3.5</code> <span style="color:#999;">(deprecated)</span>
; When to use:
: Use one of these modules for general scientific computing projects, or as ready-made base for installing your own packages when one of the [[#Python standard distributions]] proved insufficient.
Python-based software suites with a broad sope became popular in recent years.
<!-- with a much broader sope than a mere interpreter and a few native packages -->
They typically contain:
* A Python interpreter.
* The [https://en.wikipedia.org/wiki/Conda_(package_manager) Conda]/[https://pip.pypa.io/en/latest/ pip] package management systems (see [https://packaging.python.org/guides/tool-recommendations/ more context]).
* A wide-ranging set of Python-native packages, extensions, and add-on executables, often including:
** [http://www.numpy.org NumPy]/[https://www.scipy.org/ SciPy] – Ecosystem for scientific computing in Python
** [https://docs.python.org/3/library/idle.html Idle] – Python Integrated DeveLopment Environment
** [https://ipython.org iPython] and [https://jupyter.org/ Jupyter] – Interactive computation system
** [http://cython.org Cython] – Compiler for a superset of Python and C/C++
* ''All'' libraries and supporting binaries required by the above, e.g.:
** [http://www.netlib.org/blas/ BLAS]/[http://www.netlib.org/lapack/ LAPACK] – Linear algebra libraries
** [http://www.tcl.tk/ Tcl] – Interpreter for the Tool Command Language
** [http://mpi-forum.org/docs/ MPI] – Runtime environment for parallel applications using Message Passing Interface, including the <code>mpirun</code> or <code>mpiexec</code> launch commands.
The last item can be problematic because libraries and supporting binaries included in a Python suite can interfere with other software modules on Carbon.
For access by users, HPC software typically leverages conventional Unix environment variables
like <code>PATH</code>,  <code>LD_LIBRARY_PATH</code>, and  <code>PYTHONPATH</code>.
These environment variables are interpreted front-to-back, which implies priorities
and can easily lead to resources from one module overshadowing those from another when they use the same name.
In particular, a Python suite with an included MPI runtime makes parallel computing easily accessible under Python,
but this means a "mere python module" can clash with other MPI implementations used on Carbon.
Loading a module for a Python suite that contains MPI and a separate MPI module at the same time can be done but then requires more detail for the MPI launcher in job scripts.
Without that, non-obvious failures will result for either bundled parallel applications within the Python suite or (worse) unrelated compiled applications on Carbon.
{{caution|Avoid loading modules at the same time for both a Python suite with an included MPI and a separate MPI implementation, especially in your dot-files (.bashrc, .modules-el*).}}
Various workarounds for MPI clashes exist:
* Altering the module load order.
* Launching the non-python MPI variant with qualified paths like <code>$FOO_HOME/bin/mpirun</code>.
* If MPI is not needed on the Python side, use a custom virtual environment that has MPI excluded (described in the next section).
== Python suites with customized package set ==
<!-- == Custom Conda environments == -->
; Contains:
; Contains:
:* Python software suite with the '''package selection as distributed by the vendor.'''
: [https://wiki.python.org/moin/IntermediatesGuide#Multiple_isolated_Python_environments Python virtual environment] with a '''package selection customized for Carbon.'''


== Suites with customized package selection ==
<!-- == Custom Conda environments == -->
; Module nomenclature:
; Module nomenclature:
: <code>'''python-env-''distributor'''''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''</code>
: <code>'''python-env-''distributor'''''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''</code>
:: A default selection of packages deemed useful for jobs on Carbon.
:: Python with a default selection of packages deemed generally useful on Carbon.
: <code>'''python-env-''distributor''-''purpose'''''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''</code>
: <code>'''python-env-''distributor''-''purpose'''''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''</code>
:: Same, with alternative package selections. (TBD)
:: Same, with alternative package selections for specific uses, created on request.
<!-- # or?
<!-- # or?
  '''python-env-''distributor''-Carbon'''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''
  '''python-env-''distributor''-Carbon'''/''pyMajor.pyMinor''/''distMajor''/''pyVersion[…]-moduleBuild''
-->
-->
; Contains:
 
:* Python software suite with the '''package selection customized for Carbon.'''
; Examples of full module names:
:: This is usually implemented as a [https://conda.io/docs/user-guide/tasks/manage-environments.html Conda "environment"] derived from a vendor-distributed Conda suite.
: <code>python-env-intel/2.7/2018/2.7-01</code>
: <code>python-env-intel/3.5/2017/3.5-01</code>
: <code>python-env-intel/3.6/2018/3.6-01</code>
: <code style="color:#999;">python-env-anaconda/2.7/4/2.7.11-09</code> <span style="color:#999;">(deprecated)</span>
; Recommended module abbreviations:
: <code>python-env-intel/2.7</code>
: <code>python-env-intel/3.6</code>
: <code style="color:#999;">python-env-anaconda/2.7</code> <span style="color:#999;">(deprecated)</span>
: <code style="color:#999;">python-env-anaconda/3.5</code> <span style="color:#999;">(deprecated)</span>
 
; When to use:
: Use one of these modules projects in domains related to nanoscience on Carbon, like atomistic or photonic modeling.
 
 
Packages from nanoscience-related domains are typically not included in [[#Python suites with vendor-distributed package set|vendor-distributed suites]].
On Carbon, a set of virtual environments is routinely provided for each vendor distribution,
with a growing list of such packages added on and ready to use.
<!-- The virtual environments are derived from one of the vendor-distributed Conda suites described in the previous section. -->
 
To request a custom environment or that packages be added to an existing environment, please contact [mailto:[email protected]?Subject=Python%20virtual%20environment [email protected]].

Latest revision as of 19:24, January 24, 2018

There are many Python installations on Carbon, each normally prepared for access by a module load modulename shell command. Modules are grouped and named according to the type and scope of the Python installation provided. Each module gives you access to a Python interpreter and typically one or more Python-native packages for import, and to supporting software.

As with other modules, each modulename will look like a Unix file name that has one or more directory components. You can (and normally should) use abbreviated module names, with recommended name forms shown below. For abbreviated names, the module shell command will select the most suitable module from the modules that would complete your abbreviated name. This will usually be the module with the highest version number or an administrator-designated default.

What follows are the types of Python installations and their module naming convention. For the current full list of these modules, run the shell command module avail python. Please contact [email protected] for help to choose an approach.

Python standard distributions

Contains
A Python interpreter and solely its standard packages (which vary by version), as distributed by the Python Software Foundation at python.org.
Module nomenclature
python/pyMajor.pyMinor/compilerName-compMajor.compMinor/pyMajor.pyMinor.pyPatch-moduleBuild
Examples of full module names
python/2.7/gcc-4.1/2.7.3-1
python/2.7/gcc-4.4/2.7.11-1
python/3.5/gcc-4.4/3.5.1-1
Recommended module abbreviations
python/2.7
python/3.5
When to use
Use one of these modules as base for installing one or a few Python-native packages yourself, as long as the dependencies are not too odious or performance is not critical. For more complex requirements, choose a vendor-distributed or Carbon-customized Python suite as shown below.

Caution: Packages that were installed under one Python version (or module) usually are not accessible through another version, especially packages that contain binaries like shared libraries and executables.

Python bundles within the OS

Contains
The Python interpreter that comes with the operating system. This particular installation is required for many system internals and is the one expected by system-provided add-on packages (installed via rpm or yum). Fortunately, system functionality is not affected (except for possible resource exhaustion) when you choose a different Python installation for user application tasks.
Module nomenclature
python-osname/pyMajor.pyMinor/compilerName-compMajor.compMinor/pyMajor.pyMinor.pyPatch
Examples of full module names
python-centos/2.4/gcc-4.1/2.4.3
python-centos/2.6/gcc-4.4/2.6.6
python-centos/2.7/gcc-4.8/2.7.5
Recommended module abbreviations
(none)
When to use
You normally do not need to load these modules. They are included here for visibility in the module lineup, and as a prerequisite for some older modules which provided Python-native packages independently (outside a package manager), and which are usually tied to the version of the Python interpreter under which they were installed.

Python suites with vendor-distributed package set

Contains
Python software suite with the package selection as distributed by the vendor.
Module nomenclature
python-distributor/pyMajor.pyMinor/distMajor/distributor_defined_version[…]-moduleBuild
Examples of full module names
python-intel/2.7/2018/2.7.14-2018.1.023-1
python-intel/3.5/2017/3.5.3-2017.3.052-1
python-intel/3.6/2018/3.6.3-2018.1.023-1
python-anaconda/2.7/4/2.7.11-4.0.0-2
Recommended module abbreviations
python-intel/2.7
python-intel/3.6
python-anaconda/2.7 (deprecated)
python-anaconda/3.5 (deprecated)
When to use
Use one of these modules for general scientific computing projects, or as ready-made base for installing your own packages when one of the #Python standard distributions proved insufficient.


Python-based software suites with a broad sope became popular in recent years. They typically contain:

  • A Python interpreter.
  • The Conda/pip package management systems (see more context).
  • A wide-ranging set of Python-native packages, extensions, and add-on executables, often including:
    • NumPy/SciPy – Ecosystem for scientific computing in Python
    • Idle – Python Integrated DeveLopment Environment
    • iPython and Jupyter – Interactive computation system
    • Cython – Compiler for a superset of Python and C/C++
  • All libraries and supporting binaries required by the above, e.g.:
    • BLAS/LAPACK – Linear algebra libraries
    • Tcl – Interpreter for the Tool Command Language
    • MPI – Runtime environment for parallel applications using Message Passing Interface, including the mpirun or mpiexec launch commands.

The last item can be problematic because libraries and supporting binaries included in a Python suite can interfere with other software modules on Carbon. For access by users, HPC software typically leverages conventional Unix environment variables like PATH, LD_LIBRARY_PATH, and PYTHONPATH. These environment variables are interpreted front-to-back, which implies priorities and can easily lead to resources from one module overshadowing those from another when they use the same name.

In particular, a Python suite with an included MPI runtime makes parallel computing easily accessible under Python, but this means a "mere python module" can clash with other MPI implementations used on Carbon. Loading a module for a Python suite that contains MPI and a separate MPI module at the same time can be done but then requires more detail for the MPI launcher in job scripts. Without that, non-obvious failures will result for either bundled parallel applications within the Python suite or (worse) unrelated compiled applications on Carbon.

Caution: Avoid loading modules at the same time for both a Python suite with an included MPI and a separate MPI implementation, especially in your dot-files (.bashrc, .modules-el*).

Various workarounds for MPI clashes exist:

  • Altering the module load order.
  • Launching the non-python MPI variant with qualified paths like $FOO_HOME/bin/mpirun.
  • If MPI is not needed on the Python side, use a custom virtual environment that has MPI excluded (described in the next section).

Python suites with customized package set

Contains
Python virtual environment with a package selection customized for Carbon.
Module nomenclature
python-env-distributor/pyMajor.pyMinor/distMajor/pyVersion[…]-moduleBuild
Python with a default selection of packages deemed generally useful on Carbon.
python-env-distributor-purpose/pyMajor.pyMinor/distMajor/pyVersion[…]-moduleBuild
Same, with alternative package selections for specific uses, created on request.
Examples of full module names
python-env-intel/2.7/2018/2.7-01
python-env-intel/3.5/2017/3.5-01
python-env-intel/3.6/2018/3.6-01
python-env-anaconda/2.7/4/2.7.11-09 (deprecated)
Recommended module abbreviations
python-env-intel/2.7
python-env-intel/3.6
python-env-anaconda/2.7 (deprecated)
python-env-anaconda/3.5 (deprecated)
When to use
Use one of these modules projects in domains related to nanoscience on Carbon, like atomistic or photonic modeling.


Packages from nanoscience-related domains are typically not included in vendor-distributed suites. On Carbon, a set of virtual environments is routinely provided for each vendor distribution, with a growing list of such packages added on and ready to use.

To request a custom environment or that packages be added to an existing environment, please contact [email protected].