This article describes three options for users to configure a custom Python environment to be used with the compute cluster. These options are:
Using a containerized Python is recommended if no dependencies other then the Python interpreter and its related ecosystem are required. Note that custom containers do not provide any software available from the virtual application environments. In case users are interested in more details then described in this article, we recommend corresponding material from HPC Carpentry for Python 3 and the Python HPC Community 4.
Apptainer Container
Depending on the target hardware different container images are recommended:
- For CPU workloads JupyterLab 5 containers are the best choice in terms of options available, e.g. frameworks like TensorFlow, Scipy, Spark or languages like Julia or R. Please refer to the official documentation 6 in order to see which container may be better suited for your application.
- For GPU workloads it is preferable to use container images provided by either the AMD Infinity Hub 7 or the NVIDIA Registry 8 for the respective hardware.
A simple array example which uses NumPy:
#!/usr/bin/env python
import numpy as np
= np.array([
array 3, 7, 1],
[10, 3, 2],
[5, 6, 7]
[
])print(array)
print()
# Sort the whole array
print(np.sort(array, axis=None))
# Sort along each row
print(np.sort(array, axis=1))
# Sort along each column
print(np.sort(array, axis=0))
Pull the JupyterLab data-science
container from Docker Hub:
apptainer pull docker://jupyter/datascience-notebook:latest
Test the container interactively with the previous script:
apptainer exec datascience-notebook.sif numpy_array.py
Refer to the container section in the Virgo User Guide for instructions on how to submit an container as batch job on the compute cluster.
Conda-Forge
Alternative to a container it is possible to install a standalone Python installation using Conda-Forge, a community-driven installer for Python. Note that Anaconda and Miniconda should not be used on the cluster infrastructure due its licence!
In the end of 2020 Anaconda Inc. updated its licence terms of service 9. The change is targeted primarily thwarts commercial companies. Despite being a scientific institute GSI is not qualified for a non-commercial license unfortunately. As a consequence to that:
Use of the default Anaconda package channels is not allowed!
The terms of service change does not apply to Conda-Forge, nor to other channels hosted on anaconda.org
10. It applies only to the default channel and other software hosted on repo.anaconda.com
11. We kindly ask and strongly recommend to use Conda-Forge for your Python based projects.
Installation
Download the latest version of Miniforge 12 and perform the installation, as exemplified below. If a prefix is not specified with the APP_DIR
environment variable, then Miniforge will be installed in your home-directory by default:
# Download the installer...
curl -OL https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
# ...set a prefix environment variable
APP_DIR="~/opt/conda"
# ...run the install passing the prefix as option
/bin/sh Miniforge3-Linux-x86_64.sh -b -p ${APP_DIR}
# ...load the installation into your shell environment
source ${APP_DIR}/etc/profile.d/conda.sh
The source
command will add the conda
executable to the $PATH
environment variable. Use conda
command to manage the environment…
# ...modifies your shell prompt and show (base) as prefix
conda activate
# ...print information about Conda
conda info
# ...remove Conad from your shell environment
conda deactivate
Usage
Search and install Python packages…
# ...search for packages
conda search $package_name
# ...install a package with a specified version...
conda install pytorch=2.0.0=cpu_py39he4d1dc0_0
Work with environments…
# create a new environment with basic options
conda create -n $env_name
# switch to the newly created environment
conda activate $env_name
# install packages within the envrironment
conda install $package_name
# list installed packages
conda list
# remove an environment
conda env remove -n $env_name
Virtual Environments
Python virtual environments are useful if you do not want to maintain a complete Python installation yourself, but rather a small subset of Python packages. The virtual application environments provide Python as loadable Spack package:
# Load the Python interpreter...
spack load python target=$(spack arch -t)
# ...and create a virtual environment
python -m venv myEnv
- Initialize the environment:
source venv/bin/activate
(the environment can be deactivated simply with:deactivate
) - Installation of additional modules can be done simply with
pip install
.
Note that the installation directory (if not specified otherwise) will always be $HOME/myEnv
, so your Linux home directory (/u/$USER
) will be used as default installation path which poses a problem when a job is launched via Slurm since the home directory is not available on the batch farm.
If you decide to use a directory on Lustre to deploy Python virtual environments, we will kindly ask you to avoid having dozens of “big” frameworks installed (e.g. TensorFlow, PyTorch, etc.) at the same time. This would affects the performance of the shared storage for all users. Please read the Lustre best practices section in the User Guide for more details. Consider to use containers as described above instead.
Footnotes
Apptainer Website
https://apptainer.org/↩︎Conda Forge Website
https://conda-forge.org↩︎HPC Carpentry for Python
https://www.hpc-carpentry.org/hpc-python↩︎Python for HPC: Community Materials
https://betterscientificsoftware.github.io/python-for-hpc/python-for-hpc↩︎JupyterLab Docker Stack, Github
https://github.com/jupyter/docker-stacks↩︎Selecting an Image, JupyterLab Docker Stack Documentation
https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html↩︎AMD Infinity Hub
https://www.amd.com/en/technologies/infinity-hub↩︎NVIDIA Container Registry
https://catalog.ngc.nvidia.com/collections↩︎Package Distribution and the anaconda.com Terms of Service, Conda-Forge Blog
https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos↩︎Anaconda Inc. Website
https://anaconda.org↩︎Anaconda Packages, Anaconda Inc.
https://repo.anaconda.com↩︎Miniforge, GitHub
https://github.com/conda-forge/miniforge↩︎