Using Python on the Compute Cluster

Build your own Python environment tailored to your Application

Container
Authors

Matteo Dessalvi

Victor Penso

Published

October 13, 2023

Modified

November 8, 2023

Abstract

This article illustrates multiple methods to utilize Python on the compute cluster depending on the requirements of your application. The examples are very basic and only intended to give users a the overall picture.

Keywords

Apptainer, Python

This article describes three options for users to configure a custom Python environment to be used with the compute cluster. These options are:

  1. Use of an Apptainer 1 Container
  2. Use of Conda-Forge 2
  3. Use of a Python Virtual Environment

Using a containerized Python is recommended if no dependencies other then the Python interpreter and its related ecosystem are required. Note that custom containers do not provide any software available from the virtual application environments. In case users are interested in more details then described in this article, we recommend corresponding material from HPC Carpentry for Python 3 and the Python HPC Community 4.

Apptainer Container

Depending on the target hardware different container images are recommended:

  • For CPU workloads JupyterLab 5 containers are the best choice in terms of options available, e.g. frameworks like TensorFlow, Scipy, Spark or languages like Julia or R. Please refer to the official documentation 6 in order to see which container may be better suited for your application.
  • For GPU workloads it is preferable to use container images provided by either the AMD Infinity Hub 7 or the NVIDIA Registry 8 for the respective hardware.

A simple array example which uses NumPy:

#!/usr/bin/env python

import numpy as np

array = np.array([
    [3, 7, 1],
    [10, 3, 2],
    [5, 6, 7]
])
print(array)
print()

# Sort the whole array
print(np.sort(array, axis=None))

# Sort along each row
print(np.sort(array, axis=1))

# Sort along each column
print(np.sort(array, axis=0))

Pull the JupyterLab data-science container from Docker Hub:

apptainer pull docker://jupyter/datascience-notebook:latest

Test the container interactively with the previous script:

apptainer exec datascience-notebook.sif numpy_array.py

Refer to the container section in the Virgo User Guide for instructions on how to submit an container as batch job on the compute cluster.

Conda-Forge

Alternative to a container it is possible to install a standalone Python installation using Conda-Forge, a community-driven installer for Python. Note that Anaconda and Miniconda should not be used on the cluster infrastructure due its licence!

Anaconda Inc. Lizence Terms of Service

In the end of 2020 Anaconda Inc. updated its licence terms of service 9. The change is targeted primarily thwarts commercial companies. Despite being a scientific institute GSI is not qualified for a non-commercial license unfortunately. As a consequence to that:

Use of the default Anaconda package channels is not allowed!

The terms of service change does not apply to Conda-Forge, nor to other channels hosted on anaconda.org 10. It applies only to the default channel and other software hosted on repo.anaconda.com 11. We kindly ask and strongly recommend to use Conda-Forge for your Python based projects.

Installation

Download the latest version of Miniforge 12 and perform the installation, as exemplified below. If a prefix is not specified with the APP_DIR environment variable, then Miniforge will be installed in your home-directory by default:

# Download the installer...
curl -OL https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

# ...set a prefix environment variable
APP_DIR="~/opt/conda"
# ...run the install passing the prefix as option
/bin/sh Miniforge3-Linux-x86_64.sh -b -p ${APP_DIR}

# ...load the installation into your shell environment
source ${APP_DIR}/etc/profile.d/conda.sh

The source command will add the conda executable to the $PATH environment variable. Use conda command to manage the environment…

# ...modifies your shell prompt and show (base) as prefix
conda activate

# ...print information about Conda
conda info

# ...remove Conad from your shell environment
conda deactivate

Usage

Search and install Python packages…

# ...search for packages
conda search $package_name

# ...install a package with a specified version...
conda install pytorch=2.0.0=cpu_py39he4d1dc0_0

Work with environments…

# create a new environment with basic options
conda create -n $env_name

# switch to the newly created environment
conda activate $env_name

# install packages within the envrironment
conda install $package_name

# list installed packages
conda list

# remove an environment
conda env remove -n $env_name

Virtual Environments

Python virtual environments are useful if you do not want to maintain a complete Python installation yourself, but rather a small subset of Python packages. The virtual application environments provide Python as loadable Spack package:

# Load the Python interpreter...
spack load python target=$(spack arch -t)

# ...and create a virtual environment
python -m venv myEnv
  • Initialize the environment: source venv/bin/activate (the environment can be deactivated simply with: deactivate)
  • Installation of additional modules can be done simply with pip install.

Note that the installation directory (if not specified otherwise) will always be $HOME/myEnv, so your Linux home directory (/u/$USER) will be used as default installation path which poses a problem when a job is launched via Slurm since the home directory is not available on the batch farm.

If you decide to use a directory on Lustre to deploy Python virtual environments, we will kindly ask you to avoid having dozens of “big” frameworks installed (e.g. TensorFlow, PyTorch, etc.) at the same time. This would affects the performance of the shared storage for all users. Please read the Lustre best practices section in the User Guide for more details. Consider to use containers as described above instead.

Footnotes

  1. Apptainer Website
    https://apptainer.org/↩︎

  2. Conda Forge Website
    https://conda-forge.org↩︎

  3. HPC Carpentry for Python
    https://www.hpc-carpentry.org/hpc-python↩︎

  4. Python for HPC: Community Materials
    https://betterscientificsoftware.github.io/python-for-hpc/python-for-hpc↩︎

  5. JupyterLab Docker Stack, Github
    https://github.com/jupyter/docker-stacks↩︎

  6. Selecting an Image, JupyterLab Docker Stack Documentation
    https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html↩︎

  7. AMD Infinity Hub
    https://www.amd.com/en/technologies/infinity-hub↩︎

  8. NVIDIA Container Registry
    https://catalog.ngc.nvidia.com/collections↩︎

  9. Package Distribution and the anaconda.com Terms of Service, Conda-Forge Blog
    https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos↩︎

  10. Anaconda Inc. Website
    https://anaconda.org↩︎

  11. Anaconda Packages, Anaconda Inc.
    https://repo.anaconda.com↩︎

  12. Miniforge, GitHub
    https://github.com/conda-forge/miniforge↩︎