Environment

Modified

November 8, 2023

Abstract

This section gives a brief overview of commonly used Slurm commands with a selected set of command-line options. It details configuration options for a compute jobs working-directory, I/O redriection and mail notifactions among others.

Following is a list of commands available to users to interact with the workload management system:

Command Description
sinfo Information on cluster partitions and nodes
squeue Overview of jobs and their states
scontrol View configuration, states, (un-)suspending jobs
srun Run executable as job (blocks until the job is scheduled)
salloc Submit an interactive job. (blocks until prompt appears)
sbatch Submit a job script for batch scheduling
scancel Cancels a running or pending job

Help & Verbosity

Support documentation for all commands above is available with the command options…

  • --usage for a brief overview, for example sinfo --usage
  • --help lists all options, for example srun --help

Complete documentation for all command is available in the corresponding man-pages 1, for example man sbatch During execution of a command the option -v enables logging, and -vvv increases verbosity of the log output.

Arguments & Options

In general it is recommended to use the no option with double dash (--) to separate command options to SLURM from arguments and options corresponding to the application launched by the user:

                   ┌─── SLURM resource allocation with options

┌──────────────────┤
sbatch -p debug -N 1 -- root.exe -b /path/to/macro.C
                        ├──────────────────────────┘

                        └─── User application with options

Environment Variables

Most of the options to the salloc, srun, and sbatch commands can be set by so called input environment 2 variables. These variables are read during command execution; thus, they are an alternative to command-line options and [meta-commands][#meta-commands]. Following example uses the SALLOC_PARTITION environment variable instead of the command option --partition (Note that you can use the export 3 command to mark an environment variable to be visible with any newly forked child processes):

» SALLOC_PARTITION=debug salloc env | grep SLURM
salloc: Granted job allocation 118
SLURM_WORKING_DIR=/lustre/hpc/vpenso
SLURM_JOB_NODELIST=lxbk0595
SLURM_JOB_PARTITION=debug

As you can see in the example above, Slurm defines another set of output environment variables available during runtime to be consumed by the user applications. These include detailed information about the allocated resource, for example SLURM_CPUS_ON_NODE providing the number of usable CPUs for the application on a particular node. Make sure to read the manual pages for a complete list of input and output environment variables.

Meta Commands

Meta commands are a convenient way to add Slurm configuration options in a reproducible way to compute job submitted with the sbatch command. They allow to add configuration options into the header of a batch script. Slurm will interpret comments with the prefix #SBATCH as command-line options:

#!/bin/bash

#SBATCH -J sleep
#SBATCH -o sleep-%j.log

function puts() {
  echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@"
}

puts START $(whoami)@$(hostname):$(pwd) \
        $SLURM_CLUSTER_NAME:$SLURM_JOB_PARTITION\
        $SLURM_JOB_NAME-$SLURM_JOB_ID \
        $SLURM_JOB_CPUS_PER_NODE:$SLURM_TASKS_PER_NODE $SLURM_MEM_PER_CPU

sec=${1:-30}
puts Sleep for $sec seconds
sleep $sec

# Propagate last signal to the system
state=$?
puts EXIT $state
exit $state

The example above will set the job name sleep and define the path for I/O redirection:

# submit the job script above
export SLURM_WORKING_DIR=$LUSTRE_HOME
sbatch -- $LUSTRE_HOME/sleep.sh

# list only jobs with the sleep label
squeue -n sleep

# check the path to the output stream
scontrol show job $(squeue -ho %A -n sleep) | grep StdOut

Working Directory

The working directory 4 is an absolute path in the file system directory tree used to execute your application program during the runtime of your job. Typically you want to have a working directory on the cluster shared storage. Jobs inherit the submit directory as working directory by default, if it is not explicitly specified.

Your network home-directory is only available on the submit node and not accessible from any compute node.

# change into a directory on the cluster file system
» cd /lustre/$(id -g -n)/$USER && pwd
/lustre/hpc/vpenso

# execute a job and print working directory
» srun -- pwd
/lustre/hpc/vpenso

Not all working groups follow this naming convention for the directory structure. Adjust your path accordingly. Execute a job in a non-existing directory falls back to /tmp/:

# executing a job in the users home-directory
» srun -- pwd
slurmstepd: error: couldn't chdir to `/u/vpenso': []
/tmp

Specify a working directory by command option:

Option Description
-D <path>,
--chdir=<path>
Sets the working directory for jobs,
otherwise the default is the current working directory.
» export LUSTRE_HOME=/lustre/$(id -g -n)/$USER
/lustre/hpc/vpenso

# ...determine your current working directory with the `pwd` command
» srun -D $LUSTRE_HOME -- pwd
/lustre/hpc/vpenso
Variable Description
SLURM_SUBMIT_DIR Directory from which a job has been submitted.
SLURM_WORKING_DIR Directory where the job will be executed.

Within your job runtime environment the two Slurm output environment variables above are available to read the directory configuration:

# set the working directory with a Slurm input
# environment variable
» export SLURM_WORKING_DIR=$LUSTRE_HOME

# the job is submitted from the user home-directory 
# not available on the compute nodes
» srun --pty \
        -- /bin/bash -c 'env | grep "SLURM_[A-Z]*_DIR"'
SLURM_SUBMIT_DIR=/u/vpenso
SLURM_WORKING_DIR=/lustre/hpc/vpenso

I/O Redirection

Option Description
-o,--output= Relative or absolute path to stdout.
-e,--error= Relative or absolute path to stderr, send to stdout by default.
-i,--input= Relative or absolute path to stdin.
[…] --output='%j.log' --error='%j.err.log' []

The path format string may contain following specifiers:

Option Description
%j or %J Current jobid or jobid.stepid.
%N, %n Hostname, and node identifier (numerical) relative to current job.
%A, %a Array master job allocation number, array ID (index) number.
%t Task identifier (rank) relative to current job.

Mail Notification

Receive notifications by mail

Option Description
--mail-type= Notify at a selection of following events: BEGIN,END,FAIL,REQUEUE,ALL.
--mail-user= Set the receivers email address.
[…] --mail-type=FAIL --mail-type=REQUEUE --mail-user=v.penso@gsi.de []

Job Name

A job name is part of the meta data associated to a job. It allows uses to identify a compute job by an information different to the unique job identification number (JOBID). It is very convenient to use job names to distinguish between different applications running at the same time.

Option Description
-J <name>,
--job-name=<name>
Name of the job (24 characters maximum).

Multiple jobs can share a common name. The job name can be set by input environment variable:

Variable Description
SLURM_JOB_NAME Interpreted by the srun command.
SBATCH_JOB_NAME Interpreted by the sbatch command.

In addition each compute job can have a comment string attached to further distinguish individual runs of a repeated execution of the same application with a different configuration.

Option Description
--comment '<string>' Attach a comment to the job.

The squeue command supports the -n option to limit the list to job allocation of a given name, for example:

# print job ID, name, and comment
» squeue -o '%25A %10j %10k' -n sleep
JOBID                     NAME       COMMENT
1908264                   sleep      run-3
1908262                   sleep      run-1
1908263                   sleep      run-2

Footnotes

  1. Man Page, Wikipedia
    https://en.wikipedia.org/wiki/Man_page↩︎

  2. Environment, Bash Reference Manual, GNU Project
    https://www.gnu.org/software/bash/manual/bash.html#Environment↩︎

  3. Bourne Shell Builtins, Bash Reference Manual, GNU Project
    https://www.gnu.org/software/bash/manual/bash.html#Bourne-Shell-Builtins↩︎

  4. Working Directory, Wikipedia
    https://en.wikipedia.org/wiki/Working_directory↩︎