Environment
This section gives a brief overview of commonly used Slurm commands with a selected set of command-line options. It details configuration options for a compute jobs working-directory, I/O redriection and mail notifactions among others.
Following is a list of commands available to users to interact with the workload management system:
Command | Description |
---|---|
sinfo |
Information on cluster partitions and nodes |
squeue |
Overview of jobs and their states |
scontrol |
View configuration, states, (un-)suspending jobs |
srun |
Run executable as job (blocks until the job is scheduled) |
salloc |
Submit an interactive job. (blocks until prompt appears) |
sbatch |
Submit a job script for batch scheduling |
scancel |
Cancels a running or pending job |
Help & Verbosity
Support documentation for all commands above is available with the command options…
- …
--usage
for a brief overview, for examplesinfo --usage
- …
--help
lists all options, for examplesrun --help
Complete documentation for all command is available in the corresponding man-pages 1, for example man sbatch
During execution of a command the option -v
enables logging, and -vvv
increases verbosity of the log output.
Arguments & Options
In general it is recommended to use the no option with double dash (--
) to separate command options to SLURM from arguments and options corresponding to the application launched by the user:
┌─── SLURM resource allocation with options
│
┌──────────────────┤
sbatch -p debug -N 1 -- root.exe -b /path/to/macro.C
├──────────────────────────┘
│ └─── User application with options
Environment Variables
Most of the options to the salloc
, srun
, and sbatch
commands can be set by so called input environment 2 variables. These variables are read during command execution; thus, they are an alternative to command-line options and meta-commands.
Following example uses the SALLOC_PARTITION
environment variable instead of the command option --partition
(Note that you can use the export
3 command to mark an environment variable to be visible with any newly forked child processes):
» SALLOC_PARTITION=debug salloc env | grep SLURM
salloc: Granted job allocation 118
SLURM_WORKING_DIR=/lustre/hpc/vpenso
SLURM_JOB_NODELIST=lxbk0595
SLURM_JOB_PARTITION=debug
As you can see in the example above, Slurm defines another set of output environment variables available during runtime to be consumed by the user applications. These include detailed information about the allocated resource, for example SLURM_CPUS_ON_NODE
providing the number of usable CPUs for the application on a particular node. Make sure to read the manual pages for a complete list of input and output environment variables.
Option | Description |
---|---|
--export |
Identifies which environment variables from the submission environment are propagated to the launched application. |
Recommended use for the --export
option to set environment variables:
sbatch --export=ALL,VAR=val #...
- By default (option not set) all environment variables will be propagated.
--export=<environment_variables>
will only export the variables defined by the option. Note that this prevents the required propagation of environment variables used to launch containers.--export=ALL,<environment_variables>
. If “ALL” is specified, then all user environment variables will be loaded and will take precedence over any explicitly given environment variables.
Meta Commands
Meta commands allow users to set Slurm configuration options in the header of a batch-script. These commands are ignored by the shell and interpreted as comments:
- All meta-commands need to be placed at the top before any other shell command.
- Prefix a meta-command with
#SBATCH
followed by a Slurm option with argument. - Multiple options require one meta-command per option.
Consider following example:
sleep.sh
#!/bin/bash
#SBATCH -J sleep
#SBATCH -o sleep-%j.log
function puts() {
echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@"
}
puts START $(whoami)@$(hostname):$(pwd) \
$SLURM_CLUSTER_NAME:$SLURM_JOB_PARTITION\
$SLURM_JOB_NAME-$SLURM_JOB_ID \
$SLURM_JOB_CPUS_PER_NODE:$SLURM_TASKS_PER_NODE $SLURM_MEM_PER_CPU
sec=${1:-30}
puts Sleep for $sec seconds
sleep $sec
# Propagate last signal to the system
state=$?
puts EXIT $state
exit $state
Above meta-commands set the job name sleep
and define the file name for I/O redirection:
# submit the job script above
export SLURM_WORKING_DIR=$LUSTRE_HOME
sbatch -- $LUSTRE_HOME/sleep.sh
# list only jobs with the sleep label
squeue -n sleep
# check the path to the output stream
scontrol show job $(squeue -ho %A -n sleep) | grep StdOut
Slurm does not interpolate shell environment variables in meta-commands. It is not possible to use #SBATCH -D $LUSTRE_HOME
to configure the working directory. Make use of a Slurm environment variable instead, for example SLURM_WORKING_DIR
above.
Working Directory
The working directory 4 is an absolute path in the file system directory tree used to execute your application program during the runtime of your job. Typically you want to have a working directory on the cluster shared storage. Jobs inherit the submit directory as working directory by default, if it is not explicitly specified.
Your network home-directory is only available on the submit node and not accessible from any compute node.
# change into a directory on the cluster file system
» cd /lustre/$(id -g -n)/$USER && pwd
/lustre/hpc/vpenso
# execute a job and print working directory
» srun -- pwd
/lustre/hpc/vpenso
Not all working groups follow this naming convention for the directory structure. Adjust your path accordingly. Execute a job in a non-existing directory falls back to /tmp/
:
# executing a job in the users home-directory
» srun -- pwd
slurmstepd: error: couldn't chdir to `/u/vpenso': […]
/tmp
Specify a working directory by command option:
Option | Description |
---|---|
-D <path> ,--chdir=<path> |
Sets the working directory for jobs, otherwise the default is the current working directory. |
» export LUSTRE_HOME=/lustre/$(id -g -n)/$USER
/lustre/hpc/vpenso
# ...determine your current working directory with the `pwd` command
» srun -D $LUSTRE_HOME -- pwd
/lustre/hpc/vpenso
Variable | Description |
---|---|
SLURM_SUBMIT_DIR |
Directory from which a job has been submitted. |
SLURM_WORKING_DIR |
Directory where the job will be executed. |
Within your job runtime environment the two Slurm output environment variables above are available to read the directory configuration:
# set the working directory with a Slurm input
# environment variable
» export SLURM_WORKING_DIR=$LUSTRE_HOME
# the job is submitted from the user home-directory
# not available on the compute nodes
» srun --pty \
-- /bin/bash -c 'env | grep "SLURM_[A-Z]*_DIR"'
SLURM_SUBMIT_DIR=/u/vpenso
SLURM_WORKING_DIR=/lustre/hpc/vpenso
I/O Redirection
Option | Description |
---|---|
-o ,--output= |
Relative or absolute path to stdout. |
-e ,--error= |
Relative or absolute path to stderr, send to stdout by default. |
-i ,--input= |
Relative or absolute path to stdin. |
[…] --output='%j.log' --error='%j.err.log' […]
The path format string may contain following specifiers:
Option | Description |
---|---|
%j or %J |
Current jobid or jobid.stepid . |
%N , %n |
Hostname, and node identifier (numerical) relative to current job. |
%A , %a |
Array master job allocation number, array ID (index) number. |
%t |
Task identifier (rank) relative to current job. |
Mail Notification
Receive notifications by mail
Option | Description |
---|---|
--mail-type= |
Notify at a selection of following events: BEGIN,END,FAIL,REQUEUE,ALL . |
--mail-user= |
Set the receivers email address. |
[…] --mail-type=FAIL --mail-type=REQUEUE --mail-user=v.penso@gsi.de […]
Job Name
A job name is part of the meta data associated to a job. It allows uses to identify a compute job by an information different to the unique job identification number (JOBID). It is very convenient to use job names to distinguish between different applications running at the same time.
Option | Description |
---|---|
-J <name> ,--job-name=<name> |
Name of the job (24 characters maximum). |
Multiple jobs can share a common name. The job name can be set by input environment variable:
Variable | Description |
---|---|
SLURM_JOB_NAME |
Interpreted by the srun command. |
SBATCH_JOB_NAME |
Interpreted by the sbatch command. |
In addition each compute job can have a comment string attached to further distinguish individual runs of a repeated execution of the same application with a different configuration.
Option | Description |
---|---|
--comment '<string>' |
Attach a comment to the job. |
The squeue
command supports the -n
option to limit the list to job allocation of a given name, for example:
# print job ID, name, and comment
» squeue -o '%25A %10j %10k' -n sleep
JOBID NAME COMMENT
1908264 sleep run-3
1908262 sleep run-1
1908263 sleep run-2
Footnotes
Man Page, Wikipedia
https://en.wikipedia.org/wiki/Man_page↩︎Environment, Bash Reference Manual, GNU Project
https://www.gnu.org/software/bash/manual/bash.html#Environment↩︎Bourne Shell Builtins, Bash Reference Manual, GNU Project
https://www.gnu.org/software/bash/manual/bash.html#Bourne-Shell-Builtins↩︎Working Directory, Wikipedia
https://en.wikipedia.org/wiki/Working_directory↩︎