Environment

Modified

July 30, 2024

Abstract

This section gives a brief overview of commonly used Slurm commands with a selected set of command-line options. It details configuration options for a compute jobs working-directory, I/O redriection and mail notifactions among others.

Following is a list of commands available to users to interact with the workload management system:

Command	Description
`sinfo`	Information on cluster partitions and nodes
`squeue`	Overview of jobs and their states
`scontrol`	View configuration, states, (un-)suspending jobs
`srun`	Run executable as job (blocks until the job is scheduled)
`salloc`	Submit an interactive job. (blocks until prompt appears)
`sbatch`	Submit a job script for batch scheduling
`scancel`	Cancels a running or pending job

Help & Verbosity

Support documentation for all commands above is available with the command options…

…--usage for a brief overview, for example sinfo --usage
…--help lists all options, for example srun --help

Complete documentation for all command is available in the corresponding man-pages ¹, for example man sbatch During execution of a command the option -v enables logging, and -vvv increases verbosity of the log output.

Arguments & Options

In general it is recommended to use the no option with double dash (--) to separate command options to SLURM from arguments and options corresponding to the application launched by the user:

                   ┌─── SLURM resource allocation with options
                   │
┌──────────────────┤
sbatch -p debug -N 1 -- root.exe -b /path/to/macro.C
                        ├──────────────────────────┘
                        │
                        └─── User application with options

Environment Variables

Most of the options to the salloc, srun, and sbatch commands can be set by so called input environment ² variables. These variables are read during command execution; thus, they are an alternative to command-line options and meta-commands.

Following example uses the SALLOC_PARTITION environment variable instead of the command option --partition (Note that you can use the export ³ command to mark an environment variable to be visible with any newly forked child processes):

» SALLOC_PARTITION=debug salloc env | grep SLURM
salloc: Granted job allocation 118
SLURM_WORKING_DIR=/lustre/hpc/vpenso
SLURM_JOB_NODELIST=lxbk0595
SLURM_JOB_PARTITION=debug

As you can see in the example above, Slurm defines another set of output environment variables available during runtime to be consumed by the user applications. These include detailed information about the allocated resource, for example SLURM_CPUS_ON_NODE providing the number of usable CPUs for the application on a particular node. Make sure to read the manual pages for a complete list of input and output environment variables.

Option	Description
`--export`	Identifies which environment variables from the submission environment are propagated to the launched application.

Recommended use for the --export option to set environment variables:

sbatch --export=ALL,VAR=val #...

By default (option not set) all environment variables will be propagated.
--export=<environment_variables> will only export the variables defined by the option. Note that this prevents the required propagation of environment variables used to launch containers.
--export=ALL,<environment_variables>. If “ALL” is specified, then all user environment variables will be loaded and will take precedence over any explicitly given environment variables.

Meta Commands

Meta commands allow users to set Slurm configuration options in the header of a batch-script. These commands are ignored by the shell and interpreted as comments:

All meta-commands need to be placed at the top before any other shell command.
Prefix a meta-command with #SBATCH followed by a Slurm option with argument.
Multiple options require one meta-command per option.

Consider following example:

sleep.sh

#!/bin/bash
#SBATCH -J sleep
#SBATCH -o sleep-%j.log

function puts() {
  echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@"
}

puts START $(whoami)@$(hostname):$(pwd) \
        $SLURM_CLUSTER_NAME:$SLURM_JOB_PARTITION\
        $SLURM_JOB_NAME-$SLURM_JOB_ID \
        $SLURM_JOB_CPUS_PER_NODE:$SLURM_TASKS_PER_NODE $SLURM_MEM_PER_CPU

sec=${1:-30}
puts Sleep for $sec seconds
sleep $sec

# Propagate last signal to the system
state=$?
puts EXIT $state
exit $state

Above meta-commands set the job name sleep and define the file name for I/O redirection:

# submit the job script above
export SLURM_WORKING_DIR=$LUSTRE_HOME
sbatch -- $LUSTRE_HOME/sleep.sh

# list only jobs with the sleep label
squeue -n sleep

# check the path to the output stream
scontrol show job $(squeue -ho %A -n sleep) | grep StdOut

Slurm does not interpolate shell environment variables in meta-commands. It is not possible to use #SBATCH -D $LUSTRE_HOME to configure the working directory. Make use of a Slurm environment variable instead, for example SLURM_WORKING_DIR above.

Working Directory

The working directory ⁴ is an absolute path in the file system directory tree used to execute your application program during the runtime of your job. Typically you want to have a working directory on the cluster shared storage. Jobs inherit the submit directory as working directory by default, if it is not explicitly specified.

Your network home-directory is only available on the submit node and not accessible from any compute node.

# change into a directory on the cluster file system
» cd /lustre/$(id -g -n)/$USER && pwd
/lustre/hpc/vpenso

# execute a job and print working directory
» srun -- pwd
/lustre/hpc/vpenso

Not all working groups follow this naming convention for the directory structure. Adjust your path accordingly. Execute a job in a non-existing directory falls back to /tmp/:

# executing a job in the users home-directory
» srun -- pwd
slurmstepd: error: couldn't chdir to `/u/vpenso': […]
/tmp

Specify a working directory by command option:

Option	Description
`-D <path>`, `--chdir=<path>`	Sets the working directory for jobs, otherwise the default is the current working directory.

» export LUSTRE_HOME=/lustre/$(id -g -n)/$USER
/lustre/hpc/vpenso

# ...determine your current working directory with the `pwd` command
» srun -D $LUSTRE_HOME -- pwd
/lustre/hpc/vpenso

Variable	Description
`SLURM_SUBMIT_DIR`	Directory from which a job has been submitted.
`SLURM_WORKING_DIR`	Directory where the job will be executed.

Within your job runtime environment the two Slurm output environment variables above are available to read the directory configuration:

# set the working directory with a Slurm input
# environment variable
» export SLURM_WORKING_DIR=$LUSTRE_HOME

# the job is submitted from the user home-directory 
# not available on the compute nodes
» srun --pty \
        -- /bin/bash -c 'env | grep "SLURM_[A-Z]*_DIR"'
SLURM_SUBMIT_DIR=/u/vpenso
SLURM_WORKING_DIR=/lustre/hpc/vpenso

I/O Redirection

Option	Description
`-o`,`--output=`	Relative or absolute path to stdout.
`-e`,`--error=`	Relative or absolute path to stderr, send to stdout by default.
`-i`,`--input=`	Relative or absolute path to stdin.

[…] --output='%j.log' --error='%j.err.log' […]

The path format string may contain following specifiers:

Option	Description
`%j` or `%J`	Current `jobid` or `jobid.stepid`.
`%N`, `%n`	Hostname, and node identifier (numerical) relative to current job.
`%A`, `%a`	Array master job allocation number, array ID (index) number.
`%t`	Task identifier (rank) relative to current job.

Mail Notification

Receive notifications by mail

Option	Description
`--mail-type=`	Notify at a selection of following events: `BEGIN,END,FAIL,REQUEUE,ALL`.
`--mail-user=`	Set the receivers email address.

[…] --mail-type=FAIL --mail-type=REQUEUE --mail-user=v.penso@gsi.de […]

Job Name

A job name is part of the meta data associated to a job. It allows uses to identify a compute job by an information different to the unique job identification number (JOBID). It is very convenient to use job names to distinguish between different applications running at the same time.

Option	Description
`-J <name>`, `--job-name=<name>`	Name of the job (24 characters maximum).

Multiple jobs can share a common name. The job name can be set by input environment variable:

Variable	Description
`SLURM_JOB_NAME`	Interpreted by the `srun` command.
`SBATCH_JOB_NAME`	Interpreted by the `sbatch` command.

In addition each compute job can have a comment string attached to further distinguish individual runs of a repeated execution of the same application with a different configuration.

Option	Description
`--comment '<string>'`	Attach a comment to the job.

The squeue command supports the -n option to limit the list to job allocation of a given name, for example:

# print job ID, name, and comment
» squeue -o '%25A %10j %10k' -n sleep
JOBID                     NAME       COMMENT
1908264                   sleep      run-3
1908262                   sleep      run-1
1908263                   sleep      run-2

Footnotes

Man Page, Wikipedia
https://en.wikipedia.org/wiki/Man_page ↩︎
Environment, Bash Reference Manual, GNU Project
https://www.gnu.org/software/bash/manual/bash.html#Environment ↩︎
Bourne Shell Builtins, Bash Reference Manual, GNU Project
https://www.gnu.org/software/bash/manual/bash.html#Bourne-Shell-Builtins ↩︎
Working Directory, Wikipedia
https://en.wikipedia.org/wiki/Working_directory ↩︎

--- title: Environment date-modified: 2024/07/30 abstract: > This section gives a brief overview of commonly used Slurm commands with a selected set of command-line options. It details configuration options for a compute jobs working-directory, I/O redriection and mail notifactions among others. --- Following is a list of commands available to users to interact with the workload management system: Command | Description -----------|------------------- `sinfo` | Information on cluster partitions and nodes `squeue` | Overview of jobs and their states `scontrol` | View configuration, states, (un-)suspending jobs `srun` | Run executable as job (blocks until the job is scheduled) `salloc` | Submit an interactive job. (blocks until prompt appears) `sbatch` | Submit a job script for batch scheduling `scancel` | Cancels a running or pending job ### Help & Verbosity Support documentation for all commands above is available with the command options... - ...`--usage` for a brief overview, for example `sinfo --usage` - ...**`--help` lists all options**, for example `srun --help` [^man]: Man Page, Wikipedia <https://en.wikipedia.org/wiki/Man_page> Complete documentation for all command is available in the corresponding man-pages [^man], for example `man sbatch` During execution of a command the option `-v` enables logging, and `-vvv` increases verbosity of the log output. ### Arguments & Options In general it is **recommended to use the no option with double dash (`--`)** to separate command options to SLURM from arguments and options corresponding to the application launched by the user: ```txt ┌─── SLURM resource allocation with options │ ┌──────────────────┤ sbatch -p debug -N 1 -- root.exe -b /path/to/macro.C ├──────────────────────────┘ │ └─── User application with options ``` ### Environment Variables Most of the options to the `salloc`, `srun`, and `sbatch` commands can be set by so called **input environment** [^XNesS] variables. These variables are read during command execution; thus, they are an alternative to command-line options and [meta-commands](#meta-commands). Following example uses the `SALLOC_PARTITION` environment variable instead of the command option `--partition` (Note that you can use the `export` [^MTrQ8] command to mark an environment variable to be visible with any newly forked child processes): [^XNesS]: Environment, Bash Reference Manual, GNU Project <https://www.gnu.org/software/bash/manual/bash.html#Environment> [^MTrQ8]: Bourne Shell Builtins, Bash Reference Manual, GNU Project <https://www.gnu.org/software/bash/manual/bash.html#Bourne-Shell-Builtins> ```bash » SALLOC_PARTITION=debug salloc env | grep SLURM salloc: Granted job allocation 118 SLURM_WORKING_DIR=/lustre/hpc/vpenso SLURM_JOB_NODELIST=lxbk0595 SLURM_JOB_PARTITION=debug ``` As you can see in the example above, Slurm defines another set of **output environment** variables available during runtime to be consumed by the user applications. These include detailed information about the allocated resource, for example `SLURM_CPUS_ON_NODE` providing the number of usable CPUs for the application on a particular node. Make sure to read the manual pages for a complete list of input and output environment variables. Option | Description -------|------------ `--export` | Identifies which environment variables from the submission environment are propagated to the launched application. Recommended use for the `--export` option to set environment variables: ```bash sbatch --export=ALL,VAR=val #... ``` - By default (option not set) all environment variables will be propagated. - `--export=<environment_variables>` will only export the variables defined by the option. Note that this prevents the required propagation of [environment variables used to launch containers][wAyU0]. - `--export=ALL,<environment_variables>`. If "ALL" is specified, then all user environment variables will be loaded and will take precedence over any explicitly given environment variables. [wAyU0]: ../containers/execution.html#environment-variable ### Meta Commands Meta commands allow users to set Slurm configuration options in the header of a batch-script. These commands are ignored by the shell and interpreted as comments: - All meta-commands need to be **placed at the top before any other shell command**. - **Prefix a meta-command with `#SBATCH`** followed by a Slurm option with argument. - Multiple options require one meta-command per option. Consider following example: ```{.bash filename="sleep.sh"} #!/bin/bash #SBATCH -J sleep #SBATCH -o sleep-%j.log function puts() { echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@" } puts START $(whoami)@$(hostname):$(pwd) \ $SLURM_CLUSTER_NAME:$SLURM_JOB_PARTITION\ $SLURM_JOB_NAME-$SLURM_JOB_ID \ $SLURM_JOB_CPUS_PER_NODE:$SLURM_TASKS_PER_NODE $SLURM_MEM_PER_CPU sec=${1:-30} puts Sleep for $sec seconds sleep $sec # Propagate last signal to the system state=$? puts EXIT $state exit $state ``` Above meta-commands set the [job name](#job-name) `sleep` and define the file name for [I/O redirection](#io-redirection): ```bash # submit the job script above export SLURM_WORKING_DIR=$LUSTRE_HOME sbatch -- $LUSTRE_HOME/sleep.sh # list only jobs with the sleep label squeue -n sleep # check the path to the output stream scontrol show job $(squeue -ho %A -n sleep) | grep StdOut ``` **Slurm does not interpolate shell environment variables in meta-commands**. It is not possible to use `#SBATCH -D $LUSTRE_HOME` to configure the working directory. Make use of a Slurm environment variable instead, for example `SLURM_WORKING_DIR` above. ### Working Directory The working directory [^joBqe] is an **absolute path** in the file system directory tree used to execute your application program during the runtime of your job. Typically you want to have a working directory on the cluster [shared storage][mkUmU]. Jobs **inherit the submit directory** as working directory by default, if it is not explicitly specified. [mkUmU]: ../user-guide/storage.html [^joBqe]: Working Directory, Wikipedia <https://en.wikipedia.org/wiki/Working_directory> ::: {.callout-warning appearance="simple"} Your network [home-directory][OHXH9] is only available on the submit node and not accessible from any compute node. [OHXH9]: ../storage.html#home-directory ::: ```bash # change into a directory on the cluster file system » cd /lustre/$(id -g -n)/$USER && pwd /lustre/hpc/vpenso # execute a job and print working directory » srun -- pwd /lustre/hpc/vpenso ``` Not all working groups follow this naming convention for the directory structure. Adjust your path accordingly. Execute a job in a **non-existing directory** falls back to `/tmp/`: ```sh # executing a job in the users home-directory » srun -- pwd slurmstepd: error: couldn't chdir to `/u/vpenso': […] /tmp ``` Specify a working directory by command option: Option | Description -----------------------------------|-------------------------------- `-D <path>`,<br/> `--chdir=<path>` | Sets the working directory for jobs,<br/> otherwise the default is the current working directory. ```bash » export LUSTRE_HOME=/lustre/$(id -g -n)/$USER /lustre/hpc/vpenso # ...determine your current working directory with the `pwd` command » srun -D $LUSTRE_HOME -- pwd /lustre/hpc/vpenso ``` Variable | Description ------------------------|--------------------------------------------------- `SLURM_SUBMIT_DIR` | Directory from which a job has been submitted. `SLURM_WORKING_DIR` | Directory where the job will be executed. Within your job runtime environment the two Slurm output environment variables above are available to read the directory configuration: ```bash # set the working directory with a Slurm input # environment variable » export SLURM_WORKING_DIR=$LUSTRE_HOME # the job is submitted from the user home-directory # not available on the compute nodes » srun --pty \ -- /bin/bash -c 'env | grep "SLURM_[A-Z]*_DIR"' SLURM_SUBMIT_DIR=/u/vpenso SLURM_WORKING_DIR=/lustre/hpc/vpenso ``` ### I/O Redirection Option | Description ------------------|------------- `-o`,`--output=` | Relative or absolute path to **stdout**. `-e`,`--error=` | Relative or absolute path to **stderr**, send to _stdout_ by default. `-i`,`--input=` | Relative or absolute path to **stdin**. ```sh […] --output='%j.log' --error='%j.err.log' […] ``` The _path_ format string may contain following specifiers: Option | Description ------------------------|------------- `%j` or `%J` | Current `jobid` or `jobid.stepid`. `%N`, `%n` | Hostname, and node identifier (numerical) relative to current job. `%A`, `%a` | Array master job allocation number, array ID (index) number. `%t` | Task identifier (rank) relative to current job. ### Mail Notification Receive **notifications** by mail Option | Description ----------------|------------- `--mail-type=` | Notify at a selection of following events: `BEGIN,END,FAIL,REQUEUE,ALL`. `--mail-user=` | Set the receivers email address. ```sh […] --mail-type=FAIL --mail-type=REQUEUE --mail-user=v.penso@gsi.de […] ``` ### Job Name A job name is part of the meta data associated to a job. It allows uses to identify a compute job by an information different to the unique job identification number (JOBID). It is very convenient to use job names to distinguish between different applications running at the same time. Option | Description ---------------------------------------|------------- `-J <name>`,<br/> `--job-name=<name>` | Name of the job (24 characters maximum). **Multiple jobs can share a common name.** The job name can be set by input environment variable: Variable | Description ------------------------|-------------------------------- `SLURM_JOB_NAME` | Interpreted by the `srun` command. `SBATCH_JOB_NAME` | Interpreted by the `sbatch` command. In addition each compute job can have a comment string attached to further distinguish individual runs of a repeated execution of the same application with a different configuration. Option | Description ------------------------|------------- `--comment '<string>'` | Attach a comment to the job. The `squeue` command supports the `-n` option to limit the list to job allocation of a given name, for example: ```bash # print job ID, name, and comment » squeue -o '%25A %10j %10k' -n sleep JOBID NAME COMMENT 1908264 sleep run-3 1908262 sleep run-1 1908263 sleep run-2 ```