Resource Allocation
Following sections describes how users request resource like CPUs, memory of GPUs to be allocated for a compute job. Furthermore the differences between interactive jobs and batch jobs in explained.
Allocations
Users request the allocation of computing resources on behalf of their associated accounts using the salloc
1, srun
2 or sbatch
3 commands:
Command | Interactive | Blocking | Description |
---|---|---|---|
salloc |
yes | yes | Allocate resources and launches a shell. |
srun |
yes | yes | Allocate resources and starts an application. |
sbatch |
no | no | Queues an application for later execution. |
A resource allocation specifies a set of resources, e.g. nodes, CPUs, RAM, etc., possibly with some set of constraints, e.g. number of processors per node, maximum runtime and so on. All three commands accept the same set of parameters for resource allocation.
A significant difference is that salloc
and srun
are interactive & blocking. This means that both are linked to your terminal session, hence bound to your connection to the submit node. Output is directed to your interactive shell running in the terminal session. Losing the connection to the submit node might kill the job.
The sbatch
command in contrast **transfers complete control to the cluster controller, and allows you to disconnect from the submit node.
Interactive
The following example uses salloc
to request a set of default resources from one of the partitions. The command blocks until resources are allocated:
# start an interactive command interpreter
» salloc --partition=debug
salloc: Granted job allocation 2964352
salloc
launches an interactive shell, after resources have been granted by the cluster controller:
# execute an arbitrary command
» cat /proc/cpuinfo | grep 'model name' | sort | uniq
model name : AMD EPYC 7551 32-Core Processor
# investigate the job configuration
» scontrol show job $SLURM_JOB_ID | grep Time
RunTime=00:00:51 TimeLimit=00:05:00 TimeMin=N/A
SubmitTime=2020-08-27T07:23:14 EligibleTime=2020-08-27T07:23:14
AccrueTime=Unknown
StartTime=2020-08-27T07:23:14 EndTime=2020-08-27T07:28:14 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Use the exit
command to stop the interactive shell and release all allocated resources:
» exit
exit
salloc: Relinquishing job allocation 2964352
srun
within a job allocation launches in parallel across all allocated nodes. The execution environment is inherited by all launched processes:
» salloc --partition=debug --nodes=2 --chdir=/tmp -- bash
salloc: Granted job allocation 2964433
# run a command on the node hosting the interactive shell
» hostname
lxbk0595
# run a command on all allocated nodes (in parallel)
» srun hostname
lxbk0595
lxbk0596
# run another command
» srun uptime
07:37:04 up 56 days, 14:26, 27 users, load average: 0.28, 0.18, 0.15
07:37:04 up 56 days, 14:26, 4 users, load average: 0.03, 0.07, 0.13
# release the resources
» exit
exit
salloc: Relinquishing job allocation 2964433
Each invocation of srun
within a job is known as a step. A (compute) job consists of one or more steps, each consisting of one or more tasks, each using one or more processors.
Real Time
Using srun
outside of a job allocation from salloc
, requests resources specified by command options and waits until resource allocation. As soon as resource are available it automatically launches the specified application.
» srun --partition=debug --nodes=5 --chdir=/tmp -- hostname
lxbk0596
lxbk0597
lxbk0599
lxbk0598
lxbk0600
Once the specified application has finished, resources a relinquished automatically.
The command above is basically is short notation for:
» salloc --partition=debug --nodes=5 \
--chdir=/tmp -- hostname
srun salloc: Granted job allocation 138
lxbk0596
lxbk0599
lxbk0597
lxbk0598
lxbk0600
salloc: Relinquishing job allocation 138
Batch Jobs
sbatch
is used to submit a compute job to the scheduler-queue for later execution. The command exits immediately after the controller has assigned a unique JOBID, and the job is queued by the scheduler. Batch jobs wait in the queue of pending jobs until resources become available.
The compute job is copied to a compute node as soon as resources have been granted by the scheduler. The jobs application is typically launched by a batch script with the resource allocation and resource constrains specified by meta-commands.
A compute job submitted with sbatch
has at least one implicit job step, the start of the executable provided as argument to the command.
Following is a very simple example of a batch job executing a simple shell-script:
# simple script executing a couple of commands
cat > $LUSTRE_HOME/sleep.sh <<EOF
#!/usr/bin/env bash
hostname ; uptime ; sleep 180 ; uname -a
EOF
# submit the script above, with the job name "sleep"
sbatch --output='%j.log' --chdir=$LUSTRE_HOME \
--job-name=sleep -- $LUSTRE_HOME/sleep.sh
Check if the state of the job using the squeue
command:
» squeue --format='%6A %8T %8N %9L %o' --name=sleep
JOBID STATE NODELIST TIME_LEFT COMMAND
18 RUNNING lxbk0596 1:57:40 /lustre/hpc/vpenso/sleep.sh
# read the stdout of the job
» cat $LUSTRE_HOME/$(squeue -ho %A -n sleep).log
lxbk0596
10:08:37 up 1 day, 2:05, 0 users, load average: 0.00, 0.01, 0.05
Attach to Job
From a submit node you may attach an interactive debugging shell to your running job with the following command:
srun --jobid <running-jobid> [-w <hostname>] -O --pty bash
Option | Description |
---|---|
--jobid=<jobid> |
Initiate a job step under an already allocated job with given id. |
-O , --overcommit |
The instantiated job step and task for the debugging shell do not demand additional resources from the existing allocation (which is usually already used up). |
--pty |
Execute task zero in pseudo terminal mode. Implicitly sets --unbuffered . Implicitly sets --error and --output to /dev/null for all tasks except task zero, which may cause those tasks to exit immediately (e.g. shells will typically exit immediately in that situation). This option applies to step allocations. |
-w , --nodelist=<hostname> |
Request a specific host. Useful if job allocation spans multiple nodes. |
Footnotes
salloc
manual page, SchedMD
https://slurm.schedmd.com/salloc.html↩︎srun
manual page, SchedMD
https://slurm.schedmd.com/srun.html↩︎sbatch
manual page, SchedMD
https://slurm.schedmd.com/sbatch.html↩︎