Scheduler Queue

Modified

November 9, 2023

Abstract

The scheduler queue runs on the cluster controller and is one of the main components of the Slurm workload management system. The primary goal of the scheduling queue is to efficiently utilize all resources available in the compute cluster, while ensuring the distribution of resources to all users following the principles of a fair-share algorithm.

Queue State

List jobs in the scheduler queue with the squeue 1 command. By default squeue list all jobs in the scheduler queue. In order to limit the length of the list multiple command line options can be applied, for example:

Option Description
-u <user_list>,
--user=<user_list>
List jobs from a comma separated list of users.
-A <account_list>,
--account=<account_list>
List jobs associated to Slurm accounts.
# list jobs for your Linux user name
squeue -u $USER

# similar using an environment variable
export SQUEUE_USERS=$USER
squeue
Variable Description
SQUEUE_ACCOUNT List jobs for a list of accounts.
SQUEUE_USERS List jobs for a list of Linux users.

List jobs in order of expected start time with the option:

Option Description
--start Report the expected start time and resources to be allocated for pending jobs.
squeue --start

Job State

The following list presents the most common job states:

State Abb. Description
pending PD Job is awaiting resource allocation.
running R Job currently has an allocation.
timeout TO Job terminated upon reaching its time limit.
canceled CA Job canceled by the user or system administrator.
failed F Job terminated with non-zero exit code.
completed CD Job has terminated all processes on all nodes.

The squeue command is very customizable in order to limit the output to a specific set of jobs in the system, for example:

# list running jobs by start time
squeue \
    --states running \
    --format '%20S %11M %9P %8u %6g %10T %11l' \
    | sort -k 1 \
    | uniq -f 2 -c \
    | tac
# accumulate similar jobs with a counter in the first column

Pending

Reason Description
BadConstraint Job resource constrains can not be satisfied.
Resources Required resources are in use.
Priority Resources being reserved for higher priority jobs.
Dependency Job dependencies not yet satisfied.
Reservation Waiting for advanced reservation.

The scheduler queue specifies job reason codes, which identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case only one of those reasons is displayed. Above is an incomplete, but most common list of reasons for jobs waiting in the queue. Refer to the squeue manual page fro a complete list of job reason codes.

Priority

Option Description
-P, --priority List pending jobs sorted by priority.

Many factors are weighted in order to calculate a priority individually for each job. The scheduler queue sorts jobs in priority order and allocates resources as efficient as possible among the jobs with highest priority. In order to optimize overall system utilization jobs are scheduled depending on resource requirements and user define resource limits. Produce a list of pending jobs in the same order considered for scheduling with the squeue command using following option:

Footnotes

  1. squeue manual page, SchedMD
    https://slurm.schedmd.com/squeue.html↩︎