Scheduler Queue
The scheduler queue runs on the cluster controller and is one of the main components of the Slurm workload management system. The primary goal of the scheduling queue is to efficiently utilize all resources available in the compute cluster, while ensuring the distribution of resources to all users following the principles of a fair-share algorithm.
Queue State
List jobs in the scheduler queue with the squeue
1 command. By default squeue
list all jobs in the scheduler queue. In order to limit the length of the list multiple command line options can be applied, for example:
Option | Description |
---|---|
-u <user_list> ,--user=<user_list> |
List jobs from a comma separated list of users. |
-A <account_list> ,--account=<account_list> |
List jobs associated to Slurm accounts. |
# list jobs for your Linux user name
squeue -u $USER
# similar using an environment variable
export SQUEUE_USERS=$USER
squeue
Variable | Description |
---|---|
SQUEUE_ACCOUNT |
List jobs for a list of accounts. |
SQUEUE_USERS |
List jobs for a list of Linux users. |
List jobs in order of expected start time with the option:
Option | Description |
---|---|
--start |
Report the expected start time and resources to be allocated for pending jobs. |
squeue --start
Job State
The following list presents the most common job states:
State | Abb. | Description |
---|---|---|
pending | PD | Job is awaiting resource allocation. |
running | R | Job currently has an allocation. |
timeout | TO | Job terminated upon reaching its time limit. |
canceled | CA | Job canceled by the user or system administrator. |
failed | F | Job terminated with non-zero exit code. |
completed | CD | Job has terminated all processes on all nodes. |
The squeue
command is very customizable in order to limit the output to a specific set of jobs in the system, for example:
# list running jobs by start time
squeue \
--states running \
--format '%20S %11M %9P %8u %6g %10T %11l' \
| sort -k 1 \
| uniq -f 2 -c \
| tac
# accumulate similar jobs with a counter in the first column
Pending
Reason | Description |
---|---|
BadConstraint | Job resource constrains can not be satisfied. |
Resources | Required resources are in use. |
Priority | Resources being reserved for higher priority jobs. |
Dependency | Job dependencies not yet satisfied. |
Reservation | Waiting for advanced reservation. |
The scheduler queue specifies job reason codes, which identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case only one of those reasons is displayed. Above is an incomplete, but most common list of reasons for jobs waiting in the queue. Refer to the squeue
manual page fro a complete list of job reason codes.
Priority
Option | Description |
---|---|
-P , --priority |
List pending jobs sorted by priority. |
Many factors are weighted in order to calculate a priority individually for each job. The scheduler queue sorts jobs in priority order and allocates resources as efficient as possible among the jobs with highest priority. In order to optimize overall system utilization jobs are scheduled depending on resource requirements and user define resource limits. Produce a list of pending jobs in the same order considered for scheduling with the squeue
command using following option:
Footnotes
squeue
manual page, SchedMD
https://slurm.schedmd.com/squeue.html↩︎