Partitions

Modified

November 8, 2023

Abstract

Partitions group nodes with similar characteristics like resources, priorities or run-time limits.

Configuration

The sinfo 1 command lists partitions and their states:

Option Description
-s, --summarize Lists only a partition state summary with no node state details.
-o, --format Specifies the output columns to print, please refer to the manual page for more details.

Following show an example of overall resource allocation on partitions. The column “NODES(A/I/O/T)” indicates resource state, capital letter are abbreviations for Available, Idle, Other and Total:

>>> sinfo -s
PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
debug        up      30:00         0/7/3/10 lxbk[0719-0722,1130-1135]
main*        up    8:00:00   254/129/57/440 lxbk[0724-1033,1136-1265]
grid         up 3-00:00:00   148/106/56/310 lxbk[0724-1033]
high_mem     up 7-00:00:00       24/3/19/46 lxbk[1034-1079]
gpu          up 7-00:00:00       14/3/33/50 lxbk[1080-1129]
long         up 7-00:00:00    197/89/56/342 lxbk[0717-0718,0824-1033,1136-1265]

Show default runtime with limits:

>>> sinfo -o "%9P  %6g %11L %10l %5D %20C" 
PARTITION  GROUPS DEFAULTTIME TIMELIMIT  NODES CPUS(A/I/O/T)       
debug      all    5:00        30:00      10    0/1664/384/2048     
main*      all    2:00:00     8:00:00    440   23058/33838/6144/630
grid       all    1:00:00     3-00:00:00 310   10380/14004/5376/297
high_mem   all    1:00:00     7-00:00:00 46    2296/4616/4864/11776
gpu        all    2:00:00     7-00:00:00 50    1202/430/3168/4800  
long       all    2:00:00     7-00:00:00 342   19074/28574/6048/536

List CPUs configuration and memory per node:

>>> sinfo -o "%9P %6g %4c %10z %8m %5D %20C"
PARTITION GROUPS CPUS S:C:T      MEMORY   NODES CPUS(A/I/O/T)       
debug     all    128+ 2:32+:2    257500+  10    0/1664/384/2048     
main*     all    96+  2:24+:2    191388+  440   23056/33840/6144/630
grid      all    96   2:24:2     191388   310   10378/14006/5376/297
high_mem  all    256  8:16:2     1031342  46    2296/4616/4864/11776
gpu       all    96   2:24:2     515451   50    1202/430/3168/4800  
long      all    96+  2:24+:2    191388+  342   19072/28576/6048/536

Print a comprehensive list idle nodes including available resources:

sinfo -Nel -t idle

An asterisk as suffix ‘*’ indicates the default partition. Compute jobs will be send to the default partition unless a specific partition is selected by option.

It is recommended to test your application launch in the debug partition first. This partition has a very short runtime and therefore allows a very quick resource allocation, which prevents long waiting times in the scheduler queue.

Allocation

salloc, srun, and sbatch support following command options to select a partition, which is typically used in conjunction with other options related to resource allocation:

Option Description
-p, --partition Request a specific partition for the resource allocation.

For example request resource from the debug partition:

sbatch --partition=debug ...

Overwrite the default partition configuration with following environment variables:

Variable Description
SLURM_PARTITION Interpreted by the srun command
SALLOC_PARTITION Interpreted by the salloc command
SBATCH_PARTITION Interpreted by the sbatch command

Footnotes

  1. sinfo manual page, SchedMD
    https://slurm.schedmd.com/sinfo.html↩︎