Resource Constrains

Modified

November 9, 2023

Abstract

The hardware of each compute node dictates the resource limits** of an application executed on it. These limits include for example the main memory and the number of processor cores. Furthermore there are limits defined by the cluster controller like run-time or the maximum allocatable memory per compute Job.

Limits

Use the sinfo command to overview resource limits for nodes in their corresponding partitions:

» sinfo -o "%9P  %6g %11L %10l %10m %5D %7X %5Y %7Z"
PARTITION  GROUPS DEFAULTTIME TIMELIMIT  MEMORY     NODES SOCKETS CORES THREADS
debug      all    5:00        30:00      257649     5     8       8     2
hpc_debug  all    5:00        30:00      257649     6     8       8     2
main*      all    2:00:00     8:00:00    257649     161   8       8     2
long       all    2:00:00     7-00:00:00 257649     38    8       8     2
grid       all    1:00:00     3-00:00:00 127653+    104   2+      8+    2

Resource constrains define the execution boundary of an application. By defining resource requirements like maximum runtime and allocatable memory, users ensure to not unintentionally consume more resources then planed.

This is in particular important for the runtime, since a software bug ¹ or an issue with access to the input data, for instance on shared storage, can increase execution time tremendously for your application. Keep in mind that run-time is one of the main factors accounted for by the cluster controller and is charged to your associated accounts for fair-share priority calculation.

Runtime

Use the sinfo command to list runtime limits:

sinfo -o "%9P %6g %11L %10l %5D %20C"

The output will include following columns:

Column	Description
TIMELIMIT	Maximum runtime of a job in a given partition.
DEFAULTTIME	Default runtime if not specified by user.

If a user does not explicitly specify a runtime limit, then the cluster controller will apply the default runtime defined for the partitions the compute job is executed in.

Specification

Command-line options to specify a maximum runtime:

Optio	Description
`-t`, `--time`	Limit on the total runtime of the job allocation

The time format to specify a runtime limited is the following:

minutes
minutes:seconds
hours:minutes:seconds
days-hours
days-hours:minutes
days-hours:minutes:seconds

If the requested time limit exceeds the partitions time limit, it will be rejected:

# time limit configuration of the debug partition
» sinfo -o "%11L %10l" -p debug
DEFAULTTIME TIMELIMIT 
5:00        30:00

# request a job with a time limite excerding the the partitions configuration
» salloc -p debug -t 02:00:00
salloc: error: Job submit/allocate failed: Requested time limit is invalid...

A runtime limit can be set by an environment variable:

Variable	Description
`SBATCH_TIMELIMIT`	Limit on the total runtime of the job allocation.

» SBATCH_TIMELIMIT=05:00 sbatch $LUSTRE_HOME/sleep.sh
» scontrol show job=$(squeue -ho %A -n sleep) | grep TimeLimit
   RunTime=00:00:43 TimeLimit=00:05:00 TimeMin=N/A

Limit Reached

When the time limit is reached, each task in each job step is sent SIGTERM followed by SIGKILL:

# submit a job with a 1 minute time limit
» sbatch -p debug -t 1 sleep.sh 360

# show the configuration of the job
» scontrol show job=$(squeue -ho %A -n sleep) | grep Time
   RunTime=00:00:53 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2019-08-06T11:24:00 EligibleTime=2019-08-06T11:24:00
   AccrueTime=2019-08-06T11:24:00
   StartTime=2019-08-06T11:24:01 EndTime=2019-08-06T11:25:01 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0

# job is killed after reaching its limit
» cat $LUSTRE_HOME/$(squeue -ho %A -n sleep).log 
[2019/08/06T11:24:01] START vpenso@lxbk0595:/lustre/hpc/vpenso virgo:debug sleep-104 2:2 2048
[2019/08/06T11:24:01] Sleep for 360 seconds
slurmstepd: error: *** JOB 104 ON lxbk0595 CANCELLED AT 2019-08-06T11:25:21 DUE TO TIME LIMIT ***

» sacct -j 104                                           
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
104               sleep      debug        hpc          2    TIMEOUT      0:0 
104.batch         batch                   hpc          2  CANCELLED     0:15

Memory

The job submission commands support following options for users to specify the maximum amount of real memory. It is important to specify enough memory since Slurm will not allow the application to use more than the requested amount of real memory. Jobs will be stopped by the out-of-memory handler if they uses more then the requested memory.

Option	Description
`--mem`	Specify memory required per node.
`--mem-per-cpu`	Minimum memory required per allocated CPU.

Memory is specified in the format size[units] with the unit suffix [K|M|G|T]. If the final amount of memory requested by a job can’t be satisfied by any of the nodes configured in the partition, then the job will be rejected.

Memory/CPU Ratio

Global configuration for memory per CPU:

» scontrol show config | grep MemPer
DefMemPerCPU            = 2048
MaxMemPerCPU            = 4096

Requesting memory beyond DefMemPerCPU will automatically allocate additional CPUs to compensate:

» sbatch --ntasks=1 --mem=64G -- $LUSTRE_HOME/sleep.sh

» scontrol show job=$(squeue -ho %A -n sleep) | grep -e CPUs -e Memory
   NumNodes=1 NumCPUs=32 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   MinCPUsNode=16 MinMemoryNode=64G MinTmpDiskNode=0

» sbatch --ntasks=1 --mem-per-cpu=8G -- $LUSTRE_HOME/sleep.sh

» scontrol show job=$(squeue -ho %A -n sleep) | grep -e CPUs -e Memory
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
   MinCPUsNode=1 MinMemoryCPU=4G MinTmpDiskNode=0

Out-of-Memory

Jobs allocating memory beyond their limits will be killed:

# request 128MB of RAM
» salloc -p debug --mem-per-cpu=128M

# allocate RAM above the requested limit
» srun -- stress -v -t 3 -m 1 --vm-hang 3 --vm-bytes 256M
...
stress: dbug: [57638] allocating 268435456 bytes ...
stress: dbug: [57638] touching bytes in strides of 4096 bytes ...
stress: FAIL: [57637] (415) <-- worker 57638 got signal 9
...
slurmstepd: error: Detected 1 oom-kill event(s) in step 106.0 cgroup. Some of
your processes may have been killed by the cgroup out-of-memory handler.
srun: error: lxbk0595: task 0: Out Of Memory

Cores

A typical compute node is build with support for multiple CPU sockets ² on its motherboard. Each socket hosts a group of Central Processing Units (CPUs) ³ , where each CPU provides multiple cores ⁴. Each core can execute two threads in parallel. Details are described in Support for Multi-core/Multi-thread Architectures ⁵ in the Slurm documentation. AMD EPYC compute node are build with multi-chip modules (MCMs) ⁶ where each chiplet represents its own socket from the perspective of the cluster controller.

Print the number of sockets, cores and threads with sinfo. The -e, --exact option list all available node configurations explicitly:

sinfo -e -o '%9P %4c %8z %8X %8Y %8Z %5D %N'

Tasks

Within a compute job a task represents a unit of work (your application instance) and the corresponding resources required to be executed. The number of tasks per jobs is configurable by the user. Following list contains a subset of options for the salloc, srun and sbatch commands, to request a specific number of tasks:

Option	Description
`-n`, `--ntasks=`	Number of tasks to start (default=1)
`--ntasks-per-node=`	Number of tasks to invoke on each node (default=1)
`--ntasks-per-socket=`	Number of tasks to invoke on each socket
`--ntasks-per-core=`	Number of tasks to invoke on each core

Each task is distributed to only one node, but more than one task may be distributed to each node. The number of tasks distributed to a node is constrained by the number of CPUs allocated on the node and the number of CPUs per task.

From the perspective of Linux (the host operating system) each execution thread from a CPU core is represented as an individual CPU. Therefore is is important to evaluate the term CPU depending on its context, a physical CPU (hardware) is not necessarily the same as a logic CPU.

The cluster controller is very flexible in the distribution of tasks to nodes depending on the command-line options provided by the user. For example following command executes a single job with 12 tasks:

» srun --partition=debug --ntasks=12 \
        -- hostname | sort | uniq -c 
     12 lxbk0595

A user can control how tasks are distributed over nodes, motherboards sockets on a node, and the number of tasks per core (physical CPU). Following example starts a single job with four tasks distributed over two nodes:

» srun --partition=debug --ntasks-per-node=2 --ntasks=4 \
        -- hostname | sort | uniq -c
      2 lxbk0595
      2 lxbk0596

Logic CPUs

By default a task allocates two logic CPUs (on a single physical CPU core). This is due to the support of modern CPUs for multi-threading ⁷. The locality of the memory hierarchy on modern compute hardware, as well as the limited capabilities of a second thread on a single core, makes it inefficient to execute two completely independent programs in parallel on the same core. Therefore users will get the full performance of a physical CPU core, and have the option to execute two threads in parallel if desired.

# two logic CPUs by default
» srun --partition=debug --ntasks=1 \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  1,65

# the same as specifically requesting two logic CPUs
» srun --partition=debug --ntasks=1 --cpus-per-task=2 \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  1,65

Users have the capability to configure the number of logic CPUs per task with a command-line option:

Option	Description
`-c`, `--cpus-per-task=`	Number of CPUs per process (default=2)

I.e. a single job with a single task requesting 4 logic CPUs executed on two physical CPU cores:

» srun --partition=debug --ntasks=1 --cpus-per-task=4 \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  1-2,65-66

Task Affinity

Task affinity is a mechanism for the user to control how tasks are distributed over the physical CPU cores. The command line option --hint controls distribution methods for the allocation of logic CPUs:

» srun --hint=help
Application hint options:
    --hint=             Bind tasks according to application hints
        compute_bound   use all cores in each socket
        memory_bound    use only one core in each socket
        [no]multithread [don't] use extra threads with in-core multi-threading
        help            show this help message

By default compute bound will allocate complete physical core:

» srun -p debug -n 1 -c 1 --hint=compute_bound \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1,65

» srun -p debug -n 1 -c 2 --hint=compute_bound \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1-2,65-66

» srun -p debug -n 1 -c 3 --hint=compute_bound \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1-3,65-67

User have the option to use --hint=multithread to explicitly control how execution threads are allocated:

# single task with a single logic CPU
» srun -p debug -n 1 -c 1 --hint=multithread \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1

# single task with two logic CPUs
» srun -p debug -n 1 -c 2 --hint=multithread \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1,65

# single task wtih three logic CPUs
» srun -p debug -n 1 -c 3 --hint=multithread \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:      1-2,65

# two task with a single logic CPU each
» srun -p debug -n 2 -c 1 --hint=multithread \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  1
Cpus_allowed_list:  65

# three tasks with a single logic CPU each
» srun -p debug -n 3 -c 1 --hint=multithread \
        -- cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  65
Cpus_allowed_list:  1
Cpus_allowed_list:  8

Features

Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option.

Print a list of available features with the sinfo command:

» date ; sinfo -o "%20N %f"
Fri Mar  5 08:48:33 CET 2021
NODELIST             AVAIL_FEATURES
lxbk[0553-0723]      amd,epic,7551
lxbk[0724-1033]      intel,xeon,gold6248r
lxbk[0501-0552]      intel,xeon,e52680

Use an argument to the salloc, srun or sbatch commands to request specific features:

Option	Description
`-C`, `--constraint`	Multiple constraints may be specified with the and operator `&`, or the or operator `\|`. (Further details are described in the corresponding manual pages.)

Following examples request specific CPU types:

# require a specific CPU type
» srun --constraint=intel \
       -- cat /proc/cpuinfo | grep 'model name' | sort | uniq
model name  : Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz

» srun --constraint=amd \
       -- cat /proc/cpuinfo | grep 'model name' | sort | uniq
model name  : AMD EPYC 7551 32-Core Processor

# request nodes with multiple features (AND operator)
» srun --constraint='intel&e52680' \
       -- cat /proc/cpuinfo | grep 'model name' | sort | uniq
model name  : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

When srun is executed from within salloc or sbatch, the constraint value can only contain a single feature name. None of the other operators are currently supported for job steps.

Footnotes

Software bug, Wikipedia
https://en.wikipedia.org/wiki/Software_bug ↩︎
CPU socket, Wikipedia
https://en.wikipedia.org/wiki/CPU_socket ↩︎
Central processing unit, Wikipedia
https://en.wikipedia.org/wiki/Central_processing_unit ↩︎
Multi-core processor, Wikipedia
https://en.wikipedia.org/wiki/Multi-core_processor ↩︎
Support for Multi-core/Multi-thread Architectures, SchedMD
https://slurm.schedmd.com/mc_support.html ↩︎
Multi-chip module, Wikipedia
https://en.wikipedia.org/wiki/Multi-chip_module ↩︎
CPU Management User and Administrator Guide, SchedMD
https://slurm.schedmd.com/cpu_management.html ↩︎

--- title: Resource Constrains date-modified: 2023/11/09 abstract: > The hardware of each compute node dictates the resource limits** of an application executed on it. These limits include for example the main memory and the number of processor cores. Furthermore there are limits defined by the cluster controller like run-time or the maximum allocatable memory per compute Job. --- ## Limits Use the `sinfo` command to overview resource limits for nodes in their corresponding [partitions][6e91v]: [6e91v]: partitions.html ```bash » sinfo -o "%9P %6g %11L %10l %10m %5D %7X %5Y %7Z" PARTITION GROUPS DEFAULTTIME TIMELIMIT MEMORY NODES SOCKETS CORES THREADS debug all 5:00 30:00 257649 5 8 8 2 hpc_debug all 5:00 30:00 257649 6 8 8 2 main* all 2:00:00 8:00:00 257649 161 8 8 2 long all 2:00:00 7-00:00:00 257649 38 8 8 2 grid all 1:00:00 3-00:00:00 127653+ 104 2+ 8+ 2 ``` **Resource constrains define the execution boundary of an application.** By defining resource requirements like maximum runtime and allocatable memory, users **ensure to not unintentionally consume more resources then planed**. This is in particular important for the runtime, since a software bug [^bfU3Z] or an issue with access to the input data, for instance on [shared storage][5cKBO], can increase execution time tremendously for your application. Keep in mind that [run-time][qCHWy] is one of the main factors accounted for by the cluster controller and is **charged to your associated [accounts][xprR7]** for fair-share priority calculation. [^bfU3Z]: Software bug, Wikipedia <https://en.wikipedia.org/wiki/Software_bug> [5cKBO]: ../storage.html#shared-storage [qCHWy]: resource-allocation.html#real-time [xprR7]: accounts.html ## Runtime Use the `sinfo` command to list runtime limits: ```sh sinfo -o "%9P %6g %11L %10l %5D %20C" ``` The output will include following columns: Column | Description ------------|----------------------------------------------------------- TIMELIMIT | Maximum runtime of a job in a given partition. DEFAULTTIME | Default runtime if not specified by user. If a user does not explicitly specify a runtime limit, then the cluster controller will apply the default runtime defined for the [partitions][zcbTR] the compute job is executed in. [zcbTR]: partitions.html ### Specification Command-line options to specify a maximum runtime: Optio | Description -------|----------------------- `-t`, `--time` | Limit on the total runtime of the job allocation The time **format to specify a runtime limited** is the following: ```txt minutes minutes:seconds hours:minutes:seconds days-hours days-hours:minutes days-hours:minutes:seconds ``` If the requested time limit exceeds the partitions time limit, it will be rejected: ```sh # time limit configuration of the debug partition » sinfo -o "%11L %10l" -p debug DEFAULTTIME TIMELIMIT 5:00 30:00 # request a job with a time limite excerding the the partitions configuration » salloc -p debug -t 02:00:00 salloc: error: Job submit/allocate failed: Requested time limit is invalid... ``` A runtime limit can be set by an environment variable: Variable | Description ------------------|------------------------------- `SBATCH_TIMELIMIT`| Limit on the total runtime of the job allocation. ```sh » SBATCH_TIMELIMIT=05:00 sbatch $LUSTRE_HOME/sleep.sh » scontrol show job=$(squeue -ho %A -n sleep) | grep TimeLimit RunTime=00:00:43 TimeLimit=00:05:00 TimeMin=N/A ``` ### Limit Reached When the time limit is reached, each task in each job step is sent `SIGTERM` followed by `SIGKILL`: ```sh # submit a job with a 1 minute time limit » sbatch -p debug -t 1 sleep.sh 360 # show the configuration of the job » scontrol show job=$(squeue -ho %A -n sleep) | grep Time RunTime=00:00:53 TimeLimit=00:01:00 TimeMin=N/A SubmitTime=2019-08-06T11:24:00 EligibleTime=2019-08-06T11:24:00 AccrueTime=2019-08-06T11:24:00 StartTime=2019-08-06T11:24:01 EndTime=2019-08-06T11:25:01 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 # job is killed after reaching its limit » cat $LUSTRE_HOME/$(squeue -ho %A -n sleep).log [2019/08/06T11:24:01] START vpenso@lxbk0595:/lustre/hpc/vpenso virgo:debug sleep-104 2:2 2048 [2019/08/06T11:24:01] Sleep for 360 seconds slurmstepd: error: *** JOB 104 ON lxbk0595 CANCELLED AT 2019-08-06T11:25:21 DUE TO TIME LIMIT *** » sacct -j 104 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 104 sleep debug hpc 2 TIMEOUT 0:0 104.batch batch hpc 2 CANCELLED 0:15 ``` ## Memory The job submission commands support following options for users to specify the maximum amount of real memory. It is important to specify enough memory since Slurm will not allow the application to use more than the requested amount of real memory. Jobs will be stopped by the [out-of-memory handler][HW19Q] if they uses more then the requested memory. [HW19Q]: #out-of-memory Option | Description -------|------------------------------------------- `--mem`| Specify memory required per node. `--mem-per-cpu` | Minimum memory required per allocated CPU. Memory is specified in the format `size[units]` with the unit suffix `[K|M|G|T]`. If the final amount of memory requested by a job can't be satisfied by any of the nodes configured in the partition, then the job will be rejected. ### Memory/CPU Ratio Global configuration for memory per CPU: ```sh » scontrol show config | grep MemPer DefMemPerCPU = 2048 MaxMemPerCPU = 4096 ``` Requesting memory beyond `DefMemPerCPU` will automatically allocate additional CPUs to compensate: ```sh » sbatch --ntasks=1 --mem=64G -- $LUSTRE_HOME/sleep.sh » scontrol show job=$(squeue -ho %A -n sleep) | grep -e CPUs -e Memory NumNodes=1 NumCPUs=32 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* MinCPUsNode=16 MinMemoryNode=64G MinTmpDiskNode=0 » sbatch --ntasks=1 --mem-per-cpu=8G -- $LUSTRE_HOME/sleep.sh » scontrol show job=$(squeue -ho %A -n sleep) | grep -e CPUs -e Memory NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=2 ReqB:S:C:T=0:0:*:* MinCPUsNode=1 MinMemoryCPU=4G MinTmpDiskNode=0 ``` ### Out-of-Memory Jobs allocating memory beyond their limits will be killed: ```sh # request 128MB of RAM » salloc -p debug --mem-per-cpu=128M # allocate RAM above the requested limit » srun -- stress -v -t 3 -m 1 --vm-hang 3 --vm-bytes 256M ... stress: dbug: [57638] allocating 268435456 bytes ... stress: dbug: [57638] touching bytes in strides of 4096 bytes ... stress: FAIL: [57637] (415) <-- worker 57638 got signal 9 ... slurmstepd: error: Detected 1 oom-kill event(s) in step 106.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. srun: error: lxbk0595: task 0: Out Of Memory ``` ## Cores A typical compute node is build with support for **multiple CPU sockets** [^sock] on its motherboard. Each socket hosts a group of Central Processing Units (CPUs) [^cpu] , where **each CPU provides multiple cores** [^core]. Each core can execute **two threads in parallel**. Details are described in _Support for Multi-core/Multi-thread Architectures_ [^smmth] in the Slurm documentation. AMD EPYC compute node are build with multi-chip modules (MCMs) [^mcm] where each chiplet represents its own socket from the perspective of the cluster controller. [^sock]: CPU socket, Wikipedia <https://en.wikipedia.org/wiki/CPU_socket> [^cpu]: Central processing unit, Wikipedia <https://en.wikipedia.org/wiki/Central_processing_unit> [^core]: Multi-core processor, Wikipedia <https://en.wikipedia.org/wiki/Multi-core_processor> [^mcm]: Multi-chip module, Wikipedia <https://en.wikipedia.org/wiki/Multi-chip_module> [^smmth]: Support for Multi-core/Multi-thread Architectures, SchedMD <https://slurm.schedmd.com/mc_support.html> Print the number of sockets, cores and threads with `sinfo`. The `-e`, `--exact` option list all available node configurations explicitly: ```sh sinfo -e -o '%9P %4c %8z %8X %8Y %8Z %5D %N' ``` ### Tasks Within a compute job a task represents a unit of work (your application instance) and the corresponding resources required to be executed. **The number of tasks per jobs is configurable by the user**. Following list contains a subset of options for the `salloc`, `srun` and `sbatch` commands, to request a specific number of tasks: Option | Description -------|--------------- `-n`, `--ntasks=` | Number of tasks to start (default=1) `--ntasks-per-node=` | Number of tasks to invoke on each node (default=1) `--ntasks-per-socket=` | Number of tasks to invoke on each socket `--ntasks-per-core=` | Number of tasks to invoke on each core **Each task is distributed to only one node**, but more than one task may be distributed to each node. The number of tasks distributed to a node is constrained by the number of CPUs allocated on the node and the number of CPUs per task. ::: {.callout-note appearance="simple"} From the perspective of Linux (the host operating system) each execution thread from a CPU core is represented as an individual CPU. Therefore is is important to evaluate the term CPU depending on its context, **a physical CPU (hardware) is not necessarily the same as a logic CPU**. ::: The cluster controller is very flexible in the distribution of tasks to nodes depending on the command-line options provided by the user. For example following command executes a single job with 12 tasks: ```sh » srun --partition=debug --ntasks=12 \ -- hostname | sort | uniq -c 12 lxbk0595 ``` A user can control how tasks are distributed over nodes, motherboards sockets on a node, and the number of tasks per core (physical CPU). Following example starts a single job with four tasks distributed over two nodes: ```sh » srun --partition=debug --ntasks-per-node=2 --ntasks=4 \ -- hostname | sort | uniq -c 2 lxbk0595 2 lxbk0596 ``` ### Logic CPUs **By default a task allocates two logic CPUs** (on a single physical CPU core). This is due to the support of modern CPUs for multi-threading [^cmuag]. The locality of the memory hierarchy on modern compute hardware, as well as the limited capabilities of a second thread on a single core, makes it inefficient to execute two completely independent programs in parallel on the same core. Therefore users will get the full performance of a physical CPU core, and have the option to execute two threads in parallel if desired. [^cmuag]: CPU Management User and Administrator Guide, SchedMD <https://slurm.schedmd.com/cpu_management.html> ```sh # two logic CPUs by default » srun --partition=debug --ntasks=1 \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1,65 # the same as specifically requesting two logic CPUs » srun --partition=debug --ntasks=1 --cpus-per-task=2 \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1,65 ``` Users have the capability to **configure the number of logic CPUs per task** with a command-line option: Option | Description -------|--------------- `-c`, `--cpus-per-task=` | Number of CPUs per process (default=2) I.e. a single job with a single task requesting 4 logic CPUs executed on two physical CPU cores: ```sh » srun --partition=debug --ntasks=1 --cpus-per-task=4 \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1-2,65-66 ``` ### Task Affinity Task affinity is a mechanism for the user to control how tasks are distributed over the physical CPU cores. The command line option `--hint` controls distribution methods for the allocation of logic CPUs: ```sh » srun --hint=help Application hint options: --hint= Bind tasks according to application hints compute_bound use all cores in each socket memory_bound use only one core in each socket [no]multithread [don't] use extra threads with in-core multi-threading help show this help message ``` By default compute bound will allocate complete physical core: ```sh » srun -p debug -n 1 -c 1 --hint=compute_bound \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1,65 » srun -p debug -n 1 -c 2 --hint=compute_bound \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1-2,65-66 » srun -p debug -n 1 -c 3 --hint=compute_bound \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1-3,65-67 ``` User have the option to use `--hint=multithread` to explicitly control how execution threads are allocated: ```sh # single task with a single logic CPU » srun -p debug -n 1 -c 1 --hint=multithread \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1 # single task with two logic CPUs » srun -p debug -n 1 -c 2 --hint=multithread \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1,65 # single task wtih three logic CPUs » srun -p debug -n 1 -c 3 --hint=multithread \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1-2,65 # two task with a single logic CPU each » srun -p debug -n 2 -c 1 --hint=multithread \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 1 Cpus_allowed_list: 65 # three tasks with a single logic CPU each » srun -p debug -n 3 -c 1 --hint=multithread \ -- cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 65 Cpus_allowed_list: 1 Cpus_allowed_list: 8 ``` ## Features Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option. Print a list of available features with the `sinfo` command: ```sh » date ; sinfo -o "%20N %f" Fri Mar 5 08:48:33 CET 2021 NODELIST AVAIL_FEATURES lxbk[0553-0723] amd,epic,7551 lxbk[0724-1033] intel,xeon,gold6248r lxbk[0501-0552] intel,xeon,e52680 ``` Use an argument to the `salloc`, `srun` or `sbatch` commands to request specific features: Option | Description -------|--------------- `-C`, `--constraint` | Multiple constraints may be specified with the and operator `&`, or the or operator `|`. (Further details are described in the corresponding manual pages.) Following examples request specific CPU types: ```sh # require a specific CPU type » srun --constraint=intel \ -- cat /proc/cpuinfo | grep 'model name' | sort | uniq model name : Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz » srun --constraint=amd \ -- cat /proc/cpuinfo | grep 'model name' | sort | uniq model name : AMD EPYC 7551 32-Core Processor # request nodes with multiple features (AND operator) » srun --constraint='intel&e52680' \ -- cat /proc/cpuinfo | grep 'model name' | sort | uniq model name : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz ``` ::: {.callout-warning appearance="simple"} When `srun` is executed from within `salloc` or `sbatch`, the constraint value can only contain a single feature name. None of the other operators are currently supported for job steps. :::