Accounts
Slurm is configured to collect accounting information for resource allocations. This information is a prerequisite to distribute the cluster resources according to a fair-share algorithm among all users. A Slurm account is comparable to a bank account, in the sense, that it has a credit for an defined resource share to be consumed by the account users over a given time period. User are associated to an account by their Linux username. Management of the account users and the account internal resource distribution is delegated to account coordinator. Users can be associated to multiple accounts and need to request access to account resources by contacting the corresponding account coordinator. Account users and sub-accounts can be configured with resource limits in order to prevent individual user from allocating resources beyond defined limits or shares within the account fair-share.
Default Account
Option | Description |
---|---|
-A , --account |
Account name used for job allocation |
Users need to request resources on behalf of a specified account. Users have an defined default account, if not overwritten by a configuration. The following command lists all accounts associated to a Linux username including default account:
sacctmgr show user withassoc format=account,user,defaultaccount where user=$USER
In order to specify the account for a single job use option -A
with the account name as value. Alternatively overwrite the default account with environment variables listed in the table below. Note that these variables have precedence over account setting defined with meta-commands.
Variable | Description |
---|---|
SLURM_ACCOUNT |
Interpreted by srun command |
SALLOC_ACCOUNT |
Interpreted by salloc command |
SBATCH_ACCOUNT |
Interpreted by sbatch command |
Coordinators
Collaborations and experiments associated to GSI/FAIR can apply for a Slurm account. Print a list for all accounts with their corresponding coordinators using the sacctmgr
command:
# list all accounts with their respective coordinators
sacctmgr list account format=acc%10,co%30 withco
# list users for a specific account
sacctmgr list account format=acc,us withas where account=$name
Coordinators can manage all jobs associated to their account with the scancel
and scontrol
1 commands:
scontrol hold # prevent a pending job from being started
scontrol release # Release a previously held job to begin execution
scontrol requeue # Requeue a batch job into pending state
scontrol show job # Print details of a specific job
scontrol update job # Change a job configuration
scontrol show step # Print details of a specific job step
scontrol update step # Change a job step configuration
scancel # Terminate jobs
In the following common uses of the sacctmgr
2 command are shown to work with Slurm user accounts. The name=
filed requires to be the Linux account name of a particular user:
# list all association for given user name
sacctmgr show user withassoc format=account,user,defaultaccount where user=$user
# associate a user to an account
sacctmgr add user account=$name names=$user[,$user]
# modify a user association
sacctmgr modify user where user=$name set $key=$value
# remove a users account association
sacctmgr delete user name=$name account=$account
An account coordinator can promote a user to be coordinator:
# list coordinators for a given account
sacctmgr list account withcoordinator where account=$account
# add coordinator(s) to account
sacctmgr add coordinator account=$account names=$name,...
Account Limits
Limit | Description |
---|---|
GrpCPUMins | Hard limit of CPU minutes. |
GrpCPUs | Total count of CPUs able to be used. |
GrpJobs | Total number of jobs able to run at any given time. |
GrpMemory | Total amount of memory (MB) able to be used. |
GrpNodes | Total count of nodes able to be used at any given time. |
GrpSubmitJobs | Total number of jobs able to be submitted. |
GrpWall | Maximum wall clock time any job submitted. |
Resource limits 3 allow a more finer-grained restriction of allocatable resources for a given association (typically for a user). Parent association limits are inherited by children, unless dedicated limits have been set. Whereas, children can have higher limits then their parents. Coordinators configure limits with the sacctmgr
command:
# set the maximum number of jobs for a user
sacctmgr modify user $name set GrpJobs=1000
# clear a resource limit with a negative value
sacctmgr modify user $name set GrpCPUs=-1
Footnotes
scontrol
Manual Page, SchedMD
https://slurm.schedmd.com/scontrol.html↩︎sacctmgr
Manual Page, SchedMD
https://slurm.schedmd.com/sacctmgr.html↩︎Resource Limits, SchedMD Slurm Documentation
https://slurm.schedmd.com/resource_limits.html↩︎Fairshare and Job Accounting, Harvard University, FAS Research Computing
https://docs.rc.fas.harvard.edu/kb/fairshare/↩︎sshare
Manual Pages, SchedMD
https://slurm.schedmd.com/sshare.html↩︎