Accounts

Modified

November 10, 2023

Abstract

Slurm is configured to collect accounting information for resource allocations. This information is a prerequisite to distribute the cluster resources according to a fair-share algorithm among all users. A Slurm account is comparable to a bank account, in the sense, that it has a credit for an defined resource share to be consumed by the account users over a given time period. User are associated to an account by their Linux username. Management of the account users and the account internal resource distribution is delegated to account coordinator. Users can be associated to multiple accounts and need to request access to account resources by contacting the corresponding account coordinator. Account users and sub-accounts can be configured with resource limits in order to prevent individual user from allocating resources beyond defined limits or shares within the account fair-share.

Default Account

Option Description
-A, --account Account name used for job allocation

Users need to request resources on behalf of a specified account. Users have an defined default account, if not overwritten by a configuration. The following command lists all accounts associated to a Linux username including default account:

sacctmgr show user withassoc format=account,user,defaultaccount where user=$USER

In order to specify the account for a single job use option -a with the account name to use as value. Alternatively overwrite the default account with environment variables in listed in the table below. Note that these variables have precedence over account setting defined with meta-commands.

Variable Description
SLURM_ACCOUNT Interpreted by srun command
SALLOC_ACCOUNT Interpreted by salloc command
SBATCH_ACCOUNT Interpreted by sbatch command

Coordinators

Collaborations and experiments associated to GSI/FAIR can apply for a Slurm account. Print a list for all accounts with their corresponding coordinators using the sacctmgr command:

# list all accounts with their respective coordinators
sacctmgr list account format=acc%10,co%30 withco

# list users for a specific account
sacctmgr list account format=acc,us withas where account=$name

Coordinators can manage all jobs associated to their account with the scancel and scontrol 1 commands:

scontrol hold        # prevent a pending job from being started
scontrol release     # Release a previously held job to begin execution
scontrol requeue     # Requeue a batch job into pending state
scontrol show job    # Print details of a specific job
scontrol update job  # Change a job configuration
scontrol show step   # Print details of a specific job step
scontrol update step # Change a job step configuration
scancel              # Terminate jobs

In the following common uses of the sacctmgr 2 command are shown to work with Slurm user accounts. The name= filed requires to be the Linux account name of a particular user:

# list all association for given user name
sacctmgr show user withassoc format=account,user,defaultaccount where user=$user

# associate a user to an account
sacctmgr add user account=$name names=$user[,$user]

# modify a user association
sacctmgr modify user where user=$name set $key=$value

# remove a users account association
sacctmgr delete user name=$name account=$account

An account coordinator can promote a user to be coordinator:

# list coordinators for a given account
sacctmgr list account withcoordinator where account=$account

# add coordinator(s) to account
sacctmgr add coordinator account=$account names=$name,...

Account Limits

Limit Description
GrpCPUMins Hard limit of CPU minutes.
GrpCPUs Total count of CPUs able to be used.
GrpJobs Total number of jobs able to run at any given time.
GrpMemory Total amount of memory (MB) able to be used.
GrpNodes Total count of nodes able to be used at any given time.
GrpSubmitJobs Total number of jobs able to be submitted.
GrpWall Maximum wall clock time any job submitted.

Resource limits 3 allow a more finer-grained restriction of allocatable resources for a given association (typically for a user). Parent association limits are inherited by children, unless dedicated limits have been set. Whereas, children can have higher limits then their parents. Coordinators configure limits with the sacctmgr command:

# set the maximum number of jobs for a user
sacctmgr modify user $name set GrpJobs=1000

# clear a resource limit with a negative value
sacctmgr modify user $name set GrpCPUs=-1

Fair-Share

Fairshare essentially defines the accumulated amount of resources a user can allocate on the cluster within a defined time-frame. In other words fairshare limits users to a configured fraction of the system, based on a score calculated from the accounted resource usage in the users past. Based on the fairshare score user become a priority assigned in order to fulfill a configured share equilibrium between all user of the system on the long run.

While Fairshare may seem complex and confusing, it is actually quite logical once you think about it. The scheduler needs some way to adjudicate who gets what resources. Different groups on the cluster have been granted different resources for various reasons. In order to serve the great variety of groups and needs on the cluster a method of fairly adjudicating job priority is required. This is the goal of Fairshare. Fairshare allows those users who have not fully used their resource grant to get higher priority for their jobs on the cluster, while making sure that those groups that have used more than their resource grant do not overuse the cluster. The cluster is a limited resource and Fairshare allows us to ensure everyone gets a fair opportunity to use it regardless of how big or small the group is. 4

sshare 5 lists shares of associations in a cluster:

sshare -u $USER                   # for a specific user
sshare -a -A $account             # all user of a specific account

The FairShare column show the fairshare score for an account/user, which is a value between 1.0 and 0 interpreted like:

Value Description
1.0 Unused
1.0 > v > 0.5 Underutilization
0.5 Average utilization
0.5 > v > 0 Over-utilization
0 No share left

The above value is the indicator to understand how much resources an account/user has consumed in relation to its total granted share.

Since the usage of the cluster varies, the schedule does not stop Accounts from using more than their granted Share. Instead, the scheduler wants to fill idle cycles, so it will take whatever jobs it has available. Thus an Account is essentially borrowing computing resource time in the future to use now. This will continue to drive down the Account’s Fairshare score, but allow jobs for the Account to still start. Eventually, another Account with a higher Fairshare score will start submitting jobs and that labs jobs will have a higher priority because they have not used their granted Share. Fairshare only recovers as a lab reduces the workload to allow other Accounts to run. The half-life helps to expedite this recovery.

Footnotes

  1. scontrol Manual Page, SchedMD
    https://slurm.schedmd.com/scontrol.html↩︎

  2. sacctmgr Manual Page, SchedMD
    https://slurm.schedmd.com/sacctmgr.html↩︎

  3. Resource Limits, SchedMD Slurm Documentation
    https://slurm.schedmd.com/resource_limits.html↩︎

  4. Fairshare and Job Accounting, Harvard University, FAS Research Computing
    https://docs.rc.fas.harvard.edu/kb/fairshare/↩︎

  5. sshare Manual Pages, SchedMD
    https://slurm.schedmd.com/sshare.html↩︎