Glossary

Modified

November 14, 2023

Accounting Database

Users are associated to accounts, which define the available resources accessible for job execution. On behalf of an account user activity on the cluster is recorded, in particular resource allocations. The records in the accounting database are used to calculate a fair-share score, which is used to assign priorities to all jobs.

Batch Script

A batch script is a small program typically written in a scripting language like Bash. It is used as command line argument to the sbatch command of the Slurm Workload Manager. User should use it to specify Slurm configuration options using meta-commands and to start a defined application program. Furthermore, a batch script is usually used to track the application programs behaviour and to read and log information required for debugging, cf. first-issue in the example section.

Cluster Controller

The cluster controller is the central entity interfacing all resources to users and operators. It runs the job scheduler and handles all communication within the cluster.

Compute Cluster

A compute cluster is a collection of compute nodes integrated into a single system by a Workload Management System. The system accepts work requests as “jobs” from users and puts these jobs into a pending area “queue”. Queued jobs will be scheduled to the available resources according to sharing policies and efficiency constrains. The section slurm architecture provides a very basic overview for a workload management system.

Compute Job

A job is the basic unit of work users send to workload management system. A job includes the application a users wants to execute as well as the definition of a resource allocation. An application is an executable program, a set of commands or a script.

Compute Node

The individual computers executing the user jobs. The typical configuration of these nodes are detailed in the hardware section. The terms execution node and batch node are used as synonyms.

Scheduler Queue

As soon as free resources become available a “matching” job from the queue is selected and send to the compute nodes. Jobs are sorted by priority order in the queue by the resource scheduler components of the workload management system. Priorities are calculated according to a fair-share algorithm.

Submit Node

Specific compute nodes which allow login from users. They are build to enable users to interact with the compute cluster and to request resources from the cluster controller. A user submits applications as compute job from these nodes and can access all storage systems.