OpenMPI

Published

November 10, 2023

Modified

April 18, 2024

Abstract

This section will introduce you to the basics of launching a program utilizing the MPI (Message Passing Interface) parallel computing library. The example uses the Open MPI implementation of the MPI Standard.

Prerequisites

Use the spack command to load OpenMPI into your environment:

# list available OpenMPI versions
spack find mpi

# load a specific version
spack load openmpi@5.0.1%gcc@12.3.0 arch=linux-debian11-x86_64

# check if the compiler is available
mpicc --version

Process Manager

Slurm supports the Process Management Interface (PMI), specifically PMIx 1. PMI provides a common abstraction to HPC process managers, to decouple process management from the underlying process manager. The process manager has following functions:

  • Handle start/stop of processes
  • Aggregation of I/O channels std{in|out|err}
  • Environment and signal propagation
  • Central coordination point of parallel processes

PMI is used by most MPI libraries to interact with any compliant system e.g. Slurm to fulfill following roles:

  • Requests the PM to start processes on the nodes of a parallel machine
  • Propagate startup data with PMI out-of-band communication
  • Processes use out-of-band communication to setup MPI communication

Launch Modes

Slurm supports multiple modes to launch MPI process:

  • Slurm launches tasks, and PMI initializes communication (default)
  • Slurm allocates resources, mpirun launches tasks (using Slurm)
  • Slurm allocates resources, mpirun launches tasks with a mechanism outside the control of Slurm (no CPU task binding, nor task accounting)

List the supported MPI launch modes with:

srun --mpi=list

The launch mode can be selected using an environment variable or by command-line option for the srun and sbatch commands:

# set the launch mode with an env. variable
SLURM_MPI_TYPE=$mode
# set launch mode with an option
{srun|sbatch} --mpi=$mode ... 

Show the default mode for launching MPI applications by printing the Slurm system configuration with scontrol:

» scontrol show config | grep MpiDefault
MpiDefault              = pmix_v2

Example Program

Following C code exemplifies a basic “Hello World” MPI program:

// mpi-hello.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include "mpi.h"

int main(int argc,char ** argv )
{
    int rank;
    int size;
    char hostname[1024];
    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );
    pid_t pid = getpid();
    gethostname(hostname, 1024);
    printf( "Hello world %s.%d [%d/%d]\n",hostname , pid, rank, size);
    MPI_Finalize();
    return 0;
}

Compile the program using mpicc:

mpicc $LUSTRE_HOME/src/mpi-hello.c -o $LUSTRE_HOME/bin/mpi-hello 

Execute the program with mpiexec specifying the number of parallel processes using the option -n <numproc>:

# run the program with four parallel process
mpiexec -n 4 $LUSTRE_HOME/bin/mpi-hello

Once you have verified that the program works as expected, you can continue to launch it on the resources of the compute cluster.

Slurm Job

Among many options to specific the parallelism of a program, Slurm supports the following two:

Option Description
--nodes=<number> Number of nodes to allocate
--ntasks-per-node=<ntasks> Number of tasks per node

Following example request resources using the salloc command, specifying a requirement of 6 nodes with 60 processes per node:

>>> salloc --partition=debug --chdir=/tmp --nodes=2 --ntasks-per-node=2 bash
salloc: Pending job allocation 502707
salloc: job 502707 queued and waiting for resources
salloc: job 502707 has been allocated resources
salloc: Granted job allocation 502707

# print environment variables for the allocation
>>> env | egrep 'SLURM_(NTASKS|NNODES|NPROCS|NODELIST|DIST)'
SLURM_NTASKS=4
SLURM_NODELIST=lxbk[0724-0725]
SLURM_NPROCS=4
SLURM_NNODES=2
SLURM_NTASKS_PER_NODE=2

Once the system has granted the resources, start your MPI program using the srun command:

# run your MPI program
>>> srun $LUSTRE_HOME/bin/mpi-hello
Start Singularity container /cvmfs/vae.gsi.de/vae24/containers/vae24-user_container_20240418T1037.sif
Start Singularity container /cvmfs/vae.gsi.de/vae24/containers/vae24-user_container_20240418T1037.sif
Start Singularity container /cvmfs/vae.gsi.de/vae24/containers/vae24-user_container_20240418T1037.sif
Start Singularity container /cvmfs/vae.gsi.de/vae24/containers/vae24-user_container_20240418T1037.sif
Hello world lxbk0724.40528 [1/4]
Hello world lxbk0725.164848 [2/4]
Hello world lxbk0724.40529 [0/4]
Hello world lxbk0725.164849 [3/4]

# relinquish the job allocation
>>> exit

Footnotes

  1. Process Management Interface - Exascale, PMIx Community
    https://pmix.org/↩︎