Slurm
SLURM (Simple Linux Utility for Resource Management) is a cluster management and job scheduling system used in high-performance computing (HPC) clusters. This manual covers the basic commands for submitting and managing jobs in our SLURM environment.
Basic Commands
- sinfo: Displays cluster nodes status
- sbatch: Run a job in batch mode
- srun: Executes an interactive job or a job step within a script/sbatch job
- squeue: Shows active and queued jobs
- scancel: Cancel a job
- sacct: Shows job history
Accounting
In order to launch jobs with Slurm (sbatch, salloc, srun) you must specify a billing account aka Slurm account with the -A or --account options. You can check which Slurm account your project is associated to like this:
Example output
Common pitfalls
If the name of the Slurm account is too long it could be displayed with a + sign at the end, like this:
[pruebas@srvlogin02 ~]$ sacctmgr show user $USER withassoc format=User,Account
User Account
---------- ----------
pruebas cuentadep+
To ensure that you get the full account name, specify a longer length for the account, e.g. 40 characters (Account%40) until you don’t see a + sign at the end:
sacctmgr show user $USER withassoc format=User,Account%40
sacctmgr show user $USER withassoc format=User,Account%<number of characters>
Multiple Slurm Accounts
If you are part of more than one project you will have one user login with access to multiple Slurm accounts. Let us take Bob for an example: he is a researcher working on 2 different projects, Project-Apples and Project-Oranges. Bob has a single user login (bob) with access to 2 Slurm accounts: project_apples and project_oranges. When Bob works on Project-Apples he should use the corresponding project_apples Slurm account, like this:
When you launch jobs please make sure to assign them to the correct Slurm account.
Ease of use
If you only have one Slurm account we recommend that you add these lines at the end of your ~/.bashrc. Every time you log in this will export an environment variable that contains your account name, so that you can reference it easily. Just replace cuentadepruebas with your full Slurm account name.
Now you can launch jobs like this:
Billing for Exclusive Jobs
If you submit jobs with the --exclusive flag, you will be billed for all node resources (all CPU cores and all GPUs), even if you specify a limited number of cores/GPUs.
Launch Jobs
There are several ways to launch jobs in Slurm. Here we cover the two main approaches.
Interactive Jobs
Interactive mode is useful for tests, development or if you just need to work directly on a compute node.
Example 1. Simple interactive session: Requests default resources and opens a bash shell in the allocated compute node
Example 2. Specify resources: This requests 1 node, 4 processes (cores), 1 hour runtime and 4GB of RAM
For more advanced customization please check the official Slurm documentation for srun.Batch Jobs
Ideal for production workloads or long jobs. We have a script that contains all job instructions.
Example sbatch script (my_job.sbatch). Replace <slurm account> with your Slurm account
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --account=<slurm account>
#SBATCH --output=output_%j.out
#SBATCH --error=error_%j.err
#SBATCH --ntasks=4
#SBATCH --time=02:00:00
#SBATCH --mem=8G
# load modules (optional)
module load python/3.10
# run the program
python my_script.py
For more advanced customization please check the official Slurm documentation for sbatch.
Monitor Jobs
Display all running/queued jobs
Display only your jobs
Check job details
Cancel job
List job history
Additional Documentation
If you need help with something that is not covered in this guide please check the official Slurm Documentation.