Submitting Jobs on Spiedie


Table of Contents

  1. Using srun
    1. Running a program
    2. Interactive Session
  2. Using sbatch
    1. Writing a run script
      1. Directives
      2. Run commands
    2. Job Submission
  3. Customizing resource allocation
    1. Using Spiedie-specific Directvies or Features
    2. Increase memory allocation

      Using srun

Both the srun and sbatch command have similar capabilities and set of parameters. srun is used to submit a job for execution in real time and blocks the terminal. You will not be able to issue other commands while the program executes and must have an open connection while the program runs. It is generally recommended to use srun for quick test runs and for simple workflows.

Running simple program

To queue a quick test program use:

srun -n4 -N1 <program run command>

here we have speciefied to allocate one node (-N1) and four tasks(cores) (-N4), and use defaults for other parameters.

Interactive Session

A common use case for srun is to open an interactive shell session on a cluster.

To start a interaactive session run:

srun -n1 --partition=<partition_name> --pty bash 

This will allow direct access to a single node in a specific partition defined by parition_name once it is available.

For more information on partitions, click here. For a list of all srun parameters, click here.

Using sbatch

The typical way to run a job on a cluster is to write a submission script and run the sbatch command. This allows for greater control and parameter passing. Also sbatch jobs are stored in internal storage awaiting to be executed and the user can log out without affecting the submitted job.

Writing a run script

*** Note: Take a look at example_run_script.sh for a complete run script **

The very first line of the script must be the shebang. Such as,

#!/bin/bash
Directives

We can include sbatch parameter directives after the shebang to specify the parameters we wish to pass to SLURM. sbatch has the same parameters as srun. For a list of all sbatch parameters click here

Some example run script directives would be,

#SBATCH --job-name=name_of_job
#SBATCH --output=output_file_name.txt
#SBATCH --error=error_file_name.log
#SBATCH --ntasks=1

The above directives specify the job name, standard out, standard error file names and the number of tasks. If output and error are not specified, the output and error will be written to slurm_job_name.out by default.

Shell Commands

Finally, we can simply append the shell command to instruct SLURM to the desired program.

program_run_command program 

For example to run a python script we would write:

python python_Script.py
Parallel Programs

To run a MPI program, we would write:

module load OpenMPI 
srun MPI_program.mpi 

It is recommended to use srun inside of the sbatch run script, as srun automatically launches the required processes.

module load OpenMPI loads the MPI runtime library prior to running the program. For more information on modules, click here.

Submitting the run script

To use a run script to submit a job use:

sbatch --partition=partition_name run_script.sh 

Customizing resource allocation

One of the important advantages of using Spiedie is the flexibility of hardware resources available to the user. You can tailor your resource request to suit the needs of your program.

Proper resource allocation also ensures your program runs as fast and efficiently as possibile and does not halt unexepectedly due to hardware resource shortages such as memory. It is also best practice to allocate the right amount of resources for SLURM to work as efficient as possible for the entire cluster.

Using Spiedie-specific Directvies or Features

One way to make sure your programs run properly is to run it on the correct partition, as stated above.

You can also use the feature flag to help properly allocate resources. For more information on partitions, click here.

To make use of a feature, such as the KNL nodes use:

srun -N1 -n40 -C="knl" ./program_to_run

The above command requests 40 cores for the program to run, which is only available on the Knights Landing Nodes. The -C (constraint) flag ensures that the program is only run on KNL nodes.

Increase memory allocation

In order to increase the default memory allocation (2GB), you can use the –mem flag for srun and sbatch to specify the memory needed per node.

For example:

srun -n1 --mem=4G ./program_to_run

The above job requests for 4 GB of memory for the default one node.

You can also request memory per core using the –mem-per-cpu flag.