This tutorial is designed to get you up and running on Spiedie as quickly as possible. You should be able to run simple programs on the Spiedie cluster by the end of this guide.
Things covered in this guide:
Requirements to complete the guide:
- A Spiedie user account and password
- Computer connected to the internet
- SSL VPN (Pulse) (if not connected to the school internet)
- Familairty with programming and basic command line experience
Log in
After acquiring your username and password, you should be able to log in to the Spiedie cluster following the steps listed here.
You will be logged in at your home directory. You can add modules and quickly prototype the code you with to run here. You should note run any lengthy programs in the log in node as it may cause disruptions for other users. You should only run programs using srun and sbatch, which will be explained these tutorials.
Once logged in you can create a new directory by running
mkdir quick_start
This will create a new directory called quick_start in your Home directory. You can quickly verify by typing
ls
Which will list all the files and directories in your Home directory.
Transfer files to Cluster
Before we go further, download the python script we will be running. There are various ways to transfer data to and from cluster.
In this example we will be using SCP to transfer the data from our local machine to the quick_start directory located on the Spiedie server.
On your terminal or command prompt go to the directory of the downloaded python script and run
scp quick_start.py username@spiedie.binghamton.edu:quick_start/
replace username with your username and fill in your password when prompted. This should place the quick_start.py file on the quick_start directory on Spiedie.
To verify the transfer, go back to your logged in session and run
cd quick_start
ls
You should see the python file listed on your screen.
Run the program
Since this is a small prototype program we believe will finish quickly, we will run it interactively using srun. For larger programs we usually need to write a batch script and use sbatch. Running more complex jobs and srun and sbatch are covered here and here
Go back to your logged in Spiedie session and make sure you are on the directory of the qiuck_start.py.
We will interactively run the quick_start.py on Spiedie by using the srun command. srun will submit our job to the SLURM queue to be allocated, and the output will be printed to the terminal.
You should familiarize with the different partitions and compute capabilities of Spiedie, as different partitions may be more well suited depending on the task.
You can get a quick overview of the cluster by running
sinfo
We’ll be using the quick partition as it is used mostly for rapid prototyping. You’ll notice quick has a time cutoff of 10 minutes, so jobs are atuomatically cleared after 10 minutes. This makes sure we don’t have to wait too long for allocation.
Before we ask for allocation on the cluster, we can check how busy the system is by running
squeue
This will list all jobs currently running and waiting to be allocated. You can learn more about how SLURM priorities work here
Let’s run the quick_start.py program. Run
srun --partition=quick python3 quick_start.py 1>quick_start.log 2>quick_start_error.log &
This will send your job to the SLURM daemon to be allocated and then run on a quick partition node. We have chosen to use default parameters for srun such number of nodes (1) and number of tasks (1) and number of CPU’s (1).
We have directed the outputs of the program to the quick_start.log and quick_start_error.log and retained the control of the terminal.
You can see a full explanation of the shell command here
Checking job status
Since we retained the control of the terminal we can check the allocation status of the job by running
squeue -u username
Replace username with your user name and you should see your job listed as either pending or active and on which node it has been allocated to.
You can also check your account status using
sacct
Click here for a list of other basic SLURM commands.
Once the program has finished running, you can view the output by running
cat quick_start.log
And check for any error messages using
cat quick_start_error.log