Introduction to the Ceres High-Performance Computing System Environment
This page was modified from Session 2 of the SCINet Geospatial Workshop 2020. Ceres HPC Resources have been used to publish scientific papers:
Learning Goals:
- understand what an HPC system is and when to use one
- introduction to the USDA-ARS Ceres HPC system on SCINet
- access the Ceres with Secure Shell at the command line and with the JupyterHub web interface
- create a simple Jupyter notebook
- execute basic linux commands
- run an interactive computing session on Ceres
- write and run a SLURM batch script on Ceres
Contents
High-Performance Computing (HPC) System Basics
Ceres HPC Login with Secure Shell (SSH)
Interactive Computing vs Batch Computing
Submitting a Compute Job with a SLURM Batch Script
The SCINet Website
The SCINet Website is the source of much of the material presented in this tutorial. Use the SCINet website to request SCINet accounts, access SCINet/HPC user guides, get computing help or other support, and find out about upcoming and previous computational training events.
High-Performance Computing (HPC) System Basics
What is an HPC System?
A High Performance Computing (HPC) system provides a computational environment that can processes data and perform complex computations at high speeds. Generally HPC systems consists of 3 components:
-
Compute nodes (servers) that can provide a consistent environment across the system (similiar OS, software, etc…).
-
Data storage, generally a parallel file system, that supports high I/O throughput.
-
Highspeed network to allow for effecient communication and data transfer across compute nodes.
For specific details about the Ceres system see: https://scinet.usda.gov/guide/ceres/#technical-overview
Why use an HPC System?
HPC systems can provide the compute infrastructure that allow researchers to improve or make possible computationally intensive analyses/workflows. However, developing analyses to run on HPC systems involve a non-trivial amount of overhead. Therefore, you should first evaluate if SCINet is an appropriate avenue for your research. Typically, analyses that are well-suited for SCINet are:
- CPU intensive workloads
- high memory workloads
Additional considerations are:
- Are my analyses already optimized?
- Will I need to parallelize my analyses (typical for CPU intensive workloads)?
- Will I require more than a single node of compute power (ie. distributed computing)?
HPC Terminology
SSH - Secure Shell is a network protocol that allows remote access over un-secure networks. We will use SSH to access the Ceres login node.
shell - a shell is what provides you an interface to the unix operating system. It’s where we type commands and run programs. The default shell on Ceres is called bash.
login/compute node - Nodes refer to the individual servers that compose an HPC system. The login node is the node/server that users are sent to when they SSH to the system. The compute nodes (typically the bulk of the HPC nodes) are designed for running the computationally intensive workloads. There can be many different types of compute nodes within a HPC system (i.e. standard, high memory, gpu, etc…).
core/logical core - Cores (or CPU) are the computational processing component within a computer. Most modern cores have hyperthreading, which allow a single core to process two tasks simultaneously. Therefore, logical cores refer to the number apparent (not physical) cores in a system. For most modern systems, a single core will equate to two logical cores.
batch/interactive computing - Batch computing referes to workflows that require no user interaction once they are underway/submitted. Interactive computing typically involves processing commands/transactions one at a time and requires input from the user.
SLURM/job scheduler - HPC systems generally have software to allocate resources (nodes, cores, memory, etc…) in a fair and consistent manner to the users. These systems are generally refered to as job schedulers. A common job scheduling software used in HPC systems is SLURM.
USDA-ARS HPC System Details
The Computer Systems page of the SCINet website gives a brief summary of the USDA-ARS HPC systems.
The Ceres HPC System
The Ceres User Manual and Ceres Quick Start Guide contain most of the information you could want to know about the Ceres HPC.
System Configuration
The operating system running on Ceres is CentOS and the job scheduler is SLURM. See System Configuration in the Ceres User Manual for more.
Nodes
When you SSH into Ceres you are connecting to the login node. The login node should be used for navigating your directories, organizing your files, and running very minor scripts. All computing on Ceres should be done on compute nodes. DON’T RUN YOUR COMPUTE SCRIPTS OR INSTALL SOFTWARE ON THE LOGIN NODE AS IT SLOWS THE NODE DOWN FOR EVERYONE.
When you use JupyterHub to login to Ceres you are placed on a compute node, not a login node.
There are 5000+ compute cores (10000 logical cores), 65 TB of total RAM, 250TB of total local storage, and 4.7 PB of shared storage available on the Ceres HPC. See the Technical Overview in the Ceres User Manual for more.
Partitions/Queues
All Ceres users have access to the “community partitions”- short, medium, long, long60, mem, longmem, mem768, debug. Each different partition has different capabilities (e.g. regular memory versus high memory nodes) and resource restrictions (e.g. time limits on jobs). The short partition is the default. See Partitions or Queues in the Ceres User Manual for more.
Some research groups have purchased their own nodes which are placed on “priority partitions” that the research group has first priority to use. Other Ceres users have lower priority access to many of these partitions (the “-low” and “scavenger” partitions). However, the “-low” partitions have a compute time limit of 2 hours and while you can run jobs for much longer on the “scavenger” partitions, you run the risk of having your job killed at any moment if a higher priority user needs access to the nodes.
Directory structure and data storage
There are 3 places to store your codes and data persistently on Ceres:
- home directory, /home/firstname.lastname/ - Everyone has a small private home directory. You are automatically put in this directory when you log in. Home directories are backed up. You shouldn’t run computations from your home directory.
- project directory, /lustre/project/your_project_name/ - You must apply for each project directory. This is where you run scripts from and where your datasets, codes, and results should live. Project directories are not backed up.
- keep directory, /KEEP/your_project_name/ - Each of your projects will also have folder in the /KEEP directory. This folder is backed up nightly. Copy your most important project files to your /KEEP folder to ensure they are backed up. Compute jobs shouldn’t be run from /KEEP directories.
Temporarily share files locally with other users:
- project shared files directory, /lustre/project/shared_files/ - Everyone has access to the shared files folder in the project directory. You can share files with other users by creating a folder inside this directory a copying your files there, although there is a 5GB limit. This is not a permanent storage location for your files.
See more about directories and data storage in Data Storage in the Quick Start Guide and in Quotas in Home and Project Directories in the Ceres User Manual.
User Compute Limitations
There is currently no quota system for keeping track of how many jobs you run or how much compute time you’ve used on Ceres like there is on University and National Lab HPC systems. However, there is a fair share policy which means super users who run many jobs and take up a lot of compute time are downgraded in priority compared to other users.
The individual user compute limitations are:
- 400 cores per user (across all your running jobs)
- 1512 GB memory per user (across all your running jobs)
- 100 jobs per user running simultaneously
Software on Ceres
There is plenty of software on Ceres that you can access through the module system. See the Software Overview for more.
Users can also install their own software using the Anaconda package and environment management software module. See the Conda Guide from more.
If you don’t want to use Conda, Ceres is also set up for R and Perl users to download packages. See the Guide to Installing R, Perl, and Python Packages for more (although we recommend that Python coders use Conda).
Lastly, if none of the above methods of accessing software work for your particular software needs, you can request that the SCINet Virtual Research Support Core install software on the system for you. This is the method of last resort though because it takes a few weeks and requires an agency-level security review. See the Request Software page for more.
Getting data on/off
There are multiple ways of getting data on and off of the Ceres HPC system. See the SCINet File Transfer Guide for more.
- for data on the web, download directly to Ceres using tools like wget
- for data on your local machine the recommended method is Globus
- for large local data, you can:
- utilize the I2 connection at your ARS location or
- ship to VRSC and they will upload for you
User support from the Virtual Research Support Core (VRSC)
The VRSC is comprised of Iowa State University and ARS staff who manage the Ceres HPC system and provide support to users. See more on the SCINet VRSC Support page.
Ceres HPC Best Practices
- nothing serious should be run on the login node
- run compute jobs from project directories
- install software on a compute node (salloc)
- install software (including Conda environments) in your home or /KEEP directories
- use node local scratch space for faster i/o with large datasets (stage your data to /local/scratch, $TMPDIR)
- use your /KEEP directory to ensure important data/codes are backed up
- for short heavy compute jobs (less than 2hrs) go for the brief-low partition (more cores per node and more memory)
- acknowledge SCINet in your pubs!
Other SCINet HPC Systems
There are two other HPC Systems coming to SCINet soon. Summaries of the systems will be posted to the SCINet website computing systems page.
Ceres HPC Login with Secure Shell (SSH)
First, let’s login to our SCINet (Ceres) accounts with SSH. You should have already successfully logged in this way at least once before today by following the instructions sent to you when your SCINet account was approved. The Quick Start Guide has instructions for SSH’ing to Ceres from Windows, Mac, and Linux Machines.
If you haven’t yet set up a config file for SSH’ing to Ceres (we won’t cover it but instructions are at the Quick Start Guide link above) then:
ssh -o TCPKeepAlive=yes -o ServerAliveInterval=20 -o ServerAliveCountMax=30 firstname.lastname@ceres.scinet.usda.gov
The keep alives are especially important for rural/satellite internet connections so that instantaneous breaks in service won’t terminate your connection to the HPC.
If you’ve set up your config file you can simply:
ssh whatever-you-named-the-host-in-your-config
If you are not on an ARS VPN, you will be asked for a 6-digit Google Authenticator code. See the multi-factor authentication guide for help. After entering the Google code, you will be asked to enter your password.
If you are on an ARS VPN, you will skip the Google authentication and be asked only for your password.
After a successful login you will see a list of all your quotas and used space.
If you can’t successfully login to your account, contact scinet_vrsc@usda.gov for assistance.
To sign out of Ceres just close your terminal or type exit
.
loading software from the module system
view available software on Ceres with:
module avail
load software (in this case conda) from the module system:
module load miniconda
see what software modules you have loaded with:
module list
unload software from the module system with:
module unload miniconda
There are also some SLURM-specific commands that are very useful
See the bioinformatics workbook for more than what we cover here.
sinfo - see the status of all the nodes
salloc - “SLURM allocate”. Move onto a compute node in an interactive session. More in the next section.
squeue - view information on compute jobs that are running. More in the next section.
scancel - terminate a compute job that is running
sbatch - submit a batch script to run a compute job. More in the next section
Interactive Computing vs Batch Computing
Many geospatial researchers spend much of their time writing and debugging new scripts for each project they work on. This differs from other research communities who may be able to re-use a single script frequently for multiple projects or who can use a standard analysis process/workflow on many different input datasets.
A geospatial researcher may write and debug their scripts using small to medium size data until the script is functional and then modify the script to process big data only once. For this reason, geospatial researchers may more often use interactive computing sessions on an HPC.
Interactive Computing on Ceres
Interactive computing essentially means that you are working directly on a compute node as opposed to using the SLURM job scheduler to submit compute jobs in batches. JupyterHub allows us easy access to interactive computing on the Ceres HPC. Just login to Ceres through JupyerHub and start coding in a Jupyter notebook- you will automatically be placed in an interactive compute session.
But let’s learn how to open an interactive computing session from the command line. This is important when you log in with SSH or if you’ve logged in with JupyterHub but want to compute or install software on a different node than where your JupyterLab session is running.
Step 1: Open a terminal on Ceres
Since we are already in JupyterLab, use the launcher to open a terminal. We could also use Windows Powershell or Mac/Linux Terminal to SSH onto the Ceres login node instead.
Step 2: Start an Interactive Compute Session
The simplest way to start an interactive session is:
salloc
Issuing this command requests the SLURM job scheduler to allocate you a single hyper-threaded core (2 logical cores) with 6200 MB of allocated memory on one of the compute nodes. The session will last for 2 days, but will timeout after 1.5 hours of inactivity.
View your runnning compute sessions/jobs with:
squeue -u firstname.lastname
To exit the interactive session:
exit
For more control over your interactive session you can use the srun
command instead of salloc
using the format:
srun --pty -p queuename -t hh:mm:ss -n cores -N nodes /bin/bash -l
for example:
srun --pty -p short -t 03:00:00 -n 4 -N 1 /bin/bash -l
will request the SLURM job scheduler to allocate you two hyper-threaded cores (4 logical cores) with 3100x4 MB of allocated memory on one of the compute nodes in the short partition. The session will last for 3 hours, but will also timeout due to inactivity after 1.5 hours.
Batch Computing on Ceres
Batch computing involves writing and executing a batch script that the SLURM job scheduler will manage. This mode of computing is good for “set it and forget it” compute jobs such as running a climate model, executing a single script multiple times in a row, or executing a more complicated but fully functional workflow that you know you don’t have to debug. We’ll cover how to write and execute a batch script next.
Submitting a Compute Job with a SLURM Batch Script
Let’s practice by submitting a batch script.
First create simple python program:
cat > hello-world.py
print('Hello, world!')
then type Ctl-d to exit.
View the file you just created:
cat hello-world.py
a serial job that runs a python script one time
Now create your batch script with nano or other text editor:
nano my-first-batch-script.sbatch
In the nano editor type:
#!/bin/bash
#SBATCH --job-name=HelloWorld
#SBATCH -p short #name of the partition (queue) you are submitting to
#SBATCH -N 1 #number of nodes in this job
#SBATCH -n 2 #number of cores/tasks in this job
#SBATCH -t 00:00:30 #time allocated for this job hours:mins:seconds
#SBATCH -o "stdout.%j.%N" # standard output, %j adds job number to output file name and %N adds the node name
#SBATCH -e "stderr.%j.%N" #optional, prints our standard error
module load python_3
echo "you are running python"
python3 --version
python3 hello-world.py
Exit the nano editor with Ctl+O, enter, Ctl+X.
Submit your batch script with:
sbatch my-first-batch-script.sbatch
Check out the output of your compute job. It’s in the stdout file:
ls
cat stdout.######.ceres##-compute-##
Note: there are a ton of other SBATCH options you could add to your script. For example, you could receive an email when your job has completed (see the Ceres User Manual) and lots more (see the SLURM sbatch doc).
Also Note: this is a serial job, meaning that it will run on a single compute core. The compute likely won’t be any faster than if you ran this type of job on your laptop. To run your hello-world code in parallel from a batch script (multiple times simulataneously on different cores) you would use openMP or MPI (see the Ceres User Manual) and your code would have to be in C or Fortran (not Python). For Python coders, there are much easier ways to run in parallel (using interactive mode as opposed to batch scripting), which we will cover in Session 3: Intro to Python and Dask.
a serial job that runs a python script five times
Let’s now run a script that will execute the same python code 5 times in a row back to back.
First, delete all your stdout and stderr files so it’s easier to see which new files have been generated:
rm std*
Now modify your sbatch script using nano to look like this:
#!/bin/bash
#SBATCH --job-name=HelloWorld
#SBATCH -p short #name of the partition (queue) you are submitting to
#SBATCH -N 1 #number of nodes in this job
#SBATCH -n 2 #number of cores/tasks in this job
#SBATCH -t 00:00:30 #time allocated for this job hours:mins:seconds
#SBATCH -o "stdout.%j.%N" # standard output, %j adds job number to output file name and %N adds the node name
#SBATCH -e "stderr.%j.%N" #optional, prints our standard error
module load python_3
echo "you are running python"
python3 --version
for i {1..5}
do
python3 hello-world.py
done
Look at a stdout file and you will see the python code ran 5 times.
Go ahead and delete your stdout and stderr files again.
a parallel job that runs a python script 10 times simultaneously on different cores
Let’s now run a script that will execute the same python code 10 times simulataneously. Modify your sbatch script to look like this:
#!/bin/bash
#SBATCH --job-name=HelloWorld
#SBATCH -p short #name of the partition (queue) you are submitting to
#SBATCH -N 1 #number of nodes in this job
#SBATCH -n 10 #number of cores/tasks in this job
#SBATCH --ntasks-per-core=1
#SBATCH -t 00:00:30 #time allocated for this job hours:mins:seconds
#SBATCH -o "stdout.%j.%N" # standard output, %j adds job number to output file name and %N adds the node name
#SBATCH -e "stderr.%j.%N" #optional, prints our standard error
#SBATCH --array=1-10 #job array index values
module load python_3
echo "you are running python"
python3 --version
python3 hello-world.py
You should see a stdout and stderr file for each job in the array of jobs that were run (10) and and the jobs should have run on a variety of different nodes.