OverviewTeaching: 10 min
Exercises: 0 minQuestions
What is High Performance Computing?
Why should I be using High Performance Computing for my research?
Don’t I need to know how to program to use High Performance Computing?Objectives
Understand what an HPC is.
Understand how HPC can improve your research efficiency.
Understand the ‘barrier to entry’ is lower than you’d think.
High Performance Computing, known commonly as HPC or HPRC, is the name given to the use of computers with capabilities beyond the scope of standard desktop computers. The computers that qualify as HPC systems are typically seen as being more powerful than other systems; usually because they have more central processing units (CPUs), CPUs that operate at higher speeds, more memory, more storage, and faster connections with other computer systems. HPC systems are used when the resources of more standard computers, such as most dektops and laptops, are not enough to provide results in a timely fashion, if at all.
A Cluster of Computers?
Most frequently, HPC systems are an interconnected network or cluster or many CPU’s. Each CPU can perform many tasks simultaneously. By distributing "jobs" across many CPU’s, hundreds or thousands of tasks can be performed in parallel.
Working with HPC systems involves the use of a "UNIX shell" through a command line interface (CLI). A UNIX shell is a program which passes user instructions to other programs and returns results to the user or other programs. The user can pass the shell instructions interactively via typing them, or programatically by executing scripts. The most popular UNIX shell is Bash, the Bourne Again Shell, which we will be using to interact with the HPC. As HPC users we need an understanding of basic UNIX commands, which will be the initial focus of this lesson.
Although users can work interactively with HPC systems, it is more common (and better practice) to perform most tasks via submitting "jobs". A job consists of commands which request HPC resources and specify a task to be completed. Jobs are prioritised and placed in a queue by a "job scheduler" before being executed.
HPC systems are designed specifically to assist researchers with analytically demanding tasks. Users can perform many small tasks simultaneously, such as projecting a single model onto datasets from different years. This is an example of “serial programming”, where each task is entirely independent from the others. Alternatively, with the assistance of Message Passing Interfaces (MPI’s), users can combine multiple CPU resources to perform a single task more efficiently. This is an example of “parallel programming”, where each CPU is working on a portion of single task, before results are assembled in the correct order by an MPI. The benefits of using HPC systems for research often far outweigh the cost of learning to use a Shell and include:
Consider the case of Nelle Nemo, a marine biologist, who
has returned from a six-month survey of the
North Pacific Gyre,
where she has been sampling gelatinous marine life in the
Great Pacific Garbage Patch.
She collected 10,000 samples in all, and has run each sample through an assay machine
that measures the abundance of 300 different proteins.
The machine’s output for a single sample is
a file with one line for each protein.
For example, the file
NENE01812A.csv looks like this:
72,0.142961371327 265,0.452337146655 279,0.332503761597 25,0.557135549292 207,0.55632965303 107,0.96031076351 124,0.662827329632 193,0.814807235075 32,1.82402616061 99,0.7060230697 . . . . 200,1.13383446523 264,0.552846392611 268,0.0767025200665
Each line consists of two “fields”, separated by a comma (
The first field identifies the protein,
and the second field is a measure of the amount of protein in the sample.
Nelle needs to accomplish tasks such as the following:
stats.pythat she wrote, which produces a graph, and writes some statistics (these go in the directories
Suppose that each file takes about a minute to analyze on her desktop system and consuming all of the resources available, so no email or any other work while the program is running. With 10,000 files in all this means that it will take approximately 166 hours, or just under 7 days to completely process all the files.
Shifting this work to an HPC system will not only stand to speed up the processig of these files but the processing will importantly allow Nelle to continue to use her own computer for other work.
High Performance Computing (HPC) typically involves connecting remotely to a cluster of computers.
HPCs can be used to do work that would either be impossible or much slower in a Desktop environment.
Typical HPC workflows involve submitting “jobs” to a scheduler which queues/priortises the “jobs” of all users.
The standard method of interacting with HPC’s is via a Linux-based Command Line Interface called “The Shell”.