Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is the syntax of UNIX commands?
How do I navigate the file system?
How do I transfer files to HPC?
How do I interact with files on the HPC?
Objectives
Be able to construct basic UNIX commands.
Be able to move files to and from the remote system.
Be able to traverse the HPC file system.
Be able to interact with your files.
Since we interact the UNIX shell via typing commands, there a few key shortcuts we can use to save the amount of typing we must do (and therefore minimise errors associated with human input of commands). In particular, the use of “~” “..” are useful when changing directories or accessing files. Additionally, the use of ‘tab completion’ can help us ensure we are typing commands, directory paths and file paths appropriately.
Common Shortcuts
Home Directory – the tilda symbol “~” represents your home directory, e.g. /homeN/XX/jcXXYYY
Current Directory – a single full-stop “.” represents your current working directory, e.g. same directory returned bypwd
Parent Directory – a double full-stop “..” represents the parent of your current working directory
Root Directory – on it’s own, a single forwardslash “/” represents the root directory, e.g. the bottom of the directory tree
New Directory – between two words, a forwardslash “/” represents a new directory, e.g. in this situation “/homeN/XX/” the directory “XX” is contained within the directory “homeN”
Up & Down Arrows – use these keys to navigate through your command history
Tab Key – performs autocompletion of commands, directory paths & file paths
Escape Character – a single backslash “" is used to ‘escape’ special characters, such as spaces, ampersands & apostrophes
Environment Variables – when you open your shell, a number of pre-saved variables, so called ‘environment’ variables are set for you. You can view them all by executing the commandprintenv
, and access individual variables by prefacing their name with a “$”, e.g. $USER or $HOSTNAME.
Wildcards – the asterisk symbol “*” can be used to represent a string of characters of non-zero length; most commonly you can integrate an ‘*’ into other commands to create search patterns. For example,ls *zip
will display all files and directories that end with “zip”, regardless of the leading characters.
Insert directory image here and annotate to explain ‘dir’ shortcuts.
Here use ‘cd’ and ‘mkdir’ as explainers to demonstrate the use of up/down, tab and escape Explain what ‘zero exit’ status is
Type the command whoami
,
then press the ENTER key to send the command to the shell.
The command’s output is the ID of the current user,
i.e.,
it shows us who the shell thinks we are:
$ whoami
johnsmith
More specifically, when we type whoami
the shell:
whoami
,Unknown commands
bash: cd: TEST: No such file or directory
bash: mycommand: command not found
bash: pwd: -Y: invalid option
Next,
let’s find out where we are by running a command called pwd
(which stands for “print working directory”).
At any moment,
our current working directory
is our current default directory,
i.e.,
the directory that the computer assumes we want to run commands in
unless we explicitly specify something else.
Here,
the computer’s response is /home/johnsmith
,
which is your’s home directory:
$ pwd
/home/johnsmith
echo $HOME
echo $HOSTNAME
Using ls
and using cd
ls
has lots of other options. To find out what they are, we can type:
[remote]$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
Mandatory arguments to long options are mandatory for short options too.
-a, --all do not ignore entries starting with .
-A, --almost-all do not list implied . and ..
--author with -l, print the author of each file
-b, --escape print C-style escapes for nongraphic characters
--block-size=SIZE scale sizes by SIZE before printing them; e.g.,
'--block-size=M' prints sizes in units of
1,048,576 bytes; see SIZE format below
-B, --ignore-backups do not list implied entries ending with ~
-c with -lt: sort by, and show, ctime (time of last
modification of file status information);
with -l: show ctime and sort by name;
otherwise: sort by ctime, newest first
-C list entries by columns
--color[=WHEN] colorize the output; WHEN can be 'always' (default
if omitted), 'auto', or 'never'; more info below
-d, --directory list directories themselves, not their contents
-D, --dired generate output designed for Emacs' dired mode
-f do not sort, enable -aU, disable -ls --color
-F, --classify append indicator (one of */=>@|) to entries
--file-type likewise, except do not append '*'
--format=WORD across -x, commas -m, horizontal -x, long -l,
single-column -1, verbose -l, vertical -C
--full-time like -l --time-style=full-iso
-g like -l, but do not list owner
--group-directories-first
group directories before files;
can be augmented with a --sort option, but any
use of --sort=none (-U) disables grouping
-G, --no-group in a long listing, don't print group names
-h, --human-readable with -l and/or -s, print human readable sizes
(e.g., 1K 234M 2G)
--si likewise, but use powers of 1000 not 1024
-H, --dereference-command-line
follow symbolic links listed on the command line
--dereference-command-line-symlink-to-dir
follow each command line symbolic link
that points to a directory
--hide=PATTERN do not list implied entries matching shell PATTERN
(overridden by -a or -A)
--indicator-style=WORD append indicator with style WORD to entry names:
none (default), slash (-p),
file-type (--file-type), classify (-F)
-i, --inode print the index number of each file
-I, --ignore=PATTERN do not list implied entries matching shell PATTERN
-k, --kibibytes default to 1024-byte blocks for disk usage
-l use a long listing format
....
The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
Using color to distinguish file types is disabled both by default and
with --color=never. With --color=auto, ls emits color codes only when
standard output is connected to a terminal. The LS_COLORS environment
variable can change the settings. Use the dircolors command to set it.
Exit status:
0 if OK,
1 if minor problems (e.g., cannot access subdirectory),
2 if serious trouble (e.g., cannot access command-line argument).
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Full documentation at: <http://www.gnu.org/software/coreutils/ls>
or available locally via: info '(coreutils) ls invocation'
Many bash commands, and programs that people have written that can be
run from within bash, support a --help
flag to display more
information on how to use the commands or programs.
Parameters vs. Arguments
According to Wikipedia, the terms argument and parameter mean slightly different things. In practice, however, most people use them interchangeably to refer to the input term(s) given to a command. Consider the example below:
[remote]$ ls -lh .
ls
is the command,-lh
are the flags (also called options), andDocuments
is the argument.
Other Hidden Files
In addition to the hidden directories
..
and.
, you may also see a file called.bash_profile
. This file usually contains shell configuration settings. You may also see other files and directories beginning with.
. These are usually files and directories that are used to configure different programs on your computer. The prefix.
is used to prevent these configuration files from cluttering the terminal when a standardls
command is used.
To download our data files onto the HPC today, we are going to use two commands
To download files from the internet,
the absolute best tool is wget
.
The syntax is relatively straightforwards: wget https://some/link/to/a/file.tar.gz
[remote]$ wget https://github.com/amandamiotto/INTRO_HPC/raw/gh-pages/files/hpcCarpentry.zip
--2018-05-23 13:48:34-- https://github.com/amandamiotto/INTRO_HPC/raw/gh-pages/files/hpcCarpentry.zip
Resolving github.com... 192.30.255.113, 192.30.255.112
Connecting to github.com|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/amandamiotto/INTRO_HPC/gh-pages/files/hpcCarpentry.zip [following]
--2018-05-23 13:48:35-- https://raw.githubusercontent.com/amandamiotto/INTRO_HPC/gh-pages/files/hpcCarpentry.zip
Resolving raw.githubusercontent.com... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14419 (14K) [application/zip]
Saving to: `hpcCarpentry.zip.1'
100%[======================================>] 14,419 --.-K/s in 0.001s
2018-05-23 13:48:35 (13.5 MB/s) - `hpcCarpentry.zip.1' saved [14419/14419]
[remote]$ unzip hpcCarpentry.zip
Archive: hpcCarpentry.zip
creating: hpcCarpentry/
creating: hpcCarpentry/plots/
inflating: hpcCarpentry/plots/NENE01812A.pdf
creating: hpcCarpentry/results/
inflating: hpcCarpentry/results/max.txt
creating: hpcCarpentry/data/
inflating: hpcCarpentry/data/NENE01812A.csv
inflating: hpcCarpentry/data/NENE01843A.csv
inflating: hpcCarpentry/stats.py
inflating: test.pbs
Working with compressed files, using unzip and gunzip
The file we just downloaded is unzipped (has the
.zip
extension). You can uncompress it withunzip filename.zip
.File decompression reference:
- .tar.gz -
tar -xzvf archive-name.tar.gz
- .tar.bz2 -
tar -xjvf archive-name.tar.bz2
- .zip -
unzip archive-name.zip
- .rar -
unrar archive-name.rar
- .7z -
7z x archive-name.7z
However, sometimes we will want to compress files ourselves to make file transfers easier. The larger the file, the longer it will take to transfer. Moreover, we can compress a whole bunch of little files into one big file to make it easier on us (no one likes transferring 70000) little files!
The two compression commands we’ll probably want to remember are the following:
- Compress a single file with Gzip -
gzip filename
- Compress a lot of files/folders with Gzip -
tar -czvf archive-name.tar.gz folder1 file2 folder3 etc
Now we have some files to explore
Orthogonality
The special names
.
and..
don’t belong tocd
; they are interpreted the same way by every program. For example, if we are in/home/johnsmith/hpcCarpentry
, the commandls ..
will give us a listing of/home/johnsmith/
. When the meanings of the parts are the same no matter how they’re combined, programmers say they are orthogonal: Orthogonal systems tend to be easier for people to learn because there are fewer special cases and exceptions to keep track of.
These then, are the basic commands for navigating the filesystem on your computer:
pwd
, ls
and cd
. Let’s explore some variations on those commands. What happens
if you type cd
on its own, without giving
a directory?
[remote]$ cd
How can you check what happened? pwd
gives us the answer!
[remote]$ pwd
/home/johnsmith
It turns out that cd
without an argument will return you to your home directory,
which is great if you’ve gotten lost in your own filesystem.
Let’s try returning to the data
directory from before. Last time, we used
three commands, but we can actually string together the list of directories
to move to data
in one step:
[remote]$ cd hpcCarpentry
Check that we’ve moved to the right place by running pwd
and ls -F
If we want to move up one level from the data directory, we could use cd ..
. But
there is another way to move to any directory, regardless of your
current location.
So far, when specifying directory names, or even a directory path (as above),
we have been using relative paths. When you use a relative path with a command
like ls
or cd
, it tries to find that location from where we are,
rather than from the root of the file system.
However, it is possible to specify the absolute path to a directory by
including its entire path from the root directory, which is indicated by a
leading slash. The leading /
tells the computer to follow the path from
the root of the file system, so it always refers to exactly one directory,
no matter where we are when we run the command.
This allows us to move to our data-shell
directory from anywhere on
the filesystem (including from inside data
). To find the absolute path
we’re looking for, we can use pwd
and then extract the piece we need
to move to data-shell
.
[remote]$ pwd
/home/johnsmith/hpcCarpentry
[remote]$ cd /home/johnsmith/hpcCarpentry/data
Run pwd
and ls -F
to ensure that we’re in the directory we expect.
Knowing just this much about files and directories,
Nelle is ready to organize the files that the protein assay machine will create.
First,
she creates a directory called north-pacific-gyre
(to remind herself where the data came from).
Inside that,
she creates a directory called 2012-07-03
,
which is the date she started processing the samples.
She used to use names like conference-paper
and revised-results
,
but she found them hard to understand after a couple of years.
(The final straw was when she found herself creating
a directory called revised-revised-results-3
.)
Absolute vs Relative Paths
Starting from
/home/amanda/data/
, which of the following commands could Amanda use to navigate to her home directory, which is/Users/amanda
?
cd .
cd /
cd /home/amanda
cd ../..
cd ~
cd home
cd ~/data/..
cd
cd ..
Solution
- No:
.
stands for the current directory.- No:
/
stands for the root directory.- No: Amanda’s home directory is
/Users/amanda
.- No: this goes up two levels, i.e. ends in
/Users
.- Yes:
~
stands for the user’s home directory, in this case/Users/amanda
.- No: this would navigate into a directory
home
in the current directory if it exists.- Yes: unnecessarily complicated, but correct.
- Yes: shortcut to go back to the user’s home directory.
- Yes: goes up one level.
ls
Reading ComprehensionAssuming a directory structure as in the above Figure (File System for Challenge Questions), if
pwd
displays/Users/backup
, and-r
tellsls
to display things in reverse order, what command will display:pnas_sub/ pnas_final/ original/
ls pwd
ls -r -F
ls -r -F /Users/backup
- Either #2 or #3 above, but not #1.
Solution
- No:
pwd
is not the name of a directory.- Yes:
ls
without directory argument lists files and directories in the current directory.- Yes: uses the absolute path explicitly.
- Correct: see explanations above.
Exploring More
ls
ArgumentsWhat does the command
ls
do when used with the-l
and-h
arguments?Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless.
Solution
The
-l
arguments makesls
use a long listing format, showing not only the file/directory names but also additional information such as the file size and the time of its last modification. The-h
argument makes the file size “human readable”, i.e. display something like5.3K
instead of5369
.
Listing Recursively and By Time
The command
ls -R
lists the contents of directories recursively, i.e., lists their sub-directories, sub-sub-directories, and so on in alphabetical order at each level. The commandls -t
lists things by time of last change, with most recently changed files or directories first. In what order doesls -R -t
display things? Hint:ls -l
uses a long listing format to view timestamps.Solution
The directories are listed alphabetical at each level, the files/directories in each directory are sorted by time of last change.
It is often necessary to move data from your local computer to the remote system and vice versa. There are many ways to do this and we will look at two here: scp
and sftp
.
scp
from your local computer to the remote systemThe most basic command line tool for moving files around is secure copy or scp
.
scp
behaves similarily to ssh
but with one additional input, the name of the file to be copied. If we were in the shell on our local computer, the file we wanted to move was in our current directory, named “globus.tgz”, and Nelle wanted to move it to her home directory on cedar.computecanada.ca then the command would be
[local]$ scp fileToMove nelle@cedar.computecanada.ca:
It should be expected that a password will be asked for and you should be prepared to provide it.
Once the transfer is complete you should be able to use ssh
to login to the remote system and see your file in your home directory.
[remote]$ ls
...
fileToMove
...
Working with compressed files, using unzip and gunzip
The file we just downloaded is gzipped (has the
.gz
extension). You can uncompress it withgunzip filename.gz
.File decompression reference:
- .tar.gz -
tar -xzvf archive-name.tar.gz
- .tar.bz2 -
tar -xjvf archive-name.tar.bz2
- .zip -
unzip archive-name.zip
- .rar -
unrar archive-name.rar
- .7z -
7z x archive-name.7z
However, sometimes we will want to compress files ourselves to make file transfers easier. The larger the file, the longer it will take to transfer. Moreover, we can compress a whole bunch of little files into one big file to make it easier on us (no one likes transferring 70000) little files!
The two compression commands we’ll probably want to remember are the following:
- Compress a single file with Gzip -
gzip filename
- Compress a lot of files/folders with Gzip -
tar -czvf archive-name.tar.gz folder1 file2 folder3 etc
Key Points
scp
(The Secure Copy Program) is a standard way to securely transfer data to remote HPC systems.File ownership is an important component of a shared computing space and can be controlled with
chgrp
andchown
.Scripts are mostly just lists of commands from the command line in the order they are to be performed.