Automation & scripting in bash for absolute beginners
Background
What are Unix shells?
A Unix shell is a command line interpreter: the user enters commands as text, either interactively in the command line or in a script, and the shell passes them to the operating system.
Bash
Bash (Bourne Again SHell), released in 1989, is part of the GNU Project and is the default Unix shell on many systems (MacOS recently changed its default to zsh).
Other shells
Prior to Bash, the default was the Bourne shell (sh).
A new and popular shell (backward compatible with Bash) is zsh. It extends Bash’s capabilities.
Another shell in the same family is the KornShell (ksh).
All these shells are quite similar. The C shell (csh) however was modeled on the C programming language.
Bash is the most common shell and the one which makes the most sense to learn as a first Unix shell.
Why use a shell?
While automating GUI operations is really difficult, it is easy to rerun a script (a file with a number of commands). Unix shells thus allow the creation of reproducible workflows and the automation of repetitive tasks.
They are powerful to launch tools, modify files, search text, or combine commands.
They also allow to work on remote machines and HPC systems.
How we will use Bash today
Bash is a Unix shell. You thus need a Unix or Unix-like operating system.
We will connect to a remote HPC system via SSH (secure shell). HPC systems always run Linux.
Those on Linux or MacOS can alternatively use Bash directly on their machine. On MacOS, the default is now zsh (you can see that by typing echo $SHELL
in Terminal), but zsh is fully compatible with Bash commands, so it is totally fine to use it instead. If you really want to use Bash, simply launch it by typing in Terminal: bash
.
Connecting to a remote HPC system via SSH
Usernames and password
We will give you a link to an etherpad during the workshop. Add your name next to a free username to claim it.
We will also give you the password for our training cluster. When prompted, enter it.
Linux and MacOS users
Linux users: open the terminal emulator of your choice.
MacOS users: open “Terminal”.
Then type:
ssh userxx@bashworkshop.c3.ca # Replace userxx by your username (e.g. user09)
Windows users
We suggest using the free version of MobaXterm.
MobaXterm comes with a terminal emulator and a GUI interface for SSH sessions.
Open MobaXterm, click on “Session”, then “SSH”, and fill in the Remote host name and your username. Here is a live demo.
Bash: the basics
The prompt
In command-line interfaces, a command prompt is a sequence of characters indicating that the interpreter is ready to accept input. It can also provide some information (e.g.Β time, error types, username and hostname, etc.)
The Bash prompt is customizable. By default, it often gives the username and the hostname, and it typically ends with $
.
Help on commands
Man pages:
man <command>
less
).Navigate up/down with the space bar and the
b
key.Quit the pager with the
q
key.
Help pages:
<command> --help
Inspect commands:
command -V <command>
Examples of commands
- Print working directory:
pwd
- Change directory:
cd
- Print:
echo
- Print content of a file:
cat
- List:
ls
- Copy:
cp
- Move or rename:
mv
- Create a new directory:
mkdir
- Create a new file:
touch
Keybindings
Clear the terminal (command clear
) with C-l (this means: press the Ctrl and L keys at the same time).
Navigate command history with C-p and C-n (or up and down arrows).
You can auto-complete commands by pressing the tab key.
Bash scripting: the basics
Instead of typing commands one at a time directly in a terminal, you can write them down, one per line, in a text file called a script.
They will be run in the order in which they are written when you execute the script.
This is a great way to automate tasks: to rerun this sequence of commands, you simply have to rerun the script.
File name
Shell scripts, including Bash scripts, are usually given the extension sh
(e.g.Β my_script.sh
).
You can store scripts anywhere, but a common practice is to store them in a ~/bin
directory.
Syntax
Shebang
Scripts can be written for any interpreter (e.g.Β Bash, Python, R, etc.) The way to tell the system which one to use is to use a shebang (#!
) followed by the path of the interpreter on the first line of the script.
To use Bash, start your scripts with:
#!/bin/bash
You may also encounter this notation:
#!/usr/bin/env bash
If you are curious, you can read the answers to this Stack Overflow question for the differences between the two.
Comments
Anything to the left of #
is ignored by the interpreter and is for human consumption only.
# You can write full-line comments
pwd # You can also write comments after commands
Executing scripts
There are two ways to execute a script:
bash my_script.sh
./my_script.sh # The dot represents the current directory
In the latter case, you need to make sure that your script is executable by first running:
chmod u+x my_script.sh # This makes the script executable by the user (i.e. you)
Our first script
Open a text editor (e.g.Β nano) and type:
#!/bin/bash
echo "This is our first script."
Save and close the file.
Now run the script with one, then the other method.
What does this script do?
Variables
Declaring variables
You can declare a variable (i.e.Β a name that holds a value) with the =
sign.
variable=Test
Quotes
Let’s experiment with quotes:
variable=This string is the value of the variable # Oops...
Error in running command bash
variable="This string is the value of the variable"
variable='This string is the value of the variable'
variable='This string's the value of the variable' # Oops...
bash: -c: line 1: unexpected EOF while looking for matching `''
bash: -c: line 2: syntax error: unexpected end of file
variable="This string's the value of the variable"
variable="This string is the value of the variable called "variable"" # Oops...
variable="This string is the value of the variable called \"variable\"" # \ is the escape character
Expanding a variable’s value
To expand a variable (to access its value), you need to prepend its name with $
:
variable=Test
echo variable
variable
Mmmm… not really want we want!
variable=Test
echo $variable
Test
variable=Test; echo "$variable"
Test
!! Single quotes don’t expand variables.
variable=Test; echo '$variable'
$variable
Passing variables to a Bash script
Create a script called name.sh
with the following content:
#!/bin/bash
echo "My name is $1." # $1 refers to the first variable passed to the script
You can now pass a variable to this script with:
bash name.sh Marie
My name is Marie.
You can pass several variables to a script. Copy name.sh
to name2.sh
and edit name2.sh
to look like the following:
#!/bin/bash
echo "My name is $1 and I am $2 years old."
bash name2.sh Marie 43
My name is Marie and I am 43 years old.
You can also pass any number of variables to a script:
#!/bin/bash
echo $@
bash script.sh argument1 argument2 argument3 argument4
argument1 argument2 argument3 argument4
Brace expansion
echo {1..5}
1 2 3 4 5
echo {01..10}
01 02 03 04 05 06 07 08 09 10
echo {1..5}.txt
1.txt 2.txt 3.txt 4.txt 5.txt
echo {r..v}
r s t u v
echo {file1,file2}.sh
file1.sh file2.sh
!! Make sure not to add a space after the comma.
touch {file1,file2}.sh
touch file{3..6}.sh
echo {list,of,strings}
list of strings
Wildcards
Wildcards are really powerful to apply a command to all the elements having a common pattern.
For instance, we can delete all the files we created earlier (file1.sh
, file2.sh
, etc.) with a single command:
rm file*.sh
rm
is irreversible. Deleted files do not go to the trash: they are gone.
Loops
To apply a set of commands to all the elements of a list, you can use for loops. The general structure is as follows:
for <iterable> in <list>
do
<statement1>
<statement2>
...
done
Let’s create the script names.sh
:
#!/bin/bash
for name in $@
do
echo $name
done
Now let’s run it with a list of arguments:
bash names.sh Patrick Paul Marie Alex
Patrick
Paul
Marie
Alex
Compare the outputs of the following 2 scripts:
- script1.sh:
#!/bin/bash
echo $@
- script2.sh:
#!/bin/bash
for i in $@
do
echo $i
done
How do you explain the difference between running:
bash script1.sh arg1 arg2 arg3
and running:
bash script2.sh arg1 arg2 arg3
Let’s put it all together to automate some task
This is a rather silly example, but bear with me and let’s imagine that it actually makes sense (of course, you don’t write that many thesis chapters so you would probably never automate these tasks…)
So… let’s imagine that each time you write a thesis chapter, you do the same things:
- you create a directory with the name of the chapter,
- you create a number of subdirectories (for your source code, your manuscript, your data, and your results),
- you create a Python script in the source code directory,
- you create a markdown document in your manuscript directory,
- you put the whole thing under version control with Git,
- you create a
.gitignore
file in which you put the data subdirectory.
Write a script that would do all this, then test the script.
Give it a try on your own before looking at the solution below…
Solution
Here is what the script looks like (let’s call it chapter.sh
):
#!/bin/bash
mkdir $1
cd $1
mkdir src data results ms
touch src/$1.py ms/$1.md
git init
echo data/ > .gitignore
You then run the script:
bash chapter.sh chapter1
You can verify that all the files and directories got created with:
tree chapter1
chapter1/
βββ data
βββ ms
βΒ Β βββ chapter1.md
βββ results
βββ src
βββ chapter1.py
and:
ls -aF chapter1
./ ../ data/ .git/ .gitignore ms/ results/ src/
You can also verify the content of your .gitignore
file with:
cat chapter1/.gitignore
data/
Resources
One very useful (although very dense) resource is the Bash manual.
You can also get information on Bash from within Bash with:
info bash
and:
man bash
There are also countless resources online and don’t forget to Google anything you don’t know how to do: you will almost certainly find the answer on StackOverflow or some Stack Exchange site.