Taking Git one step further: collaborating through GitHub


 Table of contents
Abstract

Git is a version control tool: it allows to keep a record of files history. Not only is this a much cleaner method than keeping (often messy) collections of files versions, it also makes navigating this history and identifying the changes between versions particularly easy. Using online remotes (for instance through online repository hosting services such as GitHub, GitLab, or Bitbucket) unleashes Git's full potential by providing:

  • an online backup of projects history

  • a powerful system to collaborate on projects

  • an easy way to make your open-source projects available to the community (and conversely, an easy way to contribute to others' open-source projects)

In this workshop, we will go over the full workflow of creating a project, hosting it on GitHub, and collaborating on it.

Workshop requirements

1 - Git
You can download Git here if you are on Windows and here if you use MacOS or Linux.

2 - GitHub account
A free GitHub account.

Before we start: setting up a GitHub account

If you haven't done so yet, create a free GitHub account.

Each time you will push to a repo on GitHub (we will see what "pushing" means later today), you will be asked to enter your GitHub user name and password. This quickly gets tiring.

An easy way to avoid having to do this all the time is to set up SSH for your account. Here is the GitHub help file to do so. Try to follow these instructions and I will help you if you have issues. If you cannot set this up today, this is not a big deal for the workshop: you will simply have to enter your password a number of times. But ultimately, you will probably want to set this up.




Workshop scenario

We are a team of virologists starting a new research project on COVID-19 and we decided to use GitHub as our collaboration tool.

Please note that we could have chosen GitLab or Bitbucket which offer the same functionalities. You can directly apply the workflow we will use today to these services.



Project setup

Our project

Let's imagine a real case scenario:

Most researchers aren't starting anything yet (so most of you please simply watch for now), others are starting various things independently.

You want to have all the project files in one place. So those who are starting to work on the project, please create a directory with the name of your choosing (it doesn't have to be the same name for everyone). You can create it wherever is suitable in your computer. This does not matter either.

  • One of you please start a script in the language of your choice (any language is fine as long as your script is a text file: it can be a python file, or an R file, etc.). Just write a couple of lines of code.

  • Another one please start writing a bogus draft manuscript in a .txt file.

  • Someone else please add a few images (.jpg or .png) to your project folder (if you don't want to use your pictures, you can download a few images from the web).

  • Finally, someone please add a data file (quickly create a very short bogus Excel file).

At this point, it is totally uncoordinated.

Starting version control

One of you will start the coordination work. Let's say for instance that it is the person who started the draft manuscript (everybody else, you are welcome to gather around the person doing that work or you can simply follow from your seat).

That person will start version control with Git on their project, thus turning it into a repository.

cd /path/to/project
git init

A .git directory was created in the project directory.

If you don't see it, make sure that your file manager is set so that you can see hidden files and directories.

You can also see that our directory is now a repository by running:

git status

Setting up a remote on GitHub

Before we can do this, we need to have at least one commit.

Let's add the draft manuscript. As this is the only file at this point, you can run:

git add .

This stages the file to be committed.

What are you getting now when you run git status ?

Then commit your staged file with:

git commit -m "Initial commit with draft manuscript"

What are you getting now when you run git status ?

But our repo does not have any remote: git remote -v does not return anything.

The project, though under version control, only resides in your computer. If we want to collaborate with others, we need to have a version on the web.

Go to your GitHub webpage, select the Repositories tab, then click the green New button.

Enter the name of your repo, avoiding spaces. It can be the name you had on your computer (it would be sensible and make things less confusing), but it doesn't have to be.

You can make your repository public or private. In a real scenario, our researchers would probably go with the private option as their research could be sensitive. If you want to develop open source projects, of course, you want to make them public.

Here, we will go with the public option because, while free accounts allow private repositories, not all team options are available on private repositories for free accounts.

Now, you can copy the web address of your repo and add it as a remote for your project:

git remote add origin git@github.com:<your-gh-user-name>/<your-repo-name>.git

Now, your project has a remote called "origin": git remote -v returns your repo on GitHub.

What happens if you run git push ?

This is because your current branch ("master") is not associated with anything. You need to tell Git where to push "master".

You could run:

git push origin master

This tells Git to push "master" to "origin". But you would have to do this each time you want to push to "origin" from "master". A nicer way is to set the upstream for "master" when you push for the first time. This is done by adding the flag --set-upstream :

git push --set-upstream origin master

From now on, git push will be enough to push to your remote called "origin" (when you are on the branch "master").

If you were working alone on this project, you would be set. But we want to collaborate as a team on it.

Collaborating through GitHub

Inviting collaborators to a GitHub repo

Go to the Settings tab, then the Manage access section on the left-hand side. Finally click Invite a collaborator .

Cloning the repo

Now that the project is on the web, all other team members can clone it on their machine to start collaborating on it.

cd /place/where/you/want/to/have/your/project
git clone git@github.com:<user>/<repo>.git <name>

<name> is not necessary: this is only if you want to rename the repo on your machine.

Pushing changes

Those who had started to work on some file(s) then have to copy the entire content of one directory into the other (you can copy your file(s) to the newly cloned repo or copy the entire content of the repo to your previous directory).

Then, you will want to push those files of yours to the remote so that everybody in the team can get a copy.

You don't need to set the remote: cloning a repo from GitHub automatically does this for you. So all you need to do is to run:

git push

Pulling changes

Now, everybody can pull those new files to their computer:

git pull

From now on, whenever someone wants to make their local work available to everybody, they can push it to the remote and whenever someone wants to update their local repo, adding to it everybody else's changes, they can pull those changes.

This is all nice and good, as long as everybody works on something different. Now, what happens if several persons are working on the same file?

Resolving conflicts

Working on the same file is no problem at all as long as different sections of the file are being edited. But if the same section is changed by different people, this creates a conflict.

Ideally, you want to avoid conflicts with a good team workflow. But if they arise, there are great tools to help you deal with them.

You can run:

git mergetool

Or you can use one of many GUI applications developed to make Git more friendly.

The lucky people who use Emacs will have access to an amazing tool: Emacs Ediff mode.

Whatever tool you use, conflicts will look like some variation of this:

<<<<<<< HEAD (current change)
One possible version of a this section of the file
=======
Another possible version of the same section of the file
>>>>>>> some other version (incoming change)

You will jump from conflict to conflict within a file and you will have to decide which version you want to choose for each of them. You can also in one swoop keep all of your version or all of "their" version with:

git checkout --ours <file>
git checkout --theirs <file>

Let's create a conflict and see what that looks like.

Branches

What if you want to experiment with something in the project and you don't want to mess it all up?

Branches are a great way to play with project files in a safe way. If you don't like the result, you can simply get rid of the branch. If you like it, you can merge it with master.

# show all branches (current branch marked with *)
git branch

# create a new branch called <name>
git branch <name>

# checkout branch <name>
git checkout <name>

# a better option, since it is easy to create a new branch
# and forget to switch to it is to run
git checkout -b <name>
# this creates a branch called <name> and switches to it

# delete branch <name>
git branch -d <name>

Comments & questions