Collaborating through Git

Example using GitHub as a remote

 Table of contents

Abstract

Git is a powerful version control system allowing to record, access, and restore the history of projects.

After setting up remotes on the internet or other network, Git is also a mighty collaboration tool.

In this workshop, we will use the popular online Git repository hosting site GitHub to practice a collaboration workflow typical of many research teams.

Software requirements

1 - Properly configured Git

You can download Git here if you are on Windows and here if you use MacOS or Linux.

These minimum configurations should be set properly:
- your user name
- your email address
- your preferred text editor
- the end of line formatting matching your operating system

2 - GitHub account

A free GitHub account.

(Optional) If you don't want to type your password all the time, set SSH for your account.

Prerequisites

Basic knowledge of Git:

- familiarity with the concept of staging area,
- experience with staging and committing.

Git is a great tool for version control. But how can it be used to collaborate on projects?

Remotes

To collaborate, you need a syncing hub

Git projects are local and self-contained: except for a few configuration files, everything lives in the root of your project. To work jointly with your collaborators however, everybody needs to have access to the project.

Git has a great solution for this: remotes.

What are remotes, really?

Remotes are copies of a project that reside outside of it and are connected to it so that data can be synced back and forth. "Outside" can be anywhere, including on an external drive, or even on the same machine. If you want your remotes to serve as backups, you want them outside your machine. And if you want your remotes to allow for collaboration, you want them on a network your collaborators have access to. One option, of course, is the internet.

A project can have several remotes. An address (or a path if they are local) specifies their location.

A number of online Git repository managers have become popular remote hosting sites. These include GitHub, GitLab, and Bitbucket. Today, we will use GitHub.

Collaborating on projects through Git and GitHub

There are multiple scenarios:

  • You created a project on your machine, put it under version control, and now you want others to contribute to it.
  • You want to contribute to a project started by others. You do not have write access to the GitHub repository. You were granted write access to it.

Let's go over these scenarios.

You create a project and want others to contribute to it

Let's quickly create a project:

$ cd /location/of/new/project
$ mkdir myproject
$ cd myproject
$ echo "This is our great project" > README

This is the content of our project:

$ ls -a
.  ..  README

Then, let's put it under version control with Git:

$ git init

You can see that this is now a Git repository:

$ ls -a
.  ..  .git  README

Let's create a first commit:

$ git add README
$ git commit -m "Initial commit: add README"

Now, you need to create a remote on GitHub.

First, you need to create a new GitHub repository.

Creating an empty repository on GitHub

Go to https://github.com, login, and go to your home page (https://github.com/<user>).

From there, select the Repositories tab, then click the green New button.

Enter the name you want for your repo, without spaces. It can be the same name you have for your project on your computer (it would be sensible and make things less confusing), but it doesn't have to be.

You can make your repository public or private. Choose the private option if your research contains sensitive data or you do not want to share your project with the world. If you want to develop open source projects, of course, you want to make them public.

Then, you have this empty repository on GitHub, but it is not connected to your local repository.

Adding the new GitHub repo as a remote

Click on the Code green drop-down button, select SSH (if you have set SSH for your GitHub account) or HTTPS (if you haven't) and copy the address.

Then, go back to your command line, cd inside your project if you aren't already there and add your remote.

You add a remote with:

git remote add <remote-name> <remote-address>

<remote-name> is only a convenience name that will identify that remote. You can choose any name, but since Git automatically call the remote origin when you clone a repo, it is common practice to use origin as the name for the first remote.

<remote-address> is the address of your remote in the https form or—if you have set SSH for your GitHub account—the SSH form.

Example (using an SSH address):

git remote add origin git@github.com:<user>/<repo>.git

In our case:

$ git remote add origin git@github.com:<user>/myproject.git

Example (using an HTTPS address):

git remote add origin https://github.com/<user>/<repo>.git

In our case:

$ git remote add origin https://github.com/<user>/myproject.git

(Type: git remote add origin, then paste the address you have just copied on GitHub).

Finally, if you want to grant your collaborators write access to the project, you need to add them to it (note that you don't have to give them write access: we will see later how one can contribute to a project without having write access to it. But if you are involved in a serious collaboration with others on a project, you might want to facilitate the process by letting them edit the project directly).

Inviting collaborators to a GitHub repo

  • Go to your GitHub project page
  • Click on the Settings tab
  • Click on the Manage access section on the left-hand side (you will be prompted for your GitHub password)
  • Click on the Invite a collaborator green button
  • Invite your collaborators with one of their GitHub user name, their email address, or their full name

Getting information on remotes

To list remotes, run:

git remote

To list the remotes with their addresses:

git remote -v

You can see that your local project now has a remote called origin and that it has the address of your GitHub repo.

To get yet more information about a particular remote, you can run:

git remote show <remote-name>

For instance, to inspect your new remote, run:

git remote show origin

Managing remotes

You rename a remote with:

git remote rename <old-remote-name> <new-remote-name>

And you delete a remote with:

git remote remove <remote-name>

You can change the url of the remote with:

git remote set-url <remote-name> <new-url> [<old-url>]

Working with remotes

Downloading data from the remote

If you collaborate on your project through the GitHub remote, you will have to download data added by your teammates to keep your local project up to date.

To download new data from the remote, you have 2 options: git fetch and git pull.

Fetching changes

Fetching downloads the data from your remote that you don't already have in your local version of the project.

git fetch <remote-name>

The branches on the remote are now accessible locally as <remote-name>/<branch>. You can inspect them or you can merge them into your local branches.

Example: To fetch from your new GitHub remote, you would run:

git fetch origin
Pulling changes

Pulling does 2 things: it fetches the data (as we just saw) and it then merges the changes onto your local branches.

git pull <remote-name> <branch>

Example

git pull origin master

If your branch is already tracking a remote branch (see below), then you simply need to run:

git pull

Now, how do you upload data to the remote?

Pushing to a remote

Uploading data to the remote is called pushing and is done with:

git push <remote-name> <branch-name>

To push your branch master to the remote origin:

git push origin master

You can also set an upstream branch to track a local branch with the -u flag:

git push -u origin master

From now on, all you have to run when you are on master is:

git push

Git knows that your local master branch is being tracked by the upstream master branch.

You want to contribute to a project created by someone else

When you want to contribute to someone's project, you are in one of two scenarios: either you have write access to the project or you don't.

Read access only

If you do not have write access to the remote, you cannot push to it and you need to submit a pull request (PR).

Let's try it using this project.

Setup

Here is how to set things up:

  1. Fork the project.
  2. Clone your fork on your machine (this will automatically set your fork as a remote to your new local project and that remote is automatically called origin).
  3. Add a second remote, this one pointing to the initial project. Usually, people call that remote upstream.
Fork the repo

First, go to GitHub and fork the project by clicking on the Fork button in the top right corner.

Clone your fork

Then, navigate to the directory in which you want to clone the project and clone your fork:

$ cd /location/of/new/project

There are 2 ways to clone a project. If you have set SSH for your account, the command is:

git clone git@github.com:<user>/<repo>.git

Here:

$ git clone git@github.com:<user>/git_practice.git

If you haven't set SSH for your account, use the HTTPS address and enter your GitHub user name and password when prompted. The general command looks like this:

git clone https://github.com/<user>/<repo>.git

Here:

$ git clone https://github.com/<user>/git_practice.git

Note that, if you want to give your copy of the project a different name, you can clone it with either of:

git clone git@github.com:<user>/<repo>.git <name>

git clone https://github.com/<user>/<repo>.git <name>

Add the initial project as upstream

We already saw how to add a remote:

$ git remote add upstream git@github.com:razoumov/git_practice.git
  • Make sure to use 'upstream' and not 'origin' for the name of this remote
  • Make sure to use the address of the initial project and not your fork

From there on, you can:

  • Pull from upstream (the repo to which you do not have write access and to which you want to contribute). This allows you to keep your fork up-to-date.
  • Push to and pull from origin (this is your fork, to which you have read and write access).

Pull request

You are now ready to submit pull requests.

Here is the workflow:

  1. Pull from upstream to make sure that your contributions are made on an up-to-date version of the project
  2. Create and checkout a new branch
  3. Make and commit your changes on that branch
  4. Push that branch to your fork (i.e. origin — remember that you do not have write access on upstream)
  5. Go to the original project GitHub's page and open a pull request from your fork. Note that after you have pushed your branch to origin, GitHub will automatically offer you to do so.

The maintainer of the original project may accept or decline the PR. They may also make comments and ask you to make changes. If so, make new changes and push additional commits to that branch.

Once the PR is merged by the maintainer, you can delete the branch on your fork and pull from upstream to update your local fork with the recently accepted changes.

Let's try this with our new project.

Read/write access

If you have write access to the project, you can clone the project directly, without forking it, and push changes to it.

This is a very simple setup: the copy on GitHub is the central copy—the one allowing various team members to work jointly on the same project. You now have a copy of it (as well as its entire history) on your machine and you push and pull freely to/from it. Your collaborators have their own copies on their own machines and can also freely push/pull to the same remote.

Comments & questions