Introduction to using git locally

Wednesday 14th January 2009

Why git?

Why bother using git? This is something of a loaded question. The first question this asks is: why use source control? There are various reasons, but the simplest is to keep hold of old versions, in case you need some piece of code that you've deleted, or if you've introduced a bug and need to revert a change.

Using a distributed version control system (DVCS), which git is, allows us to do certain things more easily that more traditional centralised version control systems, such as CVS or Subversion. In particular, a DVCS makes it painless to create branches with your own changes, and merge in other people's changes, picking and choosing what changes you want to include.

Finally, we have the question of: why git specifically? One reason might be popularity - there are plenty of users of git, so getting help shouldn't be too hard. (Of course, there are plenty of other popular DVCSs around.) Speed is another factor - what can take minutes on certain version control systems can take mere seconds using git. Providing some proof for this paragraph is way beyond the scope of this article, but there is a plethora of articles online on the topic.

This guide aims to show just some of the commands that git lets you use - there are certainly far more than are listed here. It also assumes that you're only going to working locally. Much of git's usefulness becomes apparent when you start working with other people, but even for one-man jobs using git can be tremendously helpful.

Of course, any corrections or suggestions are welcome! So, let's get going.

Getting started

For any git command, git-<command> and git <command> usually do the exact same thing. Especially helpful are the man pages for each command i.e. man git-<command>.

To start, we probably want to create a git repository. This is easy enough - in the directory you want to track, just run the command:

git init

This will create a repository containing a single branch, master. Anything in this directory is said to be the working tree.

To get things into the git repository, we first need to move them into the index. The index can be thought of as the staging area for the next commit. This can be done using:

git add <my-file> <some-other-file>

This will add <my-file> and <some-other-file> to the index. If you instead specify some directory in the list of things to add, it will add everything in that directory. Since a single dot represents the current directory, we can add everything in the working tree by typing (assuming you're in the working directory):

git add .

Once you're happy with the state of the index, you can commit your changes by typing

git commit

Change the commit message as you see fit, and your changes should enter the repository. Note that the first line of the message will be used a summary for that commit.

If you use the -a flag when committing i.e.

git commit -a

git will add everything that it tracks to the index and commit it. Say we've already committed a file foo. We then modify foo and add a new file bar. When we use the command git commit -a, git will commit the new version of foo, but not bar.

If you want to remove <my-file> from the index, use the command:

git rm --cached <my-file>

This will only remove the file from the index, so that it won't be part of the next commit. If you actually want to delete the file as well as remove it from the index, just drop the --cached flag:

git rm <my-file>

Limbo

Note that git differs from some other version control systems in the way it treats the working directory, index and commits.

First of all, the stuff in the working tree that isn't in the index (the stuff we're not going to commit), but isn't explicitly ignored, is said to be in limbo. Exactly what is in limbo, and what's actually in the index can be confusing if you're not clear on what's actually go on when you add and commit files.

Let's try an example. Say we have a file, foo, which has already been committed. Let's label the current content of foo as foo.1 (note that this is not what git calls it - this is just to make this explanation a little easier). We then make some changes to foo. We'll call the updated content foo.2. If we run the command:

git commit

git will say there's nothing to commit, since we haven't actually added anything to the index. We can either use git commit -a to add and commit the change in one, but let's say we just add it using:

git add foo

Now, foo.2 is in the index, but hasn't been committed. We might then make further changes to foo, meaning the content is now foo.3. Without any calls to git add, we call

git commit

and git updates foo in the repository. Note, however, that this committed foo.2 to the repository, not foo.3, since we never actually added foo.3 to the index. The changes between foo.2 and foo.3 are still in limbo. If we wanted to commit the latest revision of foo to the repository, we would have to have added foo again just before committing.

Commits

Each time you successfully run a commit command, git creates a new commit for whatever branch you're working in (more on branches later). Each commit is uniquely identifiable by a SHA1 hash. When referencing individual commits, you can use this SHA1 hash. In fact, you don't even need to use the entire hash - just enough characters to uniquely identify that commit.

The latest commit of a branch is always know as HEAD.

Appending a caret to the identifier of a commit will refer to the commit's parent. For instance, HEAD^ refers to the commit before last.

Looking back

To find out what's happened between commits, you can use git's diff tool. Just entering:

git diff

will let us look at the difference between the index and the working tree. If we want to see the differences between what we've got now in the working tree, and some other commit, simply enter:

git diff <my-commit>

So, to compare the working tree to the last commit, we enter the command:

git diff HEAD

If we want to use the index rather than the working directory, we can use the --cached command. For instance:

git diff --cached

will let us look at changes made between the last commit i.e. HEAD, and the index. If we specify a specific commit we can compare the index to that commit like so:

git diff --cached <my-commit>

If we want the difference between two specific commits, we just name them both:

git diff <commit-one> <commit-two>

This will show the changes made between <commit-one> and <commit-two>.

To view the various commits of the current branch, use:

git log

If you prefer a graphical view of what's going on, there are various GUIs you can use - qgit provides a GUI using the Qt toolkit.

You can also view the current status of the repository using:

git status

Fixing mistakes

Being human, we all make mistakes. Therefore, git has a few handy commands to help us get rid of them. First of all, let's say you've just made a commit, but forgot to add a file that should have been committed. All you have to do is update the index to whatever you want to be in the commit, and then issue the command

git commit --amend

Without the --amend flag, git would commit the index as a separate commit. With the flag, it simply updates the latest commit.

Branches

One of the most important features of git is the ability to branch. By default, a git repository has just one branch, known as master. You can see this by typing:

git branch

This shows all the current branches of a git repository. When we create another branch, git treats it independently of all the other branches. When we commit to one branch, only that branch is updated.

So, for instance, we might have the master branch as the stable version of the program under development. If you want to add a caching feature to your application, you might create a new branch called caching, based on the stable version. This means that you can make changes to the caching branch, potentially breaking the application, but you'll still have the stable branch, master, being unaffected.

This could be handy, say, when somebody finds a bug. If you want to issue a quick fix, you can change back to the master branch, fix the bug, commit the fix, and then go back to the unstable caching branch and carry on hacking.

So, enough chatter - how do we actually create branches? Simple. Just issue the command:

git branch <my-branch>

This will create a new branch called <my-branch>. It will be based on whatever is currently checked out. So, if you're currently working on the master branch, git branch <my-branch> will create a new branch based on the HEAD of master.

If you type git branch, you can see the list of branches in the repository, with the currently checked out branch having an asterix next to it. Despite creating the new branch, you'll still be in the same branch as before. To change to the new branch, use:

git checkout <my-branch>

Now, any changes you commit will be committed to <my-branch>, rather than master (or whatever branch you happened to be in before)

To both create and change to a branch in a single command, use:

git checkout -b <my-branch>

Checking out is not just limited to branches - we can checkout specific commits. For instance, if we have a commit identified by f107dd02d45cbc88a539d52d6829eb7237023441, we can check it out using:

git checkout f107

This assumes that no other commits start with the characters f107 - otherwise, the identifier would be amibigious. The working tree now contains the contents of that commit. Note, however, that since we haven't checked out the HEAD of a branch, we can't do many of our usual operations, such as git add or git commit - doing so wouldn't make any sense since we're not actually on any branch. If we want to create a branch from this point, we can use the same command as before:

git branch <my-branch>

We don't have to checkout a specific commit that we want to branch from - we can just specify the commit when branching, like so:

git branch <my-branch> <commit-to-branch-from>

So, if we want to create a branch called baz from the commit before last, we would use:

git branch baz HEAD^

If we want to rename a branch, we simple use the -m flag:

git branch -m <old-name> <new-name>

We might also decide we no longer need a branch since we've merged its changes into some other branch (more on merging later). In that case, we use the -d flag:

git branch -d <my-branch>

If the changes of <my-branch> have not been fully merged, git will abort the deletion of that branch. If you really want to delete that branch and lose its commits, then use the -D flag:

git branch -D <my-branch>

When identifying commits, if a commit is the HEAD of a branch, you simply need to use the name of the given branch. Say we want to see how the caching branch differs from the master branch - we can use:

git diff master caching

Grabbing changes from another branch

Reusing our earlier example, let's say we have a master branch, and we have created a caching branch. After we've made some bug fixes to the master branch, we'll probably want those changes to be reflected in the caching branch too. We could think of the caching branch as a series of patches on top of a particular commit in the master branch. In this case, we want to modify the caching branch so it now applies those patches to the HEAD of master. This is exactly what git rebase will do.

In this case, with caching checked out, the command we want to execute is:

git rebase master

git will then undo all the changes made in the caching branch, and use HEAD of master as our new starting point. It will then try to reapply all of the changes made in the caching branch. If git has any problems, it will stop and tell you what the problem is and how to fix it. In general, you can rebase onto any commit like so:

git rebase <my-commit>

But what about when we don't want to rebase, but want to grab all the changes from some other branch? For instance, if we want to merge the changes of caching into master, it doesn't make much sense to rebase master on caching. Instead, we use git-pull. This time around, we want to use the command (with master checked out):

git pull . caching

The dot indicates that we're using the repository in the current directory, and the caching is the branch that contains the changes we want to pull in.

In some cases, we don't want to grab all of the changes from a particular branch. Instead, you want to apply just a single patch. In this case, you can use git cherry-pick. Say some commit <my-commit> in another branch fixed a bug that you have in your branch, but you don't want to pull in all the other changes from the other branch. In this case, you could use the command:

git cherry-pick <my-commit>

Stashing

Let's say you're working on the master branch. You're halfway through implementing a new feature, when you find a one line bug. You could fix the bug and finish implementing the new feature, and commit them together. However, this tends to be a bad idea - for instance, what if you revert this feature for some reason? Now you've lost your bug fix too. Generally, we want to keep each commit small and focused, so we want two commits here - one for the bug fix, and one for the new feature.

But how do we write and commit the bug fix without also commiting our half-baked feature implementation? The easiest is by using git-stash. Simply call:

git stash

This will put your changes in both the working directory and the index into the stash, and reset the working directory and index to HEAD. If you want to associate the stash with a different message, then use this command instead:

git stash save <message>

To take a look at what you currently have stashed away, use:

git stash list

To reapply the changes stored in the stash, type:

git stash apply

That will use the latest stash - if you want to use a different stash, then you want:

git stash apply <stash-name>

To remove stashes, you have a few choices. To remove the latest stash:

git stash pop

The following will remove all stashes:

git stash clear

Finally, if you want to remove a particular stash, try:

git stash drop <stash-name>