Getting Started With Git
Okay, let’s assume you’ve used version control systems before and perhaps you’ve even used Git. Great! However, you’re still a bit confused about how it all works and what you should be doing. Groovy! That makes you just like every other Git user at some point in their lives. This is a primer (or refresher) on Git.
On the other hand, if you have a complete mastery of when rebasing makes sense and under what circumstances a forced push is useful then TL;DR and move on. Have a nice day. Better yet, send me a pull request on how this could be made better.
Installation
It won’t come as much of a surprise to learn that in order to use Git you’ll need to have it installed on your computer. Let’s get that out of the way first.
Windows
There are several approaches to installing Git on Windows. My personal preference is to use Git for Windows as it provides a pretty decent BASH implementation, a decent Git GUI (if you’re in to that kind of thing), and shell integration. If you prefer to spend your time in PowerShell you’ll probably prefer posh-git. Of course, you can also just install from the Git download page.
OS X
Congratulations! You already have Git. However, the version you have is likely pretty outdated. You can download the latest from the Git download page or, if you prefer, use a tool like Homebrew to install it and stay up to date.
Linux
Congratulations! The architect of your operating system is also the guy that created Git so you’re off to a great start. You probably already have Git, but if not then it’s certainly easy to find thanks to your distribution’s package manager (apt-get
, rpm
, or whatever). Using a distribution that doesn’t have a package manager or you’re not familiar with how to install software on your Linux computer? Well, you’re probably used to reading HOWTO documents already so… do that.
BeOS/MS-DOS/Minux/OS2/NetWare
Seriously? I mean, cool. Good for you, buddy.
Git GUI’s
I found this quote regarding Git recently:
“The problem with Git is that it’s so ancient that we have to use the command line or Terminal if you’re a Mac user in order to access it, typing in snippets of code like ‘90s hackers. This can be a difficult proposition for modern computer users. That’s where GitHub comes in.”
Typing simple commands on a (gasp) command line is difficult? If that’s the case for you then perhaps you should visit your guidance counselor and talk about an exciting career in TV and VCR repair. Seriously, learning Git means learning Git which is not a pretty graphical interface. Are those available? Sure! They’re out there. They can be nice to use, too. Do yourself a huge favor, however, and learn the tool then learn the other goodies. Oh, and the notion that GitHub is just a graphical interface for Git? Just ignore that.
Setup
Git uses dotfiles (literally, files with names that begin with a full stop a.k.a., “period” or “dot”) to store configuration values. The ones you’ll generally be concerned with are .gitconfig
and .gitignore_global
which are usually in your home folder (or in %USERPROFILE%
on Windows) and .gitignore
which is generally found in each repository. Open these up and poke around. They’re in the INI file format and are quite easy to change.
Repositories
A Git repository is nothing more than a folder that has been initialized using git init
and therefore contains a .git
folder in it that contains the goodies Git needs to keep track of things. It isn’t magical. It’s merely a folder that has been initialized for Git. Go ahead and poke around in there. It’s pretty interesting stuff!
Think of your .git
folder as Git’s database. In that database is everything Git knows about the files, branches, and history of everything. The folder that holds the .git
sub-folder? That’s a single checkout from your Git database that’s been put in your working folder. If you come from a centralized (or client-server) version control system like CVS or Perforce you can think of the .git
folder as the VCS server and the folder that contains the .git
folder as your workspace.
Modifying, Staging, and Committing
Okay, so a version control system is all about keeping track of changes, right? Well, with Git there are three states that a file can be in when it comes to Git; modified, staged, and committed. Committed just means that the file hasn’t changed. Modified means that the file has changed, but those changes have not been committed. Staged means that the file has been changed (which includes moves, deletes, and new files) and has been marked in the index to go as a part of your next commit.
HEAD
You’re going to run across the term HEAD
fairly often when using Git. HEAD
is simply a pointer to the last commit you made. You can learn more here.
Staging
To stage changes that you’ve made to your working copy you use commands like git add <filename>
, git add *
, and git rm <filename>
. This adds the change(s) to Git’s index which means that they are staged, but not committed.
Add
To add staged change use git add <filename>
, git add *
, git add .
git add -A
, or whatever variant gets the job done for you.
Remove
To remove a staged change use git rm <filename>
, git rm *`, or whatever variant gets the job done for you.
Viewing Status and History
To see the current state of your working copy according to Git, just type git status
. You’ll get output something like this:
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
#modified: hello.py
#
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
#modified: main.py
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
#hello.pyc
It’s usually a very good idea to take a look at the status before issuing a commit. That way you can make sure something doesn’t sneak in. For example, without a good .gitignore
file set up, things like debug binaries, unencrypted passwords, temp folders, and all sorts of other junk will creep in to your repository.
One thing you won’t see with git status
, however, is project history. To see that, you’ll want to use git log
. That command will let you view and filter the history of commits.
Committing
To commit your staged changes simply use the git commit
command. At this point, HEAD
now points to that commit as well. If you see a message like nothing to commit, working directory clean
then that means all your staged changes have been committed. Oh, and do yourself and everyone else a favor and add a message to each of your commit messages. All you need to do is use the git commit -m "<your message>"
syntax.
Branching
A branch essentially assigns a name to a particular commit and every ancestor of that commit. It’s a very tiny bit of data that acts as a sort of waypoint in history. It is not a completely new copy of every file. That would be silly and terribly inefficient (and exactly what a lot of version control systems do). Branches are cheap.
There are a lot of reasons to create a branch, but generally speaking you create a new branch in order to isolate changes. What you get is sort of a virtual copy of everything that you can change independently. Later on you can decide to keep those changes separate, merge everything back into a single version, or scrap it altogether. You can also do what most people do on large projects and just abandon your branch leaving it to wither away and die cold and alone. On second thought, don’t be that person.
Checkout
When you want to switch branches you simply issue the git checkout <branch-name>
command. Git will do what’s necessary to update your working folder with all the right stuff. If you have uncommitted changes, Git will complain and tell you to do something with them. At that point you can commit, revert, or stash your changes.
Branch and Immediately Checkout
Now that you know about both the branch
and the checkout
commands, you can use a handy shortcut for creating and immediately switching to a new branch: git checkout -b <branch-name>
. Bingo! You’re ready to go straight to work.
Stashing
It happened again. You’ve been cranking away on some code when something else comes up and you need to switch to a different branch. You don’t want to commit what you have because it’s half-baked. You just want to set it aside so you can switch gears and then come back to it later. The answer is the git stash
command.
Stashing puts your modified, tracked files and staged changes and saves them so you can resurrect them at any time. Stashes are stored in a list and when you want to bring your latest stash back you simply issue the git stash apply
command. If you have multiple stashes, you can pick which ones you want to apply. You can also unapply a stash, delete a stash, and so forth, but we’ll skip those features here. You can learn more about stashing right here.
Merging
First, let’s assume the following branch history:
C---D---E experiment
/
A---B---F---G master
Where:
- master is the default branch that Git created for us when we initialized the repository
- experiment is a branch that we created in order to try out an idea we had
When you perform the git merge <some-other-branch-name>
command you’re telling Git that you want to join two branch histories together. Git does this by “replaying” the changes that occurred on one branch after it diverged on top of another branch. Once complete Git will record the result in a new commit. The ancestry of each commit is preserved and the result would look like this:
C---D---E experiment
/ \
A---B---F---G---H master
Now, if the same file changed in the same place in both branches, we might end up with a merge conflict. Unless Git can auto-resolve the conflict (and often it can), it will refuse to stitch the branch histories until you, the human, tell Git what to do. Generally speaking, a three-way merge tool such as P4Merge is going to be your best friend. It will show you the file as it exists in both branches as well as the base which is simply how the file was originally before each branches changes. From there you can figure out what to keep, what to throw away, and what to change. Spend some time getting to know your merge tool before you really need it!
Previewing Changes
It’s usually a good idea to have a peek at what’s different between two branches before you merge. To do that simply execute the git diff <source-branch-name> <target-branch-name>
command. I won’t go in to how to read the diff command output (see here for that), but I will tell you that setting up a diff tool is a really good idea. I recommend P4Merge for that task. Here’s what I have in my .gitconfig
file for that:
[merge]
tool = p4merge
[mergetool "p4merge"]
path = "C:/Program Files/Perforce/p4merge.exe"
keepBackup = false
trustExitCode = false
[diff]
tool = p4merge
[push]
default = simple
Fast Forward Merges
When it comes to merges with Git, one thing we should talk about is how Git records the history. For the purposes of this discussion, we’ll simplify things and assume that there are two methods; “fast forward” and “not fast forward”. Here’s the difference:
First, let’s assume the following branch history:
C---D---E experiment
/
A---B-------- master
Where:
- master is the default branch that Git created for us when we initialized the repository
- experiment is a branch that we created in order to try out an idea we had
When you perform the git merge <some-other-branch-name>
command you’re telling Git that you want to join two branch histories together. If the source branch has not changed since the branch was started, by default Git will attempt to record the merge using the “fast forward” which results in linear branch history that looks like this:
A--B--C--D--E--F
However, if you perform the git merge -no-ff <some-other-branch-name>
command (note the -no-ff
argument), then Git will not attempt to record the merge with the “fast forward” which results in an extra commit and a branch history that looks like this:
C---D---E experiment
/ \
A---B-----------F master
The commit at point “F” in the diagram above is the merge commit. So, in short, “not fast forward” merging keeps the notion of explicit branches by adding the extra commit to the history. Alternatively, “fast forward” records the changesets in a linear fashion and in a sense ignores the fact that a branch ever existed. Whether to use fast foward merges or not can be quite a hot topic amoung Git nerds. Some people say, “Hey, if the source never changed while the branch was being worked on then why should I clutter my history with that information?” Others will argue that having full history is worth the price of a little extra noise.
Local and Remote
Up to this point, we’ve only been talking about things happening on your computer. However, Git is a distributed version control system and that allows multiple people to work on the code at the same time. Unlike a centralized VCS that usually involves a server that everyone connects to and is the only place that has all of the information about branches and so-forth, a distributed system is peer-to-peer and everyone has their own complete copy of the repository.
So what does that really mean? Well, it means that while there’s always a local repository, there can be remote repositories as well. In other words:
- local is a version of the code that you have on your computer
- remote is a version of the code hosted out on the network (or Internet) somewhere
GitHub, BitBucket, Etc.
It’s easy to get confused about the role a service such as GitHub provides. This is particularly true if your experiences with version control have been with centralized systems like Team Foundation Server, Subversion, or CVS. At their core, all these services provide is a hosted copy of the repository, identical to the one you have on your computer, that people can sync with. You could accomplish the same basic thing with your own computer as long as everyone can connect to it from their computers. Sure, most of these services have lots of other goodies, but at their very core they’re merely a commonly-accessible clone of your repository.
Local Branches and Remote Branches
Here’s a little something that trips a lot of folks up, but it’s really pretty simple. Your Git repository has, for all intents and purposes, two kinds of branches: local branches and remote tracking branches. Local branches are what we’ve talked about so far. Nothing new there. Remote tracking branches, however, are a bit different.
Origin
Remote tracking branches are named with a simple convention:
[remotename]/[branch-name]
By default, Git will set the remote name to origin
when you clone a repository. Assuming you stick with Git’s defaults (and you probably should for now), you’ll often see origin/master
as the “root” of a remote. Oh, you can also have more than one remote repository defined in your local repository, but we won’t get in to that here.
The most important thing to remember about a remote tracking branch is that you can’t modify it directly. It seems obvious when you think about it, right? So what good is it? Well, what you usually do with a remote tracking branch is:
- Fetch and optionally merge the changes other people have pushed to the same remote tracking branch
- Push your local changes to the remote tracking branch
- Create new branches based on the remote tracking branch
Checkout
Folks often get a little confused when it comes to remotes and checkout. That’s generally because Git does some fancy footwork that sort of hides what’s really happening. For example, if you perform a git checkout <branch-name>
and you don’t have a local branch by the name you specify, but there is a remote branch with the same name then Git will assume you want a new local branch tracking that remote tracking branch. Handy! Just remember you can’t modify a remote tracking branch directly and what really happened is that you created a new local branch that tracks the remote.
Oh, and don’t forget to git fetch
before you issue your checkout command. Speaking of which…
Fetching
Fetching updates your local Git database with the latest remote information. That includes new remote branches, changes to existing remote branches, and all the data necessary to bring your local working copy up to date with remote changes if you choose to do so. Fetching is safe to do when you darn well please and won’t screw up anything you’ve got going on in your working folder.
Pushing
When you’ve committed changes and you want to send those changes to the remote repository, you’re going to use git push <remote-name> <branch-name>
. Remember, if you’ve created a new branch it won’t be available to anyone else unless you’ve pushed it.
Force Pushing
Do not force push. Unless you mean “force push” in a Star Wars context. In that case, knock yourself out. Nerd.
Merging
Before we go on, let’s use this as our example branch diagram representing local and remote:
C---D---E local
/
A---B---F---G remote
Just like when you merge a branch locally, you’re telling Git that you want to join two branch histories together. Just as before, the ancestry of each commit is preserved and the result would look like this:
C---D---E local
/ \
A---B---F---G---H remote
Pulling
Pulling is a shortcut that’s the equivalent of running git fetch
and git merge
back-to-back. Many folks will tell you to avoid it, however. Why? Because you’re skipping the opportunity to see exactly what’s about to happen beforehand.
Rebasing
git rebase
or git pull --rebase
When you rebase, Git will take the commits that exist in your local branch and re-apply them on top of the remote branch. This re-writes the ancestors of your local commits. The effect is this:
C---D---E local
/
A---B---F---G remote
In general, you can (optionally) rebase local changes before you push them for the first time in order to keep the history clean, but you probably never want to rebase anything you’ve already pushed somewhere. In other words, there’s nothing wrong with this:
$ git checkout -b newfeature --track origin/newfeature
$ touch README.md
$ git add README.md
$ git commit -am "Added a blank README file."
$ vim README.md
$ git commit -am "Added some basic layout to the README file."
$ vim README.md
$ git commit -am "Fixed a typo."
$ git rebase
$ git push
Just remember after that git push
that you’ve made history public and changing that history is a no-no.
Re-Writing History
When you rebase you are re-writing Git’s history. Useful to be sure, but once that history is pushed to remotes that other people are using then your days of re-writing history are over. Why? Because you’re not re-writing your history, you’re re-writing everyone’s history and that can lead to some pretty serious problems for people trying to do their own merging and pushing.
- You are 100% sure you know what you are doing.
- You get explicit consent to do so from everyone working on that remote.
Being loose and careless about this will get you a very bad reputation, and if you do it the consequences could range from getting kicked off the team to getting fired to getting hellbanned from touching a computer for the rest of your life.
Pull Requests
A pull request is how you ask an upstream project to pull changes into their repository. Most often this means you’re asking a project that you don’t have permission to push to to incorporate changes that you’ve made to a clone of their repository into their repository. This means that you need to first push your changes to your public repository and then request the upstream project to pull those changes into theirs. Simple!
Forking
Let’s get one thing out of the way here… forking is a [GitHub](http://www.github.com) thing and not a Git thing. Got it? Groovy. Now, that you understand that, let’s talk about why forks are nice.
A fork is really nothing more than a shortcut. It’s the same thing as cloning someone else’s repository to your computer, making changes, then creating a GitHub public repository, making it a remote for your local repository, and establishing tracking between your local branch and your remote tracking branch. Forking just saves you a lot of extra work when all you probably want to do is help somebody out by fixing a bug, adding a feature, or writing some documentation.
Summary
So there you have it! While this certainly isn’t the comprehensive guide to Git, it covers the essentials. If you have further questions, here are some great resources: