The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by Linda Moore, 2020-03-26 18:20:53

progit

progit

$ tree
.
├── objects
│   ├── repos
│   │   └── [...]
│   └── trees
│   └── [...]

├── p4gf_config
├── repos
│   └── Talkhouse
│   └── p4gf_config
└── users
  └── p4gf_usermap

498 directories, 287 files

The objects directory is used internally by Git Fusion to map Perforce objects to Git and vice versa,
you won’t have to mess with anything in there. There’s a global p4gf_config file in this directory, as
well as one for each repository – these are the configuration files that determine how Git Fusion
behaves. Let’s take a look at the file in the root:

[repo-creation]
charset = utf8

[git-to-perforce]
change-owner = author
enable-git-branch-creation = yes
enable-swarm-reviews = yes
enable-git-merge-commits = yes
enable-git-submodules = yes
preflight-commit = none
ignore-author-permissions = no
read-permission-check = none
git-merge-avoidance-after-change-num = 12107

[perforce-to-git]
http-url = none
ssh-url = none

[@features]
imports = False
chunked-push = False
matrix2 = False
parallel-push = False

[authentication]
email-case-sensitivity = no

395

We won’t go into the meanings of these flags here, but note that this is just an INI-formatted text
file, much like Git uses for configuration. This file specifies the global options, which can then be
overridden by repository-specific configuration files, like repos/Talkhouse/p4gf_config. If you open
this file, you’ll see a [@repo] section with some settings that are different from the global defaults.
You’ll also see sections that look like this:

[Talkhouse-master]
git-branch-name = master
view = //depot/Talkhouse/main-dev/... ...

This is a mapping between a Perforce branch and a Git branch. The section can be named whatever
you like, so long as the name is unique. git-branch-name lets you convert a depot path that would be
cumbersome under Git to a more friendly name. The view setting controls how Perforce files are
mapped into the Git repository, using the standard view mapping syntax. More than one mapping
can be specified, like in this example:

[multi-project-mapping]
git-branch-name = master
view = //depot/project1/main/... project1/...
  //depot/project2/mainline/... project2/...

This way, if your normal workspace mapping includes changes in the structure of the directories,
you can replicate that with a Git repository.

The last file we’ll discuss is users/p4gf_usermap, which maps Perforce users to Git users, and which
you may not even need. When converting from a Perforce changeset to a Git commit, Git Fusion’s
default behavior is to look up the Perforce user, and use the email address and full name stored
there for the author/committer field in Git. When converting the other way, the default is to look up
the Perforce user with the email address stored in the Git commit’s author field, and submit the
changeset as that user (with permissions applying). In most cases, this behavior will do just fine,
but consider the following mapping file:

john [email protected] "John Doe"
john [email protected] "John Doe"
bob [email protected] "Anon X. Mouse"
joe [email protected] "Anon Y. Mouse"

Each line is of the format <user> <email> "<full name>", and creates a single user mapping. The first
two lines map two distinct email addresses to the same Perforce user account. This is useful if
you’ve created Git commits under several different email addresses (or change email addresses),
but want them to be mapped to the same Perforce user. When creating a Git commit from a
Perforce changeset, the first line matching the Perforce user is used for Git authorship information.

The last two lines mask Bob and Joe’s actual names and email addresses from the Git commits that
are created. This is nice if you want to open-source an internal project, but don’t want to publish
your employee directory to the entire world. Note that the email addresses and full names should

396

be unique, unless you want all the Git commits to be attributed to a single fictional author.

Workflow

Perforce Git Fusion is a two-way bridge between Perforce and Git version control. Let’s have a look
at how it feels to work from the Git side. We’ll assume we’ve mapped in the “Jam” project using a
configuration file as shown above, which we can clone like this:

$ git clone https://10.0.1.254/Jam
Cloning into 'Jam'...
Username for 'https://10.0.1.254': john
Password for 'https://[email protected]':
remote: Counting objects: 2070, done.
remote: Compressing objects: 100% (1704/1704), done.
Receiving objects: 100% (2070/2070), 1.21 MiB | 0 bytes/s, done.
remote: Total 2070 (delta 1242), reused 0 (delta 0)
Resolving deltas: 100% (1242/1242), done.
Checking connectivity... done.
$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/master
  remotes/origin/rel2.1
$ git log --oneline --decorate --graph --all
* 0a38c33 (origin/rel2.1) Create Jam 2.1 release branch.
| * d254865 (HEAD, origin/master, origin/HEAD, master) Upgrade to latest metrowerks on
Beos -- the Intel one.
| * bd2f54a Put in fix for jam's NT handle leak.
| * c0f29e7 Fix URL in a jam doc
| * cc644ac Radstone's lynx port.
[...]

The first time you do this, it may take some time. What’s happening is that Git Fusion is converting
all the applicable changesets in the Perforce history into Git commits. This happens locally on the
server, so it’s relatively fast, but if you have a lot of history, it can still take some time. Subsequent
fetches do incremental conversion, so it’ll feel more like Git’s native speed.

As you can see, our repository looks exactly like any other Git repository you might work with.
There are three branches, and Git has helpfully created a local master branch that tracks
origin/master. Let’s do a bit of work, and create a couple of new commits:

397

# ...
$ git log --oneline --decorate --graph --all
* cfd46ab (HEAD, master) Add documentation for new feature
* a730d77 Whitespace
* d254865 (origin/master, origin/HEAD) Upgrade to latest metrowerks on Beos -- the
Intel one.
* bd2f54a Put in fix for jam's NT handle leak.
[...]

We have two new commits. Now let’s check if anyone else has been working:

$ git fetch
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://10.0.1.254/Jam
  d254865..6afeb15 master -> origin/master
$ git log --oneline --decorate --graph --all
* 6afeb15 (origin/master, origin/HEAD) Update copyright
| * cfd46ab (HEAD, master) Add documentation for new feature
| * a730d77 Whitespace
|/
* d254865 Upgrade to latest metrowerks on Beos -- the Intel one.
* bd2f54a Put in fix for jam's NT handle leak.
[...]

It looks like someone was! You wouldn’t know it from this view, but the 6afeb15 commit was
actually created using a Perforce client. It just looks like another commit from Git’s point of view,
which is exactly the point. Let’s see how the Perforce server deals with a merge commit:

398

$ git merge origin/master
Auto-merging README
Merge made by the 'recursive' strategy.
 README | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git push
Counting objects: 9, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 917 bytes | 0 bytes/s, done.
Total 9 (delta 6), reused 0 (delta 0)
remote: Perforce: 100% (3/3) Loading commit tree into memory...
remote: Perforce: 100% (5/5) Finding child commits...
remote: Perforce: Running git fast-export...
remote: Perforce: 100% (3/3) Checking commits...
remote: Processing will continue even if connection is closed.
remote: Perforce: 100% (3/3) Copying changelists...
remote: Perforce: Submitting new Git commit objects to Perforce: 4
To https://10.0.1.254/Jam
  6afeb15..89cba2b master -> master

Git thinks it worked. Let’s take a look at the history of the README file from Perforce’s point of view,
using the revision graph feature of p4v:

Figure 147. Perforce revision graph resulting from Git push.
If you’ve never seen this view before, it may seem confusing, but it shows the same concepts as a
graphical viewer for Git history. We’re looking at the history of the README file, so the directory tree
at top left only shows that file as it surfaces in various branches. At top right, we have a visual
graph of how different revisions of the file are related, and the big-picture view of this graph is at

399

bottom right. The rest of the view is given to the details view for the selected revision (2 in this
case).

One thing to notice is that the graph looks exactly like the one in Git’s history. Perforce didn’t have a
named branch to store the 1 and 2 commits, so it made an “anonymous” branch in the .git-fusion
directory to hold it. This will also happen for named Git branches that don’t correspond to a named
Perforce branch (and you can later map them to a Perforce branch using the configuration file).

Most of this happens behind the scenes, but the end result is that one person on a team can be using
Git, another can be using Perforce, and neither of them will know about the other’s choice.

Git-Fusion Summary

If you have (or can get) access to your Perforce server, Git Fusion is a great way to make Git and
Perforce talk to each other. There’s a bit of configuration involved, but the learning curve isn’t very
steep. This is one of the few sections in this chapter where cautions about using Git’s full power will
not appear. That’s not to say that Perforce will be happy with everything you throw at it – if you try
to rewrite history that’s already been pushed, Git Fusion will reject it – but Git Fusion tries very
hard to feel native. You can even use Git submodules (though they’ll look strange to Perforce users),
and merge branches (this will be recorded as an integration on the Perforce side).

If you can’t convince the administrator of your server to set up Git Fusion, there is still a way to use
these tools together.

Git-p4

Git-p4 is a two-way bridge between Git and Perforce. It runs entirely inside your Git repository, so
you won’t need any kind of access to the Perforce server (other than user credentials, of course).
Git-p4 isn’t as flexible or complete a solution as Git Fusion, but it does allow you to do most of what
you’d want to do without being invasive to the server environment.

 You’ll need the p4 tool somewhere in your PATH to work with git-p4. As of this
writing, it is freely available at http://www.perforce.com/downloads/Perforce/20-
User.

Setting Up

For example purposes, we’ll be running the Perforce server from the Git Fusion OVA as shown
above, but we’ll bypass the Git Fusion server and go directly to the Perforce version control.

In order to use the p4 command-line client (which git-p4 depends on), you’ll need to set a couple of
environment variables:

$ export P4PORT=10.0.1.254:1666
$ export P4USER=john

Getting Started

As with anything in Git, the first command is to clone:

400

$ git p4 clone //depot/www/live www-shallow
Importing from //depot/www/live into www-shallow
Initialized empty Git repository in /private/tmp/www-shallow/.git/
Doing initial import of //depot/www/live/ from revision #head into
refs/remotes/p4/master

This creates what in Git terms is a “shallow” clone; only the very latest Perforce revision is
imported into Git; remember, Perforce isn’t designed to give every revision to every user. This is
enough to use Git as a Perforce client, but for other purposes it’s not enough.
Once it’s finished, we have a fully-functional Git repository:

$ cd myproject
$ git log --oneline --all --graph --decorate
* 70eaf78 (HEAD, p4/master, p4/HEAD, master) Initial import of //depot/www/live/ from
the state at revision #head

Note how there’s a “p4” remote for the Perforce server, but everything else looks like a standard
clone. Actually, that’s a bit misleading; there isn’t actually a remote there.

$ git remote -v

No remotes exist in this repository at all. Git-p4 has created some refs to represent the state of the
server, and they look like remote refs to git log, but they’re not managed by Git itself, and you can’t
push to them.

Workflow

Okay, let’s do some work. Let’s assume you’ve made some progress on a very important feature,
and you’re ready to show it to the rest of your team.

$ git log --oneline --all --graph --decorate
* 018467c (HEAD, master) Change page title
* c0fb617 Update link
* 70eaf78 (p4/master, p4/HEAD) Initial import of //depot/www/live/ from the state at
revision #head

We’ve made two new commits that we’re ready to submit to the Perforce server. Let’s check if
anyone else was working today:

401

$ git p4 sync
git p4 sync
Performing incremental import into refs/remotes/p4/master git branch
Depot paths: //depot/www/live/
Import destination: refs/remotes/p4/master
Importing revision 12142 (100%)
$ git log --oneline --all --graph --decorate
* 75cd059 (p4/master, p4/HEAD) Update copyright
| * 018467c (HEAD, master) Change page title
| * c0fb617 Update link
|/
* 70eaf78 Initial import of //depot/www/live/ from the state at revision #head

Looks like they were, and master and p4/master have diverged. Perforce’s branching system is
nothing like Git’s, so submitting merge commits doesn’t make any sense. Git-p4 recommends that
you rebase your commits, and even comes with a shortcut to do so:

$ git p4 rebase
Performing incremental import into refs/remotes/p4/master git branch
Depot paths: //depot/www/live/
No changes to import!
Rebasing the current branch onto remotes/p4/master
First, rewinding head to replay your work on top of it...
Applying: Update link
Applying: Change page title
 index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

You can probably tell from the output, but git p4 rebase is a shortcut for git p4 sync followed by
git rebase p4/master. It’s a bit smarter than that, especially when working with multiple branches,
but this is a good approximation.

Now our history is linear again, and we’re ready to contribute our changes back to Perforce. The
git p4 submit command will try to create a new Perforce revision for every Git commit between
p4/master and master. Running it drops us into our favorite editor, and the contents of the file look
something like this:

402

# A Perforce Change Specification.

#

# Change: The change number. 'new' on a new changelist.

# Date: The date this specification was last modified.

# Client: The client on which the changelist was created. Read-only.

# User: The user who created the changelist.

# Status: Either 'pending' or 'submitted'. Read-only.

# Type: Either 'public' or 'restricted'. Default is 'public'.

# Description: Comments about the changelist. Required.

# Jobs: What opened jobs are to be closed by this changelist.

# You may delete jobs from this list. (New changelists only.)

# Files: What opened files from the default changelist are to be added

# to this changelist. You may delete files from this list.

# (New changelists only.)

Change: new

Client: john_bens-mbp_8487

User: john

Status: new

Description:
  Update link

Files:
  //depot/www/live/index.html # edit

######## git author [email protected] does not match your p4 account. 2014-
######## Use option --preserve-user to modify authorship.
######## Variable git-p4.skipUserNameCheck hides this message.
######## everything below this line is just the diff #######
--- //depot/www/live/index.html 2014-08-31 18:26:05.000000000 0000
+++ /Users/ben/john_bens-mbp_8487/john_bens-mbp_8487/depot/www/live/index.html
08-31 18:26:05.000000000 0000
@@ -60,7 +60,7 @@
 </td>
 <td valign=top>
 Source and documentation for
-<a href="http://www.perforce.com/jam/jam.html">
+<a href="jam.html">
 Jam/MR</a>,
 a software build tool.
 </td>

This is mostly the same content you’d see by running p4 submit, except the stuff at the end which
git-p4 has helpfully included. Git-p4 tries to honor your Git and Perforce settings individually when

403

it has to provide a name for a commit or changeset, but in some cases you want to override it. For
example, if the Git commit you’re importing was written by a contributor who doesn’t have a
Perforce user account, you may still want the resulting changeset to look like they wrote it (and not
you).

Git-p4 has helpfully imported the message from the Git commit as the content for this Perforce
changeset, so all we have to do is save and quit, twice (once for each commit). The resulting shell
output will look something like this:

$ git p4 submit
Perforce checkout for depot path //depot/www/live/ located at /Users/ben/john_bens-
mbp_8487/john_bens-mbp_8487/depot/www/live/
Synchronizing p4 checkout...
... - file(s) up-to-date.
Applying dbac45b Update link
//depot/www/live/index.html#4 - opened for edit
Change 12143 created with 1 open file(s).
Submitting change 12143.
Locking 1 files ...
edit //depot/www/live/index.html#5
Change 12143 submitted.
Applying 905ec6a Change page title
//depot/www/live/index.html#5 - opened for edit
Change 12144 created with 1 open file(s).
Submitting change 12144.
Locking 1 files ...
edit //depot/www/live/index.html#6
Change 12144 submitted.
All commits applied!
Performing incremental import into refs/remotes/p4/master git branch
Depot paths: //depot/www/live/
Import destination: refs/remotes/p4/master
Importing revision 12144 (100%)
Rebasing the current branch onto remotes/p4/master
First, rewinding head to replay your work on top of it...
$ git log --oneline --all --graph --decorate
* 775a46f (HEAD, p4/master, p4/HEAD, master) Change page title
* 05f1ade Update link
* 75cd059 Update copyright
* 70eaf78 Initial import of //depot/www/live/ from the state at revision #head

The result is as though we just did a git push, which is the closest analogy to what actually did
happen.

Note that during this process every Git commit is turned into a Perforce changeset; if you want to
squash them down into a single changeset, you can do that with an interactive rebase before
running git p4 submit. Also note that the SHA-1 hashes of all the commits that were submitted as
changesets have changed; this is because git-p4 adds a line to the end of each commit it converts:

404

$ git log -1
commit 775a46f630d8b46535fc9983cf3ebe6b9aa53145
Author: John Doe <[email protected]>
Date: Sun Aug 31 10:31:44 2014 -0800

  Change page title

  [git-p4: depot-paths = "//depot/www/live/": change = 12144]

What happens if you try to submit a merge commit? Let’s give it a try. Here’s the situation we’ve
gotten ourselves into:

$ git log --oneline --all --graph --decorate
* 3be6fd8 (HEAD, master) Correct email address
* 1dcbf21 Merge remote-tracking branch 'p4/master'
|\
| * c4689fc (p4/master, p4/HEAD) Grammar fix
* | cbacd0a Table borders: yes please
* | b4959b6 Trademark
|/
* 775a46f Change page title
* 05f1ade Update link
* 75cd059 Update copyright
* 70eaf78 Initial import of //depot/www/live/ from the state at revision #head

The Git and Perforce history diverge after 775a46f. The Git side has two commits, then a merge
commit with the Perforce head, then another commit. We’re going to try to submit these on top of a
single changeset on the Perforce side. Let’s see what would happen if we tried to submit now:

$ git p4 submit -n
Perforce checkout for depot path //depot/www/live/ located at /Users/ben/john_bens-
mbp_8487/john_bens-mbp_8487/depot/www/live/
Would synchronize p4 checkout in /Users/ben/john_bens-mbp_8487/john_bens-
mbp_8487/depot/www/live/
Would apply
  b4959b6 Trademark
  cbacd0a Table borders: yes please
  3be6fd8 Correct email address

The -n flag is short for --dry-run, which tries to report what would happen if the submit command
were run for real. In this case, it looks like we’d be creating three Perforce changesets, which
correspond to the three non-merge commits that don’t yet exist on the Perforce server. That sounds
like exactly what we want, let’s see how it turns out:

405

$ git p4 submit
[…]
$ git log --oneline --all --graph --decorate
* dadbd89 (HEAD, p4/master, p4/HEAD, master) Correct email address
* 1b79a80 Table borders: yes please
* 0097235 Trademark
* c4689fc Grammar fix
* 775a46f Change page title
* 05f1ade Update link
* 75cd059 Update copyright
* 70eaf78 Initial import of //depot/www/live/ from the state at revision #head

Our history became linear, just as though we had rebased before submitting (which is in fact
exactly what happened). This means you can be free to create, work on, throw away, and merge
branches on the Git side without fear that your history will somehow become incompatible with
Perforce. If you can rebase it, you can contribute it to a Perforce server.

Branching

If your Perforce project has multiple branches, you’re not out of luck; git-p4 can handle that in a
way that makes it feel like Git. Let’s say your Perforce depot is laid out like this:

//depot
  └── project
  ├── main
  └── dev

And let’s say you have a dev branch, which has a view spec that looks like this:

//depot/project/main/... //depot/project/dev/...

Git-p4 can automatically detect that situation and do the right thing:

406

$ git p4 clone --detect-branches //depot/project@all
Importing from //depot/project@all into project
Initialized empty Git repository in /private/tmp/project/.git/
Importing revision 20 (50%)
  Importing new branch project/dev

  Resuming with change 20
Importing revision 22 (100%)
Updated branches: main dev
$ cd project; git log --oneline --all --graph --decorate
* eae77ae (HEAD, p4/master, p4/HEAD, master) main
| * 10d55fb (p4/project/dev) dev
| * a43cfae Populate //depot/project/main/... //depot/project/dev/....
|/
* 2b83451 Project init

Note the “@all” specifier in the depot path; that tells git-p4 to clone not just the latest changeset for
that subtree, but all changesets that have ever touched those paths. This is closer to Git’s concept of
a clone, but if you’re working on a project with a long history, it could take a while.

The --detect-branches flag tells git-p4 to use Perforce’s branch specs to map the branches to Git refs.
If these mappings aren’t present on the Perforce server (which is a perfectly valid way to use
Perforce), you can tell git-p4 what the branch mappings are, and you get the same result:

$ git init project
Initialized empty Git repository in /tmp/project/.git/
$ cd project
$ git config git-p4.branchList main:dev
$ git clone --detect-branches //depot/project@all .

Setting the git-p4.branchList configuration variable to main:dev tells git-p4 that “main” and “dev”
are both branches, and the second one is a child of the first one.

If we now git checkout -b dev p4/project/dev and make some commits, git-p4 is smart enough to
target the right branch when we do git p4 submit. Unfortunately, git-p4 can’t mix shallow clones
and multiple branches; if you have a huge project and want to work on more than one branch,
you’ll have to git p4 clone once for each branch you want to submit to.

For creating or integrating branches, you’ll have to use a Perforce client. Git-p4 can only sync and
submit to existing branches, and it can only do it one linear changeset at a time. If you merge two
branches in Git and try to submit the new changeset, all that will be recorded is a bunch of file
changes; the metadata about which branches are involved in the integration will be lost.

Git and Perforce Summary

Git-p4 makes it possible to use a Git workflow with a Perforce server, and it’s pretty good at it.
However, it’s important to remember that Perforce is in charge of the source, and you’re only using
Git to work locally. Just be really careful about sharing Git commits; if you have a remote that other

407

people use, don’t push any commits that haven’t already been submitted to the Perforce server.

If you want to freely mix the use of Perforce and Git as clients for source control, and you can
convince the server administrator to install it, Git Fusion makes using Git a first-class version-
control client for a Perforce server.

Git and TFS

Git is becoming popular with Windows developers, and if you’re writing code on Windows, there’s
a good chance you’re using Microsoft’s Team Foundation Server (TFS). TFS is a collaboration suite
that includes defect and work-item tracking, process support for Scrum and others, code review,
and version control. There’s a bit of confusion ahead: TFS is the server, which supports controlling
source code using both Git and their own custom VCS, which they’ve dubbed TFVC (Team
Foundation Version Control). Git support is a somewhat new feature for TFS (shipping with the
2013 version), so all of the tools that predate that refer to the version-control portion as “TFS”, even
though they’re mostly working with TFVC.

If you find yourself on a team that’s using TFVC but you’d rather use Git as your version-control
client, there’s a project for you.

Which Tool

In fact, there are two: git-tf and git-tfs.

Git-tfs (found at https://github.com/git-tfs/git-tfs) is a .NET project, and (as of this writing) it only
runs on Windows. To work with Git repositories, it uses the .NET bindings for libgit2, a library-
oriented implementation of Git which is highly performant and allows a lot of flexibility with the
guts of a Git repository. Libgit2 is not a complete implementation of Git, so to cover the difference
git-tfs will actually call the command-line Git client for some operations, so there are no artificial
limits on what it can do with Git repositories. Its support of TFVC features is very mature, since it
uses the Visual Studio assemblies for operations with servers. This does mean you’ll need access to
those assemblies, which means you need to install a recent version of Visual Studio (any edition
since version 2010, including Express since version 2012), or the Visual Studio SDK.

 Git-tf is End-of-Life (EOL), it will not get any updates. It is also no longer supported
by Microsoft.

Git-tf (whose home is at https://archive.codeplex.com/?p=gittf) is a Java project, and as such runs on
any computer with a Java runtime environment. It interfaces with Git repositories through JGit (a
JVM implementation of Git), which means it has virtually no limitations in terms of Git functions.
However, its support for TFVC is limited as compared to git-tfs – it does not support branches, for
instance.

So each tool has pros and cons, and there are plenty of situations that favor one over the other.
We’ll cover the basic usage of both of them in this book.

408

 You’ll need access to a TFVC-based repository to follow along with these
instructions. These aren’t as plentiful in the wild as Git or Subversion repositories,
so you may need to create one of your own. Codeplex
(https://archive.codeplex.com/) or Visual Studio Online
(https://visualstudio.microsoft.com) are both good choices for this.

Getting Started: git-tf
The first thing you do, just as with any Git project, is clone. Here’s what that looks like with git-tf:

$ git tf clone https://tfs.codeplex.com:443/tfs/TFS13 $/myproject/Main project_git

The first argument is the URL of a TFVC collection, the second is of the form $/project/branch, and
the third is the path to the local Git repository that is to be created (this last one is optional). Git-tf
can only work with one branch at a time; if you want to make checkins on a different TFVC branch,
you’ll have to make a new clone from that branch.

This creates a fully functional Git repository:

$ cd project_git
$ git log --all --oneline --decorate
512e75a (HEAD, tag: TFS_C35190, origin_tfs/tfs, master) Checkin message

This is called a shallow clone, meaning that only the latest changeset has been downloaded. TFVC
isn’t designed for each client to have a full copy of the history, so git-tf defaults to only getting the
latest version, which is much faster.

If you have some time, it’s probably worth it to clone the entire project history, using the --deep
option:

$ git tf clone https://tfs.codeplex.com:443/tfs/TFS13 $/myproject/Main \
  project_git --deep
Username: domain\user
Password:
Connecting to TFS...
Cloning $/myproject into /tmp/project_git: 100%, done.
Cloned 4 changesets. Cloned last changeset 35190 as d44b17a
$ cd project_git
$ git log --all --oneline --decorate
d44b17a (HEAD, tag: TFS_C35190, origin_tfs/tfs, master) Goodbye
126aa7b (tag: TFS_C35189)
8f77431 (tag: TFS_C35178) FIRST
0745a25 (tag: TFS_C35177) Created team project folder $/tfvctest via the \
  Team Project Creation Wizard

Notice the tags with names like TFS_C35189; this is a feature that helps you know which Git commits

409

are associated with TFVC changesets. This is a nice way to represent it, since you can see with a
simple log command which of your commits is associated with a snapshot that also exists in TFVC.
They aren’t necessary (and in fact you can turn them off with git config git-tf.tag false) – git-tf
keeps the real commit-changeset mappings in the .git/git-tf file.

Getting Started: git-tfs

Git-tfs cloning behaves a bit differently. Observe:

PS> git tfs clone --with-branches \
  https://username.visualstudio.com/DefaultCollection \
  $/project/Trunk project_git
Initialized empty Git repository in C:/Users/ben/project_git/.git/
C15 = b75da1aba1ffb359d00e85c52acb261e4586b0c9
C16 = c403405f4989d73a2c3c119e79021cb2104ce44a
Tfs branches found:
- $/tfvc-test/featureA
The name of the local branch will be : featureA
C17 = d202b53f67bde32171d5078968c644e562f1c439
C18 = 44cd729d8df868a8be20438fdeeefb961958b674

Notice the --with-branches flag. Git-tfs is capable of mapping TFVC branches to Git branches, and
this flag tells it to set up a local Git branch for every TFVC branch. This is highly recommended if
you’ve ever branched or merged in TFS, but it won’t work with a server older than TFS 2010 –
before that release, “branches” were just folders, so git-tfs can’t tell them from regular folders.

Let’s take a look at the resulting Git repository:

PS> git log --oneline --graph --decorate --all
* 44cd729 (tfs/featureA, featureA) Goodbye
* d202b53 Branched from $/tfvc-test/Trunk
* c403405 (HEAD, tfs/default, master) Hello
* b75da1a New project
PS> git log -1
commit c403405f4989d73a2c3c119e79021cb2104ce44a
Author: Ben Straub <[email protected]>
Date: Fri Aug 1 03:41:59 2014 +0000

  Hello

  git-tfs-id:
[https://username.visualstudio.com/DefaultCollection]$/myproject/Trunk;C16

There are two local branches, master and featureA, which represent the initial starting point of the
clone (Trunk in TFVC) and a child branch (featureA in TFVC). You can also see that the tfs “remote”
has a couple of refs too: default and featureA, which represent TFVC branches. Git-tfs maps the
branch you cloned from to tfs/default, and others get their own names.

410

Another thing to notice is the git-tfs-id: lines in the commit messages. Instead of tags, git-tfs uses
these markers to relate TFVC changesets to Git commits. This has the implication that your Git
commits will have a different SHA-1 hash before and after they have been pushed to TFVC.

Git-tf[s] Workflow

 Regardless of which tool you’re using, you should set a couple of Git configuration
values to avoid running into issues.

$ git config set --local core.ignorecase=true
$ git config set --local core.autocrlf=false

The obvious next thing you’re going to want to do is work on the project. TFVC and TFS have
several features that may add complexity to your workflow:

1. Feature branches that aren’t represented in TFVC add a bit of complexity. This has to do with
the very different ways that TFVC and Git represent branches.

2. Be aware that TFVC allows users to “checkout” files from the server, locking them so nobody
else can edit them. This obviously won’t stop you from editing them in your local repository, but
it could get in the way when it comes time to push your changes up to the TFVC server.

3. TFS has the concept of “gated” checkins, where a TFS build-test cycle has to complete
successfully before the checkin is allowed. This uses the “shelve” function in TFVC, which we
don’t cover in detail here. You can fake this in a manual fashion with git-tf, and git-tfs provides
the checkintool command which is gate-aware.

In the interest of brevity, what we’ll cover here is the happy path, which sidesteps or avoids most of
these issues.

Workflow: git-tf

Let’s say you’ve done some work, made a couple of Git commits on master, and you’re ready to
share your progress on the TFVC server. Here’s our Git repository:

$ git log --oneline --graph --decorate --all
* 4178a82 (HEAD, master) update code
* 9df2ae3 update readme
* d44b17a (tag: TFS_C35190, origin_tfs/tfs) Goodbye
* 126aa7b (tag: TFS_C35189)
* 8f77431 (tag: TFS_C35178) FIRST
* 0745a25 (tag: TFS_C35177) Created team project folder $/tfvctest via the \
  Team Project Creation Wizard

We want to take the snapshot that’s in the 4178a82 commit and push it up to the TFVC server. First
things first: let’s see if any of our teammates did anything since we last connected:

411

$ git tf fetch
Username: domain\user
Password:
Connecting to TFS...
Fetching $/myproject at latest changeset: 100%, done.
Downloaded changeset 35320 as commit 8ef06a8. Updated FETCH_HEAD.
$ git log --oneline --graph --decorate --all
* 8ef06a8 (tag: TFS_C35320, origin_tfs/tfs) just some text
| * 4178a82 (HEAD, master) update code
| * 9df2ae3 update readme
|/
* d44b17a (tag: TFS_C35190) Goodbye
* 126aa7b (tag: TFS_C35189)
* 8f77431 (tag: TFS_C35178) FIRST
* 0745a25 (tag: TFS_C35177) Created team project folder $/tfvctest via the \
  Team Project Creation Wizard

Looks like someone else is working, too, and now we have divergent history. This is where Git
shines, but we have two choices of how to proceed:

1. Making a merge commit feels natural as a Git user (after all, that’s what git pull does), and git-
tf can do this for you with a simple git tf pull. Be aware, however, that TFVC doesn’t think this
way, and if you push merge commits your history will start to look different on both sides,
which can be confusing. However, if you plan on submitting all of your changes as one
changeset, this is probably the easiest choice.

2. Rebasing makes our commit history linear, which means we have the option of converting each
of our Git commits into a TFVC changeset. Since this leaves the most options open, we
recommend you do it this way; git-tf even makes it easy for you with git tf pull --rebase.

The choice is yours. For this example, we’ll be rebasing:

$ git rebase FETCH_HEAD
First, rewinding head to replay your work on top of it...
Applying: update readme
Applying: update code
$ git log --oneline --graph --decorate --all
* 5a0e25e (HEAD, master) update code
* 6eb3eb5 update readme
* 8ef06a8 (tag: TFS_C35320, origin_tfs/tfs) just some text
* d44b17a (tag: TFS_C35190) Goodbye
* 126aa7b (tag: TFS_C35189)
* 8f77431 (tag: TFS_C35178) FIRST
* 0745a25 (tag: TFS_C35177) Created team project folder $/tfvctest via the \
  Team Project Creation Wizard

Now we’re ready to make a checkin to the TFVC server. Git-tf gives you the choice of making a
single changeset that represents all the changes since the last one (--shallow, which is the default)
and creating a new changeset for each Git commit (--deep). For this example, we’ll just create one

412

changeset:

$ git tf checkin -m 'Updating readme and code'
Username: domain\user
Password:
Connecting to TFS...
Checking in to $/myproject: 100%, done.
Checked commit 5a0e25e in as changeset 35348
$ git log --oneline --graph --decorate --all
* 5a0e25e (HEAD, tag: TFS_C35348, origin_tfs/tfs, master) update code
* 6eb3eb5 update readme
* 8ef06a8 (tag: TFS_C35320) just some text
* d44b17a (tag: TFS_C35190) Goodbye
* 126aa7b (tag: TFS_C35189)
* 8f77431 (tag: TFS_C35178) FIRST
* 0745a25 (tag: TFS_C35177) Created team project folder $/tfvctest via the \
  Team Project Creation Wizard

There’s a new TFS_C35348 tag, indicating that TFVC is storing the exact same snapshot as the 5a0e25e
commit. It’s important to note that not every Git commit needs to have an exact counterpart in
TFVC; the 6eb3eb5 commit, for example, doesn’t exist anywhere on the server.

That’s the main workflow. There are a couple of other considerations you’ll want to keep in mind:

• There is no branching. Git-tf can only create Git repositories from one TFVC branch at a time.
• Collaborate using either TFVC or Git, but not both. Different git-tf clones of the same TFVC

repository may have different commit SHA-1 hashes, which will cause no end of headaches.
• If your team’s workflow includes collaborating in Git and syncing periodically with TFVC, only

connect to TFVC with one of the Git repositories.

Workflow: git-tfs

Let’s walk through the same scenario using git-tfs. Here are the new commits we’ve made to the
master branch in our Git repository:

PS> git log --oneline --graph --all --decorate
* c3bd3ae (HEAD, master) update code
* d85e5a2 update readme
| * 44cd729 (tfs/featureA, featureA) Goodbye
| * d202b53 Branched from $/tfvc-test/Trunk
|/
* c403405 (tfs/default) Hello
* b75da1a New project

Now let’s see if anyone else has done work while we were hacking away:

413

PS> git tfs fetch
C19 = aea74a0313de0a391940c999e51c5c15c381d91d
PS> git log --all --oneline --graph --decorate
* aea74a0 (tfs/default) update documentation
| * c3bd3ae (HEAD, master) update code
| * d85e5a2 update readme
|/
| * 44cd729 (tfs/featureA, featureA) Goodbye
| * d202b53 Branched from $/tfvc-test/Trunk
|/
* c403405 Hello
* b75da1a New project

Yes, it turns out our coworker has added a new TFVC changeset, which shows up as the new aea74a0
commit, and the tfs/default remote branch has moved.

As with git-tf, we have two fundamental options for how to resolve this divergent history:

1. Rebase to preserve a linear history.
2. Merge to preserve what actually happened.

In this case, we’re going to do a “deep” checkin, where every Git commit becomes a TFVC changeset,
so we want to rebase.

PS> git rebase tfs/default
First, rewinding head to replay your work on top of it...
Applying: update readme
Applying: update code
PS> git log --all --oneline --graph --decorate
* 10a75ac (HEAD, master) update code
* 5cec4ab update readme
* aea74a0 (tfs/default) update documentation
| * 44cd729 (tfs/featureA, featureA) Goodbye
| * d202b53 Branched from $/tfvc-test/Trunk
|/
* c403405 Hello
* b75da1a New project

Now we’re ready to complete our contribution by checking in our code to the TFVC server. We’ll use
the rcheckin command here to create a TFVC changeset for each Git commit in the path from HEAD
to the first tfs remote branch found (the checkin command would only create one changeset, sort of
like squashing Git commits).

414

PS> git tfs rcheckin
Working with tfs remote: default
Fetching changes from TFS to minimize possibility of late conflict...
Starting checkin of 5cec4ab4 'update readme'
 add README.md
C20 = 71a5ddce274c19f8fdc322b4f165d93d89121017
Done with 5cec4ab4b213c354341f66c80cd650ab98dcf1ed, rebasing tail onto new TFS-
commit...
Rebase done successfully.
Starting checkin of b1bf0f99 'update code'
 edit .git\tfs\default\workspace\ConsoleApplication1/ConsoleApplication1/Program.cs
C21 = ff04e7c35dfbe6a8f94e782bf5e0031cee8d103b
Done with b1bf0f9977b2d48bad611ed4a03d3738df05ea5d, rebasing tail onto new TFS-
commit...
Rebase done successfully.
No more to rcheckin.
PS> git log --all --oneline --graph --decorate
* ff04e7c (HEAD, tfs/default, master) update code
* 71a5ddc update readme
* aea74a0 update documentation
| * 44cd729 (tfs/featureA, featureA) Goodbye
| * d202b53 Branched from $/tfvc-test/Trunk
|/
* c403405 Hello
* b75da1a New project

Notice how after every successful checkin to the TFVC server, git-tfs is rebasing the remaining work
onto what it just did. That’s because it’s adding the git-tfs-id field to the bottom of the commit
messages, which changes the SHA-1 hashes. This is exactly as designed, and there’s nothing to
worry about, but you should be aware that it’s happening, especially if you’re sharing Git commits
with others.

TFS has many features that integrate with its version control system, such as work items,
designated reviewers, gated checkins, and so on. It can be cumbersome to work with these features
using only a command-line tool, but fortunately git-tfs lets you launch a graphical checkin tool very
easily:

PS> git tfs checkintool
PS> git tfs ct

It looks a bit like this:

415

Figure 148. The git-tfs checkin tool.

This will look familiar to TFS users, as it’s the same dialog that’s launched from within Visual
Studio.

Git-tfs also lets you control TFVC branches from your Git repository. As an example, let’s create one:

PS> git tfs branch $/tfvc-test/featureBee
The name of the local branch will be : featureBee
C26 = 1d54865c397608c004a2cadce7296f5edc22a7e5
PS> git log --oneline --graph --decorate --all
* 1d54865 (tfs/featureBee) Creation branch $/myproject/featureBee
* ff04e7c (HEAD, tfs/default, master) update code
* 71a5ddc update readme
* aea74a0 update documentation
| * 44cd729 (tfs/featureA, featureA) Goodbye
| * d202b53 Branched from $/tfvc-test/Trunk
|/
* c403405 Hello
* b75da1a New project

Creating a branch in TFVC means adding a changeset where that branch now exists, and this is
projected as a Git commit. Note also that git-tfs created the tfs/featureBee remote branch, but HEAD
is still pointing to master. If you want to work on the newly-minted branch, you’ll want to base your
new commits on the 1d54865 commit, perhaps by creating a topic branch from that commit.

Git and TFS Summary

Git-tf and Git-tfs are both great tools for interfacing with a TFVC server. They allow you to use the
power of Git locally, avoid constantly having to round-trip to the central TFVC server, and make

416

your life as a developer much easier, without forcing your entire team to migrate to Git. If you’re
working on Windows (which is likely if your team is using TFS), you’ll probably want to use git-tfs,
since its feature set is more complete, but if you’re working on another platform, you’ll be using git-
tf, which is more limited. As with most of the tools in this chapter, you should choose one of these
version-control systems to be canonical, and use the other one in a subordinate fashion – either Git
or TFVC should be the center of collaboration, but not both.

Migrating to Git

If you have an existing codebase in another VCS but you’ve decided to start using Git, you must
migrate your project one way or another. This section goes over some importers for common
systems, and then demonstrates how to develop your own custom importer. You’ll learn how to
import data from several of the bigger professionally used SCM systems, because they make up the
majority of users who are switching, and because high-quality tools for them are easy to come by.

Subversion

If you read the previous section about using git svn, you can easily use those instructions to git svn
clone a repository; then, stop using the Subversion server, push to a new Git server, and start using
that. If you want the history, you can accomplish that as quickly as you can pull the data out of the
Subversion server (which may take a while).

However, the import isn’t perfect; and because it will take so long, you may as well do it right. The
first problem is the author information. In Subversion, each person committing has a user on the
system who is recorded in the commit information. The examples in the previous section show
schacon in some places, such as the blame output and the git svn log. If you want to map this to
better Git author data, you need a mapping from the Subversion users to the Git authors. Create a
file called users.txt that has this mapping in a format like this:

schacon = Scott Chacon <[email protected]>
selse = Someo Nelse <[email protected]>

To get a list of the author names that SVN uses, you can run this:

$ svn log --xml --quiet | grep author | sort -u | \
  perl -pe 's/.*>(.*?)<.*/$1 = /'

That generates the log output in XML format, then keeps only the lines with author information,
discards duplicates, strips out the XML tags. Obviously this only works on a machine with grep,
sort, and perl installed. Then, redirect that output into your users.txt file so you can add the
equivalent Git user data next to each entry.

 If you’re trying this on a Windows machine, this is the point where you’ll run into
trouble. Microsoft have provided some good advice and samples at
https://docs.microsoft.com/en-us/azure/devops/repos/git/perform-migration-from-
svn-to-git.

417

You can provide this file to git svn to help it map the author data more accurately. You can also tell
git svn not to include the metadata that Subversion normally imports, by passing --no-metadata to
the clone or init command. The metadata includes a git-svn-id inside each commit message that
Git will generate during import. This can bloat your Git log and might make it a bit unclear.

 You need to keep the metadata when you want to mirror commits made in the Git
repository back into the original SVN repository. If you don’t want the
synchronization in your commit log, feel free to omit the --no-metadata parameter.

This makes your import command look like this:

$ git svn clone http://my-project.googlecode.com/svn/ \
  --authors-file=users.txt --no-metadata --prefix "" -s my_project
$ cd my_project

Now you should have a nicer Subversion import in your my_project directory. Instead of commits
that look like this

commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date: Sun May 3 00:12:22 2009 +0000

  fixed install - go to trunk

  git-svn-id: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
  be05-5f7a86268029

they look like this:

commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <[email protected]>
Date: Sun May 3 00:12:22 2009 +0000

  fixed install - go to trunk

Not only does the Author field look a lot better, but the git-svn-id is no longer there, either.

You should also do a bit of post-import cleanup. For one thing, you should clean up the weird
references that git svn set up. First you’ll move the tags so they’re actual tags rather than strange
remote branches, and then you’ll move the rest of the branches so they’re local.

To move the tags to be proper Git tags, run:

$ for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do git
tag ${t/tags\//} $t && git branch -D -r $t; done

418

This takes the references that were remote branches that started with refs/remotes/tags/ and
makes them real (lightweight) tags.
Next, move the rest of the references under refs/remotes to be local branches:

$ for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do git branch
$b refs/remotes/$b && git branch -D -r $b; done

It may happen that you’ll see some extra branches which are suffixed by @xxx (where xxx is a
number), while in Subversion you only see one branch. This is actually a Subversion feature called
“peg-revisions”, which is something that Git simply has no syntactical counterpart for. Hence, git
svn simply adds the svn version number to the branch name just in the same way as you would
have written it in svn to address the peg-revision of that branch. If you do not care anymore about
the peg-revisions, simply remove them:

$ for p in $(git for-each-ref --format='%(refname:short)' | grep @); do git branch -D
$p; done

Now all the old branches are real Git branches and all the old tags are real Git tags.
There’s one last thing to clean up. Unfortunately, git svn creates an extra branch named trunk,
which maps to Subversion’s default branch, but the trunk ref points to the same place as master.
Since master is more idiomatically Git, here’s how to remove the extra branch:

$ git branch -d trunk

The last thing to do is add your new Git server as a remote and push to it. Here is an example of
adding your server as a remote:

$ git remote add origin git@my-git-server:myrepository.git

Because you want all your branches and tags to go up, you can now run this:

$ git push origin --all
$ git push origin --tags

All your branches and tags should be on your new Git server in a nice, clean import.

Mercurial

Since Mercurial and Git have fairly similar models for representing versions, and since Git is a bit
more flexible, converting a repository from Mercurial to Git is fairly straightforward, using a tool
called "hg-fast-export", which you’ll need a copy of:

419

$ git clone https://github.com/frej/fast-export.git

The first step in the conversion is to get a full clone of the Mercurial repository you want to convert:

$ hg clone <remote repo URL> /tmp/hg-repo

The next step is to create an author mapping file. Mercurial is a bit more forgiving than Git for what
it will put in the author field for changesets, so this is a good time to clean house. Generating this is
a one-line command in a bash shell:

$ cd /tmp/hg-repo
$ hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors

This will take a few seconds, depending on how long your project’s history is, and afterwards the
/tmp/authors file will look something like this:

bob
bob@localhost
bob <[email protected]>
bob jones <bob <AT> company <DOT> com>
Bob Jones <[email protected]>
Joe Smith <[email protected]>

In this example, the same person (Bob) has created changesets under four different names, one of
which actually looks correct, and one of which would be completely invalid for a Git commit. Hg-
fast-export lets us fix this by turning each line into a rule: "<input>"="<output>", mapping an <input>
to an <output>. Inside the <input> and <output> strings, all escape sequences understood by the
python string_escape encoding are supported. If the author mapping file does not contain a
matching <input>, that author will be sent on to Git unmodified. If all the usernames look fine, we
won’t need this file at all. In this example, we want our file to look like this:

"bob"="Bob Jones <[email protected]>"
"bob@localhost"="Bob Jones <[email protected]>"
"bob <[email protected]>"="Bob Jones <[email protected]>"
"bob jones <bob <AT> company <DOT> com>"="Bob Jones <[email protected]>"

The same kind of mapping file can be used to rename branches and tags when the Mercurial name
is not allowed by Git.
The next step is to create our new Git repository, and run the export script:

420

$ git init /tmp/converted
$ cd /tmp/converted
$ /tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors
The -r flag tells hg-fast-export where to find the Mercurial repository we want to convert, and the
-A flag tells it where to find the author-mapping file (branch and tag mapping files are specified by
the -B and -T flags respectively). The script parses Mercurial changesets and converts them into a
script for Git’s "fast-import" feature (which we’ll discuss in detail a bit later on). This takes a bit
(though it’s much faster than it would be over the network), and the output is fairly verbose:

421

$ /tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors

Loaded 4 authors

master: Exporting full revision 1/22208 with 13/0/0 added/changed/removed files

master: Exporting simple delta revision 2/22208 with 1/1/0 added/changed/removed files

master: Exporting simple delta revision 3/22208 with 0/1/0 added/changed/removed files

[…]

master: Exporting simple delta revision 22206/22208 with 0/4/0 added/changed/removed

files

master: Exporting simple delta revision 22207/22208 with 0/2/0 added/changed/removed

files

master: Exporting thorough delta revision 22208/22208 with 3/213/0

added/changed/removed files

Exporting tag [0.4c] at [hg r9] [git :10]

Exporting tag [0.4d] at [hg r16] [git :17]

[…]

Exporting tag [3.1-rc] at [hg r21926] [git :21927]

Exporting tag [3.1] at [hg r21973] [git :21974]

Issued 22315 commands

git-fast-import statistics:

---------------------------------------------------------------------

Alloc'd objects: 120000

Total objects: 115032 ( 208171 duplicates )

  blobs : 40504 ( 205320 duplicates 26117 deltas of 39602

attempts)

  trees : 52320 ( 2851 duplicates 47467 deltas of 47599

attempts)

  commits: 22208 ( 0 duplicates 0 deltas of 0

attempts)

  tags : 0 ( 0 duplicates 0 deltas of 0

attempts)

Total branches: 109 ( 2 loads )

  marks: 1048576 ( 22208 unique )

  atoms: 1952

Memory total: 7860 KiB

  pools: 2235 KiB

  objects: 5625 KiB

---------------------------------------------------------------------

pack_report: getpagesize() = 4096

pack_report: core.packedGitWindowSize = 1073741824

pack_report: core.packedGitLimit = 8589934592

pack_report: pack_used_ctr = 90430

pack_report: pack_mmap_calls = 46771

pack_report: pack_open_windows = 1/ 1

pack_report: pack_mapped = 340852700 / 340852700

---------------------------------------------------------------------

$ git shortlog -sn
  369 Bob Jones
  365 Joe Smith

422

That’s pretty much all there is to it. All of the Mercurial tags have been converted to Git tags, and
Mercurial branches and bookmarks have been converted to Git branches. Now you’re ready to
push the repository up to its new server-side home:

$ git remote add origin git@my-git-server:myrepository.git
$ git push origin --all

Bazaar

Bazaar is a DVCS tool much like Git, and as a result it’s pretty straightforward to convert a Bazaar
repository into a Git one. To accomplish this, you’ll need to import the bzr-fastimport plugin.
Getting the bzr-fastimport plugin
The procedure for installing the fastimport plugin is different on UNIX-like operating systems and
on Windows. In the first case, the simplest is to install the bzr-fastimport package that will install
all the required dependencies.
For example, with Debian and derived, you would do the following:

$ sudo apt-get install bzr-fastimport

With RHEL, you would do the following:

$ sudo yum install bzr-fastimport

With Fedora, since release 22, the new package manager is dnf:

$ sudo dnf install bzr-fastimport

If the package is not available, you may install it as a plugin:

$ mkdir --parents ~/.bazaar/plugins # creates the necessary folders for the
plugins
$ cd ~/.bazaar/plugins
$ bzr branch lp:bzr-fastimport fastimport # imports the fastimport plugin
$ cd fastimport
$ sudo python setup.py install --record=files.txt # installs the plugin

For this plugin to work, you’ll also need the fastimport Python module. You can check whether it is
present or not and install it with the following commands:

423

$ python -c "import fastimport"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named fastimport
$ pip install fastimport

If it is not available, you can download it at address https://pypi.python.org/pypi/fastimport/.
In the second case (on Windows), bzr-fastimport is automatically installed with the standalone
version and the default installation (let all the checkboxes checked). So in this case you have
nothing to do.
At this point, the way to import a Bazaar repository differs according to that you have a single
branch or you are working with a repository that has several branches.

Project with a single branch
Now cd in the directory that contains your Bazaar repository and initialize the Git repository:

$ cd /path/to/the/bzr/repository
$ git init

Now, you can simply export your Bazaar repository and convert it into a Git repository using the
following command:

$ bzr fast-export --plain . | git fast-import

Depending on the size of the project, your Git repository is built in a lapse from a few seconds to a
few minutes.

Case of a project with a main branch and a working branch
You can also import a Bazaar repository that contains branches. Let us suppose that you have two
branches: one represents the main branch (myProject.trunk), the other one is the working branch
(myProject.work).

$ ls
myProject.trunk myProject.work

Create the Git repository and cd into it:

$ git init git-repo
$ cd git-repo

Pull the master branch into git:

424

$ bzr fast-export --export-marks=../marks.bzr ../myProject.trunk | \
git fast-import --export-marks=../marks.git

Pull the working branch into Git:

$ bzr fast-export --marks=../marks.bzr --git-branch=work ../myProject.work | \
git fast-import --import-marks=../marks.git --export-marks=../marks.git

Now git branch shows you the master branch as well as the work branch. Check the logs to make
sure they’re complete and get rid of the marks.bzr and marks.git files.

Synchronizing the staging area
Whatever the number of branches you had and the import method you used, your staging area is
not synchronized with HEAD, and with the import of several branches, your working directory is not
synchronized either. This situation is easily solved by the following command:

$ git reset --hard HEAD

Ignoring the files that were ignored with .bzrignore
Now let’s have a look at the files to ignore. The first thing to do is to rename .bzrignore into
.gitignore. If the .bzrignore file contains one or several lines starting with "!!" or "RE:", you’ll have
to modify it and perhaps create several .gitignore files in order to ignore exactly the same files that
Bazaar was ignoring.
Finally, you will have to create a commit that contains this modification for the migration:

$ git mv .bzrignore .gitignore
$ # modify .gitignore if needed
$ git commit -am 'Migration from Bazaar to Git'

Sending your repository to the server
Here we are! Now you can push the repository onto its new home server:

$ git remote add origin git@my-git-server:mygitrepository.git
$ git push origin --all
$ git push origin --tags

Your Git repository is ready to use.

425

Perforce

The next system you’ll look at importing from is Perforce. As we discussed above, there are two
ways to let Git and Perforce talk to each other: git-p4 and Perforce Git Fusion.

Perforce Git Fusion

Git Fusion makes this process fairly painless. Just configure your project settings, user mappings,
and branches using a configuration file (as discussed in Git Fusion), and clone the repository. Git
Fusion leaves you with what looks like a native Git repository, which is then ready to push to a
native Git host if you desire. You could even use Perforce as your Git host if you like.

Git-p4

Git-p4 can also act as an import tool. As an example, we’ll import the Jam project from the Perforce
Public Depot. To set up your client, you must export the P4PORT environment variable to point to
the Perforce depot:

$ export P4PORT=public.perforce.com:1666

 In order to follow along, you’ll need a Perforce depot to connect with. We’ll be
using the public depot at public.perforce.com for our examples, but you can use
any depot you have access to.

Run the git p4 clone command to import the Jam project from the Perforce server, supplying the
depot and project path and the path into which you want to import the project:

$ git-p4 clone //guest/perforce_software/jam@all p4import
Importing from //guest/perforce_software/jam@all into p4import
Initialized empty Git repository in /private/tmp/p4import/.git/
Import destination: refs/remotes/p4/master
Importing revision 9957 (100%)

This particular project has only one branch, but if you have branches that are configured with
branch views (or just a set of directories), you can use the --detect-branches flag to git p4 clone to
import all the project’s branches as well. See Branching for a bit more detail on this.

At this point you’re almost done. If you go to the p4import directory and run git log, you can see
your imported work:

426

$ git log -2
commit e5da1c909e5db3036475419f6379f2c73710c4e6
Author: giles <giles@[email protected]>
Date: Wed Feb 8 03:13:27 2012 -0800

  Correction to line 355; change </UL> to </OL>.

  [git-p4: depot-paths = "//public/jam/src/": change = 8068]

commit aa21359a0a135dda85c50a7f7cf249e4f7b8fd98
Author: kwirth <[email protected]>
Date: Tue Jul 7 01:35:51 2009 -0800

  Fix spelling error on Jam doc page (cummulative -> cumulative).

  [git-p4: depot-paths = "//public/jam/src/": change = 7304]

You can see that git-p4 has left an identifier in each commit message. It’s fine to keep that identifier
there, in case you need to reference the Perforce change number later. However, if you’d like to
remove the identifier, now is the time to do so – before you start doing work on the new repository.
You can use git filter-branch to remove the identifier strings en masse:

$ git filter-branch --msg-filter 'sed -e "/^\[git-p4:/d"'
Rewrite e5da1c909e5db3036475419f6379f2c73710c4e6 (125/125)
Ref 'refs/heads/master' was rewritten

If you run git log, you can see that all the SHA-1 checksums for the commits have changed, but the
git-p4 strings are no longer in the commit messages:

$ git log -2
commit b17341801ed838d97f7800a54a6f9b95750839b7
Author: giles <giles@[email protected]>
Date: Wed Feb 8 03:13:27 2012 -0800

  Correction to line 355; change </UL> to </OL>.

commit 3e68c2e26cd89cb983eb52c024ecdfba1d6b3fff
Author: kwirth <[email protected]>
Date: Tue Jul 7 01:35:51 2009 -0800

  Fix spelling error on Jam doc page (cummulative -> cumulative).

Your import is ready to push up to your new Git server.

TFS

If your team is converting their source control from TFVC to Git, you’ll want the highest-fidelity

427

conversion you can get. This means that, while we covered both git-tfs and git-tf for the interop
section, we’ll only be covering git-tfs for this part, because git-tfs supports branches, and this is
prohibitively difficult using git-tf.

 This is a one-way conversion. The resulting Git repository won’t be able to connect
with the original TFVC project.

The first thing to do is map usernames. TFVC is fairly liberal with what goes into the author field for
changesets, but Git wants a human-readable name and email address. You can get this information
from the tf command-line client, like so:

PS> tf history $/myproject -recursive > AUTHORS_TMP

This grabs all of the changesets in the history of the project and put it in the AUTHORS_TMP file that
we will process to extract the data of the User column (the 2nd one). Open the file and find at which
characters start and end the column and replace, in the following command-line, the parameters
11-20 of the cut command with the ones found:

PS> cat AUTHORS_TMP | cut -b 11-20 | tail -n+3 | sort | uniq > AUTHORS

The cut command keeps only the characters between 11 and 20 from each line. The tail command
skips the first two lines, which are field headers and ASCII-art underlines. The result of all of this is
piped to sort and uniq to eliminate duplicates, and saved to a file named AUTHORS. The next step is
manual; in order for git-tfs to make effective use of this file, each line must be in this format:

DOMAIN\username = User Name <[email protected]>

The portion on the left is the “User” field from TFVC, and the portion on the right side of the equals
sign is the user name that will be used for Git commits.

Once you have this file, the next thing to do is make a full clone of the TFVC project you’re
interested in:

PS> git tfs clone --with-branches --authors=AUTHORS
https://username.visualstudio.com/DefaultCollection $/project/Trunk project_git

Next you’ll want to clean the git-tfs-id sections from the bottom of the commit messages. The
following command will do that:

PS> git filter-branch -f --msg-filter 'sed "s/^git-tfs-id:.*$//g"' '--' --all

That uses the sed command from the Git-bash environment to replace any line starting with “git-tfs-
id:” with emptiness, which Git will then ignore.

428

Once that’s all done, you’re ready to add a new remote, push all your branches up, and have your
team start working from Git.

A Custom Importer

If your system isn’t one of the above, you should look for an importer online – quality importers are
available for many other systems, including CVS, Clear Case, Visual Source Safe, even a directory of
archives. If none of these tools works for you, you have a more obscure tool, or you otherwise need
a more custom importing process, you should use git fast-import. This command reads simple
instructions from stdin to write specific Git data. It’s much easier to create Git objects this way than
to run the raw Git commands or try to write the raw objects (see Git Internals for more
information). This way, you can write an import script that reads the necessary information out of
the system you’re importing from and prints straightforward instructions to stdout. You can then
run this program and pipe its output through git fast-import.

To quickly demonstrate, you’ll write a simple importer. Suppose you work in current, you back up
your project by occasionally copying the directory into a time-stamped back_YYYY_MM_DD backup
directory, and you want to import this into Git. Your directory structure looks like this:

$ ls /opt/import_from
back_2014_01_02
back_2014_01_04
back_2014_01_14
back_2014_02_03
current

In order to import a Git directory, you need to review how Git stores its data. As you may
remember, Git is fundamentally a linked list of commit objects that point to a snapshot of content.
All you have to do is tell fast-import what the content snapshots are, what commit data points to
them, and the order they go in. Your strategy will be to go through the snapshots one at a time and
create commits with the contents of each directory, linking each commit back to the previous one.

As we did in An Example Git-Enforced Policy, we’ll write this in Ruby, because it’s what we
generally work with and it tends to be easy to read. You can write this example pretty easily in
anything you’re familiar with – it just needs to print the appropriate information to stdout. And, if
you are running on Windows, this means you’ll need to take special care to not introduce carriage
returns at the end your lines – git fast-import is very particular about just wanting line feeds (LF)
not the carriage return line feeds (CRLF) that Windows uses.

To begin, you’ll change into the target directory and identify every subdirectory, each of which is a
snapshot that you want to import as a commit. You’ll change into each subdirectory and print the
commands necessary to export it. Your basic main loop looks like this:

429

last_mark = nil

# loop through the directories
Dir.chdir(ARGV[0]) do
  Dir.glob("*").each do |dir|
  next if File.file?(dir)

  # move into the target directory
  Dir.chdir(dir) do
  last_mark = print_export(dir, last_mark)
  end
  end
end

You run print_export inside each directory, which takes the manifest and mark of the previous
snapshot and returns the manifest and mark of this one; that way, you can link them properly.
“Mark” is the fast-import term for an identifier you give to a commit; as you create commits, you
give each one a mark that you can use to link to it from other commits. So, the first thing to do in
your print_export method is generate a mark from the directory name:

mark = convert_dir_to_mark(dir)

You’ll do this by creating an array of directories and using the index value as the mark, because a
mark must be an integer. Your method looks like this:

$marks = []
def convert_dir_to_mark(dir)
  if !$marks.include?(dir)
  $marks << dir
  end
  ($marks.index(dir) + 1).to_s
end

Now that you have an integer representation of your commit, you need a date for the commit
metadata. Because the date is expressed in the name of the directory, you’ll parse it out. The next
line in your print_export file is:

date = convert_dir_to_date(dir)

where convert_dir_to_date is defined as:

430

def convert_dir_to_date(dir)
  if dir == 'current'
  return Time.now().to_i
  else
  dir = dir.gsub('back_', '')
  (year, month, day) = dir.split('_')
  return Time.local(year, month, day).to_i
  end
end

That returns an integer value for the date of each directory. The last piece of meta-information you
need for each commit is the committer data, which you hardcode in a global variable:

$author = 'John Doe <[email protected]>'

Now you’re ready to begin printing out the commit data for your importer. The initial information
states that you’re defining a commit object and what branch it’s on, followed by the mark you’ve
generated, the committer information and commit message, and then the previous commit, if any.
The code looks like this:

# print the import information
puts 'commit refs/heads/master'
puts 'mark :' + mark
puts "committer #{$author} #{date} -0700"
export_data('imported from ' + dir)
puts 'from :' + last_mark if last_mark

You hardcode the time zone (-0700) because doing so is easy. If you’re importing from another
system, you must specify the time zone as an offset. The commit message must be expressed in a
special format:

data (size)\n(contents)

The format consists of the word data, the size of the data to be read, a newline, and finally the data.
Because you need to use the same format to specify the file contents later, you create a helper
method, export_data:

def export_data(string)
  print "data #{string.size}\n#{string}"
end

All that’s left is to specify the file contents for each snapshot. This is easy, because you have each
one in a directory – you can print out the deleteall command followed by the contents of each file
in the directory. Git will then record each snapshot appropriately:

431

puts 'deleteall'
Dir.glob("**/*").each do |file|
  next if !File.file?(file)
  inline_data(file)
end

Note: Because many systems think of their revisions as changes from one commit to another, fast-
import can also take commands with each commit to specify which files have been added,
removed, or modified and what the new contents are. You could calculate the differences between
snapshots and provide only this data, but doing so is more complex – you may as well give Git all
the data and let it figure it out. If this is better suited to your data, check the fast-import man page
for details about how to provide your data in this manner.

The format for listing the new file contents or specifying a modified file with the new contents is as
follows:

M 644 inline path/to/file
data (size)
(file contents)

Here, 644 is the mode (if you have executable files, you need to detect and specify 755 instead), and
inline says you’ll list the contents immediately after this line. Your inline_data method looks like
this:

def inline_data(file, code = 'M', mode = '644')
  content = File.read(file)
  puts "#{code} #{mode} inline #{file}"
  export_data(content)
end

You reuse the export_data method you defined earlier, because it’s the same as the way you
specified your commit message data.

The last thing you need to do is to return the current mark so it can be passed to the next iteration:

return mark

 If you are running on Windows you’ll need to make sure that you add one extra
step. As mentioned before, Windows uses CRLF for new line characters while git
fast-import expects only LF. To get around this problem and make git fast-import
happy, you need to tell ruby to use LF instead of CRLF:

$stdout.binmode

432

That’s it. Here’s the script in its entirety:

#!/usr/bin/env ruby

$stdout.binmode
$author = "John Doe <[email protected]>"

$marks = []
def convert_dir_to_mark(dir)
  if !$marks.include?(dir)
  $marks << dir
  end
  ($marks.index(dir)+1).to_s
end

def convert_dir_to_date(dir)
  if dir == 'current'
  return Time.now().to_i
  else
  dir = dir.gsub('back_', '')
  (year, month, day) = dir.split('_')
  return Time.local(year, month, day).to_i
  end
end

def export_data(string)
  print "data #{string.size}\n#{string}"
end

def inline_data(file, code='M', mode='644')
  content = File.read(file)
  puts "#{code} #{mode} inline #{file}"
  export_data(content)
end

def print_export(dir, last_mark)
  date = convert_dir_to_date(dir)
  mark = convert_dir_to_mark(dir)

  puts 'commit refs/heads/master'
  puts "mark :#{mark}"
  puts "committer #{$author} #{date} -0700"
  export_data("imported from #{dir}")
  puts "from :#{last_mark}" if last_mark

  puts 'deleteall'
  Dir.glob("**/*").each do |file|
  next if !File.file?(file)
  inline_data(file)
  end

433

  mark
end

# Loop through the directories
last_mark = nil
Dir.chdir(ARGV[0]) do
  Dir.glob("*").each do |dir|
  next if File.file?(dir)

  # move into the target directory
  Dir.chdir(dir) do
  last_mark = print_export(dir, last_mark)
  end
  end
end

If you run this script, you’ll get content that looks something like this:

$ ruby import.rb /opt/import_from
commit refs/heads/master
mark :1
committer John Doe <[email protected]> 1388649600 -0700
data 29
imported from back_2014_01_02deleteall
M 644 inline README.md
data 28
# Hello

This is my readme.
commit refs/heads/master
mark :2
committer John Doe <[email protected]> 1388822400 -0700
data 29
imported from back_2014_01_04from :1
deleteall
M 644 inline main.rb
data 34
#!/bin/env ruby

puts "Hey there"
M 644 inline README.md
(...)

To run the importer, pipe this output through git fast-import while in the Git directory you want to
import into. You can create a new directory and then run git init in it for a starting point, and
then run your script:

434

$ git init

Initialized empty Git repository in /opt/import_to/.git/

$ ruby import.rb /opt/import_from | git fast-import

git-fast-import statistics:

---------------------------------------------------------------------

Alloc'd objects: 5000

Total objects: 13 ( 6 duplicates )

  blobs : 5 ( 4 duplicates 3 deltas of 5
4
attempts) 0
0
  trees : 4 ( 1 duplicates 0 deltas of

attempts)

  commits: 4 ( 1 duplicates 0 deltas of

attempts)

  tags : 0 ( 0 duplicates 0 deltas of

attempts)

Total branches: 1 ( 1 loads )

  marks: 1024 ( 5 unique )

  atoms: 2

Memory total: 2344 KiB

  pools: 2110 KiB

  objects: 234 KiB

---------------------------------------------------------------------

pack_report: getpagesize() = 4096

pack_report: core.packedGitWindowSize = 1073741824

pack_report: core.packedGitLimit = 8589934592

pack_report: pack_used_ctr = 10

pack_report: pack_mmap_calls = 5

pack_report: pack_open_windows = 2/ 2

pack_report: pack_mapped = 1457 / 1457

---------------------------------------------------------------------

As you can see, when it completes successfully, it gives you a bunch of statistics about what it
accomplished. In this case, you imported 13 objects total for 4 commits into 1 branch. Now, you can
run git log to see your new history:

$ git log -2
commit 3caa046d4aac682a55867132ccdfbe0d3fdee498
Author: John Doe <[email protected]>
Date: Tue Jul 29 19:39:04 2014 -0700

  imported from current

commit 4afc2b945d0d3c8cd00556fbe2e8224569dc9def
Author: John Doe <[email protected]>
Date: Mon Feb 3 01:00:00 2014 -0700

  imported from back_2014_02_03

435

There you go – a nice, clean Git repository. It’s important to note that nothing is checked out – you
don’t have any files in your working directory at first. To get them, you must reset your branch to
where master is now:

$ ls
$ git reset --hard master
HEAD is now at 3caa046 imported from current
$ ls
README.md main.rb
You can do a lot more with the fast-import tool – handle different modes, binary data, multiple
branches and merging, tags, progress indicators, and more. A number of examples of more
complex scenarios are available in the contrib/fast-import directory of the Git source code.

Summary

You should feel comfortable using Git as a client for other version-control systems, or importing
nearly any existing repository into Git without losing data. In the next chapter, we’ll cover the raw
internals of Git so you can craft every single byte, if need be.

436

Git Internals

You may have skipped to this chapter from a much earlier chapter, or you may have gotten here
after sequentially reading the entire book up to this point — in either case, this is where we’ll go
over the inner workings and implementation of Git. We found that understanding this information
was fundamentally important to appreciating how useful and powerful Git is, but others have
argued to us that it can be confusing and unnecessarily complex for beginners. Thus, we’ve made
this discussion the last chapter in the book so you could read it early or later in your learning
process. We leave it up to you to decide.

Now that you’re here, let’s get started. First, if it isn’t yet clear, Git is fundamentally a content-
addressable filesystem with a VCS user interface written on top of it. You’ll learn more about what
this means in a bit.

In the early days of Git (mostly pre 1.5), the user interface was much more complex because it
emphasized this filesystem rather than a polished VCS. In the last few years, the UI has been refined
until it’s as clean and easy to use as any system out there; however, the stereotype lingers about the
early Git UI that was complex and difficult to learn.

The content-addressable filesystem layer is amazingly cool, so we’ll cover that first in this chapter;
then, you’ll learn about the transport mechanisms and the repository maintenance tasks that you
may eventually have to deal with.

Plumbing and Porcelain

This book covers primarily how to use Git with 30 or so subcommands such as checkout, branch,
remote, and so on. But because Git was initially a toolkit for a version control system rather than a
full user-friendly VCS, it has a number of subcommands that do low-level work and were designed
to be chained together UNIX-style or called from scripts. These commands are generally referred to
as Git’s “plumbing” commands, while the more user-friendly commands are called “porcelain”
commands.

As you will have noticed by now, this book’s first nine chapters deal almost exclusively with
porcelain commands. But in this chapter, you’ll be dealing mostly with the lower-level plumbing
commands, because they give you access to the inner workings of Git, and help demonstrate how
and why Git does what it does. Many of these commands aren’t meant to be used manually on the
command line, but rather to be used as building blocks for new tools and custom scripts.

When you run git init in a new or existing directory, Git creates the .git directory, which is where
almost everything that Git stores and manipulates is located. If you want to back up or clone your
repository, copying this single directory elsewhere gives you nearly everything you need. This
entire chapter basically deals with what you can see in this directory. Here’s what a newly-
initialized .git directory typically looks like:

437

$ ls -F1
config
description
HEAD
hooks/
info/
objects/
refs/

Depending on your version of Git, you may see some additional content there, but this is a fresh git
init repository — it’s what you see by default. The description file is used only by the GitWeb
program, so don’t worry about it. The config file contains your project-specific configuration
options, and the info directory keeps a global exclude file for ignored patterns that you don’t want
to track in a .gitignore file. The hooks directory contains your client- or server-side hook scripts,
which are discussed in detail in Git Hooks.

This leaves four important entries: the HEAD and (yet to be created) index files, and the objects and
refs directories. These are the core parts of Git. The objects directory stores all the content for your
database, the refs directory stores pointers into commit objects in that data (branches, tags,
remotes and more), the HEAD file points to the branch you currently have checked out, and the index
file is where Git stores your staging area information. You’ll now look at each of these sections in
detail to see how Git operates.

Git Objects

Git is a content-addressable filesystem. Great. What does that mean? It means that at the core of Git
is a simple key-value data store. What this means is that you can insert any kind of content into a
Git repository, for which Git will hand you back a unique key you can use later to retrieve that
content.

As a demonstration, let’s look at the plumbing command git hash-object, which takes some data,
stores it in your .git/objects directory (the object database), and gives you back the unique key that
now refers to that data object.

First, you initialize a new Git repository and verify that there is (predictably) nothing in the objects
directory:

$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test
$ find .git/objects
.git/objects
.git/objects/info
.git/objects/pack
$ find .git/objects -type f

Git has initialized the objects directory and created pack and info subdirectories in it, but there are

438

no regular files. Now, let’s use git hash-object to create a new data object and manually store it in
your new Git database:

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

In its simplest form, git hash-object would take the content you handed to it and merely return the
unique key that would be used to store it in your Git database. The -w option then tells the command
to not simply return the key, but to write that object to the database. Finally, the --stdin option tells
git hash-object to get the content to be processed from stdin; otherwise, the command would
expect a filename argument at the end of the command containing the content to be used.

The output from the above command is a 40-character checksum hash. This is the SHA-1 hash — a
checksum of the content you’re storing plus a header, which you’ll learn about in a bit. Now you
can see how Git has stored your data:

$ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

If you again examine your objects directory, you can see that it now contains a file for that new
content. This is how Git stores the content initially — as a single file per piece of content, named
with the SHA-1 checksum of the content and its header. The subdirectory is named with the first 2
characters of the SHA-1, and the filename is the remaining 38 characters.

Once you have content in your object database, you can examine that content with the git cat-file
command. This command is sort of a Swiss army knife for inspecting Git objects. Passing -p to cat-
file instructs the command to first figure out the type of content, then display it appropriately:

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content

Now, you can add content to Git and pull it back out again. You can also do this with content in files.
For example, you can do some simple version control on a file. First, create a new file and save its
contents in your database:

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30

Then, write some new content to the file, and save it again:

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a

439

Your object database now contains both versions of this new file (as well as the first content you
stored there):

$ find .git/objects -type f
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

At this point, you can delete your local copy of that test.txt file, then use Git to retrieve, from the
object database, either the first version you saved:

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt
$ cat test.txt
version 1

or the second version:

$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt
$ cat test.txt
version 2

But remembering the SHA-1 key for each version of your file isn’t practical; plus, you aren’t storing
the filename in your system — just the content. This object type is called a blob. You can have Git tell
you the object type of any object in Git, given its SHA-1 key, with git cat-file -t:

$ git cat-file -t 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
blob

Tree Objects

The next type of Git object we’ll examine is the tree, which solves the problem of storing the
filename and also allows you to store a group of files together. Git stores content in a manner
similar to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects,
with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes
or file contents. A single tree object contains one or more entries, each of which is the SHA-1 hash
of a blob or subtree with its associated mode, type, and filename. For example, the most recent tree
in a project may look something like this:

$ git cat-file -p master^{tree} README
100644 blob a906cb2a4a904a152e80877d4088654daad0c859 Rakefile
100644 blob 8f94139338f9404f26296befa88755fc2598c289 lib
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0

The master^{tree} syntax specifies the tree object that is pointed to by the last commit on your

440

master branch. Notice that the lib subdirectory isn’t a blob but a pointer to another tree:

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0

100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b simplegit.rb

 Depending on what shell you use, you may encounter errors when using the
master^{tree} syntax.

In CMD on Windows, the ^ character is used for escaping, so you have to double it
to avoid this: git cat-file -p master^^{tree}. When using PowerShell, parameters
using {} characters have to be quoted to avoid the parameter being parsed
incorrectly: git cat-file -p 'master^{tree}'.

If you’re using ZSH, the ^ character is used for globbing, so you have to enclose the
whole expression in quotes: git cat-file -p "master^{tree}".

Conceptually, the data that Git is storing looks something like this:

Figure 149. Simple version of the Git data model.

You can fairly easily create your own tree. Git normally creates a tree by taking the state of your
staging area or index and writing a series of tree objects from it. So, to create a tree object, you first
have to set up an index by staging some files. To create an index with a single entry — the first
version of your test.txt file — you can use the plumbing command git update-index. You use this

441

command to artificially add the earlier version of the test.txt file to a new staging area. You must
pass it the --add option because the file doesn’t yet exist in your staging area (you don’t even have a
staging area set up yet) and --cacheinfo because the file you’re adding isn’t in your directory but is
in your database. Then, you specify the mode, SHA-1, and filename:

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt

In this case, you’re specifying a mode of 100644, which means it’s a normal file. Other options are
100755, which means it’s an executable file; and 120000, which specifies a symbolic link. The mode is
taken from normal UNIX modes but is much less flexible — these three modes are the only ones
that are valid for files (blobs) in Git (although other modes are used for directories and
submodules).

Now, you can use git write-tree to write the staging area out to a tree object. No -w option is
needed — calling this command automatically creates a tree object from the state of the index if
that tree doesn’t yet exist:

$ git write-tree

d8329fc1cc938780ffdd9f94e0d364e0ea74f579

$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579

100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt

You can also verify that this is a tree object using the same git cat-file command you saw earlier:

$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree

You’ll now create a new tree with the second version of test.txt and a new file as well:

$ echo 'new file' > new.txt
$ git update-index --add --cacheinfo 100644 \
  1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
$ git update-index --add new.txt

Your staging area now has the new version of test.txt as well as the new file new.txt. Write out
that tree (recording the state of the staging area or index to a tree object) and see what it looks like:

$ git write-tree

0155eb4229851634a0f03eb265b69f5a2d56f341

$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341

100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt

100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt

442

Notice that this tree has both file entries and also that the test.txt SHA-1 is the “version 2” SHA-1
from earlier (1f7a7a). Just for fun, you’ll add the first tree as a subdirectory into this one. You can
read trees into your staging area by calling git read-tree. In this case, you can read an existing tree
into your staging area as a subtree by using the --prefix option with this command:

$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579

$ git write-tree

3c4e9cd789d88d8d89c1073707c3585e41b0e614

$ git cat-file -p 3c4e9cd789d88d8d89c1073707c3585e41b0e614

040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579 bak

100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt

100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt

If you created a working directory from the new tree you just wrote, you would get the two files in
the top level of the working directory and a subdirectory named bak that contained the first version
of the test.txt file. You can think of the data that Git contains for these structures as being like this:

Figure 150. The content structure of your current Git data.

Commit Objects

If you’ve done all of the above, you now have three trees that represent the different snapshots of
your project that you want to track, but the earlier problem remains: you must remember all three
SHA-1 values in order to recall the snapshots. You also don’t have any information about who saved

443

the snapshots, when they were saved, or why they were saved. This is the basic information that
the commit object stores for you.
To create a commit object, you call commit-tree and specify a single tree SHA-1 and which commit
objects, if any, directly preceded it. Start with the first tree you wrote:

$ echo 'First commit' | git commit-tree d8329f
fdf4fc3344e67ab068f836878b6c4951e3b15f3d

You will get a different hash value because of different creation time and author data. Replace
commit and tag hashes with your own checksums further in this chapter. Now you can look at your
new commit object with git cat-file:

$ git cat-file -p fdf4fc3
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <[email protected]> 1243040974 -0700
committer Scott Chacon <[email protected]> 1243040974 -0700
First commit

The format for a commit object is simple: it specifies the top-level tree for the snapshot of the
project at that point; the parent commits if any (the commit object described above does not have
any parents); the author/committer information (which uses your user.name and user.email
configuration settings and a timestamp); a blank line, and then the commit message.
Next, you’ll write the other two commit objects, each referencing the commit that came directly
before it:

$ echo 'Second commit' | git commit-tree 0155eb -p fdf4fc3
cac0cab538b970a37ea1e769cbbde608743bc96d
$ echo 'Third commit' | git commit-tree 3c4e9c -p cac0cab
1a410efbd13591db07496601ebc7a059dd55cfe9

Each of the three commit objects points to one of the three snapshot trees you created. Oddly
enough, you have a real Git history now that you can view with the git log command, if you run it
on the last commit SHA-1:

444


Click to View FlipBook Version