Category: All Posts

Archiving Changed Files in Git

Intro – Back to the Command Line

Believe it or not my favorite operating system was always VMS (later, OpenVMS), partly because of the clear and elegant online HELP facility, and partly due to the clear and elegant DCL command language.  I’ve used various flavors of Unix/Linux, including AIX, but while I got my work done, somehow I never quite got the hang of the MAN page format.  HELP in VMS always provides numerous examples and I found it much easier to wrap my head around them than the arcane lists of parameters favored on the *ix platforms.

In my current work, we’re using git for source control.  Git is a distributed version control system (DVCS) that is used on a wide variety of proprietary and open-source projects including the massive Linux kernel.  For more on why we like Git, please see the article “Why Git is better than X” (where X is any non-DVCS system). is a very nice hosted git service with many features — issue tracking and wiki among them — that make for a very nice set of software development tools.

Most people using git on windows use the freely-available msysgit.  It has some GUI tools, but is mostly a command-line implementation, and that command-line is the same bash shell that is used on most linux systems.  Thus, I find myself back in the world of command-line interfaces for much of my day-to-day work — and I have to say that I am finding a lot of power and utility in the command line version.  And unlike back in my VMS days, I can find help in a lot more places than from MAN pages — with Google, I can simply search for a string, and immediately have several examples to use to learn a particular feature or function.

By the way, to learn more about git, try the Pro Git book (free online but please donate to the author if you like and use it).  You can also refer to these instructions for getting started with git and

Files in the Cloud

This week we moved all of our graphics to the cloud — storage on RackSpace’s Cloud Files service and content delivery via a content delivery network (CDN).  We put all the images in a separate git repository (or ‘repo’ as the kids say), and can check in (‘commit’) versions as we add and change images.  Thus, we can deploy new images to cloud storage without changing the web code, and make the images available on the CDN.  A side benefit is that at scale, all image serving is kept off of the main web servers, leaving more performance capacity for the actual web traffic.

As I was setting this up, I quickly discovered the git archive command, which creates a nice zip or tar archive of files in your repository.  However, I wasn’t able to create an archive containing only the changes — what files were new or changed since the last deployment?  Right now, we have only a few hundred images and I could easily deploy the whole set each time, but that seemed both wasteful and risky — in a production system, I only like to change what needs to be changed, sort of a ‘minimally-invasive’ deployment strategy for software.

So after many Google searches, and much experimentation on the command line, I came up with the following process:

Initial Deployment

  • Commit your images to git. When you are ready to deploy, create a tag (I use tag names like “FirstDeployToCDN” but use what you like). Create an archive of the entire repository based on the tag:

$git archive --format tar -o FirstDeployToCDN.tar FirstDeployToCDN

This says to do the following:

  • git archive —format tar — Create an archive using the ‘tar’ format
  • -o FirstDeployToCDN.tar — Call the archive FirstDeployToCDN.tar
  • FirstDeployToCDN — Create the archive based on what is in the FirstDeployToCDN tag

In other words, this command creates a file called FirstDeployToCDN.tar containing all the files in FirstDeployToCDN tag.

Tip: if you want the archive name to contain a datestamp, you can use a command line trick to insert a formatted date string.  The bash ‘date’ command can be used with a format string, and you can insert those results in another command using the back-tick delimeters “`” (backwards apostrophe), to wit:

$git archive --format tar -o FirstDeployToCDN-`date +%Y-%m-%dT%H%M%Z`.tar FirstDeployToCDN

This would create a file called something like  FirstDeployToCDN-2010-08-26-2010-08-28T1114EDT.tar, which is a mouthful, but also contains info about when it was created.

  • Anyway, once you have your archive file, you can back it up, or extract the files to your server or staging area, then copy the files to the CDN/Cloud storage location.  For the RackSpace Cloud Files service, I find FireUploader to be very effective and efficient.

Uploading Changes (the point of this article)

  • Make changes to, add, and remove images, commiting as you go.  When you are ready to deploy an incremental set of images, you need to commit everything, adding a tag if you wish.  Remember, you want the changes since the last deployment (i.e., since that last tag, in our case FirstDeployToCDN).  The current state of commited files is represented in git by “HEAD”, so here’s the command I use to pull out the changes since the FirstDeployToCDN tag
$git archive --format tar -o cdn-diff-FirstDeployToCDN-`date +%Y-%m-%dT%H%M%Z`.tar HEAD `git diff FirstDeployToCDN --name-only`

So what’s this command all about?  Let’s break it down:

  • git archive —format tar — Ok, we’re creating a tar-format archive from git
  • -o cdn-diff-FirstDeployToCDN-`date +%Y-%m-%dT%H%M%Z`.tar — The output file will be called ‘cdn-diff-” plus the name of our prior tag, plus the current timestamp including timezone, plus the “.tar” extension.
  • HEAD — this says we’re pulling files out of the current latest commit in the current branch — the latest updates
  • `git diff FirstDeployToCDN –name-only` — this is a bit more command-line trickery, delimited with back-ticks (backwards apostrophes — ` not ‘).  This says, “Git, please give me a list of all changes between the current repository state (aka HEAD) and the set of files represented by the git tag FirstDeployToCDN” — that is, the changed and added files.  That list is, in turn, passed to the git archive command as a list of files to put in the archive.
  • So now you have a new file, called something like cdn-diff-since-FirstDeployToCDN-2010-08-28T1215EDT.tar.  It contains all the changed and added files since the FirstDeployToCDN tag, as of  August 28 at 12:15 EDT.

Final Notes

Of course this is a lot of typing, so I’ve created a script to handle most of this.  Please note that the script needs to be run from within your CDN repository, and assumes you have committed everything so that you want a diff between the specified tag and HEAD.  No warranty implied, as is, use at your own risk, YMMV, comments and suggestions welcome.

# Creates a dated tar file archive containing all changes between HEAD and
# a specified a git tag.
# If this code works, it is Copyright 2010 Carl Leubsdorf, Jr., ► Problem Solved.
# If not, author is unknown.

if [[  -n $1 ]]; then



    echo "usage: $0 TAG-NAME"



echo "Creating archive file for changes since tag $TAG

Archive Name: cdn-diff-$TAG-`date +%Y-%m-%dT%H%M%Z`.tar


`git diff $TAG --name-only`


git archive --format tar -o cdn-diff-$TAG-`date +%Y-%m-%dT%H%M%Z`.tar HEAD `git diff $TAG --name-only`


BlackBerry 9700 + OS 6 = More to love

Some time ago, I posted on Tumblr about my love/hate relationship with my new BlackBerry 9700. I loved the phone (as a telephone — for actually making calls), the email, calendar, and contacts integration, and the form factor (fits into a pocket easily; durable).

I was not, however, enamored of two things, to wit:

The browser is just terrible. To be fixed in version 6.0 of the OS, but we shall see.
The Blackberry App Store experience is pretty horrendous.

I’ve recently installed several different leaked versions of BlackBerry OS 6 — the same OS that will come on the new BB 9780 as well as showing up on the new Torch touchscreen BB. OS 6 brings a tremendously-improved, WebKit based browser, as well as a snappier, smoother interface and updated applications.

My first few experiences with OS 6 were a bit flaky — SocialScope links didn’t work, for example, due to an OS bug with context menus — but even with that the other improvements made the 9700 feel like a brand new device. [Note: SocialScope is a fantastic Twitter/FaceBook/FourSquare client for BB. It’s still in beta but I have some invites available as of this writing if anyone’s interested.]

But with the recent release, the context menu bug was gone, and has meant an even greater improvement — faster browsing, fewer random errors, and the memory leaks and battery drain issues of earlier releases seem to be gone.

I should caveat that updating your BB device OS, particularly with a leaked, unofficial version, is not for the faint of heart. It can brick the device. It can cause you to lose all of your data. Strange and horrible things may occur. So, it makes sense to take some precautions before trying an update:

  • Check out the forum postings on the update. You can read others’ experiences with the update and decide if the benefits outweigh the risks. The version I’m using has some bugs reported, but most posters say it is working fine and a distinct improvement over the 380 release.
  • Carefully review the information on updating an OS
  • Use AppLoader or Desktop Manager to back up everything ahead the update
  • Download and install the new OS — scan it for malware of course! — using links from a somewhat trusted source such as CrackBerry
  • Be prepared to reinstall the OS in case your BB gets bricked or downgrade to a prior OS if you don’t like the new version.

So, what about those things I ‘hated’ about my BlackBerry? Well, the browser is fast, loads pages very well, and can zoom into text almost as well as the iOS browser. And the BB AppStore, while not nearly as slick as iTunes and the on-device iOS purchase experience, is relatively painless — one login and password to download an purchase apps, and a choice of carrier or PayPal billing.

All in all, I’m pretty satisfied with the 9700+OS 6 combo — enough to stick with the BB for a while. As an aside, I had a chance to look at and briefly use a BB Torch (9800). Nice big screen but a bit too thick and heavy compared with the 9700.

For comprehensive information on the latest unofficial (and official) BB OS releases, visit the CrackBerry OS Superpage.

KVS Availability Tool

The KVS Availability Tool is used to search for flight and seat availability by class, award/upgrade status, route, and date.  It is absolutely invaluable for finding award and upgrade inventory — just the thing to use up those extra frequent flyer miles.  We prefer the Platinum edition which grants access to almost all features for a 'contribution' of $15 for two months (or US$60/year).  The "Diamond" level adds round the world fares and routing routing rules.


We use Skype for instant messaging, cheap voice calls including very reasonable rates for international calls, and video conferencing.  I've long been a fan of face-to-face communication as I think you lose something from email-only interactions.  And while phone calls are a step up from text-only media such as email and IM, there's no replacement for the visual cues and context that video can provide. In-person meetings are better still, but the logistics and costs of meeting frequently in the same physical location can be prohibitive for geographically-distributed teams.

In the past, I've used Tandberg video units, which are easy to use, relatively simple to set up and function very well, but their cost is extremely high — list price almost USD 3k for the least expensive unit!  With Skype and a good camera like the Microsoft LifeCam HD Cinema (about $50 with noise-cancelling microphone and true 720p HD quality), you can have a great video-conferencing setup for 5% of the cost of the cheapest Tandberg.

WordPress – A great tool for building sites

This site is built using WordPress. For most companies, a corporate site can become a distraction from their core business — particularly for Internet companies that have a ‘real’ product to build. Using WordPress for content sites means no time spent coding and no wasting of valuable product and technology team focus. And the wide variety of themes, templates, and plugins for WordPress, with the rich ecosystem of developers and creative agencies that has sprung up to support them, make it a great choice for content sites of all types, sizes, and audiences.