Code writer, beat mixer, and comic book reader at GitHub.

© 2016. All rights reserved.

The Journey for Large Files on GitHub

Picture of me on the Iron Throne at GDC 2012

I am a big fan of video games, so I jumped at the chance to work the GitHub booth at the Game Developer’s Conference (GDC) in 2012. It was my first time representing the company for a community that I’m an outsider of. I planned on spending my time explaining the merits of social coding, while sneaking out during breaks to check out some sessions about video game development.

However, the second most frequent question I was asked was, “how can I work with large files?”. This got me wondering: is there something that I can do to get these developers using GitHub?

As a programmer that builds web applications, it’s all too easy to take Git for granted. A Git repository tracks every change of every file for the life time of a project. I can go to any line of code and track down who, when and why the specific change was made. I can link it to a pull request, which contains the proposal, review, and tweaks around change to the application.

However, not all developers are able to use Git because of technical and workflow limitations. Some projects work with large binary files, such as audio, 3D models, high res graphics, etc. Yet, doing this on GitHub was a pretty terrible user experience. If you attempt to upload a file over 100MB, you get blocked with this error message:

remote: warning: Large files detected.  
remote: error: File big-file is 123.00 MB; this exceeds GitHub's file size limit of 100 MB

This means that you’ll have to run some gnarly terminal commands to fix something before being able to sync your work. While there are legit technical reasons for this, it doesn’t take away from the fact that GitHub was getting in the way of their work.

Over the next three years, my team participated in a larger effort to research and build the Git Large File Storage (Git LFS) tool to solve this problem.

It turns out that GitHub’s own “Iron Man”, Scott Chacon, had already experimented with this in a project called Git Media. It uses internal hooks inside Git to intercept access to large files, storing them in various cloud storage services (like S3), instead of the Git repository. It’s an interesting prototype, but the game dev community was not using it. Surely we could easily release this as an official GitHub product. I told a colleague at GDC that we’d be back next year with something. Ha!

I pitched this idea at an internal mini summit for the GitHub Product team in late 2012, hoping to convince others to join me in building this. Unfortunately, there were too many important projects to tackle. I needed another way to convince the company that we should be pursuing this.

Around the same time, GitHub hired its first User Experience Researcher: Chrissie Brodigan. She gave a talk about what she does, and how she’s going to turn all of us into UX researchers. This sounded insane to me. I get all the feedback I need from emails and Twitter! So, I emailed this crazy woman to see if she could help with a new feature we were about to launch.

Interviews turned to actionable feedback, which resulted in some important adjustments to the feature before its launch. With the success of this small research project behind us, we turned towards a more ambitious question: how can we enable developers to work with large files on GitHub?

We interviewed a diverse set of candidates (according to industry, use case, and location) about their large file use. A common set of themes emerged, and were published to the rest of the company in OctoStudy 8. These interviews informed a list of suggestions and aspirations, which directly influenced the design of what would eventually become Git LFS.

Being involved in these two UXR reports turned out to be the best experience I’ve had at GitHub. I’m thrilled to see Chrissie’s growing team move on to more challenging and important studies for the company. Read more about that in her post, New Year, New User Journeys.

This UXR report was enough to get approval for the project. Scott Barron joined me on the new Storage Team in late 2013. Finally, it was time to get to work!

In addition to slinging code, we had to coordinate with other internal teams at GitHub. This was a unique product launch that touched so many parts of the company. We setup a weekly video chat to keep interested parties in the loop. The attendees of each meeting varied wildly, as a team’s involvement would begin and end.

  • The Billing Team, in the midst of their own large refactoring, implemented the crucial bits to our payment system that we needed.
  • The Creative Team produced awesome graphics and videos to help promote the project.
  • The Communications Team produced this amazing ad for LFS.
  • The Docs Team keep the Git LFS documentation updated on help.github.com.
  • The Legal and Outreach teams helped nail down the open source licensing, code of conduct, and CLA tracker.
  • The Marketing Team found a suitable place to launch, and helped with the external website and messaging.
  • The Native Team defined what Git LFS needs to integrate with GitHub Desktop and other similar Git tools.
  • The Security and Infrastructure teams reviewed the architecture of the backend systems, making sure we’re doing things responsibly.
  • The Support Team reviewed the product, looking for common support scenarios that we would have to handle.
  • The Training Team was another valuable source of user feedback, and produced a great video and presentation about Git LFS.

I also met Saeed Noursalehi, a PM for Visual Studio Online at Microsoft, through both our companies’ involvement in the libgit2 project. They too were concerned with the same large file problem, and provided extremely valuable feedback on our early ideas and API based on their own observations.

We announced the Git LFS open source project at the Git Merge conference in April 2015. Pitching a tool that challenges Git’s decentralized model to a room full of Git enthusiasts and experts was intimidating. Overall, it went pretty well. No one threw tomatoes or called me names.

Coincidentally, John Garcia from Atlassian announced Git LOB, their solution for large files in Git, immediately after my presentation. The core ideas behind the projects were very similar, but their version wasn’t yet ready for public consumption.

Like most launches though, this was just the beginning. While the client was released, we were keeping the actual server component behind an Early Access program. Our UXR team used this to collect valuable feedback from a controlled group of beta users.

For six months, we refined specs, pushed bug fixes, and even redesigned the server API. Most importantly, the newly open sourced project saw outside contributors for the first time.

Steve Streeting, lead developer of Git LOB at Atlassian, reached out to us soon after Git LFS launched. We both agreed it made sense to put our weight behind a single solution, instead of competing head to head, fragmenting the community, and duplicating even more work. What really impressed me was his willingness to jump in, on a competitor’s site especially, and make major improvements to Git LFS.

The project also saw contributions from Andy Neff and Stephen Gelman, who focused on packaging Git LFS for Linux. They started with scripts to build packages for their respective distros, which evolved into internal tools for building and testing Linux packages for release on my Macbook Pro.

Then in October 2015, I shared a stage with Saeed and Steve at GitHub Universe to announce Git LFS v1.0, and its availability on GitHub.com. By the end of the year, Git LFS was supported by two other Git hosts, with support announced for a fourth soon.

Git LFS has launched, its Epstein drives are running. This year we will be using thrusters to adjust the project’s trajectory, making constant small tweaks. Fixing bugs and edge cases as they are discovered. Writing good documentation to help new users transition or start new projects with LFS. Documenting or automating the processes that run the project.

Most of all, I’m just excited about doing this in the open. I look forward to helping new and experienced people contribute to the project. Find us at the Git LFS repository or our chat room. If you’re interested in working on LFS or similar features with us at GitHub, let me know!

Izzy Kane, first human worthy of the Shiar Imperial Guard, boots up for the first time

(images taken from Avengers Vol. 5, #1 and #5 by Marvel Comics)

Go interfaces communicate intent

Interfaces are one of my favorite features of Go. When used properly in arguments, they tell you what a function is going to do with your object.

go // from io func Copy(dst Writer, src Reader) (written int64, err error)

Right away, you know Copy() is going to call dst.Write() and src.Read().

Interfaces in return types tell you what you can and should do with the object.

go // from os/exec func (c *Cmd) StdoutPipe() (io.ReadCloser, error) {

It’s unclear what type of object StdoutPipe() is returning, but I do know that I can read it. Since it also implements io.Closer, I know that I should probably close it somewhere.

This brings up a good rule of thumb when designing Go APIs. Prefer an io.Reader over an io.ReadCloser for arguments. Let the calling code handle its own resource cleanup. Simple enough. So what breaks this rule? Oh, my dumb passthrough package.

Here’s the intended way to use it:

go func main() { fakeResWriter := pseudoCodeForExample() res, _ := http.Get("SOMETHING") passthrough.Pass(res, fakeResWriter, 200) }

However, on a first glance without any knowledge of how the passthrough package works, you may be inclined to close the body manually.

```go func main() { fakeResWriter := pseudoCodeForExample() res, _ := http.Get(“SOMETHING”) // hopefully you’re not ignoring this possible error :)

// close body manually defer res.Body.Close()

// passthrough also closes it??? passthrough.Pass(res, fakeResWriter, 200) } ```

Now, you’re closing the Body twice. That’s not great.

Resource management is very important, so we commonly review code to ensure everything is handled properly. Helper functions that try to do too much like passthrough have caused us enough issues that I’ve rethought how I design Go packages. Don’t get in the way of idiomatic Go code.

Weather Lights

I recently spoke at the GitHub Patchwork event in Boulder last month. My son Nathan tagged along to get his first taste of the GitHub Flow. I don’t necessarily want him to be a programmer, but I do push him to learn a little to augment his interest in meteorology and astronomy.

The night was a success. He made it through the tutorial with only one complaint: the Patchwork credit went to my wife, who had created a GitHub login that night.

Since then, I’ve been looking for a project to continue his progress. I settled on a weather light, which consists of a ruby script that changes the color of a Philips Hue bulb. If you’re already an experienced coder, jump straight to the source at github.com/technoweenie/weatherhue.

NOTE: This post has been updated to match the latest version of the weatherhue script. You can still read the original blog post if you really want.


Unfortunately, there’s one hefty requirement: You need a Philips Hue light kit, which consists of a Hue bridge and a few lights. Once you have the kit, you’ll have to use the Hue API to create a user and figure out the ID of your light.

Next, you need to setup an account for the Weather2 API. There are a lot of services out there, but this one is free, supports JSON responses, and also gives simple forecasts. They allow 500 requests a day. If you set this script to run every 5 minutes, you’ll only use 288 requests.

After you’re done, you should have five values. Write these down somewhere.

  • HUE_API - The address of your Hue bridge. Probably something like “”
  • HUE_USER - The username you setup with the Hue API.
  • HUE_LIGHT - The ID of the Hue light. Probably 1-3.
  • WEATHER2_TOKEN - Your token for the weather2 API.
  • WEATHER2_QUERY - The latitude and longitude of your house. For example, Pikes Peak is at “38.8417832,-105.0438213.”

Finally, you need ruby, with the following gems: faraday and dotenv. If you’re on a version of ruby lower than 1.9, you’ll also want the json gem.

Writing the script

I’m going to describe the process I used to write the weatherhue.rb script. Due to the way ruby runs, it’s not necessarily in the order that the code is written. If you look at the file, you’ll see 4 sections:

  1. Lines requiring ruby gems.
  2. A few defined helper functions.
  3. A list of temperatures and their HSL values.
  4. Running code that gets the temperature and sets the light.

You’ll likely find yourself bouncing around as you write the various sections.

Step 1: Get the temperature

The first thing the script needs is the temperature. There are two ways to get it: through an argument in the script (useful for testing), or a Weather API. This is a simple script that pulls the current temperature from the API forecast results.

```ruby if temp = ARGV[0] # Get the temperature from the first argument. temp = temp.to_i else # Get the temperature from the weather2 api url = “http://www.myweather2.com/developer/forecast.ashx?uac=#{ENV[“WEATHER2_TOKEN”]}&temp_unit=f&output=json&query=#{ENV[“WEATHER2_QUERY”]}” res = Faraday.get(url) if res.status != 200 puts res.status puts res.body exit end

data = JSON.parse(res.body) temp = data[“weather”][“curren_weather”][0][“temp”].to_i end ```

Step 2: Choose a color based on the temperature

I wanted the color to match color ranges on local news forecasts.

Our initial attempt required used color math to calculate the color between set values in 5 degree increments. This required us to specify 25 colors between -20 and 100 degrees. When we did that, we noticed a pattern:

  1. The saturation and brightness values didn’t change much.
  2. The hue value started high and eventually went down to zero.

My son saw this, and suggested that we simply calculate the hue for a temperature, leaving the saturation and brightness values the same. So then I talked him through a simple algorithm based on some math concepts he’d learned.

First, we set an upper and lower bound that we wanted to track. We decided to track from -20 to 100. The Hue light takes values from 0 to 65535.

ruby HUE = { -20 => 60_000, # a deep purple 100 => 0, # bright red }

The #hue_for_temp method gets the color range of any of the highest mapped temperature below the actual temperatur. It then uses a ratio to get the hue based on a range of hues.

For example:

```ruby temp = 50 full_range = 120 # 100 - -20 temp_range = 60 # 40 - -20 temp_perc = temp_range / full_range.to_f

full_hue_range = 60_000 # HUE[-20] - HUE[100] hue_range = full_hue_range * temp_perc hue = min_hue - hue_range ```

The #hue_for_temp method lets us set hue values for any temperature we want, too. While checking the output colors, my son wanted to set 50 to green for “hoodie weather.” This means that 60 is a really light yellow. 70 is orange, meaning we can leave off any light jackets. This is the set of mapped temperatures that we ended with:

ruby HUE = { -20 => 60_000, 50 => 25_500, 100 => 0, }

Step 3: Set the light color

Now that we have the HSL values for the temperature, it’s time to set the Philip Hue light. First, create a state object for the light:

ruby state = { :on => true, :hue => hue_for_temp(temp), :sat => 255, :bri => 200, # performs a smooth transition to the new color for 1 second :transitiontime => 10, }

A simple HTTP PUT call will change the color.

ruby hueapi = Faraday.new ENV["HUE_API"] hueapi.put "/api/#{ENV["HUE_USER"]}/lights/#{ENV["HUE_LIGHT"]}/state", state.to_json

Step 4: Schedule the script

If you don’t want to set the environment variables each time, you can create a .env file in the root of the application.

WEATHER2_TOKEN=MONKEY WEATHER2_QUERY=38.8417832,-105.0438213 HUE_API= HUE_USER=technoweenie HUE_LIGHT=1

You can then run the script with dotenv:

$ dotenv ruby weatherhue.rb 75

A crontab can be used to run this every 5 minutes. Run crontab -e to add a new entry:

# note: put tabs between the `*` values
*/5 * * * * cd /path/to/script; dotenv ruby weatherhue.rb

Confirm the crontab with crontab -l.

Bonus Round

  1. Update the script to use the forecast for the day, and not the current temperature.
  2. Set a schedule that automatically only keeps the light on in the mornings when you actually care what the temperature will be.

I hope you enjoyed this little tutorial. I’d love to hear any experiences from working with it! Send me pictures or emails either to the GitHub issue for this post, or my email address.

Key/value logs in Go

I shipped GitHub’s first user-facing Go app a month ago: the Releases API upload endpoint. It’s a really simple, low traffic service to dip our toes in the Go waters. Before I could even think about shipping it though, I had to answer these questions:

  • How can I deploy a Go app?
  • Will it be fast enough?
  • Will I have any visibility into it?

The first two questions are simple enough. I worked with some Ops people on getting Go support in our Boxen and Puppet recipes. Considering how much time this app would spend in network requests, I knew that raw execution speed wasn’t going to be a factor. To help answer question 3, I wrote grohl, a combination logging, error reporting, and metrics library.

import "github.com/technoweenie/grohl"

A few months ago, we started using the scrolls Ruby gem for logging on GitHub.com. It’s a simple logger that writes out key/value logs:

app=myapp deploy=production fn=trap signal=TERM at=exit status=0

Logs are then indexed, giving us the ability to search logs for the first time. The next thing we did was added a unique X-GitHub-Request-Id header to every API request. This same request is sent down to internal systems, exception reporters, and auditors. We can use this to trace user problems across the entire system.

I knew my Go app had to be tied into the same systems to give me visibility: our exception tracker, statsd to record metrics into Graphite, and our log index. I wrote grohl to be the single source of truth for the app. Its default behavior is to just log everything, with the expectation that something would process them. Relevant lines are indexed, metrics are graphed, and exceptions are reported.

At GitHub, we’re not quite there yet. So, grohl exposes both an error reporting interface, and a statter interface (designed to work with g2s). Maybe you want to push metrics directly to statsd, or you want to push errors to a custom HTTP endpoint. It’s also nice that I can double check my app’s metrics and error reporting without having to spin up external services. They just show up in the development log like anything else.

Comments are on reddit.

One HTTP Handler to rule them all

Justinas Stankevičius wrote a post about writing HTTP middleware in Go. Having seen how Rack changed the Ruby web framework landscape, I’m glad Go has simple HTTP server interfaces baked in.

GitHub itself runs as a set of about 15 Rack middleware (depending on the exact environment that it boots in). They are setup in a nice declarative format:

# GitHub app middleware pipeline
use InvalidCookieDropper
use Rack::ContentTypeCleaner
use Rails::Rack::Static unless %w[staging production].include?(Rails.env)

# Enable Rack middleware for capturing (or generating) request id's
use Rack::RequestId

However, Rack actually assembles the objects like this:


This wraps every request in a nested call stack, which gets exposed in any stack traces:

lib/rack/request_id.rb:20:in `call'
lib/rack/content_type_cleaner.rb:11:in `call'
lib/rack/invalid_cookie_dropper.rb:24:in `call'
lib/github/timer.rb:47:in `block in call'

go-httppipe uses an approach that simply loops through a slice of http.Handler objects, and returns after one of them calls WriteHeader().

pipe := httppipe.New(

http.Handle("/", pipe)

This is how http.StripPrefix currently wraps another handler:

func StripPrefix(prefix string, h Handler) Handler {
  if prefix == "" {
    return h
  return HandlerFunc(func(w ResponseWriter, r *Request) {
    if p := strings.TrimPrefix(r.URL.Path, prefix); len(p) < len(r.URL.Path) {
      r.URL.Path = p
      h.ServeHTTP(w, r)
    } else {
      NotFound(w, r)

It could be rewritten like this:

type StripPrefixHandler struct {
  Prefix string

func (h *StripPrefixHandler) ServeHTTP(w ResponseWriter, r *Request) {
  if h.Prefix == "" {
  p := strings.TrimPrefix(r.URL.Path, h.Prefix)
  if len(p) < len(r.URL.Path) {
    r.URL.Path = p
  } else {
    NotFound(w, r)

func StripPrefix(prefix string) Handler {
  return &StripPrefixHandler{prefix}

Notice that we don’t have to worry about passing the response writer and request to the inner handler anymore.

Embedding Structs in Go

I’ve been toying with Go off and on for the last few months. I’m finally at a point where I’m using it in a real project at GitHub, so I’ve been exploring it in more detail. Yesterday I saw some duplicated code that could benefit from class inheritance. This isn’t Ruby, so I eventually figured out that Go calls this “embedding.” This is something I missed from my first run through the Effective Go book.

Let’s start with a basic struct that serves as the super class.

type SuperStruct struct {
  PublicField string
  privateField string

func (s *SuperStruct) Foo() {
  fmt.Println(s.PublicField, s.privateField)

It’s easy to tell what Foo() will do:

func main() {
  sup := &SuperStruct{"public", "private"}
  // prints "public private\n"

What happens when we embed SuperStruct into SubStruct?

type SubStruct struct {
  CustomField string
  // Notice that we don't bother naming embedded struct field.

At this point, SuperStruct’s two fields (PublicField and privateField) and method (Foo()) are available in SubStruct. SubStruct is initialized a little differently though.

func main() {
  sup := &SuperStruct{"public", "private"}
  sub := &SubStruct{"custom", sup}
  // you can also initialize with specific field names:
  sub := &SubStruct{CustomField: "custom", SuperStruct: sup}

From here, we can access the SuperStruct fields and methods as if they were defined in SubStruct.

func main() {
  sup := &SuperStruct{"public", "private"}
  sub := &SubStruct{"custom", sup}
  // prints "public private\n"

We can also access the inner SuperStruct if needed. You’d normally do this if you wanted to override a behavior of an embedded method.

func (s *SubStruct) Foo() {
  fmt.Println(s.CustomField, s.PublicField)

func main() {
  sup := &SuperStruct{"public", "private"}
  sub := &SubStruct{"custom", sup}
  // prints "custom public\n"