02 Jan 2016
I am a big fan of video games, so I jumped at the chance to work the GitHub
booth at the Game Developer’s Conference (GDC) in 2012. It was my first time
representing the company for a community that I’m an outsider of. I planned on
spending my time explaining the merits of social coding, while sneaking out
during breaks to check out some sessions about video game development.
However, the second most frequent question I was asked was, “how can I work with
large files?”. This got me wondering: is there something that I can do to get
these developers using GitHub?
As a programmer that builds web applications, it’s all too easy to take Git for
granted. A Git repository tracks every change of every file for the life time
of a project. I can go to any line of code and track down who, when and why
the specific change was made. I can link it to a pull request, which contains
the proposal, review, and tweaks around change to the application.
However, not all developers are able to use Git because of technical and
workflow limitations. Some projects work with large binary files, such as audio,
3D models, high res graphics, etc. Yet, doing this on GitHub was a pretty
terrible user experience. If you attempt to upload a file over 100MB, you get
blocked with this error message:
remote: warning: Large files detected.
remote: error: File big-file is 123.00 MB; this exceeds GitHub's file size limit of 100 MB
This means that you’ll have to run some gnarly terminal commands to fix something
before being able to sync your work. While there are legit technical reasons for
this, it doesn’t take away from the fact that GitHub was getting in the way of
their work.
Over the next three years, my team participated in a larger effort to research
and build the Git Large File Storage (Git LFS) tool to solve this problem.

It turns out that GitHub’s own “Iron Man”, Scott Chacon, had already
experimented with this in a project called Git Media. It uses internal hooks
inside Git to intercept access to large files, storing them in various cloud
storage services (like S3), instead of the Git repository. It’s an interesting
prototype, but the game dev community was not using it. Surely we could easily
release this as an official GitHub product. I told a colleague at GDC that we’d
be back next year with something. Ha!
I pitched this idea at an internal mini summit for the GitHub Product team in
late 2012, hoping to convince others to join me in building this. Unfortunately,
there were too many important projects to tackle. I needed another way to
convince the company that we should be pursuing this.
Around the same time, GitHub hired its first User Experience Researcher: Chrissie
Brodigan. She gave a talk about what she does, and how she’s going to turn all
of us into UX researchers. This sounded insane to me. I get all the
feedback I need from emails and Twitter! So, I emailed this crazy woman to see
if she could help with a new feature we were about to launch.
Interviews turned to actionable feedback, which resulted in some important adjustments to the feature before its launch. With the success of this small
research project behind us, we turned towards a more ambitious question: how can
we enable developers to work with large files on GitHub?
We interviewed a diverse set of candidates (according to industry, use case, and
location) about their large file use. A common set of themes emerged, and were
published to the rest of the company in OctoStudy 8. These interviews informed
a list of suggestions and aspirations, which directly influenced the design of
what would eventually become Git LFS.
Being involved in these two UXR reports turned out to be the best experience
I’ve had at GitHub. I’m thrilled to see Chrissie’s growing team move on to more
challenging and important studies for the company. Read more about that in her
post, New Year, New User Journeys.

This UXR report was enough to get approval for the project. Scott Barron joined
me on the new Storage Team in late 2013. Finally, it was time to get to work!
In addition to slinging code, we had to coordinate with other internal teams at
GitHub. This was a unique product launch that touched so many parts of the
company. We setup a weekly video chat to keep interested parties in the loop.
The attendees of each meeting varied wildly, as a team’s involvement would begin
and end.
- The Billing Team, in the midst of their own large refactoring, implemented
the crucial bits to our payment system that we needed.
- The Creative Team produced awesome graphics and videos to help promote the
project.
- The Communications Team produced this amazing ad for LFS.
- The Docs Team keep the Git LFS documentation updated on help.github.com.
- The Legal and Outreach teams helped nail down the open source licensing, code
of conduct, and CLA tracker.
- The Marketing Team found a suitable place to launch, and helped with the
external website and messaging.
- The Native Team defined what Git LFS needs to integrate with GitHub Desktop
and other similar Git tools.
- The Security and Infrastructure teams reviewed the architecture of the backend
systems, making sure we’re doing things responsibly.
- The Support Team reviewed the product, looking for common support scenarios
that we would have to handle.
- The Training Team was another valuable source of user feedback, and produced
a great video and presentation about Git LFS.
I also met Saeed Noursalehi, a PM for Visual Studio Online at Microsoft, through
both our companies’ involvement in the libgit2
project. They too were concerned with the same large file problem, and provided
extremely valuable feedback on our early ideas and API based on their own
observations.

We announced the Git LFS open source project at the Git Merge conference in
April 2015. Pitching a tool that challenges Git’s decentralized model to a room
full of Git enthusiasts and experts was intimidating. Overall, it went pretty
well. No one threw tomatoes or called me names.
Coincidentally, John Garcia from Atlassian announced Git LOB, their solution
for large files in Git, immediately after my presentation. The core ideas behind
the projects were very similar, but their version wasn’t yet ready for public
consumption.
Like most launches though, this was just the beginning. While the client was
released, we were keeping the actual server component behind an Early Access
program. Our UXR team used this to collect valuable feedback from a controlled
group of beta users.
For six months, we refined specs, pushed bug fixes, and even redesigned the
server API. Most importantly, the newly open sourced project saw outside
contributors for the first time.
Steve Streeting, lead developer of Git LOB at Atlassian, reached out to us soon
after Git LFS launched. We both agreed it made sense to put our weight behind a
single solution, instead of competing head to head, fragmenting the community,
and duplicating even more work. What really impressed me was his willingness to
jump in, on a competitor’s site especially, and make major improvements to Git
LFS.
The project also saw contributions from Andy Neff and Stephen Gelman, who focused
on packaging Git LFS for Linux. They started with scripts to build packages for
their respective distros, which evolved into internal tools for building and
testing Linux packages for release on my Macbook Pro.
Then in October 2015, I shared a stage with Saeed and Steve at GitHub Universe to
announce Git LFS v1.0, and its availability on GitHub.com. By the end of the
year, Git LFS was supported by two other Git hosts, with support announced
for a fourth soon.

Git LFS has launched, its Epstein drives are running. This year we will be using
thrusters to adjust the project’s trajectory, making constant small tweaks.
Fixing bugs and edge cases as they are discovered. Writing good
documentation to help new users transition or start new projects with LFS.
Documenting or automating the processes that run the project.
Most of all, I’m just excited about doing this in the open. I look forward
to helping new and experienced people contribute to the project. Find us at the
Git LFS repository or our chat
room. If you’re interested in working on LFS
or similar features with us at GitHub, let me know!

(images taken from Avengers Vol. 5, #1 and #5 by Marvel Comics)
04 Sep 2014
Interfaces are one of my favorite features of Go. When used properly in
arguments, they tell you what a function is going to do with your object.
go
// from io
func Copy(dst Writer, src Reader) (written int64, err error)
Right away, you know Copy()
is going to call dst.Write()
and src.Read()
.
Interfaces in return types tell you what you can and should do with the object.
go
// from os/exec
func (c *Cmd) StdoutPipe() (io.ReadCloser, error) {
It’s unclear what type of object StdoutPipe()
is returning, but I do know that
I can read it. Since it also implements io.Closer
,
I know that I should probably close it somewhere.
This brings up a good rule of thumb when designing Go APIs. Prefer an
io.Reader
over an io.ReadCloser
for arguments. Let the calling code handle
its own resource cleanup. Simple enough. So what breaks this rule? Oh, my
dumb passthrough package.
Here’s the intended way to use it:
go
func main() {
fakeResWriter := pseudoCodeForExample()
res, _ := http.Get("SOMETHING")
passthrough.Pass(res, fakeResWriter, 200)
}
However, on a first glance without any knowledge of how the passthrough
package works, you may be inclined to close the body manually.
```go
func main() {
fakeResWriter := pseudoCodeForExample()
res, _ := http.Get(“SOMETHING”)
// hopefully you’re not ignoring this possible error :)
// close body manually
defer res.Body.Close()
// passthrough also closes it???
passthrough.Pass(res, fakeResWriter, 200)
}
```
Now, you’re closing the Body twice. That’s not great.
Resource management is very important, so we commonly review code to ensure
everything is handled properly. Helper functions that try to do too much like
passthrough
have caused us enough issues that I’ve rethought how I design
Go packages. Don’t get in the way of idiomatic Go code.
01 Sep 2014
I recently spoke at the GitHub Patchwork event in Boulder
last month. My son Nathan tagged along to get his first taste of the GitHub
Flow. I don’t
necessarily want him to be a programmer, but I do push him to learn a little to
augment his interest in meteorology and astronomy.
The night was a success. He made it through the tutorial with only one complaint:
the Patchwork credit went to my wife, who
had created a GitHub login that night.
Since then, I’ve been looking for a project to continue his progress. I settled
on a weather light, which consists of a ruby script that changes the color of a
Philips Hue bulb. If you’re already an experienced coder,
jump straight to the source at github.com/technoweenie/weatherhue.

NOTE: This post has been updated to match the latest version of the weatherhue
script. You can still read the original blog post
if you really want.
Requirements
Unfortunately, there’s one hefty requirement: You need a Philips Hue light kit,
which consists of a Hue bridge and a few lights. Once you have the kit, you’ll
have to use the Hue API to
create a user and figure
out the ID of your light.
Next, you need to setup an account for the Weather2 API.
There are a lot of services out there, but this one is free, supports JSON
responses, and also gives simple forecasts. They allow 500 requests a day. If
you set this script to run every 5 minutes, you’ll only use 288 requests.
After you’re done, you should have five values. Write these down somewhere.
HUE_API
- The address of your Hue bridge. Probably something like “http://10.0.0.1”
HUE_USER
- The username you setup with the Hue API.
HUE_LIGHT
- The ID of the Hue light. Probably 1-3.
WEATHER2_TOKEN
- Your token for the weather2 API.
WEATHER2_QUERY
- The latitude and longitude of your house. For example,
Pikes Peak is at “38.8417832,-105.0438213.”
Finally, you need ruby, with the following gems: faraday
and dotenv
.
If you’re on a version of ruby lower than 1.9, you’ll also want the json
gem.
Writing the script
I’m going to describe the process I used to write the weatherhue.rb
script.
Due to the way ruby runs, it’s not necessarily in the order that the code
is written. If you look at the file, you’ll see 4 sections:
- Lines requiring ruby gems.
- A few defined helper functions.
- A list of temperatures and their HSL values.
- Running code that gets the temperature and sets the light.
You’ll likely find yourself bouncing around as you write the various sections.
Step 1: Get the temperature
The first thing the script needs is the temperature. There are two ways to get
it: through an argument in the script (useful for testing), or a Weather API.
This is a simple script that pulls the current temperature from the API forecast
results.
```ruby
if temp = ARGV[0]
# Get the temperature from the first argument.
temp = temp.to_i
else
# Get the temperature from the weather2 api
url = “http://www.myweather2.com/developer/forecast.ashx?uac=#{ENV[“WEATHER2_TOKEN”]}&temp_unit=f&output=json&query=#{ENV[“WEATHER2_QUERY”]}”
res = Faraday.get(url)
if res.status != 200
puts res.status
puts res.body
exit
end
data = JSON.parse(res.body)
temp = data[“weather”][“curren_weather”][0][“temp”].to_i
end
```
Step 2: Choose a color based on the temperature
I wanted the color to match color ranges on local news forecasts.

Our initial attempt required used color math to calculate the color between set
values in 5 degree increments. This required us to specify 25 colors between
-20 and 100 degrees. When we did that, we noticed a pattern:
- The saturation and brightness values didn’t change much.
- The hue value started high and eventually went down to zero.
My son saw this, and suggested that we simply calculate the hue for a
temperature, leaving the saturation and brightness values the same. So then
I talked him through a simple algorithm based on some math concepts he’d
learned.
First, we set an upper and lower bound that we wanted to track. We decided to
track from -20 to 100. The Hue light takes values from 0 to 65535.
ruby
HUE = {
-20 => 60_000, # a deep purple
100 => 0, # bright red
}
The #hue_for_temp
method gets the color range of any of the highest mapped
temperature below the actual temperatur. It then uses a ratio to get the hue
based on a range of hues.
For example:
```ruby
temp = 50
full_range = 120 # 100 - -20
temp_range = 60 # 40 - -20
temp_perc = temp_range / full_range.to_f
full_hue_range = 60_000 # HUE[-20] - HUE[100]
hue_range = full_hue_range * temp_perc
hue = min_hue - hue_range
```
The #hue_for_temp
method lets us set hue values for any temperature we want,
too. While checking the output colors, my son wanted to set 50 to green for
“hoodie weather.” This means that 60 is a really light yellow. 70 is orange,
meaning we can leave off any light jackets. This is the set of mapped
temperatures that we ended with:
ruby
HUE = {
-20 => 60_000,
50 => 25_500,
100 => 0,
}
Step 3: Set the light color
Now that we have the HSL values for the temperature, it’s time to set the Philip
Hue light. First, create a state object for the light:
ruby
state = {
:on => true,
:hue => hue_for_temp(temp),
:sat => 255,
:bri => 200,
# performs a smooth transition to the new color for 1 second
:transitiontime => 10,
}
A simple HTTP PUT call will change the color.
ruby
hueapi = Faraday.new ENV["HUE_API"]
hueapi.put "/api/#{ENV["HUE_USER"]}/lights/#{ENV["HUE_LIGHT"]}/state", state.to_json
Step 4: Schedule the script
If you don’t want to set the environment variables each time, you can create a
.env
file in the root of the application.
WEATHER2_TOKEN=MONKEY
WEATHER2_QUERY=38.8417832,-105.0438213
HUE_API=http://192.168.1.50
HUE_USER=technoweenie
HUE_LIGHT=1
You can then run the script with dotenv:
$ dotenv ruby weatherhue.rb 75
A crontab can be used to run this every 5 minutes. Run crontab -e
to add
a new entry:
# note: put tabs between the `*` values
*/5 * * * * cd /path/to/script; dotenv ruby weatherhue.rb
Confirm the crontab with crontab -l
.
Bonus Round
- Update the script to use the forecast for the day, and not the current
temperature.
- Set a schedule that automatically only keeps the light on in the mornings when
you actually care what the temperature will be.
I hope you enjoyed this little tutorial. I’d love to hear any experiences from
working with it! Send me pictures or emails either to the GitHub issue for
this post,
or my email address.
02 Nov 2013
I shipped GitHub’s first user-facing Go app a month ago: the Releases API
upload endpoint. It’s a really simple, low traffic service to dip
our toes in the Go waters. Before I could even think about shipping it though,
I had to answer these questions:
- How can I deploy a Go app?
- Will it be fast enough?
- Will I have any visibility into it?
The first two questions are simple enough. I worked with some Ops people on
getting Go support in our Boxen and Puppet recipes. Considering how much time
this app would spend in network requests, I knew that raw execution speed wasn’t
going to be a factor. To help answer question 3, I wrote grohl, a
combination logging, error reporting, and metrics library.
import "github.com/technoweenie/grohl"
A few months ago, we started using the scrolls Ruby gem for logging on
GitHub.com. It’s a simple logger that writes out key/value logs:
app=myapp deploy=production fn=trap signal=TERM at=exit status=0
Logs are then indexed, giving us the ability to search logs for the first time.
The next thing we did was added a unique X-GitHub-Request-Id
header to every
API request. This same request is sent down to internal systems, exception
reporters, and auditors. We can use this to trace user problems across the
entire system.
I knew my Go app had to be tied into the same systems to give me visibility:
our exception tracker, statsd to record metrics into Graphite, and
our log index. I wrote grohl to be the single source of truth for the app. Its
default behavior is to just log everything, with the expectation that something
would process them. Relevant lines are indexed, metrics are graphed, and
exceptions are reported.
At GitHub, we’re not quite there yet. So, grohl exposes both an error reporting
interface, and a statter interface (designed to work with g2s).
Maybe you want to push metrics directly to statsd, or you want to push errors
to a custom HTTP endpoint. It’s also nice that I can double check
my app’s metrics and error reporting without having to spin up external services.
They just show up in the development log like anything else.
Comments are on reddit.
21 Oct 2013
Justinas Stankevičius wrote a post about writing HTTP middleware
in Go. Having seen how Rack changed the Ruby web framework landscape, I’m glad
Go has simple HTTP server interfaces baked in.
GitHub itself runs as a set of about 15 Rack middleware (depending on the exact
environment that it boots in). They are setup in a nice declarative format:
# GitHub app middleware pipeline
use InvalidCookieDropper
use Rack::ContentTypeCleaner
use Rails::Rack::Static unless %w[staging production].include?(Rails.env)
# Enable Rack middleware for capturing (or generating) request id's
use Rack::RequestId
However, Rack actually assembles the objects like this:
InvalidCookieDropper.new(
Rack::ContentTypeCleaner.new(
Rack::RequestId.New(app)
)
)
This wraps every request in a nested call stack, which gets exposed in any
stack traces:
lib/rack/request_id.rb:20:in `call'
lib/rack/content_type_cleaner.rb:11:in `call'
lib/rack/invalid_cookie_dropper.rb:24:in `call'
lib/github/timer.rb:47:in `block in call'
go-httppipe uses an approach that
simply loops through a slice of http.Handler
objects, and returns after one of
them calls WriteHeader()
.
pipe := httppipe.New(
invalidcookiedropper.New(),
contenttypecleaner.New()
requestid.New(),
myapp.New(),
)
http.Handle("/", pipe)
This is how http.StripPrefix
currently wraps another handler:
func StripPrefix(prefix string, h Handler) Handler {
if prefix == "" {
return h
}
return HandlerFunc(func(w ResponseWriter, r *Request) {
if p := strings.TrimPrefix(r.URL.Path, prefix); len(p) < len(r.URL.Path) {
r.URL.Path = p
h.ServeHTTP(w, r)
} else {
NotFound(w, r)
}
})
}
It could be rewritten like this:
type StripPrefixHandler struct {
Prefix string
}
func (h *StripPrefixHandler) ServeHTTP(w ResponseWriter, r *Request) {
if h.Prefix == "" {
return
}
p := strings.TrimPrefix(r.URL.Path, h.Prefix)
if len(p) < len(r.URL.Path) {
r.URL.Path = p
} else {
NotFound(w, r)
}
}
func StripPrefix(prefix string) Handler {
return &StripPrefixHandler{prefix}
}
Notice that we don’t have to worry about passing the response writer and request
to the inner handler anymore.
29 Aug 2013
I’ve been toying with Go off and on for the last few months. I’m finally at a
point where I’m using it in a real project at GitHub, so I’ve been exploring it
in more detail. Yesterday I saw some duplicated code that could benefit from
class inheritance. This isn’t Ruby, so I eventually figured out that Go calls
this “embedding.” This is something I missed from my first run through the
Effective Go book.
Let’s start with a basic struct that serves as the super class.
type SuperStruct struct {
PublicField string
privateField string
}
func (s *SuperStruct) Foo() {
fmt.Println(s.PublicField, s.privateField)
}
It’s easy to tell what Foo() will do:
func main() {
sup := &SuperStruct{"public", "private"}
sub.Foo()
// prints "public private\n"
}
What happens when we embed SuperStruct
into SubStruct
?
type SubStruct struct {
CustomField string
// Notice that we don't bother naming embedded struct field.
*SuperStruct
}
At this point, SuperStruct
’s two fields (PublicField
and privateField
) and
method (Foo()
) are available in SubStruct
. SubStruct
is initialized a
little differently though.
func main() {
sup := &SuperStruct{"public", "private"}
sub := &SubStruct{"custom", sup}
// you can also initialize with specific field names:
sub := &SubStruct{CustomField: "custom", SuperStruct: sup}
}
From here, we can access the SuperStruct
fields and methods as if they were
defined in SubStruct
.
func main() {
sup := &SuperStruct{"public", "private"}
sub := &SubStruct{"custom", sup}
sub.Foo()
// prints "public private\n"
}
We can also access the inner SuperStruct
if needed. You’d normally do this
if you wanted to override a behavior of an embedded method.
func (s *SubStruct) Foo() {
fmt.Println(s.CustomField, s.PublicField)
}
func main() {
sup := &SuperStruct{"public", "private"}
sub := &SubStruct{"custom", sup}
sub.Foo()
// prints "custom public\n"
}