Where's Waldo: Track user locations with Node.js and Redis

Where’s Waldo is my little node.js/Redis project to keep track of users in an app. Say hi!

hi waldo!

Tracking hits on every request can get costly, and I didn’t want to hold up the more important server processes with this. So, it felt like a good fit for a quick asynchronous web server. Node.js and Redis fit the bill perfectly.

Here’s a sample from a development build of my Tender Support product. You can probably tell where I’m going with this…

sample

If you can’t tell: you’ll be able to see who is reading the same discussion that you’re currently on.

If you want to play along at home, install Node.js, download the source, and fire it up!

First, you track a location of a user. Each curl call below returns some JSON. The result call I’m showing below is actually output from the example node.js script above.

curl "http://127.0.0.1:3456/waldo/track?location=home&name=rick"
TRACK rick => home

Now, you can locate that user:

curl "http://127.0.0.1:3456/waldo/locate?name=rick"
LOCATE rick => home

You probably won’t be doing this that much, though. Let’s list the users in “home” after adding a few more users:

curl "http://127.0.0.1:3456/waldo/track?location=home&name=bob"
TRACK bob => home
curl "http://127.0.0.1:3456/waldo/track?location=home&name=fred"
TRACK fred => home
curl "http://127.0.0.1:3456/waldo/list?location=home"
LIST home => bob, fred, rick

How’s all this work? Each track call stores two redis keys: waldo:USER and waldo:LOCATION:USER. From this, we can see where a user is, and how many users are in a location. In Redis commands, the above might look something like this:

# tracking rick at home
SET waldo:rick home
SET waldo:home:rick 1

# locate rick
GET waldo:rick # returns home

# list users at home
GET waldo:home:* # returns "waldo:home:rick"

# track rick at desk
DEL waldo:home:rick
SET waldo:rick desk
SET waldo:desk:rick

Why didn’t I use one of the nicer redis data types like a list or a set? I can expire these individual keys. In 5 minutes, the waldo:rick and waldo:home:rick keys are dropped. This keeps the location lists from growing out of hand.

This isn’t used in production just yet. I can see a big problem off the bat. The API is easily hackable. I don’t know what someone would gain out of it, but you could just plug in your own users and locations and hack the results. I’ll probably be adding some kind of token authentication verification to make sure that only confirmed sites can update Waldo.

While working on Waldo, I came across a different implementation of a similar problem: Luke Melia’s “Who’s Online?” lib. It uses Redis sets to track user IDs, and set unions to determine which of your friends are online. That’s another very cool use of Redis.

0 comments | posted 03 Feb 10:59

Node.js For My Tiny Ruby Brain: Keeping Promises

I’ve been hacking on node.js for a week now. I won’t go into why I think it’s awesome, you probably already know (thanks to bloggers like Simon Willison).

My second raw “hello world” speed test went something like this:

# node.js on freenode
spoob: technoweenie; seriously, you should look up how fast nodejs is... :)
technoweenie: yea i was getting about 5k r/s, pretty impressive
spoob: you should be getting around 20k r/s?
technoweenie: really?
technoweenie: oh wait i only ran 5k requests

My first:

# twitter
technoweenie: sample node.js server is *extremely* slow, am i missing something? i'm just trying the example app on nodejs.org  
technoweenie: oh i see, the demo app sets a 2s timeout, haha  
lifo: classic
technoweenie: hey that's a great way to start off a new web framework, simulate rails cgi speeds

(sources: 1, 2, 3, 4)

With that out of the way, I started hacking on Where’s Waldo (taking a detour to bang out a quick test framework along the way). Where’s Waldo is a throw-away prototype using Redis for tracking locations of users. Here’s my first pass at tests:

// Github: technoweenie/wheres-waldo
// SHA: fa925fe483dac9a02e374971fe392c7e00f1e5d1
// http://github.com/technoweenie/wheres-waldo/blob/fa925fe483dac9a02e374971fe392c7e00f1e5d1/lib/index.js
// this source code has been modified to fit my blog post
function WheresWaldo(redis, prefix) {
  this.track = function(user, location, ttl) {
    this.redis.set(this.prefix + ":" + user, location).wait()
    this.redis.set(this.prefix + ":" + location + ":" + user, user).wait()
  }

  this.locate = function(user) {
    return this.redis.get(prefix + ":" + user).wait()
  }

  this.list = function(location) {
    var locationKey = this.prefix + ":" + location
    var users = this.redis.keys(locationKey + ":*").wait()
    // return all users that aren't blank strings
    return _.reduce(users, [], function(users, user) {
      if(user && user.length > 0)
        users.push(user.substr(locationKey.length+1, user.length))
      return users;
    })
  }
}

// http://github.com/technoweenie/wheres-waldo/blob/fa925fe483dac9a02e374971fe392c7e00f1e5d1/test/waldo_test.js
describe("tracking a user")
  before(function() {
    this.waldo = whereswaldo.create(redis, 'tracking');
    this.waldo.track('bob', 'gym')
  })

  it("tracks a user's location", function() {
    assert.equal('gym', this.waldo.locate('bob'))
  })

  it("lists the user in that location", function() {
    assert.equal('bob', this.waldo.list('gym')[0])
  })

One of the Node.js goals is to never introduce a blocking api. There aren’t a lot of libraries yet, but the ones that exist are fully asynchronous. Even a super-fast database like Redis has an async node.js wrapper.

A simple Redis GET command doesn’t return a value, it returns a Promise. A Promise is a really basic event emitter with just two events: success and error. Ideally, you’d take this promise, listen for the success and error events, and move on to the next request. When that Redis query comes back, it emits the success event with the result, and any callbacks are run.

If you look at my locate() method, you’ll see that I called Promise#wait so that I didn’t have to worry about that yet. It’s a convenient tactic for node.js newbies, but I would not recommend that you continue to do this. I started off with a familiar synchronous lib that I could test. Once my tests were green, I was free to experiment with these wild new promise objects.

function WheresWaldo(redis, prefix) {
  this.locate = function(user) {
    return this.redis.get(this.prefix + ":" + user)
  }
// ...

// related test
it("tracks a user's location", function() {
  assert.equal('gym', this.waldo.locate('bob').wait())
})

See, promises are easy! To make that locate() method asynchronous, I simply returned the same promise that the Redis client’s get method returns. Basically, I moved the wait() call from the library to the test.

function WheresWaldo(redis, prefix) {
  this.list = function(location) {
    var locationKey = this.prefix + ":" + location,
            promise = new process.Promise();
    this.redis.keys(locationKey + ":*") 
      .addCallback(function(keys) {
        var users = _.reduce(keys, [], function(users, key) {
          if(key && key.length > 0)
            users.push(key.substr(locationKey.length+1, key.length))
          return users;
        })
        promise.emitSuccess(users);
      })
      .addErrback(function() {
        promise.emitError();
      })
    return promise;
  }
}

The list() method was a bit more complicated. This time, WheresWaldo creates its own promise object to return. It adds its own callbacks to the promise from the Redis client’s keys() method. From that success callback, it filters the keys array as desired, and emits the success event of its promise.

The final gotcha was handling multiple fire-and-forget queries. The track() method sets two Redis values. Personally, I don’t have a preference which order they run in. There were three options that I could see:

  1. Call wait() on the first one before firing the second one. Synchronous calls are bad!
  2. Nest the second call in the success callback of the first call’s promise. Nesting is ugly!
  3. Fire both queries and let Redis do its job. Good, but I want to return just one Promise.

I went with #3, and wrote a Promise Group class (though I believe this functionality may be on its way to node.js soon?)

The Promise Group takes an array of promise objects, and returns its own promise object that emits success when the group’s promises are all finished. This means that I can expect a single promise object from the track() method, and add callbacks as necessary.

That’s it for the basics of working with the asynchronous node.js APIs. If you’ll notice, I like to work in an iterative, test-driven fashion. I don’t feel comfortable writing a lot of code without tests, so it was really helpful for me to start off with horrible synchronous calls and passing tests, and work my way up from there.

0 comments | posted 15 Jan 09:09

A note on the Github/Twitter Proxy

In my last post, I made a quick note about how the FriendlyORM had some issues in Postgres. I made a few quick hacks (all in the interest of finishing this up and launching it yesterday). A few hours later, James Golick managed to fix the issues in a special postgres branch. I updated the Github/Twitter proxy with a vendored version of Friendly. If you have to upgrade though, you’ll have to wipe the database. Your steps on Heroku would look like:

Don’t worry, only caching info is stored on the database.

For normal code updates, you only need to perform the first two steps above.

Just out of curiosity, is anyone hosting their own? If you don’t want to mess with the server stuff, you can always just use http://gh-twitter.com without entering your token. Once Github gives into my demands, you’ll be able to just use gh-twitter without any worries.

1 comment | posted 03 Jan 09:59