IsLOSTOnYet no longer a ghost town

I pushed a little update last night that started filling the IsLOSTOnYet stream with a flurry of updates. Where did all this come from? search.twitter.com, courtesy of the twitter gem.

The first stage of the code looked something like this:


Twitter::Search.new("lost OR kate OR sayid OR jack").each do |s|
  ...
end

This worked well, but it brought in a lot of false positives. I got tweets about lost car keys, friends named Kate that aren’t fugitives rescued from a plane crash, etc. So, I added a simple algorithm for only displaying relevant tweets. Giles calls it the automated brain.

  • A list of main keywords are defined: %w(kate sayid #lost). This is used to generate the twitter search query. Normal words are worth 1 point, and #hashtags are worth two.
  • a list of secondary keywords are defined: %w(tv season island episode tonight). These words are only worth one point if a main keyword is in the post.

Here’s how it looks (roughly):


def valid_search_result?(main_keywords, secondary_keywords)
  if main_keywords.nil? then return true ; end
  score          = 0
  downcased_body = body.downcase
  score += score_from downcased_body, main_keywords
  if score.zero? then return false ; end
  score += score_from downcased_body, secondary_keywords
  score > 1
end

def score_from(downcased_body, words)
  return 0 if words.nil?
  score = 0
  words = words.dup
  words.each do |key|
    this_score = key =~ /^#/ ? 2 : 1 # hash keywords worth 2 points
    score += this_score if downcased_body =~ %r{(^|\s|\W)#{key}($|\s|\W)}
  end
  score
end

It’s very basic, but it filters out a lot of the crap that the search was returning. While it did let a few false positives in, it also managed to pick up tweets like this. Here’s the actual implementation if you’re curious…

Comments

Comments are closed.