IsLOSTOnYet no longer a ghost town
I pushed a little update last night that started filling the IsLOSTOnYet stream with a flurry of updates. Where did all this come from? search.twitter.com, courtesy of the twitter gem.
The first stage of the code looked something like this:
Twitter::Search.new("lost OR kate OR sayid OR jack").each do |s|
...
end
This worked well, but it brought in a lot of false positives. I got tweets about lost car keys, friends named Kate that aren’t fugitives rescued from a plane crash, etc. So, I added a simple algorithm for only displaying relevant tweets. Giles calls it the automated brain.
- A list of main keywords are defined:
%w(kate sayid #lost). This is used to generate the twitter search query. Normal words are worth 1 point, and #hashtags are worth two. - a list of secondary keywords are defined:
%w(tv season island episode tonight). These words are only worth one point if a main keyword is in the post.
Here’s how it looks (roughly):
def valid_search_result?(main_keywords, secondary_keywords)
if main_keywords.nil? then return true ; end
score = 0
downcased_body = body.downcase
score += score_from downcased_body, main_keywords
if score.zero? then return false ; end
score += score_from downcased_body, secondary_keywords
score > 1
end
def score_from(downcased_body, words)
return 0 if words.nil?
score = 0
words = words.dup
words.each do |key|
this_score = key =~ /^#/ ? 2 : 1 # hash keywords worth 2 points
score += this_score if downcased_body =~ %r{(^|\s|\W)#{key}($|\s|\W)}
end
score
end
It’s very basic, but it filters out a lot of the crap that the search was returning. While it did let a few false positives in, it also managed to pick up tweets like this. Here’s the actual implementation if you’re curious…
Comments
Comments are closed.
