JAN 02

I saw Damon Cortesi’s Twitter Stats script last night, and decided to make a Ruby version. This was before he released his code, so it’s reverse-engineered rather than ported. I’ll take a look later tonight to see how much the logic differs.

Edit: This code is rather inelegant, and I’ve replaced the clunky CSV files with an Sqlite3 database. You can find the new and improved scripts here. The following should still work, and I’m leaving it here for posterity’s sake.

tweet.rb

First up, I wrote a quick Tweet class to actually get all of my tweets.

require 'hpricot'
require 'open-uri'

class Tweet
  def initialize(user)
    @user_url = "http://twitter.com/#{user}"

    @doc = Hpricot(open(@user_url))
    @page = 1

    @tweets = [current_tweet]
    @tweets += page_to_tweets
  end

  def current_tweet
    tweet,time = @doc/'div.desc'/'p'
    tweet = tweet.inner_html
    time = DateTime.parse(time.at('abbr')['title'])

    [tweet, time]
  end

  def page_to_tweets
    (@doc/'div.tab'/'tr.hentry').map do |tweet|
      tweet,time = tweet/'span'
      tweet = tweet.inner_html.gsub(/^\s*(.*)\s*$/, '\1')
      time = DateTime.parse(time.at('abbr')['title'])

      [tweet, time]
    end
  end

  def older?
    (@doc/'div.tab'/'div.pagination'/'a').last.inner_text =~ /Older/
  end

  def succ
    if @tweets.empty?
      return nil unless older?

      @page += 1
      @doc = Hpricot(open("#{@user_url}?page=#{@page}"))
      @tweets = page_to_tweets
    end

    @tweets.shift
  end
end

download_to_csv.rb

Next, a quick script to download the tweets into a CSV file. This is actually a bit over-engineered, as it’ll only download tweets that have not been previously downloaded. Note that this takes the username as a command line argument.

#!/usr/bin/env ruby

require 'fastercsv'
require 'tweet'

base_path = File.dirname(__FILE__)

csv_files = Dir["#{base_path}/*.csv"].sort_by do |filename|
  DateTime.parse(File.basename(filename, '.csv'))
end

last_update = DateTime.parse(File.basename(csv_files.last, '.csv')) unless csv_files.empty?

tweets = Tweet.new(ARGV.shift)
current_update_time = tweets.current_tweet.last

if last_update.nil? or current_update_time > last_update
  FasterCSV.open(File.join(base_path, "#{current_update_time.to_s}.csv"), 'w') do |csv|
    while t = tweets.succ
      tweet,time = t

      break if last_update and time <= last_update

      csv << [tweet, time.to_s]
    end
  end
end

generate_graphs.rb

And last, creating the graphs of the statistics from the CSV files.

#!/usr/bin/env ruby

require 'fastercsv'
require 'gchart'
require 'tweet'

base_path = File.dirname(__FILE__)
year = 2007

month_data = Array.new(12, 0)
hour_data = Array.new(24, 0)
reply_data = Hash.new(0)

Dir["#{base_path}/*.csv"].each do |filename|
  FasterCSV.foreach(filename) do |row|
    tweet = row.first
    time = DateTime.parse(row.last)

    month_data[time.month - 1] += 1 if time.year == year
    hour_data[(time.hour-8)%24] += 1 if time.year == year
    reply_data[$1] += 1 if tweet =~ /@<a href="\/([^"]+)">\1<\/a>/ and time.year == year
  end
end

puts GChart.line(
  :title => 'Tweets per Hour',
  :data => hour_data,
  :width => 400,
  :height => 300,
  :extras => { 'chxt' => 'x,y', 'chxl' => "0:|#{(0..23).to_a.join('|')}|1:|#{hour_data.min}|#{hour_data.max}" }
).to_url

puts GChart.bar(
  :title => 'Tweets per Month',
  :data => month_data,
  :width => 400,
  :height => 300,
  :extras => { 'chxt' => 'x,y', 'chxl' => "0:|#{Date::ABBR_MONTHNAMES.compact.join('|')}|1:|#{month_data.min}|#{month_data.max}" },
  :orientation => :vertical
).to_url

Example graphs

blog comments powered by Disqus