2009-01-16

Twitter Followers/Friends from the CLI

I started getting curious on twitter. I had two questions:
  1. Who am I following that's not following me back? (i.e. can Martin Roesch hear me? The answer is no, he can't)
  2. Who is following me that I'm not following back?
Already familiar enough with the Twitter API, I threw together some quick and ugly command-line foo, and @digitaljestin wanted to know how I did it. This is REALLY ugly, and could use a lot of refinement. That said:

I'll probably program a quick stand-alone newlisp or php tool for this over the weekend. Regardless, here's how I did it on the CLI.

First, Twitter will only hand you 100 friends and followers at once. If I were going to automate this, I would poll the followers_count and following_count attributes from http://twitter.com/users/show/username.xml to figure out how many "pages" I needed to fetch.

If you have 203 followers, you will have to do three requests for follower info. Same with friends (those whom you follow). I had over 200 (but less than 300) each. So I did 3 of each request.

I'm only interested in the screen_name attribute within the XML of each. Note that I'm doing a lot of cheap grep | awk crap here, so it just builds lists of screen names without any markup.

$ wget http://user:password@twitter.com/statuses/followers.xml \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > followers.txt
$ wget http://user:password@twitter.com/statuses/followers.xml\?page=2 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt
$ wget http://user:password@twitter.com/statuses/followers.xml\?page=3 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt

$ wget http://user:password@twitter.com/statuses/friends.xml \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > friends.txt
$ wget http://user:password@twitter.com/statuses/friends.xml\?page=2 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt
$ wget http://user:password@twitter.com/statuses/friends.xml\?page=3\
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt

Then, I just sorted them:
$ sort friends.txt > friends-sort.txt
$ sort followers.txt > followers-sort.txt

Using diff, it's easy to tell who is not following you, and who you aren't following.
The < shows lines that only appear only in the first file (ones you follow only). The > shows lines that only appear only in the second file (ones following you). Grepping for only lines that start with < and > avoids all the patch-file line offset stuff. Some diffs have varying syntax to do this, but letting grep filter it should work across more platforms.

$ diff friends.txt followers.txt | grep "[<>]" | sort
[excerpt]
< H_i_R
< Hak5
< KCWeather
< Scobleizer
< Veronica
< bacontwits
< beseKUre
< brightkite
< datalossdb
< hackadaydotcom
< ihacked
< ihackstuff
< kingpin_
< milw0rm
< mroesch
< obsessable
< om
< packetlife
< pauldotcom
< schneier
< textfiles
< wilw
< window
------------------ (split added by ax0n)
> BlackHatUSA
> Computersaurus
> HacClearwater
> HackersAlerts
> HackerspacesBot
> SOURCEBoston
> SecuritySatan
> quine
> reverz
> rsreese
> secureideas
> securitypro2009
> stopthemanga

blog comments powered by Disqus