HiR Information Report: Twitter Followers/Friends from the CLI

I started getting curious on twitter. I had two questions:

Who am I following that's not following me back? (i.e. can Martin Roesch hear me? The answer is no, he can't)
Who is following me that I'm not following back?

Already familiar enough with the Twitter API, I threw together some quick and ugly command-line foo, and @digitaljestin wanted to know how I did it. This is REALLY ugly, and could use a lot of refinement. That said:

I'll probably program a quick stand-alone newlisp or php tool for this over the weekend. Regardless, here's how I did it on the CLI.

First, Twitter will only hand you 100 friends and followers at once. If I were going to automate this, I would poll the followers_count and following_count attributes from http://twitter.com/users/show/username.xml to figure out how many "pages" I needed to fetch.

If you have 203 followers, you will have to do three requests for follower info. Same with friends (those whom you follow). I had over 200 (but less than 300) each. So I did 3 of each request.

I'm only interested in the screen_name attribute within the XML of each. Note that I'm doing a lot of cheap grep | awk crap here, so it just builds lists of screen names without any markup.

$ wget http://user:password@twitter.com/statuses/followers.xml \

| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > followers.txt

$ wget http://user:password@twitter.com/statuses/followers.xml\?page=2 \

| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt

$ wget http://user:password@twitter.com/statuses/followers.xml\?page=3 \

| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt

$ wget http://user:password@twitter.com/statuses/friends.xml \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > friends.txt

$ wget http://user:password@twitter.com/statuses/friends.xml\?page=2 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt

$ wget http://user:password@twitter.com/statuses/friends.xml\?page=3\
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt

Then, I just sorted them:
$ sort friends.txt > friends-sort.txt
$ sort followers.txt > followers-sort.txt

Using diff, it's easy to tell who is not following you, and who you aren't following.
The < shows lines that only appear only in the first file (ones you follow only). The > shows lines that only appear only in the second file (ones following you). Grepping for only lines that start with < and > avoids all the patch-file line offset stuff. Some diffs have varying syntax to do this, but letting grep filter it should work across more platforms.

$ diff friends.txt followers.txt | grep "[<>]" | sort
[excerpt]

< H_i_R
< Hak5
< KCWeather
< Scobleizer
< Veronica
< bacontwits
< beseKUre
< brightkite
< datalossdb
< hackadaydotcom
< ihacked
< ihackstuff
< kingpin_
< milw0rm
< mroesch
< obsessable
< om
< packetlife
< pauldotcom
< schneier
< textfiles
< wilw
< window
------------------ (split added by ax0n)
> BlackHatUSA
> Computersaurus
> HacClearwater
> HackersAlerts
> HackerspacesBot
> SOURCEBoston
> SecuritySatan
> quine
> reverz
> rsreese
> secureideas
> securitypro2009
> stopthemanga

7 comments:

don LucioSat Jan 17, 09:17:00 AM CST
Here is one way to do it with newLISP. We cannot just use the 'get-url' function because it does not handle the "user:password" part, but using the 'exec' function we can use any shell command.

(exec "wget -q http://user:password@twitter.com/statuses/followers.xml")

(xml-type-tags nil nil nil nil)
(set 'followers (xml-parse (read-file "followers.xml") (+ 1 2 4 8 16)))
(set 'indices (ref-all 'screen_names followers))
(set 'screen_names (map (fn (item) (last (followers (chop item)))) indices))

Instead of screen_name plug in something else, e.g: location.
don LucioSat Jan 17, 09:21:00 AM CST
Correction, instead of:

(ref-all 'screen_names followers)

it should be:

(ref-all 'screen_name followers)
AnonymousSat Jan 17, 09:53:00 AM CST
You should be able to build an "Authorization: Basic " header using base64-enc user:pass. I got that working last night. Let me see if I can find out how. I did have get-url working just fine, though.
AnonymousSat Jan 17, 10:38:00 AM CST
Like so:
#!/usr/bin/newlisp
(set 'user "ax0n" 'pass "NotMyPassword")
(set 'hedr (append "Authorization: Basic " (base64-enc (append user ":" pass)) "\r\n\r\n"))
(print hedr)(set 'xml (get-url "http://twitter.com/statuses/followers.xml" 5000 hedr))
(print xml)
(exit)
don LucioSat Jan 17, 12:13:00 PM CST
Yes, thanks! This is the right way to do it, and it is platform independent.
don LucioSat Jan 17, 12:27:00 PM CST
... and here is a better method to collect info from the xml:

(find-all "<screen_name>(.*)</screen_name>" xml $1)
AnonymousThu Feb 12, 09:15:00 AM CST
There is a collection of newLISP routines for posting, viewing and deleting Twitter messages here: http://www.newlisp.org/syntax.cgi?code/twitter.txt

Pages

2009-01-16

Twitter Followers/Friends from the CLI

7 comments: