I started getting curious on twitter. I had two questions:
- Who am I following that's not following me back? (i.e. can Martin Roesch hear me? The answer is no, he can't)
- Who is following me that I'm not following back?
I'll probably program a quick stand-alone newlisp or php tool for this over the weekend. Regardless, here's how I did it on the CLI.
First, Twitter will only hand you 100 friends and followers at once. If I were going to automate this, I would poll the followers_count and following_count attributes from http://twitter.com/users/show/username.xml to figure out how many "pages" I needed to fetch.
If you have 203 followers, you will have to do three requests for follower info. Same with friends (those whom you follow). I had over 200 (but less than 300) each. So I did 3 of each request.
I'm only interested in the screen_name attribute within the XML of each. Note that I'm doing a lot of cheap grep | awk crap here, so it just builds lists of screen names without any markup.
If you have 203 followers, you will have to do three requests for follower info. Same with friends (those whom you follow). I had over 200 (but less than 300) each. So I did 3 of each request.
I'm only interested in the screen_name attribute within the XML of each. Note that I'm doing a lot of cheap grep | awk crap here, so it just builds lists of screen names without any markup.
$ wget http://user:password@twitter.com/statuses/followers.xml \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > followers.txt
$ wget http://user:password@twitter.com/statuses/followers.xml\?page=2 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt
$ wget http://user:password@twitter.com/statuses/followers.xml\?page=3 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> followers.txt
$ wget http://user:password@twitter.com/statuses/friends.xml \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > friends.txt
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' > friends.txt
$ wget http://user:password@twitter.com/statuses/friends.xml\?page=2 \
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt
$ wget http://user:password@twitter.com/statuses/friends.xml\?page=3\
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt
| grep "<screen_name>" | awk -F"[\<\>]" '{print $3}' >> friends.txt
Then, I just sorted them:
$ sort friends.txt > friends-sort.txt
$ sort followers.txt > followers-sort.txt
Using diff, it's easy to tell who is not following you, and who you aren't following.
The < shows lines that only appear only in the first file (ones you follow only). The > shows lines that only appear only in the second file (ones following you). Grepping for only lines that start with < and > avoids all the patch-file line offset stuff. Some diffs have varying syntax to do this, but letting grep filter it should work across more platforms.
$ diff friends.txt followers.txt | grep "[<>]" | sort
[excerpt]
< H_i_R
< Hak5
< KCWeather
< Scobleizer
< Veronica
< bacontwits
< beseKUre
< brightkite
< datalossdb
< hackadaydotcom
< ihacked
< ihackstuff
< kingpin_
< milw0rm
< mroesch
< obsessable
< om
< packetlife
< pauldotcom
< schneier
< textfiles
< wilw
< window
------------------ (split added by ax0n)
> BlackHatUSA
> Computersaurus
> HacClearwater
> HackersAlerts
> HackerspacesBot
> SOURCEBoston
> SecuritySatan
> quine
> reverz
> rsreese
> secureideas
> securitypro2009
> stopthemanga
Here is one way to do it with newLISP. We cannot just use the 'get-url' function because it does not handle the "user:password" part, but using the 'exec' function we can use any shell command.
ReplyDelete(exec "wget -q http://user:password@twitter.com/statuses/followers.xml")
(xml-type-tags nil nil nil nil)
(set 'followers (xml-parse (read-file "followers.xml") (+ 1 2 4 8 16)))
(set 'indices (ref-all 'screen_names followers))
(set 'screen_names (map (fn (item) (last (followers (chop item)))) indices))
Instead of screen_name plug in something else, e.g: location.
Correction, instead of:
ReplyDelete(ref-all 'screen_names followers)
it should be:
(ref-all 'screen_name followers)
You should be able to build an "Authorization: Basic " header using base64-enc user:pass. I got that working last night. Let me see if I can find out how. I did have get-url working just fine, though.
ReplyDeleteLike so:
ReplyDelete#!/usr/bin/newlisp
(set 'user "ax0n" 'pass "NotMyPassword")
(set 'hedr (append "Authorization: Basic " (base64-enc (append user ":" pass)) "\r\n\r\n"))
(print hedr)(set 'xml (get-url "http://twitter.com/statuses/followers.xml" 5000 hedr))
(print xml)
(exit)
Yes, thanks! This is the right way to do it, and it is platform independent.
ReplyDelete... and here is a better method to collect info from the xml:
ReplyDelete(find-all "<screen_name>(.*)</screen_name>" xml $1)
There is a collection of newLISP routines for posting, viewing and deleting Twitter messages here: http://www.newlisp.org/syntax.cgi?code/twitter.txt
ReplyDelete