2017-09-13

Building a honeypot army with Pi, EC2 and MHN

I've been thinking about honeypots a lot over the last year. I have a bunch of Raspberry Pi, Atheros-powered boards, old Linksys routers and other devices kicking around my home office that would make good honeypots, and my thoughts always wander back to wishing I could easily consolidate the logs from an army of diverse honeypots to a central location for reporting. That's what initially drew me to MHN -- despite development being pretty much stalled for the past 2 years save for minor fixes here and there.

It's kind of a shame, really, the state of blight attained by the once-vibrant intel-sharing honeypot-loving ecosystem. There are a couple of active projects (telnetlogger and cowrie among my favorites) but hpfriends vanished, and most of the popular honeypot projects haven't seen but a fistful of commits in years. The STIX/TAXII ecosystem always seems like it's in a state of flux for anyone outside of FS-ISAC. All the while, there seems to be this undertone of something big coming with regards to open sharing of threat information among the incident response crowd.

For all their warts, some honeypots and analysis tools still work pretty well. I started putting together a presentation for SecKC shortly after beginning my tinkering with MHN, and those slides can be found here. This post is more of a walk-through of setting up and managing MHN and honeypots than the presentation was.

Setting up MHN

The instructions on the MHN github page work well. I installed MHN using a vanilla EC2 Ubuntu AMI on a t2.micro (free tier) instance without any trouble, but as we added more honeypots to it, a bump up to t2.small was needed to keep it from running out of memory on a daily basis, and I'm still not sure how well that will scale. A few members of the SecKC community have set up almost 50 honeypots* and pointed them at this instance of MHN. It's logging, at times, more than 100,000 hits per day, and the UI is getting kind of sluggish. MongoDB runs out of memory and requires a restart a few times per week, but I just use a cron script to check it every 5 minutes, and nudge the services if needed. This is a trade-off I'm willing to make. On-demand pricing for t2.small costs about $17 per month, and moving up to t2.medium or r3.large (where it probably belongs, due to MongoDB being a pig) sets me up for a monthly Amazon bill of $35-124, which I don't feel like shelling out of my own pocket.

*( with MHN, each honeypot software package counts as a "sensor," even if you're running two, three or more on the same instance - it's more like 30 distinct VPSes, Raspberry Pi and cloud compute nodes)

When you add users to the web UI, they are all admins. Admins can disable other user accounts. This is kind of a pain. I recommend installing the sqlite3 package on your MHN instance if you end up adding more users. I'll cover MHN databases toward the end. Additionally, I think all users can edit deploy scripts and delete honeypots, even if they're not admins. Be careful who you give access to the MHN console.

I'd recommend disabling the feed of data back to ThreatStream/Anomali (as found in the MHN setup page) by running "sudo /opt/mhn/scripts/disable_collector.sh" after you get it installed. In fact, I also removed /etc/supervisor/conf.d/mhn-collector.conf entirely, then ran "sudo supervisorctl reload" to update the configuration.

Setting up TLS with Let's Encrypt

Since there's a login and password on MHN, it's probably best to set up HTTPS. Since the MHN server is Ubuntu 16.04 with nginx, I followed this DigitalOcean guide, though I left the firewall stuff alone, since we're relying on Amazon's AWS security groups for traffic control. Just make sure both :80 and :443 are open to everywhere. I added the static locations for the acme challenge, and a redirect to https at the beginning of the "Default" sites-enabled nginx config file, /etc/nginx/sites-enabled/default like so:

server {
    listen       80;
    server_name  mhn.h-i-r.net;

    location ^~ /.well-known/acme-challenge/ {
        default_type "text/plain";
        root /var/www/letsencrypt;
    }
    location / {
        return 301 https://mhn.h-i-r.net$request_uri;
    }
}


Make sure you add the appropriate crontab entry for certbot, and include the renew-hook script as in the DigitalOcean howto guide. Mine looks like this:

20 3 * * * certbot renew --noninteractive --renew-hook /root/letsencrypt.sh

Mnemosyne WebAPI

I'd mentioned the Mnemosyne WebAPI earlier. It requires TLS in order to work properly, and runs on port 8181. If you don't have certs, it won't even try to start. You don't really need to worry about this unless you want to run some custom reports against Mnemosyne outside of MongoDB, such as with some Python scripts.You should probably use your Lets Encrypt certs for this. Symlinks will keep these duplicates up-to-date as certbot rotates your certificates.

sudo ln -s /etc/letsencrypt/live/mhn.h-i-r.net/privkey.pem /opt/mnemosyne/server.key
sudo ln -s /etc/letsencrypt/live/mhn.h-i-r.net/cert.pem /opt/mnemosyne/server.crt

If you're using this feature from outside of the Amazon VPC MHN resides in, make sure you adjust your Amazon security groups to allow access to this service, but I would not recommend you expose this port publicly. There is some good API documentation here.

Deploy scripts

Deploy scripts are an easy way to take a bare-bones OS (like a fresh Raspbian Jessie Lite installation, or an Ubuntu VM) and add a Honeypot to it. When you select a deploy script, you see a deploy command (usually in the form of wget ... -O deploy.sh ; sudo bash deploy.sh ...) that should perform all of the actions needed to install the honeypot in question onto the operating system noted. Most are Ubuntu but there are a few specific to Raspberry Pi and CentOS. You can edit the scripts, the names and the notes for each of these to make local tweaks. MHN ships with a number of useful honeypot deploy scripts. Note that I've adjusted and re-named a few in this screen shot, so they won't perfectly match a fresh MHN install:


This isn't what drew me to MHN, but it's a feature I've come to love, and being written in shell (and sometimes a bit of Python), it's stuff that I'm actually comfortable editing. Since some honeypot projects went dormant before MHN, some of the deploy scripts still work fine. A few just need tweaks for minor changes in the host OS (e.g. Ubuntu 16.04LTS switched to the systemd init). In cases like Cowrie, though, the honeypot has evolved far beyond what it looked like when the deploy scripts were created. 

I've been adjusting a few of these and submitting pull requests, but you could also use my MHN fork for the time being, or simply pick and choose some of the deploy scripts out of my fork, and manually update your MHN instances that way.

Here's a quick animation of a Conpot (SCADA honeypot) deployment on EC2. It ran in just a few minutes. Most of the scripts take a little longer on a Raspberry Pi -- particularly the old 600MHz single-core Model B.

HPFeeds

HPFeeds is the magic that ties the honeynet together. It's a lightweight, authenticated publish/subscribe protocol that runs on port 10000 by default. On the back-end, it's using MHN's fork of a framework called mnemosyne, which aims to normalize and store honeypot data while providing a web API for reporting. I'll cover the database part shortly. Each deploy script calls a registration routine (registration.sh) which gets a UUID and a secret key from MHN. This UUID is the honeypot's unique identifier, and the secret key is the authentication. Each honeypot publishes attack details to the hpfeeds broker that MHN installs. Each honeypot program has at least one channel it can publish to. For example, all conpot instances will publish to the "conpot.events" channel. Some honeypots have multiple channels, like one for events and one for captured malware. As the honeypot is installed, deploy.sh plugs these variables in to the configuration so the honeypot knows where to publish the details to.  Here's an example of the clause it added to conpot.cfg when I deployed it to a Pi in my lab:

[hpfriends]
enabled = True
host = mhn.h-i-r.net
port = 10000
ident = cb53aad4-7f08-1337-beef-0ad36352028b
secret = eteKJ6frn9bwQ1Hs
channels = ["conpot.events", ]


Just as a honeypot can publish to a channel, it's possible to set up a subscription to these channels. If you connect to HPFeeds with a tool like hpfeed-client from the python hpfeeds package, you can watch in near-real-time as attacks happen. Assuming you have network connectivity, a valid identifier, secret key and permission to subscribe to a channel, you can run the client from anywhere, and it'll pull the feed that MHN is getting. In the below example, I'm only querying dionaea.connections and cowrie.sessions. No dionaea connections showed up in the few seconds I had the client running, however.



To add a feed client user, I had to copy the "add_user.py" script from the hpfeeds git repository to /opt/mhn/env/bin on the MHN server, then run it inside the virtualenv. The syntax is [identifier] [secret] [publish feeds] [subscribe feeds] and since we don't need to publish with this tool, just subscribe, leave the publish empty (in quotes) and a list of subscriptions for your new account in quotes, comma separated like so:

$ cd /opt/mhn
$ source env/bin/activate
$ python env/bin/add_user.py test s3cr3t "" "wordpot.events,amun.events,cowrie.sessions,shockpot.events,dionaea.connections,snort.alerts"

Add as many feeds to the subscription as you like. These can be found in the hpfeeds fields of your deployed honeypots. This is the list of channels that the internal mnemosyne feed is subscribed to, which is likely all the feeds generated by all the honeypots MHN knows about:
  • amun.events
  • beeswarm.hive
  • beeswarn.feeder
  • conpot.events
  • cowrie.sessions
  • cuckoo.analysis
  • dionaea.capture
  • dionaea.connections
  • elastichoney.events
  • glastopf.events
  • glastopf.files
  • kippo.sessions
  • mwbinary.dionaea.sensorunique
  • p0f.events
  • shockpot.events
  • snort.alerts
  • suricata.events
  • thug.events
  • thug.files
  • wordpot.events
To access the hpfeeds stream, you can use an HPFeeds client library, or the reference client, written in python. An example command to pull the channels we set up earlier:

hpfeeds-client -i test -s s3cr3t -c wordpot.events -c amun.events -c cowrie.sessions -c shockpot.events -c dionaea.connections -c snort.alerts subscribe

The output should look like this, as sessions start rolling in (it can take a while if your honeypots sit idle a lot. 
[feedcli] connected to @hp2
[feedcli] publish to cowrie.sessions by 4fe5c378-7ec7-11e7-b58e-abc36352928a: {"peerIP": "171.243.14.183", "commands": [], "loggedin": null, "version": "SSH-2.0-Granados-1.0", "ttylog": null, "urls": [], "hostIP": "redacted", "peerPort": 60332, "session": "abb74fda35ef", "startTime": "2017-09-14T12:13:30.060296Z", "hostPort": 22, "credentials": [], "endTime": "2017-09-14T12:13:30.524580Z", "unknownCommands": []}
[feedcli] publish to cowrie.sessions by f9b8c9d8-7ede-11e7-b58e-abc36352928a: {"peerIP": "171.243.14.183", "commands": [], "loggedin": null, "version": "SSH-2.0-Granados-1.0", "ttylog": null, "urls": [], "hostIP": "redacted", "peerPort": 60349, "session": "23ee29fca4f0", "startTime": "2017-09-14T12:13:32.161036Z", "hostPort": 22, "credentials": [], "endTime": "2017-09-14T12:13:32.857748Z", "unknownCommands": []}


You can do quite a bit with just the HPFeeds data stream, for instance, if you funnel it into a database with some shell scripts, or run some post-processing on the HPFeeds log with awk.

Choosing a honeypot package

I have done a lot of testing over the past 2 months. Some of the deploy scripts needed some work (see above) but a lot of packages still run fine. Here are the ones I recommend. Note that Cowrie and Kippo deployments will move your real SSH server to port 2222, so be prepared for that.

For Raspberry Pi, installing the most recent image of Raspbian Jessie Lite from an image is probably your best bet, though the full Raspbian/PIXEL installation, from an image or from NOOBS also works fine on the Pi2 and Pi3. Most of my testing was done with the old-school Pi Model B and low-capacity SD cards (2 and 4GB). On the Pi, Cowrie, Dionaea and ShockPot are my favorites. WordPot also runs well, but you need to choose between it and ShockPot if you want something versatile on HTTP, and WordPot uses more resources. ConPot installs and runs properly, but it seemingly reports every single HTTP connection as an attack, which is noisy and counter-intuitive in my opinion.  Kippo works, but cowrie does it better. Amun works, but it's got a lot of overlap with Dionaea, which seems to provide more details about attacks in the logs.

On Ubuntu 16.04 LTS, my picks are Cowrie, Amun and WordPot. If you have a lot of RAM to spare, add Snort to the instance as well, because it will report all kinds of things that trigger an IDS signature from the EmergingThreats rule repository. Snort takes some manual configuration changes after install. ShockPot works, but as with the Pi, you can really only run one HTTP honeypot. ConPot works, but it's noisy. Kippo works, but Cowrie is better. Dionaea is broken for 16.04, hence my recommendation of Amun. I've spun up a bunch of Ubuntu honeypots with Cowrie, Amun, WordPot and Snort all running on the same instance in EC2 and Google Compute Engine.

Confirmed broken on Ubuntu 16.04: Suricata and Glastopf. I haven't invested much time in fixing these, but Glastopf seems to rely on ancient version of PHP that's (thankfully) missing from the package repository, and I'm not 100% sure what Suricata's deal is yet, but it looks to be something that can be fixed in the config file without much of a problem.

The SecKC crowd had questions about the safety of deploying honeypots. For the most part, the ones I discuss above are unlikely to lead to full shell access, but you can't be too sure. I had one attacker running tons of curl/wget commands from a Kippo honeypot last year, fetching ads, and essentially using my honeypot as a click fraud drone. Here was my advice from the presentation at SecKC:



Databases

I've found the MHN web front-end to be a bit lacking aside from the most basic of reports, and as I'd mentioned earlier, there's a problem when you add users through the web UI.

There are 2 database servers used by MHN, sqlite and mongodb. The majority of the MHN metadata is in sqlite. Use the ".tables" and ".schema [tablename]" commands to explore MHN's sqlite database, if you get curious. The tables user, roles_users are the most useful. All of the deploy scripts are stored in sqlite as well. They're imported from the shell scripts in the git repository upon installation and the scripts in /opt/mhn/scripts/deploy_* are never referenced again.

If you want to revoke admin privileges from all but the user account you set up when installing MHN, I recommend doing this:

$ sudo sqlite3 /opt/mhn/server/mhn.db
sqlite3>  update roles_users set role_id=2 where user_id > 1;
sqlite3> .exit

Most of the attack data, reporting and statistics are held in mongodb. You can run the "mongo" command line tool and start running queries directly. Mongo has "collections" instead of tables, and an interesting query syntax. There are a few databases, but the most useful are hpfeeds, where the subscription information is held, and the grand-daddy of them all, mnemosyne, which stores the normalized honeypot data. In reality, the Mnemosyne WebAPI might be easier for some to use, but I'm a fan of getting messy in the database. By default, the Mongo shell only displays 20 records at a time (type "it" to  continue) but I found out you can alter the default maximum results by creating a file called ".mongorc.js" in your home directory and adding this line to it:

DBQuery.shellBatchSize = 100

Here are a couple of useful example queries I've come up with.

See the most recent attacks:
> use mnemosyne
> db.session.find().sort( { timestamp : -1 } );

Find records for IP addresses that start with "192.168"
> use mnemosyne
> db.session.find({source_ip : /^192\.168.*/ })

Delete all records of a specific IP address (e.g. yours) from the records:
> use mnemosyne
> db.session.remove({source_ip : "192.168.1.94"})

Delete all records older than 7 days (in the event Mongo is getting sluggish. Change 7 * 24 to n * 24 for the number of days you want to save):
> use mnemosyne
> db.session.remove( { timestamp: { $lt: (new Date((new Date()).getTime() - ( 7 * 24 * 60 * 60 * 1000 ) ) ) } } )

Get a list of IP addresses based on the number of honeypots they've connected to in the last day (I've limited this to SSH)
> use mnemosyne > db.session.aggregate([
    { $match:
        { timestamp: { $gte: (new Date((new Date()).getTime() - ( 1 * 24 * 60 * 60 * 1000 ) ) ) },
        protocol: "ssh" }
    },
    { $group: { _id: "$source_ip", honeypots: { $addToSet: "$identifier"    } } },
    { $unwind: "$honeypots" },
    { $group: { _id: "$_id", honeypotcount: { $sum: 1 } } },
    { $sort : { honeypotcount : -1} },
    { $limit : 50 }
]);

Fun stuff

Brandon from SecKC and I collaborated on this fun dashboard, based on Rob Scanlon's Tron Legacy Encom Boardroom visualization. Brandon did all the code tweaks and wrote the middleware to get data into it. I just played project manager once the MHN firehose was ready. It really is a sight to behold. The "satellites" are approximate geo-locations for the deployed honeypots. Push-pins on the globe are attackers. The center pane is a live feed of attacks, with usernames and passwords where applicable. The right pane is a top attacker list and a map (if you click on the attacker) with more details. The chart at the bottom is the last 24 hours of attacks. Some of the panels won't display data unless you're logged in to avoid sharing the honeypot IPs publicly -- we're working on tweaking the back-end so that it can work well, safely, without authentication.