I've been playing with a bunch of malware lately. Most security researchers have run across obfuscated JavaScript, and we all have our favorite ways of defeating it. I'd written about one way to decode this sort of thing back in 2011.
Why bother decoding this stuff? Because encoded within this mess is
another URL. Depending on the source of the obfuscated code it may be a
link to a page full of exploit payloads or something similarly sinister.
Unwrapping the layers of malware allows researchers to find out where
the bad guys are actually hosting their stuff, and helps us identify
providers who are willing to help with or at least turn a blind eye to
cybercrime operations.
Lately, I've become more determined to handle javascript de-obfuscation outside the browser. Over the last year or so, I've been experimenting with a bunch of techniques to make sense of blocks of code that look like this:
I think it goes without saying that this looks like a monumental pain in the ass. The truth is, it's not as bad as it appears, but I didn't say it was going to be easy.
First, the basics. In JavaScript, FromCharCode() turns a decimal number between 0 and 255 into its ASCII character counterpart. You've all seen the ASCII table, right? Same thing.
I found a way to use printf in most shells to create a similar behavior, and I called this function "chr". So let's start really simple. Here's a file containing a message encoded with CharCodes, and a quick way to decode it. It simply reads each number in (by replacing , with a space and using a for loop, then prints each character one at a time.
This is the basis for the rest of what we're about to do. Let's break down that obfuscated javascript:
The yellow highlighted area is a bunch of numbers separated by the letter w. This is stored in an array labeled "f". If you look after the yellow, you can see that the code uses the letter w to split it.
The blue highlighted area is the loop that handles decoding the numbers into characters. This is obfuscated code, so by definition they've made it a bit confusing, but the end result is that they keep appending each character to the end of the "s" variable.
The bulk of the conversion of number to charcode (ascii decimal number of the character to be rendered) is here: (w[j]*1+41)
This gets us to the essence of the article: You'll need to use some brain power. The first thing I do is turn the data block into a string of comma-separated numbers so they're easier to work with, and I put these in their own file, like so. You can do this inside a text editor with search/replace, or on the CLI if you like.
Then you need to figure out the math. In our example code, what is w? what is j? You'll probably run into a bunch of bizarre variable reassignment when trying to make sense of obfuscated code like this. Looking above, you can see between the yellow and blue blocks, w=f. So w is now a copy of that array full of numbers. Inside the blue block of code, you can see that j=i. So, w[j] points to the current number in the array. But this number isn't the CharCode. There's still the "*1+41" part to deal with. Manually, we can see what's going on here. This is a very simple algorithm. The first 3 numbers are -32,-32,64.
-32*1+41 = -32+41= 9 - CharCode 9 is a tab.
64*1+41 = 64+41 = 105 - CharCode 105 is a lowercase "i"
Doing this manually would suck. The algorithm here is obvious. The CharCode is the number, plus 41. That's it. Let's decode it with my script:
So what happened here? This is the source of my relatively simple script.
At the heart of the above script is a nifty function I found in the Bash FAQ. And then, I used a pair of sed expressions to replace the "i" and "c" placeholders within the loop that handles the output. We call on the "bc" command-line calculator to work all the math magic. tr -d "\r" fixes some broken newlines found in some of the samples I decoded. Let's see it in action:#!/bin/sh # Obfuscated JS Decoder # Ax0n - 2013-03-22 # ax0n@h-i-r.net # if [ -z "$1" ] then echo "$0 codefile algorithm" echo "c = placeholder for each number" echo "i = iterator" exit 1 fi chr() { # http://mywiki.wooledge.org/BashFAQ/071 # Turns a charcode into the ASCII byte printf \\$(printf '%03o' $1) } count=0 file=$1 shift algo="$*" for code in `cat $file | tr "," " " `; do chr `echo "$algo" | \ sed -e s/"i"/"$count"/g -e s/"c"/"$code"/g | \ bc | cut -f1 -d\.` | tr -d "\r" count=`expr $count + 1` done echo echo "----- done ----"
Now, let's look at the one in the video. It has a much more complicated algorithm than simply adding 41 to each digit before converting it to the ASCII byte! It's hard to read in the video, so here it is:
Fortunately, this one already separates the codes with commas, so we can pretty much just copy and paste the numbers into a text file for decoding. As you can see on the 3rd-to-last line, the algorithm it uses for each character is this:
((w[j]*1+e(x+3)+11))
Let's tear into it, shall we?
w[j] starts out just like the last example. We can see they set j=i and w is a copy of the array, so for each iteration, this is the digit. We can replace this with "c" in our algorithm expression.
That leaves us to figure out what e and x are. Also like the last example, e is set to "eval" at the end of the second line. We'll ignore it. Let's look for x= in the code.
There it is! j% - and remember, j = i, so it's the iterator. % is a mathematical modulus operator. -- that is, it divides two numbers and the output is the remainder. Example: 17 divided by 4 is 4 with a remainder of 1.
$ echo "17 % 4" | bc
1
(x+3) becomes essentially, (i % +3) and +3 just means "positive 3" in this context. We don't even need the +.
((w[j]*1+e(x+3)+11)) becomes ((c * 1 + ( i % 3 ) + 11)) or simply "c + ( i % 3 ) + 11"
You can see these two examples plus two more here (the password is "infected"):
http://stuff.h-i-r.net/bhlog.zip