Cracking large hash dumps is easier than ever...

I've had an interesting few weeks running a CTF event for some close friends. I did it on a very low powered AWS EC2 instance which did the job admirably - and for the princely sum of zero dollars.

I ran a password cracking challenge as part of this CTF. A large hashset, the origin of which was hidden from my friends, was provided and people had to crack as many as they could. I didn't want them to find out the source and find a list of cracked hashes for that set. If anyone did by accident then that's fine I guess... but more on that later. I'll confirm to everyone now that it came from the Battlefield Heroes forum leak from about 5 years ago. They were MD5 format with no salt used, making things easy and fast to crack.

I decided to take part in the challenge as well to set a benchmark and I'd like to talk a bit about the approaches that I took and what it taught me about password cracking. There's no pictures in this one and it's text heavy. And no, this isn't new work. Ars Technica famously did it a few years ago to prove that any old idiot with an internet connection and sufficient time can crack passwords. And they were right. A few of my friends entered who had little experience with this kind of thing and did quite well. I had great success indeed, finishing with 86.3% of the total hashes (548,686) cracked.

Sadly, I'm not organised enough to have thought about taking screenshots before I started doing this. I'm only writing about it afterwards because I thought it was interesting to show how to approach this kind of thing. But I can hopefully reconstruct my memories of the techniques, apps, and options used and show how easy it is to do this kind of mass cracking whenever there's a new hash dump from a large corporate website. I'm not documenting anything new here, but I do like to write words a lot.

The first place to go to do something like this is always Kali Linux. I have it installed on a partition on my new i5 laptop. Not a great bit of kit for cracking, but we'll see how we get on. I downloaded the hashset and set myself up a work folder on my hard drive. Next up I updated everything (OS and apps) to the latest versions - a lot of cracking utilities have very active development and any optimisation is worth checking for.

Finally, we'll need some wordlists to base our password guessing on. These wordlists, as we shall see, will be mangled and modified in all sorts of ways by automated and user-customisable rule sets. I have gone through the default rulesets of John The Ripper and Hashcat - the two main utilities we will use - to compress some shorter rule sets into one large rule set to save my typing fingers. I haven't done that with wordlists, as it is sometimes useful to keep these in organised themes for testing different sections of the English language as required. Wordlists being targeted to themes can be a good idea to mimic human behaviour (local sports teams, terminology from the company if it's a corporate network, etc).

Wordlists initially downloaded:

phpbb.txt
rockyou.txt
MySpace.txt - these three are all wordlists containing real-world passwords from other previous hacks - a great resource.
English.txt - a standard English dictionary - about 1.5 million words
+ small default John the Ripper default password.lst - there's barely anything in this but it's worth running for a few edge cases. May see how many are duplicated elsewhere and stop using it in the future.

Here's a great source for some wordlists to get you started:
https://github.com/danielmiessler/SecLists/tree/master/Passwords

Where to start?

The easiest place to start with something like this is a simple wordlist attack. This will run our hashes against hashes of the contents of our wordlists. I usually run these with rules included right from the start. Most rule sets and applications will test the original word as well as the rule modifications, so you save a bit of time here.

I basically ran all 4 of those wordlists against every rule that comes with Hashcat (sticking to one tool for now). I made sure to use the --remove option so that found hashes were removed from the source file. This made things easier to work with. I also ran with -D to specify that I want to use both my CPU and integrated GPU for cracking.

This actually took a fair amount of time, lots of command line work and lots of pushing the up arrow and changing a command by a very small amount. In the future it may be worth trying to make some kind of python-based system for this kind of work.

I also fed in some found password lists, given the obvious issue of password reuse. I took at least two or three dumps worth of passwords from hashes.org. As we'll see later on, places like this are a goldmine. These lists were responsible for basically most of the wordlist-based attack runs. One interesting trick I found early on was to feed the list of found passwords back into to Hashcat to have them manipulated by the rules. This gets a surprisingly large amount of extra solves. Humans work pretty similarly most of the time.

After running through all of these lists and words I basically ended up with just over 50% of the passwords cracked. It took me about two hours. Not a bad start at all.

Crack out the cycles: Brute force attacks

My shitty laptop is not the first thing I'd use to brute force a password. It's slow speed and crappy GPU is not conducive to the kind of mathematical work required to iterate through the process of calculating a hash for each password candidate (remember: these are one-way functions). So I need to limit my brute forcing to small keyspaces - it gave me an ETA of a year or so to test the entire 1-8 character ASCII keyspace. Not helpful. 7 characters, however, said only three days. Better. And 7 characters with only lower and upper case even less so. This was worth doing and cracked quite a few thousand hashes. 

I also ran some numerical masks, from 0-10,000,000 which didn't take long to run. Some people pick the silliest passwords. I basically dicked about brute forcing several lengths of characters, including testing the entire ASCII keyspace from 1-6 characters, and more selective tests from 7-9 characters, plus the longer all-numeric tests.

After this method I think I was at about 70-75% through the set. It took about 5 hours. Ridiculous. 

Specialist attacks: PRINCE & Markov chains

An interesting method which I broke out after I started to get diminishing returns was to use the John the Ripper mode PRINCE. This uses an algorithm developed to try to generate to passwords in the same way that humans do. It uses tricks such as appending numbers, combining words (actually a cracking attack in its own right) and other things that mimic our password-picking behaviour. 

I used all of the wordlists I had up to this point as the basis for this attack, basically running it for 30-60 minutes or so (it technically has no limit to the run time) on each of the lists that I had. This was a tremendous success and the verb I'd use to describe them coming in was a flow. A steady flow at a really fast rate. I easily broke through 80% with this method, finally reaching about 82-83% by the time I gave up. Took about 6-7 hours in total.

One thing I didn't get a chance to use was a Markov-chain attack. Mostly because I couldn't get it to work as there's a key executable used to analyse the wordlists which didn't seem to be available either with the main Hashcat package or from the internet. If anyone can help with this then please let me know. Apparently it's an interesting attack mode and can help with reducing a brute force keyspace down to characters that make sense for humans to be using in certain positions. So maybe not an exhaustive search but a great shortcut to get started with.

Final mop-up

I wasn't satisfied. I had the itch. I knew I'd probably done enough to beat everyone else but the satisfaction of seeing the passwords flow in... I was addicted.

Therefore the last few hours were spent wringing every last mutation from every last wordlist and rule that I had at my disposal,, plus some reuse of the old attacks with updated wordlists every step of the way. This plus a few extra passes of some other attacks got me to a final count of 86.3%. 

Conclusion

So, what have I found out doing this? Certainly if I was doing a CTF and I was given a list of hashes to crack I'd start by running large lists of found real-world passwords as the absolute first task. Given what we know about how many people reuse passwords it's entirely possible that you could knock off a huge amount of them almost straight away. I don't consider that cheating. I consider it being very clever.

The second thing to consider is that again, given how people reuse passwords there's a real danger that found password lists are the easiest way to get started with using those credentials for nefarious means. Or confirming a new leak is authentic, if you need to.

Thirdly, this was piss easy. Password cracking is so simple children could do it. It takes a little bit of reading and some rational thought. That's all. So, no more unsalted MD5, please. Humans can't be trusted to use or remember a password that's complicated enough.

For a final hurrah I downloaded a list of 2 billion found passwords from multiple leaks (including the Battlefield one), all collated together with duplicate removal AND sorted so that passwords found more often are higher in the list. A great resource which can be found here. Running that to test the list resulted in a final count of something tiny like 7,600 passwords remaining to be cracked. Hashes.org has an updated total, and it's in the very high 90s for total percentage cracked. I definitely think that list is the best place to start with absolutely any hack of a dumped credential, given how similar humans can be when making up passwords, coupled with the reuse problem.

And that's that. A fun challenge and a nice way to learn a good cracking methodology. Which I'll probably never, ever need.

Comments