I've been slowly getting into Infosec in a big way for a year or so now, and I've been blogging
for a few months as I learn about stuff and have shitty opinions about stuff.
The latest thing I've been getting into is reverse engineering and binary
exploitation, and I wanted to talk about it.
I got into this annoyingly complex world via a game. I can't quite believe it but I basically taught myself to do this by playing Microcorruption, an excellent simulation of a hardware lock controlled by a small CPU. You can see the memory and the registers (areas where the CPU stores data that it's working on at the time) in real time and figure out how to exploit bugs in the code by putting in different inputs for usernames and passwords and hijacking the execution flow.
I'm not a coder at all - don't have the brain for algorithms. But being able to progress at a steady rate while learning tricks in a logical order really helped me to pick up the concepts, and I was pwning vulns in style... up until it got a bit too hard and I couldn't figure out how heaps work.
If you have the time you should play Microcorruption before you read the rest of this blog. It'll help if this is new to you.
So... we're gonna have a look at some shitty code on a Linux system and end with a way of gaining root privileges. Other people have written proper tutorials about what buffer overflows are and how they work, and you can find information about how CPUs work as well - so I won't be doing that. I'll assume a bit of knowledge, and I think that's fair for most people coming into the infosec world.
We'll save this as "part1.c" and compile it later on. You can see a buffer of 80 bytes and a function - read() - that copies 400 bytes into the buffer from stdin (the 0 is the file descriptor for stdin) so we can potentially overflow the buffer with a really long string and overwrite other things in memory. Not the most useful chunks of code, but good for a tutorial.
A prerequisite for this tutorial is understanding a little bit about how programs execute in memory and how a processor does stuff. And this really is a barebones explanation. It's a book all of itself for some people. Not me. I don't care.
Stacks use a "last in, first out" system for storing and retrieving data. Stacks grow in size from high addresses to low addresses, and that's good because you don't always know how much space you'll need when you're merrily pushing stuff on there. It's better to go backwards into (mostly) nothing than forwards into something. After b() has finished the stack will be shrunk downwards from lower addresses to higher addresses.
That return address is the most important thing. That contains the return address to go back to after the current function as finished - essentially the address after a call <function name>. Functions will use bits of data and each function called makes the stack grow backwards. The return address (a function may start a chain of many others before the code gets back to the original one that called it) needs to be stored somewhere, so it gets put there to be retrieved again later.
Here's how our stack might look like for the above code - remember, they go backwards and this is simplified:
If we could overwrite that return address we could make vuln() return to a place of our choosing. We have lots of options. The first one is to return to our own input. What if we could enter data into the buffer that was actually valid code? Then we could return to it and execute it!
Firstly we have ASLR. This randomises the addresses of things like the stack and important shared library functions. If you knew those addresses you could target them, and we'll come to that later on. But ASLR makes it harder to find out what those addresses are. It is beatable under the right conditions, but for this tutorial we'll turn it off. Enter the following:
We'll also want to turn off the protection that makes this entire tutorial somewhat pointless: stack canaries. Named for their mining heritage these canaries are randomly-generated variables that are placed after all of the function variables and before the return pointer. This value is checked before the function returns and if they don't match the process will exit with an error.
And they're unbeatable. Sort of. Another bug in the same code (such as a format string vulnerability) could lead to an information leak which gives you the stack canary value or allow you to use other functions in memory to overwrite the return pointer without overwriting the canary. The canary seeding/checking code is added by the compiler (gcc usually) and is enabled by default these days. But for CTF challenges it is usually turned off, and it's still worth learning how to do buffer overflows so you can understand other concepts. Let's compile the code now that we have ASLR turned off. Well also turn off NX and Stack Canaries:
And that'll do it for the introduction. In part 1 proper we'll install some useful tools and investigate our new binary.
Part 1: Link
I got into this annoyingly complex world via a game. I can't quite believe it but I basically taught myself to do this by playing Microcorruption, an excellent simulation of a hardware lock controlled by a small CPU. You can see the memory and the registers (areas where the CPU stores data that it's working on at the time) in real time and figure out how to exploit bugs in the code by putting in different inputs for usernames and passwords and hijacking the execution flow.
I'm not a coder at all - don't have the brain for algorithms. But being able to progress at a steady rate while learning tricks in a logical order really helped me to pick up the concepts, and I was pwning vulns in style... up until it got a bit too hard and I couldn't figure out how heaps work.
If you have the time you should play Microcorruption before you read the rest of this blog. It'll help if this is new to you.
So... we're gonna have a look at some shitty code on a Linux system and end with a way of gaining root privileges. Other people have written proper tutorials about what buffer overflows are and how they work, and you can find information about how CPUs work as well - so I won't be doing that. I'll assume a bit of knowledge, and I think that's fair for most people coming into the infosec world.
Crashing Into The Buffers
I will, however, detail my learning process and show enough to understand what's going on. And the first thing to do is look at the shitty code in question:We'll save this as "part1.c" and compile it later on. You can see a buffer of 80 bytes and a function - read() - that copies 400 bytes into the buffer from stdin (the 0 is the file descriptor for stdin) so we can potentially overflow the buffer with a really long string and overwrite other things in memory. Not the most useful chunks of code, but good for a tutorial.
A prerequisite for this tutorial is understanding a little bit about how programs execute in memory and how a processor does stuff. And this really is a barebones explanation. It's a book all of itself for some people. Not me. I don't care.
CPU Registers
Firstly we'll look at registers. These are basically little storage boxes for CPUs to hold information that they are working on absolutely right now. Things like variables are read from memory into registers, or pointers to them are read into registers, and then the CPU performs some action on them. Then the results are usually written back to memory somewhere. Maybe. Not always.Stacks
The next thing to understand is how compiled code uses memory to store stuff it needs. The simplest way to do that (and there are others) is to use something called a stack. Stacks contain things such as a function's working variables and calling parameters (that actually varies for 32 vs 64 bit as 64 bit apps read parameters out of registers when functions are called). A function will initialise an area of memory (a "frame") when it is called that is big enough for what it needs to do and then reduce the stack when it's done. Say you have a bit of code with two functions, a() and b(). a() will run, do its thing, and call b(). b() will do it's own thing and use some variables of its own. Then it will exit, program execution will return to a(), and a() will complete and return to whatever called it. The stack would look (with a few simplifications) something like:Stacks use a "last in, first out" system for storing and retrieving data. Stacks grow in size from high addresses to low addresses, and that's good because you don't always know how much space you'll need when you're merrily pushing stuff on there. It's better to go backwards into (mostly) nothing than forwards into something. After b() has finished the stack will be shrunk downwards from lower addresses to higher addresses.
That return address is the most important thing. That contains the return address to go back to after the current function as finished - essentially the address after a call <function name>. Functions will use bits of data and each function called makes the stack grow backwards. The return address (a function may start a chain of many others before the code gets back to the original one that called it) needs to be stored somewhere, so it gets put there to be retrieved again later.
Here's how our stack might look like for the above code - remember, they go backwards and this is simplified:
If we could overwrite that return address we could make vuln() return to a place of our choosing. We have lots of options. The first one is to return to our own input. What if we could enter data into the buffer that was actually valid code? Then we could return to it and execute it!
Not so fast!
Yeah, not that simple. After this kind of "buffer overflow" attack was developed CPU manufacturers designed the ability to "block" certain areas of memory from being executable - known as NX (non/not executable). We're gonna turn this off when we compile the above code, but it's really easy to beat. However, we will also need to turn off two other important protections.Firstly we have ASLR. This randomises the addresses of things like the stack and important shared library functions. If you knew those addresses you could target them, and we'll come to that later on. But ASLR makes it harder to find out what those addresses are. It is beatable under the right conditions, but for this tutorial we'll turn it off. Enter the following:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_spaceThat won’t survive a reboot - probably a good thing - but you can switch that 0 for a 2 to turn it back on.
We'll also want to turn off the protection that makes this entire tutorial somewhat pointless: stack canaries. Named for their mining heritage these canaries are randomly-generated variables that are placed after all of the function variables and before the return pointer. This value is checked before the function returns and if they don't match the process will exit with an error.
And they're unbeatable. Sort of. Another bug in the same code (such as a format string vulnerability) could lead to an information leak which gives you the stack canary value or allow you to use other functions in memory to overwrite the return pointer without overwriting the canary. The canary seeding/checking code is added by the compiler (gcc usually) and is enabled by default these days. But for CTF challenges it is usually turned off, and it's still worth learning how to do buffer overflows so you can understand other concepts. Let's compile the code now that we have ASLR turned off. Well also turn off NX and Stack Canaries:
gcc part1.c -o part1 -fno-stack-protector -execstackFinally, I'll introduce the checksec.sh script here - a useful thing which will show you which protections are in place on a binary. It is part of pwntools, something we'll learn more about in the next blog. Ignore the protections you haven't heard of for now. I certainly do. But we can see that there's no NX and no stack canaries. Nice.
And that'll do it for the introduction. In part 1 proper we'll install some useful tools and investigate our new binary.
Part 1: Link
Comments
Post a Comment