Introduction
I’m writing a series of tutorials on x86 assembly for C programmers who are already familiar with many of the basics of programming and computing. The assembly tutorials available online just aren’t doing it for me, and I need something organized the way I think, on the topics I’m interested in, presented in a way which make comprehensive understanding easy. I’ll do the work, go find the answers, and then drop everything here for you to enjoy.
Please note I do not claim to be an expert on the assembly language.
My interest in assembly is for both optimizing C applications, and the purpose of developing exploits for vulnerabilities in common applications, not write applications in assembly from scratch. I’m not interested in, “Good,” examples of assembly, I’m interested in real examples. This will affect the assembly we look at. More specifically, I write the code in C, compile it with gcc, and what comes out is what we’ll be dissecting.
For the purposes of these tutorials, 32-bit x86 assembly. Everything compiled/built/disassembled on the latest stable distro of Ubuntu.
References
The Art of Assembly is an excellent reference, and if you need clarification of any of the topics discussed, I recommend checking it out. Chapter six covers all of the instructions, how they work, and what specifically they do.
Thanks To:
Bushmills from irc.freenode.net##asm for taking the time to explain to a noob why the first 7 lines of assembly were what they were.
The Code
Let’s take a look at a simple C application, and it’s disassembled assembly code.
gcc one.c -o one
#include
int main (int argc, char * argv [])
{
int i;
argc++;
for (i = 0; i < 10; i++)
printf("%d\n", i);
return 0;
}
Disassembled counterpart (for main):
objdump -d one -M intel
080483c4 :
80483c4: 8d 4c 24 04 lea ecx,[esp+0x4]
80483c8: 83 e4 f0 and esp,0xfffffff0
80483cb: ff 71 fc push DWORD PTR [ecx-0x4]
80483ce: 55 push ebp
80483cf: 89 e5 mov ebp,esp
80483d1: 51 push ecx
80483d2: 83 ec 24 sub esp,0x24
80483d5: 83 01 01 add DWORD PTR [ecx],0x1
80483d8: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0
80483df: eb 17 jmp 80483f8
80483e1: 8b 45 f8 mov eax,DWORD PTR [ebp-0x8]
80483e4: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
80483e8: c7 04 24 d0 84 04 08 mov DWORD PTR [esp],0x80484d0
80483ef: e8 04 ff ff ff call 80482f8
80483f4: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1
80483f8: 83 7d f8 09 cmp DWORD PTR [ebp-0x8],0x9
80483fc: 7e e3 jle 80483e1
80483fe: b8 00 00 00 00 mov eax,0x0
8048403: 83 c4 24 add esp,0x24
8048406: 59 pop ecx
8048407: 5d pop ebp
8048408: 8d 61 fc lea esp,[ecx-0x4]
804840b: c3 ret
804840c: 90 nop
804840d: 90 nop
804840e: 90 nop
804840f: 90 nop
This is a list of the instructions that are used above. We'll explain which each of these instructions do as we come across them later:
- lea - Load Effective Address
- and - logical AND
- push - PUSH data on to the stack
- mov - MOVe data from one register to another
- sub - SUBtract
- jmp - JuMP
- call - CALL another subfunction
- add - ADDition
- cmp - CoMPare
- pop - POP data off the stack
- ret - Return control to the parent function
You'll notice we left off jle. jle means jump if less than or equal to, and is a variant of the jmp instruction. You can find all the variations with any assembly reference.
Now let's take a look at the registers used. (ESP/EBP)
- esp - Stack Pointer (for the top of the stack).
- ecx - Counter (used for other purposes described later)
- ebp - Base Pointer
- eax - Accumulator Register (Arithmetic Operations)
If you don't understand exactly what all these registers are, we'll describe them later, and you will see how they are used.
Some Background:
First, some vocabulary:
- Stack: This is, surprise, an implementation of the data structure known as the stack. We use this stack to keep track of information about the program during the course of its running.
- Register: Think of registers as our variables. Think of them as pointers, and we dereference them by putting them in [].
- Instruction: An instruction is an operation we want to run on the processor.
- Operand: Quite simply, an operand is an argument to an instruction.
- Word: Every 4 bytes is considered a word. Wikipedia defines word as the smallest unit of data used by a computer design. We're using a 32 bit operating system, so 32 bit words, 4 byte words...
The x86 Stack and esp
The x86 stack is a LIFO mechanism we use to store information, LIFO being Last In, First Out. push puts data on the stack, pop takes data off the stack. push and pop manipulate data relative to esp, which is the stack pointer.
The stack grows down, meaning we start at higher memory addresses, and as the stack grows, we end up with lower memory addresses. esp is often referred to as pointing to the top of the stack, but in diagrams, the top of the stack is depicted as at the bottom (because we have higher addresses at the top, and lower addresses at the bottom).
esp decrements before adding a value to the stack, not after, so esp will always point to the last element added to the stack.
This may be a bit confusing now, but by the end of the first 7 instructions, you should have a good handle on it.
When we call a function, the stack typically looks like this:
------------------ | argument 1 | ------------------ | argument 0 | ------------------ | return address | <- esp is here ------------------
This is how the function will inherit the stack. In most simplistic tutorials, a few more commands will be executed at the beginning of the function to give us a stack like this:
------------------------ | argument 1 | ------------------------ | argument 0 | ------------------------ | return address | ------------------------ | original ebp | <- ebp points here ------------------------ | stack data variables | <- esp is here ------------------------
Aligning the Stack
This stack, as you will see, is nothing more than a bunch of memory in relation to esp, and esp is the only way we can identify where we are in the stack. If we change esp, we change our location in the stack, without using push or pop.
We want the stack to be "aligned", meaning we want our stack variables to start on a word whose address ends in 0, or the memory is evenly divisible by 16, however is easiest for you to think of it. This apparently speeds up the computation of some operations, but more importantly, with the introduction of SSE instructions (which work on 128 bits at once), having your variables aligned improperly can lead to some spectacular failures.
It all has to do with memory segmentation (Edit: Not really. See jldugger's comment. Processor design is important here. Visit the following link to learn more. I'm going to go ahead and mark this under not too terribly important to understand. Keep reading, you'll be fine, I promise.) If you're really interested, do some reading. For now, just know we want our stack to be properly aligned, and that's what gcc is doing in the first seven instructions.
The Assembly
We're going to go instruction by instruction, explaining what's happening, and looking at the stack, along with where our registers are, each step along the way.
| The state of the stack when we enter main() can be found to the right.
As we go through the first seven instructions, the instruction and a description will be found on the left, while the state of the stack will be found on the right. |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc |
------------------
0x78 | ret addr | <- esp points here
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | |
------------------
0x68 | |
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
lea ecx,[esp+0x4]
This is the Load Effective Address instruction. Syntax of lea: lea dest, source It loads the destination register with the source register, after completing any necessary computations. For us, it loads the address of esp +0x4 into ecx, meaning ecx will point to the address beneath esp on the stack. Our stack now looks like this: |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here now
------------------
0x78 | ret addr | <- esp points here
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | |
------------------
0x68 | |
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
and esp,0xfffffff0
This is the logical and instruction. Syntax of and: and dest, source It performs a binary and between the destination and the source, and saves the result in the destination. If you're not familiary with binary operations, you should probably take some time to familiarize yourself with them immediately. Here's what wikipedia has to say on AND. This is where we align the stack. Now our stack looks like this: |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | | <- esp points here now
------------------
0x6c | |
------------------
0x68 | |
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
push DWORD PTR [ecx-0x4]
Push "pushes" an item on to the stack. Syntax of push: push data Let's break down what we are pushing on the stack. The brackets mean we are referring to the contents of the memory pointed to by ecx-0x4. This is the return address. So ecx-0x4 is 0x7c, but [ecx-0x4] is the return address. DWORD PTR means were are referring to a 32 bit value. WORD PTR is 16 bits, BYTE PTR is 8 bits. The processor knows ecx is a 32 bit value, but because we are pushing the value at ecx, the processor needs to know how many bits, starting at ecx, to push. Once this is completed, the stack will look like this: |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | ret addr | <- esp points here now
------------------
0x68 | |
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
push ebp
We're pushing ebp on to the stack. We do this so at the end of the function, we can restore ebp to its original state. |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | ret addr |
------------------
0x68 | original ebp | <- esp points here now
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
mov ebp,esp
Mov moves the value of one register in to another.Think of mov as "dest := source" Syntax of mov: mov dest, source Here, we moving the value of the esp register in to the ebp register. If you understand the purpose of the ebp register, you know we use it to refer to variables on the stack. In our c application, int i; is a stack variable. Variables on the heap are generally variables for whom we dynamically allocate memory, but for now this isn't important. Know that on our stack we are going to have room for the integer i. We need a way to refer to this place on the stack consistently. To do this, we use the ebp register. This register points to the base of our stack in this function. Now if we want to refer to integer i, we refer to an offset of the stack relative to ebp. As we continue to go through the instructions, you will see [ebp-0x8], which actually refers to integer i on the stack. |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | ret addr |
------------------
0x68 | original ebp | <- esp and ebp point here
------------------
0x64 | |
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
push ecx
Now we're pushing ecx on to the stack. The reason we are doing this can be found in the instructions 0x8048406 and 0x8048408. We will use this ecx to return esp to its original state before executing the ret instruction at the end of this function.. |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | ret addr |
------------------
0x68 | original ebp | <- ebp points here
------------------
0x64 | 0x7c | <- esp points here now
------------------
~ ~ ~ ~
------------------
0x40 | |
------------------
|
sub esp,0x24
Sub is short for subtract, and it subtracts the value on the right from the value on the left. Syntax of sub: sub dest, source Think like this: "dest -= source" Now we subtract 0x24 from esp. This gives us our room for our stack variables. We only have one stack variable, and definitely do not need 9 words of space on the stack to make room for an integer, which under normal circumstances should be just 4 bytes, or one word in size. However, because this code was not compiled with any optimization flags, this is how gcc pieced everything together. |
------------------
0x80 | char * argv[] |
------------------
0x7c | int argc | <- ecx points here
------------------
0x78 | ret addr |
------------------
0x74 | |
------------------
0x70 | |
------------------
0x6c | ret addr |
------------------
0x68 | original ebp | <- ebp points here
------------------
0x64 | 0x7c |
------------------
~ ~ ~ ~
------------------
0x40 | | <- esp points here now
------------------
|
The first seven instructions are the most confusing, and things become much simpler from here. Hopefully you have become familiar with the working of the stack. I'm going to omit stack pictures from the remainder of this tutorial.
add DWORD PTR [ecx],0x1
The add instruction works like the sub instruction, except instead of subtraction we are working with addition.
Because of the brackets, we are not adding 1 to ecx, but instead to the memory pointed to by ecx. If you remember from our stack, ecx points to the first argument we passed to main, or int argc. If you remember from our C code, after declaring int i, we incremented argc. Well, here's the assembly instruction for that line of code.
DWORD PTR because integers are 4 bytes (int argc).
mov DWORD PTR [ebp-0x8],0x0
Now we are entering our for loop. The first thing our for loop does is set int i equal to 0. Well, we know int i is a stack variable. We also know the common convention is to refer to stack variables as an offset from ebp. So guess where int i is on the stack? That's right, it's at ebp-0x8. Here we are setting int i equal to 0, the first part of our for loop.
80483df: eb 17 jmp 80483f8
The jmp instruction is used to "JuMP" from one place in the code to another. I included the two bytes which form this instruction because I wanted to point something out. While we see "jmp 80483f8", which makes this instruction look absolute, it's actually relative. We are jumping 0x17 bytes ahead. 0xdf + 0x02 + 0x17 = 0xf8. Why add the 0x02? Because this jmp instruction is two bytes, and the jump starts after the jmp instruction.
We're going to do some skipping around now. Instead of following the assembly from first instruction to last, I'd instead like to go through the assembly in the order the instructions will be executed.
80483f8: 83 7d f8 09 cmp DWORD PTR [ebp-0x8],0x9
The CoMPare instruction compares two values, and sets the x86 flags register appropriately. Yes, there's an x86 flags register. No, we aren't that concerned with it right now. Just know that the cmp instruction sets flags which correspond to a comparison between its two operands.
80483fc: 7e e3 jle 80483e1
The Jump if Less than or Equal to instruction will execute a jmp instruction if the x86 flags register has the appropriate flags set, indicating the previous cmp instruction compared one value that was less than or equal to a second value.
We're beginning to see exactly how our for loop executes on the processor. After setting the initial value, we jump immediately down to the comparison, or for our for () statement, "i < 10". The comparison actually comes out to "i <= 9". If this condition holds true, we perform another jump to where the beginning of our for loop code would be.
80483e1: 8b 45 f8 mov eax,DWORD PTR [ebp-0x8]
eax is one of our general purpose registers we haven't mentioned yet. Here, we are setting it equal to [ebp-0x8], or int i.
80483e4: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
We are preparing to call the function printf. Printf takes two arguments. Remember, arguments are with the first argument closest to the top of the stack, and the last argument closest to the bottom. We are now positioning arguments on the stack. Int i represents our second argument in our printf() function call, and we are placing it closest to the bottom of the stack here.
80483e8: c7 04 24 d0 84 04 08 mov DWORD PTR [esp],0x80484d0
Here we are moving the value 0x80484d0 in to the memory where esp is currently located. We're placing this value into the stack without altering esp. You're probably wondering what is at memory address 0x80484d0. It's these 4 bytes:
0x25 0x64 0x0a 0x00
The C String equivalent would be "%d\n". I hope it looks familiar, because it's the first argument to our printf call.
80483ef: e8 04 ff ff ff call 80482f8
And here we go ahead and make the printf call. The call instruction will do a few things. For simplicity's sake, we will say it pushes the address of the next instruction on to the stack (the return address for the next function/procedure), and then begins executing the assembly instruction at the specified location. If you absolutely must know...
We don't really need to worry too much about this now. Just know that 0x80483f4 just got pushed on to the stack, and the next instruction that will be executed is 0x80482f8. When the procedure we call returns, its ret instruction will pop the return address off the stack, meaning the stack should by just as we left it before the call instruction.
80483f4: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1
After the printf(), and before we do our next comparison, we need to increment int i. This is where that tiny piece of magic happens.
After this add instruction, we're back to our cmp instruction. We've already covered this, so let's skip ahead to the remaining six instructions, starting at the memory location 0x80483fe.
80483fe: b8 00 00 00 00 mov eax,0x0
8048403: 83 c4 24 add esp,0x24
8048406: 59 pop ecx
8048407: 5d pop ebp
8048408: 8d 61 fc lea esp,[ecx-0x4]
804840b: c3 ret
You should be able to understand what's going on now in these last six lines. If not, here's a quick synopsis to help you on your way:
- 80483fe: Zero out eax... eax := 0
- 8048403: Return esp to its original position, before we made room for stack variables. Need more help? Look at memory location 0x80483d2.
- 8048406: Get ecx back off the stack.
- 8048407: Set ebp to its original value before we entered the procedure. We're returning to our parent function, and it probably wants to know where its stack variables are.
- 8048408: Set esp back to its original value when we entered the main() procedure.
- 804840b: Return, which will pop the return address off the stack, and the next instruction executed will now be at that return address.
Take another look at the assembly instructions. You should now understand all the basics of what is happening at the processor.
In the next tutorial, we'll take a better look at what exactly is happening, with a little less abstraction and a little more detail.
Great tutorial :) I feel like I could just about jump in and start writing.
In the synopsis for the instruction at 80483df, perhaps the line:
0xdf = 0×02…
should read:
0xdf + 0×02
?
I stared at that for a couple of minutes before working it out.
Link | October 23rd, 2009 at 12:11
You are correct, thanks. Correction has been made.
Link | October 23rd, 2009 at 12:36
Two requests:
1) Give this series its own special tag on your blog so it’s easy to access the whole thing when you’re done ;)
2) some ideas of simple projects a beginner could do that are actually doable
a) in assembly
b) by someone with the knowledge level this is aimed at.
I’m bookmarking your site.
Link | October 23rd, 2009 at 15:27
gcc -S -masm=intel -o one.asm one.c
For writing your own code, use the compiler instead of the object analyzer. You get jump targets, which is damn handy.
Also stack alignment has more to do with processor design than memory segmentation.
Link | October 23rd, 2009 at 16:54
I didn’t know about this feature for gcc, and this is awesome. Thanks!
Link | October 23rd, 2009 at 17:13
gcc -O0 -g -c one.c
objdump -C -M intel -S one.o
This gives an even better result (asm code with source code side by side).
Link | October 29th, 2009 at 18:49
is there a way to make the output 16 bit instead of 32 bit? For example i want bx instead of ebx
Link | April 6th, 2010 at 02:32
BX is simply the 16 least significant bits of the EBX register. Actually, I’m not entirely sure if it’s the least or most significant bits. In any case, you can use BX to access the 16 (most|least) significant bits of the EBX register (assuming a current 32bit intel CPU).
This url may provide some enlightenment: http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_4/CH04-1.html#HEADING1-42
A short is 16 bits, so make sure your printf string is adjusted (You’ll need %h) http://www.opengroup.org/onlinepubs/000095399/functions/printf.html
So if you want to print out BX, you’ll need to push BX onto the stack, followed by a pointer to a string “%h”, and then call printf. Because BX is only 16 bits and, assuming a 32 bit platform, you’re pushing BX onto a 32 bit word, you may want to mov a 0×0 into that stack space first, or otherwise insure that it is empty, to account for whatever may already be in place of the other 16 bits. I’m not positive that would be necessary, you would have to experiment.
You may have better luck creating a simple C app that does all this and then “gcc source.c -S” to see what you get.
Link | April 13th, 2010 at 12:14
Just a heads up, you define “Word” at the top as 32 bits, but then go on to explain that a DWORD is also 32 bits. The reason for this is that while a “word” generally speaking fits the definition you gave, in x86 assembly in particular it’s usually used to refer to 2 bytes, even though we’ve moved on to 32 and 64 bit processors. This is why a DWORD (double word) is the same size as the processor’s actual word size.
Link | January 6th, 2012 at 17:34