This post is not a class on assembly. It is about a tool I use and hope others will find useful. An understanding of x86 assembly will help.
What is the RAVM, and why create it?
Learning how programs work at the assembly level is crucial towards gaining a holistic understanding of modern day computing. While studying Computer Science at the United States Military Academy, I was introduced to a fantastic piece of in-house developed software: the MARC and MARASM (available publicly here). The MARC is a virtual 16-bit CPU programmed in ADA. When paired with the MARASM, an assembler for the MARC, cadets can write, assemble, and run assembly programs with a simplistic toolchain.
The MARC is a perfect example of using simple applications geared towards education to teach concepts, not features. Students trying to learn new concepts need tools that just work. I wanted to borrow the concepts of the MARC and create a piece of software which could be used as a stepping stone towards x86 assembly. More specifically, I wanted:
- A more comprehensive, but not complicated, instruction set which more closely mimicked an x86 instruction set.
- 32-bit, little endian words.
- A way to help students visualize what was happening in memory while their programs were running.
- A code base programmed in C, making it more accessible for expansion and hacking by others.
With these goals in mind, I created the RAVM, the Rainbowsandpwnies Assembler and Virtual Machine. The RAVM comes with three parts: assembler, disassembler, and virtual machine. Here’s how you can grab a copy of the RAVM in Ubuntu:
sudo apt-get install git build-essential libncurses5-dev git clone git://github.com/endeav0r/ravm.git cd ravm make
An example
The RAVM comes with a few example assembly programs, but let’s start with our own. We will create a function that adds two numbers together and returns the result. We will then call our function to add together 5 and 7, and then stop.
main :
mov r0, 7
push r0
mov r0, 5
push r0
call sum ; sum(5, 7)
add rsp, 0x8 ; this is the cdecl call convention
hlt
sum :
push rbp
mov rbp, rsp
push r1 ; callee saves registers r1-r7
mov r1, rbp ; place second argument in r1
add r1, 0xc
mov r1, [r1]
mov r0, rbp ; place first argument in r0
add r0, 0x8
mov r0, [r0]
add r0, r1 ; perform the addition
pop r1 ; restore saved registers
pop rbp
ret ; return
As of this writing, push and pop only accepts registers. The instruction set is still being expanded.
What we have here is a simple assembly program. Now let’s see where the RAVM really earns its money.
The vm that comes with RAVM features, “godmode.” Godmode is, in my opinion, the best way to visualize a program in memory. Let’s take a look.
We can assemble and run the above program by saving the contents in sum.asm and running the following commands
./assembler sum.bin sum.asm ./vm -i sum.bin -g
This will present us with the following screen:

On the left side of the screen are the addresses for all available memory locations (the VM is currently running with 512 bytes of memory). Starting at address 0, highlighted in cyan is the image loaded from sum.bin. Highlighted in green is the current instruction pointed to by our instruction pointer. The last word in memory, highlighted in red, shows the memory location pointed to by rsp, our stack pointer. In yellow is the user cursor, movable by the arrow keys.
At the bottom of the screen, starting from the top-left of the bottom, we have the address of the cursor, the value of the instruction pointer, a disassembled description of the current instruction, and then the value of every general purpose register.
The user steps through the instruction by pressing (or holding) the “s” key. Let’s step forward in our program until we are sitting on the add r0, r1 instruction.

We are now introduced to two new colors, blue and purple. Blue shows us the space occupied by the stack. Purple shows us the memory pointed to by the base pointer.
As the user continues to step through the program, he/she is simultaneously presented with the entire program laid out and color-coded in memory, the next instruction to execute, and the value of all registers. I have found that after an explanation of how a computer works, a real-time visual learning aid answers many questions.
Teaching security with the RAVM
My favorite example program in the RAVM is that of a basic strlen buffer overflow, overwriting the return address to point back into the stack and execute attacker instructions. When the program executes as intended, it takes two strings, one as a password and one as simulated user input. A function is called, and the user input string is copied into a buffer with a strcpy. The two strings are then compared with strcmp, and if the two strings match a value in memory holding 0xdeadbeef is zeroed to 0×00000000. If the two strings do not match, the memory location is not zeroed and the program terminates.
To assemble the buffer overflow example, run this command:
./assembler buffer_overflow.bin buffer_overflow.asm string.asm
Here’s a quick screenshot of RAVM godmode during the buffer overflow action, exploit in place, to get us started (vm memory size restored to 1024 bytes, the default):

The jump that is about to execute on the stack will jump the user onto the instructions which execute after a successful strcmp. The creation of this simple, one instruction exploit requires the use of all three tools: assembler, disassembler and vm. Let’s take a look at the first several instruction of buffer_overflow.bin as they appear from ./disassembler:
00000000 10000000020c MOV r0, 524 (0000020c) 00000006 3200 PUSH r0 00000008 3000000022 CALL 34 (0000002f) 0000000d 060800000004 ADD rsp, 4 (00000011) 00000013 410000000001 CMP r0, 1 (00000014) 00000019 2200000001 JE 1 (0000001f) 0000001e 80 HLT 0000001f 100000000234 MOV r0, 564 (00000253) 00000025 100100000000 MOV r1, 0 (00000025) 0000002b 130001 MOV [r0], r1 0000002e 80 HLT 0000002f 3209 PUSH rbp 00000031 110908 MOV rbp, rsp 00000034 0608ffffffec ADD rsp, -20 (00000020) 0000003a 110009 MOV r0, rbp 0000003d 060000000008 ADD r0, 8 (00000045) 00000043 120000 MOV r0, [r0] 00000046 3200 PUSH r0 00000048 110009 MOV r0, rbp 0000004b 0600ffffffec ADD r0, -20 (00000037) 00000051 3200 PUSH r0 00000053 30000000e9 CALL 233 (00000141) 00000058 060800000008 ADD rsp, 8 (00000060) 0000005e 110009 MOV r0, rbp 00000061 0600ffffffec ADD r0, -20 (0000004d) 00000067 3200 PUSH r0 00000069 100000000228 MOV r0, 552 (00000291)
Currently, constant values are followed by their offset in the instructions. This isn’t required for all instructions. Work in progress.
We start by pushing the address of our user supplied string on the stack and making a function call to check it against the password. After some stack cleanup, we check the result for a 1, which indicates a successful match. On a successful match, we jump to the instructions at 0x0000001f to zero out memory. On an unsuccessful match, we simply halt the program.
The disassembler provides us with an easy way to see all of our instructions next to their assembled addresses (and the addresses they will hold in memory). The attacker can then run his program in memory and calculate the offset from where the stack will be located after a return address overflow to the instructions he needs executed. Finally, the attacker has to find some instruction he can write on the stack which will include no 0×00 bytes. A carefully crafted attacker-supplied string which returns back into the stack and executes a JMP instruction does the trick.
It’s an interesting exercise in creative thinking to manipulate a system, the RAVM, in ways unintended.
Conclusion
That’s what I use to teach concepts in low-level programming and assembly. I’m interested in any suggestions, criticisms and feedback people have. If this is something you would like to use, everything is available under the GPL license. Please let me know how it goes!
Take a look at the PEP8 virtual machine coming out of Pepperdine University.
http://code.google.com/p/pep8-1/
Link | June 30th, 2011 at 12:44
Curious, but why teach x86 when ARM introduces RISC as well, and is more likely to have more new assembly code written for it in the foreseeable future?
Link | June 30th, 2011 at 12:45
@Anonymous I don’t have time to download and run the PEP/8 right now, but from what I’ve read and can see it looks absolutely awesome. I will give it a try when I get time.
@ArtB The larger process is teaching security, with an end goal of introducing and understanding many of the exploitation techniques in use today. Many, but not all, of today’s exploits are written to take advantage of software running on x86 CPUs. Hence, x86 assembly is very relevant :).
Link | June 30th, 2011 at 13:18