Teaching Assembly with RAVM

June 29, 2011

This post is not a class on assembly. It is about a tool I use and hope others will find useful. An understanding of x86 assembly will help.

What is the RAVM, and why create it?

Learning how programs work at the assembly level is crucial towards gaining a holistic understanding of modern day computing. While studying Computer Science at the United States Military Academy, I was introduced to a fantastic piece of in-house developed software: the MARC and MARASM (available publicly here). The MARC is a virtual 16-bit CPU programmed in ADA. When paired with the MARASM, an assembler for the MARC, cadets can write, assemble, and run assembly programs with a simplistic toolchain.

The MARC is a perfect example of using simple applications geared towards education to teach concepts, not features. Students trying to learn new concepts need tools that just work. I wanted to borrow the concepts of the MARC and create a piece of software which could be used as a stepping stone towards x86 assembly. More specifically, I wanted:

  • A more comprehensive, but not complicated, instruction set which more closely mimicked an x86 instruction set.
  • 32-bit, little endian words.
  • A way to help students visualize what was happening in memory while their programs were running.
  • A code base programmed in C, making it more accessible for expansion and hacking by others.

With these goals in mind, I created the RAVM, the Rainbowsandpwnies Assembler and Virtual Machine. The RAVM comes with three parts: assembler, disassembler, and virtual machine. Here’s how you can grab a copy of the RAVM in Ubuntu:

sudo apt-get install git build-essential libncurses5-dev
git clone git://github.com/endeav0r/ravm.git
cd ravm
make

An example

The RAVM comes with a few example assembly programs, but let’s start with our own. We will create a function that adds two numbers together and returns the result. We will then call our function to add together 5 and 7, and then stop.

main :
    mov r0, 7
    push r0
    mov r0, 5
    push r0
    call sum     ; sum(5, 7)
    add rsp, 0x8 ; this is the cdecl call convention
    hlt

sum :
    push rbp
    mov rbp, rsp

    push r1      ; callee saves registers r1-r7

    mov r1, rbp  ; place second argument in r1
    add r1, 0xc
    mov r1, [r1]

    mov r0, rbp  ; place first argument in r0
    add r0, 0x8
    mov r0, [r0]

    add r0, r1   ; perform the addition

    pop r1       ; restore saved registers
    pop rbp

    ret ; return

As of this writing, push and pop only accepts registers. The instruction set is still being expanded.

What we have here is a simple assembly program. Now let’s see where the RAVM really earns its money.

The vm that comes with RAVM features, “godmode.” Godmode is, in my opinion, the best way to visualize a program in memory. Let’s take a look.

We can assemble and run the above program by saving the contents in sum.asm and running the following commands

./assembler sum.bin sum.asm
./vm -i sum.bin -g

This will present us with the following screen:

On the left side of the screen are the addresses for all available memory locations (the VM is currently running with 512 bytes of memory). Starting at address 0, highlighted in cyan is the image loaded from sum.bin. Highlighted in green is the current instruction pointed to by our instruction pointer. The last word in memory, highlighted in red, shows the memory location pointed to by rsp, our stack pointer. In yellow is the user cursor, movable by the arrow keys.

At the bottom of the screen, starting from the top-left of the bottom, we have the address of the cursor, the value of the instruction pointer, a disassembled description of the current instruction, and then the value of every general purpose register.

The user steps through the instruction by pressing (or holding) the “s” key. Let’s step forward in our program until we are sitting on the add r0, r1 instruction.

We are now introduced to two new colors, blue and purple. Blue shows us the space occupied by the stack. Purple shows us the memory pointed to by the base pointer.

As the user continues to step through the program, he/she is simultaneously presented with the entire program laid out and color-coded in memory, the next instruction to execute, and the value of all registers. I have found that after an explanation of how a computer works, a real-time visual learning aid answers many questions.

Teaching security with the RAVM

My favorite example program in the RAVM is that of a basic strlen buffer overflow, overwriting the return address to point back into the stack and execute attacker instructions. When the program executes as intended, it takes two strings, one as a password and one as simulated user input. A function is called, and the user input string is copied into a buffer with a strcpy. The two strings are then compared with strcmp, and if the two strings match a value in memory holding 0xdeadbeef is zeroed to 0×00000000. If the two strings do not match, the memory location is not zeroed and the program terminates.

To assemble the buffer overflow example, run this command:

./assembler buffer_overflow.bin buffer_overflow.asm string.asm

Here’s a quick screenshot of RAVM godmode during the buffer overflow action, exploit in place, to get us started (vm memory size restored to 1024 bytes, the default):

The jump that is about to execute on the stack will jump the user onto the instructions which execute after a successful strcmp. The creation of this simple, one instruction exploit requires the use of all three tools: assembler, disassembler and vm. Let’s take a look at the first several instruction of buffer_overflow.bin as they appear from ./disassembler:

00000000      10000000020c  MOV  r0, 524 (0000020c)
00000006              3200  PUSH r0
00000008        3000000022  CALL 34 (0000002f)
0000000d      060800000004  ADD  rsp, 4 (00000011)
00000013      410000000001  CMP  r0, 1 (00000014)
00000019        2200000001  JE   1 (0000001f)
0000001e                80  HLT
0000001f      100000000234  MOV  r0, 564 (00000253)
00000025      100100000000  MOV  r1, 0 (00000025)
0000002b            130001  MOV  [r0], r1
0000002e                80  HLT
0000002f              3209  PUSH rbp
00000031            110908  MOV  rbp, rsp
00000034      0608ffffffec  ADD  rsp, -20 (00000020)
0000003a            110009  MOV  r0, rbp
0000003d      060000000008  ADD  r0, 8 (00000045)
00000043            120000  MOV  r0, [r0]
00000046              3200  PUSH r0
00000048            110009  MOV  r0, rbp
0000004b      0600ffffffec  ADD  r0, -20 (00000037)
00000051              3200  PUSH r0
00000053        30000000e9  CALL 233 (00000141)
00000058      060800000008  ADD  rsp, 8 (00000060)
0000005e            110009  MOV  r0, rbp
00000061      0600ffffffec  ADD  r0, -20 (0000004d)
00000067              3200  PUSH r0
00000069      100000000228  MOV  r0, 552 (00000291)

Currently, constant values are followed by their offset in the instructions. This isn’t required for all instructions. Work in progress.

We start by pushing the address of our user supplied string on the stack and making a function call to check it against the password. After some stack cleanup, we check the result for a 1, which indicates a successful match. On a successful match, we jump to the instructions at 0x0000001f to zero out memory. On an unsuccessful match, we simply halt the program.

The disassembler provides us with an easy way to see all of our instructions next to their assembled addresses (and the addresses they will hold in memory). The attacker can then run his program in memory and calculate the offset from where the stack will be located after a return address overflow to the instructions he needs executed. Finally, the attacker has to find some instruction he can write on the stack which will include no 0×00 bytes. A carefully crafted attacker-supplied string which returns back into the stack and executes a JMP instruction does the trick.

It’s an interesting exercise in creative thinking to manipulate a system, the RAVM, in ways unintended.

Conclusion

That’s what I use to teach concepts in low-level programming and assembly. I’m interested in any suggestions, criticisms and feedback people have. If this is something you would like to use, everything is available under the GPL license. Please let me know how it goes!

posted in Assembly Tutorials, Projects by endeavormac

Follow comments via the RSS Feed | Leave a comment | Trackback URL

3 Comments to "Teaching Assembly with RAVM"

  1. Anonymous wrote:

    Take a look at the PEP8 virtual machine coming out of Pepperdine University.

    http://code.google.com/p/pep8-1/

  2. ArtB wrote:

    Curious, but why teach x86 when ARM introduces RISC as well, and is more likely to have more new assembly code written for it in the foreseeable future?

  3. endeavormac wrote:

    @Anonymous I don’t have time to download and run the PEP/8 right now, but from what I’ve read and can see it looks absolutely awesome. I will give it a try when I get time.

    @ArtB The larger process is teaching security, with an end goal of introducing and understanding many of the exploitation techniques in use today. Many, but not all, of today’s exploits are written to take advantage of software running on x86 CPUs. Hence, x86 assembly is very relevant :).

Leave Your Comment


four × = 24

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org