rop_tools – Hack your disassembler

October 1, 2011

A couple weeks ago I posted about rop_tools, a tool for quickly finding a variety of rop gadgets in x86 ELF binaries. Well, I decided rop_tools needed an awesome scripting interface, and a couple of weeks later rop_tools emerged into a pretty powerful tool for scripting elf disassembly. Hacking together a disassembler (we’re going to refer to the front-end as the disassembler, not the actual “I’m making sense of bytes” library as the disassembler. Yes, I know some of you are having brain aneurysms right now) in an hour isn’t all that difficult anymore. In fact, I did one tonight. Let’s take a look at the output first.

First, let’s take a look at how we call lua scripts now:

rop_tools -l <script_name> [<lua_argument> ..]

And now let’s take a look at our disassembler’s, test1.lua, sweet sweet output:

This disassembles the function dowork in the following executable:

#include <stdio.h>

int dowork ()
{
    char buffer[16];
    int i;

    for (i = 0; i < 10; i++) {
        sprintf(buffer, "%d", i);
        printf("%s\n", buffer);
    }

    return 1;
}

int main (int argc, char * argv[])
{
    return dowork();
}

Let’s disassemble this ELF.

First, we’ll need to open that elf file. Easy enough.

elf = elf_t.new(argv[1])

argv[1] being the first argument passed to our lua program (lua arrays can start at 0, but the convention is 1). argv[1] is set for us by rop_tools.

On to elf_t. The underlying C library (which we won’t talk about much directly) will detect whether this is a 64 or 32 bit ELF for us automatically and abstract many of the differences away so we can write one disassembler for just ELFs. When we open an elf with elf_t.new(), we can automatically dismiss lots of details. When we disassemble instructions, we don’t have to worry if we’re disassembling a 32 or 64 set of instructions, elf_t is going to do that for us. We’ll see more later, but let’s move on.

argv[2] holds the name of the function we want to disassemble. Let’s write some code to grab those instructions for us.

instructions = disassemble_function(elf, argv[2])

function disassemble_function (elf, function_name)
    local symtab = elf:section(".symtab")
    local symbol = symtab:symbol(function_name)
    local text = elf:section(".text")
    local disassembly = text:disassemble()
    local instructions = {}

    for i,instruction in pairs(disassembly) do
        if instruction["address"] >= symbol:value() and
           instruction["address"] <  symbol:value() + symbol:size():uint_t() then
            table.insert(instructions, instruction)
        end
    end
    return instructions
end

Of course, for this to work the function symbol must not be stripped from our binary.

We’re beginning to see some of the awesomeness shine through. elf:section() allows us to grab a function by name (or index!). Symbol sections allow us to grab symbols by name as well (and index).

So what do we do here? We grab the symbol corresponding to our function from the symbol table “.symtab”, disassemble “.text”, and add all of the instruction which fall within the bounds of our symbol to another table of instructions, which we return. We don’t do much error checking here, but assuming we passed a symbol name to a valid function, this will return the instructions for our symbol.

Expect this to be a one liner in future revisions of rop_tools:

instructions = elf:section(".symtab"):symbol(function_name):disassemble()

There’s a funny line in there, let’s take a look:

           instruction["address"] <  symbol:value() + symbol:size():uint_t() then

uint_t(). Remember when we said elf_t was going to abstract much of the 32/64-bit pain away from us? Part of the magic happens in two new lua “types”, really “objects”, or in lua-speak “metatables”. These types are int_t and uint_t. They will allow us to deal with 8, 16, 32 and 64 bit signed and unsigned integers in a very transparently way. Some things we can do with these types:

  • Create new types: int_t.new(size, value) int_t.new(32, 1) uint_t.new(64, 1)
  • Add/subtract/multiple/divide/modolus
  • Compare against each other
  • Cast to uint_t() or lua number (integer) int_t.new(32, 1):uint_t():int()
  • Print in “%d” format print(int_t.new(32, 1)) some_string = "number one: " .. uint_t.new(8, 1):str()
  • Print in %0?x” format, where ? is the number of bits in the integer. print(uint_t.new(32, 9000):strx())

Note that for correctness, some actions are forbidden, such as adding uint_t and int_t together, or subtracting a larger (u|)int_t from a smaller (u|)int_t. These should fail and send errors. Most dangerous, comparing uint_t and int_t will always return false but currently doesn’t fail. This will probably be the first thing fixed after this writing (to always fail).

What elf_t (and child objects such as section_t, symbol_t, or relocation_t) returns will depend on what is defined in the elf format. A few minor “helper” functions have been added, such as section:num(). There is no num field in the elf SPEC. Normally, these helper functions will return a Number (lua_pushinteger), or a native lua type.

Once things are more stabilized, the API will be documented. For now, I write out my code. If I do something unsafe, it will fail, and I’ll know something is wrong. If things don’t fail, nothing unsafe happened.

It may also be important to add the sign of returned values for 32 and 64 bit ELFs will always be the same, and will default to the sign in the 32 bit ELF spec. For example, the ELF Spec has an Elf32_Word (signed) for Elf32_Shdr.sh_entsize, but a Elf64_Xword (unsigned) for Elf64_Shdr.sh_entsize. Why someone thought this was a good idea evades me (there are no ELF objects even close to 2^31-1 bytes in size), but elf_t will simply return a 64-bit signed integer in int_t for you.

Now moving on. You’ll notice we’ve highlighted the addresses of local jumps in our output. We’ll need to locate those jumps first. Let’s write some code to do that.

function is_jump (mnemonic)
    if mnemonic == "jmp" or
       mnemonic == "jo" or
       mnemonic == "jno" or
       mnemonic == "jb" or
       mnemonic == "jae" or
       mnemonic == "jz" or
       mnemonic == "jnz" or
       mnemonic == "jbe" or
       mnemonic == "ja" or
       mnemonic == "js" or
       mnemonic == "jp" or
       mnemonic == "jnp" or
       mnemonic == "jl" or
       mnemonic == "jge" or
       mnemonic == "jle" or
       mnemonic == "jg" then
        return true
    end
    return false
end

jump_locations = {}
for i, instruction in pairs(instructions) do
    if is_jump(instruction["mnemonic"]) then
        table.insert(jump_locations, (instruction["address"]:int_t() +
                                      instruction["operands"][1]["lval"] +
                                      int_t.new(8, instruction["size"])):uint_t())
    end
end

You’ll notice we do quite a bit of sign swapping here. We also create a new 8-bit integer to hold the size of our instruction (let’s hope there aren’t any 128-byte instructions!). What size will the resulting uint_t be? The size of the largest int_t type. IE: for 32-bit ELFs, we’ll be saving a 32-bit address, and for 64-bit ELFs we’ll be saving a 64-bit address.

Now it’s time to print out some instructions.

for i,instruction in pairs(instructions) do
    -- is this address one of our jump locations
    address = TERM_COLOR_GREEN .. instruction["address"]:strx() ..
              TERM_COLOR_DEFAULT
    for i,jump_location in pairs(jump_locations) do
        if jump_location == instruction["address"] then
            address = TERM_COLOR_CYAN .. TERM_BOLD ..
                      instruction["address"]:strx() ..
                      TERM_NORMAL .. TERM_COLOR_DEFAULT
            break
        end
    end

Here we’re setting up the address part of our output, highlighting and coloring it as we see appropriate.

Next we determine the text to output for call instructions.

    if instruction["mnemonic"] == "call" then
        instruction = TERM_COLOR_RED .. TERM_BOLD .. instruction["mnemonic"] ..
                      " " .. relative_offset_description(elf, instruction) ..
                      TERM_NORMAL .. TERM_COLOR_DEFAULT

Ahh… relative_offset_description, a real workhorse. Let’s take a look.

function operand_abs (operand, address, size)
    local absolute = address + uint_t.new(32, size)

    if operand["lval"]:int() < 0 then
        absolute = absolute - (operand["lval"] * int_t.new(8, -1)):uint_t()
    else
        absolute = absolute + operand["lval"]:uint_t()
    end

    return absolute
end

function relative_offset_description (elf, instruction)
    local target_address
    local description = nil

    target_address = operand_abs(instruction["operands"][1],
                                 instruction["address"],
                                 instruction["size"])

local description will hold the description of this instruction’s relative offset we will eventually return. local target_address is the target address of this instruction. We’ve created a function, operand_abs, to get the absolute address of our instruction. You’ll notice we subtract or add operand["lval"] from absolute in a weird way. An operand’s lval is an int_t, it is signed. We do not subtract signed integers from unsigned integers. This is here to protect you. Is it funkified? A bit. Is it going to save your butt in the long run? You bet your butt it is.

At this point, target_address should hold the target address of our instruction, or where we are calling to.

Next, we’ll check all of the symbols in “.symtab” to see if there’s a valid function symbol for this call in there.

    local symtab = elf:section(".symtab")
    for i = 0,symtab:num()-1 do
        local symbol = symtab:symbol(i)
        if symbol:value() <= target_address and
           symbol:value() + symbol:size():uint_t() > target_address then
            description = symbol:name() ..
                          " ( " .. target_address:strx() .. " | " ..
                          tostring(target_address - symbol:value()) .. " )"
            break
        end
    end

Some quick notes about this code:

  • symtab:num() (section_t:num()) is one of those “convenience” functions that don’t exist in the ELF Spec, but are just nice to have. It returns a native lua type.
  • symtab:symbol(N) (section_t:symbol(N)) returns symbols at the address they are found in the ELF, IE they start at 0, not 1 as is the LUA convention for. We only venture so far from the Spec my friends.

Nothing else too interesting to see. If it’s not in “.symtab”, the next best bet is it’s in the PLT. Oh boy, this is going to be fun.

A little background first. When your ELF is linked against a library, two sections of the executable become important. They are the PLT, or the Procedure Linkage Table, and the GOT, or Global Offset Table (there are equivalents in PE, I believe EAT and IAT). The PLT contains JMP instructions which jump to addresses held within the GOT. The GOT is initially filled with addresses back into the PLT which will load a correct address for the necessary linked function. If we jump back into the PLT, we will load the correct address into the GOT and call the necessary function. Each subsequent call will go straight to the necessary function, as the correct address is now loaded into the GOT. This is called lazy-linking, and it’s a bit important to understand before we move further.

    if description == nil then
        local plt = elf:section(".plt")
        -- check if target_address is in the ".plt"
        if target_address >= plt:address() and
           target_address < plt:address() + plt:size():uint_t() then
            local plt_jmp = plt:disassemble(target_address)
            local op_address = operand_abs(plt_jmp["operands"][1],
                                           target_address,
                                           plt_jmp["size"])
            -- find relocation for op_address
            local relplt
            if elf:section_exists(".rel.plt") then
                relplt = elf:section(".rel.plt")
            else
                relplt = elf:section(".rela.plt")
            end
            for i = 0,relplt:num()-1 do
                local relocation = relplt:relocation(i)
                if relocation:offset() == op_address then
                    description = relocation:name() .. "@PLT"
                    break
                end
            end
        end
    end

In sequence, we

  1. Make sure the target address is in the “.plt”
  2. If it is, we disassemble the first instruction in the PLT (yes we can disassemble at specific addresses (loaded, runtime addresses) and elf_t does the automagic for us), then we get the absolute address for that instruction. local op_address will be the address of the entry in the GOT (not the value held at that address).
  3. We then grab the relocation table, making sure we get the right one. Looks like gcc uses .rel.plt for 32-bit ELFs, and .rela.plt for 64-bit ELFs. We can’t abstract away everything :(.
  4. We loop through the relocations, looking for one that will be loaded at our address in the GOT. If we find one, we grab the name of our linked function.

Now we just need to wrap up the results for relative_offset_description.

    if description ~= nil then
        return description
    else
        return "(" .. target_address:strx() .. ")"
    end
end

We return the description. If we couldn’t find a good description, we just return the target_address.

So that’s how we handle calls, the rest is fairly straight-forward.

    elseif is_jump(instruction["mnemonic"]) then
        instruction = TERM_COLOR_CYAN .. TERM_BOLD .. instruction["mnemonic"] ..
                      " " .. relative_offset_description(elf, instruction) ..
                      TERM_NORMAL .. TERM_COLOR_DEFAULT
    elseif instruction["mnemonic"] == "ret" then
        instruction = TERM_COLOR_YELLOW .. TERM_BOLD ..
                      instruction["description"] ..
                      TERM_NORMAL .. TERM_COLOR_DEFAULT
    else
        instruction = instruction["description"]
    end

Here’s the code for the rest of our instructions. Remember that is_jump() function from earlier? Hopefully you do. Jump instructions use the save relative_offset_description function as calls.

Only one thing left to do, print out that awesome line of assembly for us.

    print(address .. "   " .. instruction)
end

And we’re done. If you want to know the values of the TERM variables, they are:

TERM_COLOR_RED     = "\027[31m"
TERM_COLOR_GREEN   = "\027[32m"
TERM_COLOR_YELLOW  = "\027[33m"
TERM_COLOR_BLUE    = "\027[34m"
TERM_COLOR_CYAN    = "\027[36m"
TERM_COLOR_DEFAULT = "\027[39m"
TERM_BOLD          = "\027[1m"
TERM_NORMAL        = "\027[22m"

A link to the full source can be found here: https://github.com/endeav0r/rop_tools/blob/ec09b871175eda4533fb61e426b6d242730d7e89/tools/test1.lua

I wouldn’t suggest using rop_tools in its current state, as it’s changing rapidly. However, consider this a taste of what’s to come. As a good portion of the tear-apart-and-disassemble code is complete, it’s time to add some, “brain.”

posted in rop_tools by endeavormac

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Leave Your Comment


six − = 4

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org