Hack The Box - ReRop

A write-up for the Hack The Box reverse engineering challenge "ReRop"
2024-10-06

Introduction

The zip archive downloaded from Hack The Box contains a single file: a Linux x86_64 executable named rerop. The file is not stripped, so we get quite a lot of help from symbol names. It also isn't position independent, so we know the various sections from the binary will be loaded at addresses determined at compile time and not run time. This will become important later.

The main function is extremely simple in appearance:

This leaves the check function as the most likely source of interest, both from its name and the absence of anything else interesting happening in main.

The check function

Looking into the check function, the Ghidra decompilation output is very misleading, and just shows the following:

void check(void)
{
  return;
}

This is a case where decompilation can't really do much, because the function's implementation can't really be expressed in C. Examining the disassembly is much more illuminating:

endbr64
lea %(rdi), %rsp ; Intel: LEA RSP, [RDI]
ret

This function does do something a lot more significant than just returning, it replaces the stack pointer with the address passed in as the first argument. Reflecting on the challenge name, this makes a lot of sense, as this looks a lot like a stack pivot to set up a ROP1 chain.

Return-oriented Programming

It seems reasonable to give a quick overview of return-oriented programming here, before moving on with this write-up, for the sake of completeness.

In return-oriented programming, either the stack is overwritten, or the stack pointer itself is altered, to have the return pointer of a stack frame point to a so-called gadget. These gadgets are usually desired to be a short sequence of instructions to achieve a specific purpose, e.g. popping a value from the stack into a register, followed by a ret instruction. Because the stack is controlled, this means the return pointer used by the ret from a gadget can also be controlled, giving a way of executing an arbitrary chain of instructions only using the code that already exists within an executable.

This technique can be useful to bypass the non-executable stack protection that prevents an attacker from simply writing a shellcode to the stack and overwriting the return pointer of the current stack frame to jump into that shellcode. Since return-oriented programming only relies on execution of code that is already part of the program being hijacked, and that code must be executable for the program to work, this technique provides an elegant bypass for one of the early mitigations created against turning stack-based buffer overflows into arbitrary code execution.

In the case of the rerop executable, the stack pointer is modified to pointer instead to some static data contained in the binary's .data section. We can analyze this data to establish the actual code execution by treating any pop instruction as consuming data from the "stack", and treating ret as special case of pop %rip.

Linearizing the ROP Chain

What we have in the stack, after the pivot to the address of the data symbol, is something like the following (truncated to match up with a functional block as shown later):

0x4c5100: 0x450ec7
0x4c5108: 0x000065
0x4c5110: 0x401eef
0x4c5118: 0x000000
0x4c5120: 0x409f1e
0x4c5128: 0x000001
0x4c5130: 0x458142
0x4c5138: 0x000000
0x4c5140: 0x41aab6
0x4c5148: 0x451fe0
0x4c5150: 0x450ec7
0x4c5158: 0x001198
0x4c5160: 0x452000

The structure of this is fairly simple, it's a list of return pointers interlaced with data that is consumed from the stack by the instructions referenced by the return address, before that sequence of instructions invokes another return. E.g. the instructions at 0x450ec7 (the first return pointer) are pop %rax; ret, so the first gadget sets rax equal to 0x65 (since that is the next entry on the stack after the return address).

What I'd like to see is the actual instructions executed at each return address, in sequence, including the register assignments using pop instructions. In order to achieve this, without having to manually reference every return address, I wrote a short python script to "linearize" the chain into a more comprehensible form. The program I came up with is below. It loads the binary using ELF class from pwntools to allow looking up data for symbols and from arbitrary addresses in the binary. It uses the same disassembly library (iced_x86) as I used in the PTRACE_NOPEEKING challenge.

#!/usr/bin/env python3

from iced_x86 import Decoder, Formatter, FormatterSyntax, Mnemonic, OpCodeOperandKind

from pwnlib.elf.elf import ELF
from pwnlib.util.packing import u64

def main():
    formatter = Formatter(FormatterSyntax.GAS)
    formatter.branch_leading_zeros = False
    formatter.space_after_operand_separator = True

    elf = ELF("rerop")

    # Initialize the stack pointer to the addres of `data`
    stack_pointer = elf.sym["data"]

    valid = True

    while valid:
        # Get the current return address from the stack pointer,
        return_addr = u64(elf.read(stack_pointer, 8))
        # Imitate a `ret` by consuming the data from the stack
        stack_pointer += 8

        # Read some data from the return address, 0x200 is arbitrary but proved plenty for this to work
        gadget = elf.read(return_addr, 0x200)

        decoder = Decoder(64, gadget, ip=return_addr)

        for instr in decoder:
            if instr.mnemonic == Mnemonic.INVALID:
                # Bail on encoutering the first invalid instruction
                # Might be nicer to bail on `syscall` when %rax = 0x3c (exit), but this works just fine
                valid = False
                break

            finstr = formatter.format(instr)

            if instr.mnemonic == Mnemonic.POP:
                # Get the data from the stack
                data = elf.read(stack_pointer, 8)
                # Imitate a `pop` by consuming data from the stack
                stack_pointer += 8

                # In theory we could have pops of various lengths, not just full 64 bit registers,
                # but in practice only pops of 64 bit length were used
                val = u64(data)

                print(f'0x{instr.ip:06x}: {finstr}\t# 0x{val:02x}')
            elif instr.mnemonic == Mnemonic.RET:
                print(f'0x{instr.ip:06x}: {finstr}\t\t# rsp = 0x{stack_pointer:02x}')
                break
            else:
                print(f'0x{instr.ip:06x}: {finstr}')


if __name__ == "__main__":
    main()

The basic implementation is as follows:

The output of this program is something like the following (truncated to show the same block from the stack data listed above):

0x450ec7: pop %rax	# 0x65
0x450ec8: ret		# rsp = 0x4c5110
0x401eef: pop %rdi	# 0x00
0x401ef0: ret		# rsp = 0x4c5120
0x409f1e: pop %rsi	# 0x01
0x409f1f: ret		# rsp = 0x4c5130
0x458142: pop %rdx	# 0x00
0x458143: ret		# rsp = 0x4c5140
0x41aab6: syscall
0x41aab8: ret		# rsp = 0x4c5148
0x451fe0: mov %rax, %rdi
0x451fe3: ret		# rsp = 0x4c5150
0x450ec7: pop %rax	# 0x1198
0x450ec8: ret		# rsp = 0x4c5160
0x452000: mov %rax, %rsi
0x452003: xor %rbx, %rbx
0x452006: test %rdi, %rdi
0x452009: cmovs %rsi, %rbx
0x45200d: add %rbx, %rsp
0x452010: ret		# rsp = 0x4c5168
...

This gives a good example of how the program flow works. By using pop instructions, the values of each register can be modified by using the data on the stack. Comparing this with the raw stack data, you can see the addresses on the left correlate to the return addresses in the stack data, as well as the comments indicating register assignments matching up with the data on the stack immediately following the return pointer to that gadget.

This particular block calls the ptrace2 system call, with the request PTRACE_TRACEME. The value of rax is the system call number, and there is a good reference for the register values for specific system calls at https://x64.syscall.sh/.

There is another really interesting thing about this code, in that effectively performs a conditional jump, even though there are no jmp/jcc instructions present. After the system call, its return value is stored in rdi, and an offset of 0x1198 is stored in rax. The code at 0x452000 zeroes the value in rbx, tests the value in rdi (the return value of ptrace) and if the sign bit is set, sets the value of rbx to be the offset. The instruction add %rbx, %rsp is then either add $0, %rsp (a no-op) or addr $0x1198, %rsp, which is effectively a jump, given the next instructions are determined by the value on the stack. The last ret has a comment listing its unmodified stack pointer address - this is the next gadget address if the condition didn't match. If the sign bit was set and cmovs set the value of rbx to the offset 0x1198, then the next gadget address would be 0x4c5168 + 0x1198 = 0x4c6300. The instructions executed on this branch can be found by finding the ret whose rsp value is 0x4c6300.

This address has the following breakdown in the program output:

0x450ec7: pop %rax	# 0x706f4e6d31335b1b
0x450ec8: ret		# rsp = 0x4c6310
0x458142: pop %rdx	# 0x4c57e8
0x458143: ret		# rsp = 0x4c6320
0x419ad8: mov %rax, (%rdx)
0x419adb: ret		# rsp = 0x4c6328
0x450ec7: pop %rax	# 0xa6d305b1b65
0x450ec8: ret		# rsp = 0x4c6338
0x458142: pop %rdx	# 0x4c57f0
0x458143: ret		# rsp = 0x4c6348
0x419ad8: mov %rax, (%rdx)
0x419adb: ret		# rsp = 0x4c6350
0x450ec7: pop %rax	# 0x01
0x450ec8: ret		# rsp = 0x4c6360
0x401eef: pop %rdi	# 0x01
0x401ef0: ret		# rsp = 0x4c6370
0x409f1e: pop %rsi	# 0x4c57e8
0x409f1f: ret		# rsp = 0x4c6380
0x458142: pop %rdx	# 0x0e
0x458143: ret		# rsp = 0x4c6390
0x41aab6: syscall
0x41aab8: ret		# rsp = 0x4c6398
0x450ec7: pop %rax	# 0x3c
0x450ec8: ret		# rsp = 0x4c63a8
0x401eef: pop %rdi	# 0x01
0x401ef0: ret		# rsp = 0x4c63b8
0x41aab6: syscall
0x41aab8: ret		# rsp = 0x4c63c0

This stores a string at 0x4c57e8, then calls the system call write to write the string to standard output (file descriptor 1), and finally calls the system call exit with status 1 (failure). In summary it is pretty close to the following:

// ANSI control sequences to set colour to red before text, reset after
// Prints the text "Nope" in red to the terminal
puts("\x1b[31mNope\x1b[0m");
exit(1);

This makes it pretty clear that any jump to this address is going to be an indicator of invalid input or state.

Finding the Flag

Now we understand the basic operation of the ROP chain, including the potential for conditional jumps, we need to figure out what the correct flag input should be. Thankfully, the process applied here is very repetitive, so not every instruction needs to be read and understood!

The basic flag validation block to process a single character looks like the following:

0x401eef: pop %rdi	# 0x4c7820
0x401ef0: ret		# rsp = 0x4c5188
0x450ec7: pop %rax	# 0x19
0x450ec8: ret		# rsp = 0x4c5198
0x451ff0: add %rax, %rdi
0x451ff3: ret		# rsp = 0x4c51a0
0x451fe8: mov %rdi, %rax
0x451feb: ret		# rsp = 0x4c51a8
0x45202f: movzbq (%rax), %rax
0x452033: ret		# rsp = 0x4c51b0
0x451fe0: mov %rax, %rdi
0x451fe3: ret		# rsp = 0x4c51b8
0x450ec7: pop %rax	# 0x19
0x450ec8: ret		# rsp = 0x4c51c8
0x451ff0: add %rax, %rdi
0x451ff3: ret		# rsp = 0x4c51d0
0x450ec7: pop %rax	# 0x05
0x450ec8: ret		# rsp = 0x4c51e0
0x451ff8: xor %rax, %rdi
0x451ffb: ret		# rsp = 0x4c51e8
0x450ec7: pop %rax	# 0x6e
0x450ec8: ret		# rsp = 0x4c51f8
0x451fec: sub %rax, %rdi
0x451fef: ret		# rsp = 0x4c5200
0x452011: mov $1, %esi
0x452016: test %rdi, %rdi
0x452019: cmovne %rsi, %rdx
0x45201d: ret		# rsp = 0x4c5208

First, it loads the address of buf (the user input) into rdi, then it adds 0x19 to the address, then it loads a single byte from the address into rax, and moves it into rdi. Next, it adds 0x19 to the character value, xors it with 0x05, then subtracts 0x6e. Finally, it tests if the value after processing is zero, and if not it sets rdx to 1.

This pattern repeats with different character indexes and subtraction values until the final test of validity is just to ensure that rdx is zero.

By extracting the subtraction value for each offset, the flag input can be generated. The following python script contains all extracted offsets and the value in the final subtraction, then applies the inverse of the transformation to find each character's correct value (note the mapping of 0x19 to 0x6e as demonstrated in the example):

#!/usr/bin/env python3

from operator import xor

sub_vals = {
        0x00: 0x4d,
        0x01: 0x50,
        0x02: 0x41,
        0x03: 0x7b,
        0x04: 0x5e,
        0x05: 0x3c,
        0x06: 0x6a,
        0x07: 0x5e,
        0x08: 0x62,
        0x09: 0x65,
        0x0a: 0x3b,
        0x0b: 0x5b,
        0x0d: 0x64,
        0x0c: 0x6e,
        0x0e: 0x73,
        0x0f: 0x4a,
        0x10: 0x81,
        0x11: 0x75,
        0x12: 0x67,
        0x13: 0x6f,
        0x14: 0x67,
        0x15: 0x71,
        0x16: 0x43,
        0x17: 0x6c,
        0x18: 0x72,
        0x19: 0x6e,
        0x1a: 0x48,
        0x1b: 0x74,
        0x1c: 0x9c,
}

flag_bytes = [0]*0x1d

for i in range(0x1d):
    flag_bytes[i] = xor(sub_vals[i], 0x05) - i

flag = ''.join([chr(x) for x in flag_bytes])
print(flag)

Running this will output the correct flag input.