HackIM 2019: Obfusc8 Much

Details: Written by Michael Bann

This particular binary was what I spent most of my time on. HackIM used an adaptive scoring engine this year, and this challenge ended up being worth 497 points out of 500 possible. This is a pretty strong scoring challenge, with a total of only 11 solves and I came in 6th on it.

Challenge text:

I opened this in GHIDRA but it crashed. halp pls

Binary here

Basic info:

obfusc8_much: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, stripped

Giving it a run:

./obfusc8_much
test
0

Tried other inputs, no change in output. Running ltrace, we can find a couple notable things:

ltrace ./obfusc8_much 
[pid 27946] [0x7f9cf593eec9] SYS_brk(0)                                                                                 = 0x17d8000
[pid 27946] [0x7f9cf59327de] SYS_access("/etc/ld.so.nohwcap", 00)                                                       = -2
[pid 27946] [0x7f9cf593fe27] SYS_access("/etc/ld.so.preload", 04)                                                       = -2
[pid 27946] [0x7f9cf593fcdd] SYS_openat(0xffffff9c, 0x7f9cf5944428, 0x80000, 0)                                         = 3
[pid 27946] [0x7f9cf593fc43] SYS_fstat(3, 0x7ffd94620e20)                                                               = 0
[pid 27946] [0x7f9cf593ff43] SYS_mmap(0, 0x100f7, 1, 2)                                                                 = 0x7f9cf5b39000
[pid 27946] [0x7f9cf593fed7] SYS_close(3)                                                                               = 0
[pid 27946] [0x7f9cf593b139] SYS_access("/etc/ld.so.nohwcap", 00)                                                       = -2
[pid 27946] [0x7f9cf593fcdd] SYS_openat(0xffffff9c, 0x7f9cf5b4cdd0, 0x80000, 0)                                         = 3
[pid 27946] [0x7f9cf593fda4] SYS_read(3, "\177ELF\002\001\001\003", 832)                                                = 832
[pid 27946] [0x7f9cf593fc43] SYS_fstat(3, 0x7ffd94620e80)                                                               = 0
[pid 27946] [0x7f9cf593ff43] SYS_mmap(0, 8192, 3, 34)                                                                   = 0x7f9cf5b37000
[pid 27946] [0x7f9cf593ff43] SYS_mmap(0, 0x3f0ae0, 5, 2050)                                                             = 0x7f9cf5532000
[pid 27946] [0x7f9cf593fff7] SYS_mprotect(0x7f9cf5719000, 2097152, 0)                                                   = 0
[pid 27946] [0x7f9cf593ff43] SYS_mmap(0x7f9cf5919000, 0x6000, 3, 2066)                                                  = 0x7f9cf5919000
[pid 27946] [0x7f9cf593ff43] SYS_mmap(0x7f9cf591f000, 0x3ae0, 3, 50)                                                    = 0x7f9cf591f000
[pid 27946] [0x7f9cf593fed7] SYS_close(3)                                                                               = 0
[pid 27946] [0x7f9cf5924024] SYS_arch_prctl(4098, 0x7f9cf5b384c0, 0x7f9cf5b38e10, 0x7f9cf5b37998)                       = 0
[pid 27946] [0x7f9cf593fff7] SYS_mprotect(0x7f9cf5919000, 16384, 1)                                                     = 0
[pid 27946] [0x7f9cf593fff7] SYS_mprotect(0x7c2000, 4096, 1)                                                            = 0
[pid 27946] [0x7f9cf593fff7] SYS_mprotect(0x7f9cf5b4a000, 4096, 1)                                                      = 0
[pid 27946] [0x7f9cf593ffd7] SYS_munmap(0x7f9cf5b39000, 65783)                                                          = 0
[pid 27946] [0x4005c9] memset(0x7ffd946217a0, '\0', 71)                                                                 = 0x7ffd946217a0
[pid 27946] [0x4005de] __isoc99_scanf(0x5c2b74, 0x7ffd946217a0, 71, 0 <unfinished ...>
[pid 27946] [0x7f9cf56417c3]      SYS_fstat(0, 0x7ffd945fbbb0)                                                          = 0
[pid 27946] [0x7f9cf56484b9]      SYS_brk(0)                                                                            = 0x17d8000
[pid 27946] [0x7f9cf56484b9]      SYS_brk(0x17f9000)                                                                    = 0x17f9000
[pid 27946] [0x7f9cf5642081]      SYS_read(0test
, "test\n", 1024)                                                           = 5
[pid 27946] [0x4005de] <... __isoc99_scanf resumed> )                                                                   = 1
[pid 27946] [0x5c2add] printf("%d", 0 <unfinished ...>
[pid 27946] [0x7f9cf56417c3]      SYS_fstat(1, 0x7ffd945f8530)                                                          = 0
[pid 27946] [0x5c2add] <... printf resumed> )                                                                           = 1
[pid 27946] [0x7f9cf5642154] SYS_write(1, "0", 10)                                                                       = 1
[pid 27946] [0x7f9cf5642217] SYS_lseek(0, -1, 1)                                                                        = -29
[pid 27946] [0x7f9cf5616e06] SYS_exit_group(0 <no return ...>
[pid 27946] [0xffffffffffffffff] +++ exited (status 0) +++

Specifically:

0x4005c9 == Memset'ing a buffer (getting it ready for use)
0x4005de == Scanf'ing info that buffer (you can tell via the use of the same memory address as memset)
0x5c2add == Printing our failure. Note it's a %d, so we probably should not be expecting anything other than an int at that point.

At this point, I decided to open it up in radare2 (r2 -A ./obfusc8_much). After seeking over to main, I was greeted with a slew of identified variables. For those unaware, part of radare2 (and most major disassemblers) analysis passes is to identify variables on the stack. It does this by emulating execution (in r2's case, using an intermediate language called ESIL), and watching for stack based reads and writes. When it discovers those, it marks the stack offset at that location as a possible variable. It's not uncommon to have upwards of 20 of those for longer functions. However, for the main function here, r2 discovered 729 local variables (afi~locals).

In fact, the visual control flow graph for this function is so large that IDA Pro refused to render it at all. Radare2, on the other hand, puts on it's "no fear" hat and boldly builds the graph for you. Surprisingly, after about 30 seconds, it does pop up the graph and you can somewhat navigate around. This turns out to be unimportant, however, as the graph is far too large to actually even see or understand. This is because we keep running into things called opaque predicates.

Without getting into the details of opaque predicates too much, the concept is that to the compiler (and disassembler), the action being performed seems like it could go either way, when in reality it will always end up going the same direction. Think of it like when your boss asks you to come in on the weekend. Technically, it looks like you have an option to not come in, but really it's not conditional, it's determined ahead of time.

Here's an example that repeats throughout the binary:

│       ╭─> 0x004007b5      b001           mov al, 1                                                                                                                                              
│       ╎   0x004007b7      84c0           test al, al                                                                                                                                            
│      ╭──< 0x004007b9      0f8536000000   jne 0x4007f5                ;[3]                                                                                                                       
│     ╭───< 0x004007bf      e900000000     jmp 0x4007c4                ;[4]

In this case, the binary is moving the static value of 1 into a register al. It is the testing if register al is equal to zero. Well of course it's not, we just moved 1 into there! Finally, it's conditionally jumping to 0x4007f5. This type of predicate jump is actually pretty easy for disassemblers to optimize away as it's immediately used after it is set. However, there are more tricks that are far more complicated for compilers and disassemblers to identify and optimize. For more information on the predicates used in this binary, see the authors writeup (link below).

At this point, I briefly attempted to do a basic run of angr through the application with the hope that magic would ensue. Of course it did not, as angr faced the huge state explosion problem and (surprisingly slowly) filled up my RAM. I will return to this in a second, but I took a brief detour at this point.

In my frustration with the predicates, I decided to try to simply stamp them out. While this effort didn't actually have a reasonable affect, it was nice to see just how easy mass binary editing can be with radare2. To do this, I first actually used objdump as it was easiest to find the locations I wanted to kill.

objdump -M intel -d ./obfusc8_much | grep -A 1 "test   al,al" | grep jne | cut -f 1 -d ":" | tr -d ' ' | tr '\n' ','

Since the jmp target addresses change, it's super handy to use radare2's assembly patch option. With this, you can write what assembly you want to be stamped in and it will convert to bytes and write the corresponding bytes instead. With this in mind, patching 3600 locations simply becomes about 18 lines of python.

#!/usr/bin/env python

import r2pipe
import json
import shutil

with open('addrs_to_fix.json','r') as f:
    addrs_to_fix = json.loads(f.read())

shutil.copy('./dist', './dist.dork')
r2 = r2pipe.open('./dist.dork')
r2.cmd('oo+')

for addr in addrs_to_fix:
    print('Fixing ' + hex(addr))
    old_inst = r2.cmdj('pij 1 @ {}'.format(addr))
    old_jmp = old_inst[0]['jump']
    r2.cmd('wa jmp {} @ {}'.format(old_jmp, addr))

Unfortunately, as stated up front, that actually didn't help angr at all, since angr is smart enough to not take the wrong branch. It did, however, change up the control flow graph that radare2 created, but not enough to really help me understand the flow any better.

I also checked for instruction counting leaks. When people write software, often times they want to be efficient and fast. This, as it turns out, can be a serious security flaw as it can allow for an iterative approach to breaking the security. This type of fast-fail has previously been abused in login systems to identify character-by-character the correct password. To check this, I used a core linux tool called perf. If you're unfamiliar with perf, it exposes performance tracking information such as instructions, branches taken, cache hit/miss, etc. It comes in handy for these types of attacks since it does not impose any significant performance cost on the running binary, and unlike valgrind it does not need to even modify the binary.

$ sudo perf stat -e instructions:u ./obfusc8_much 
test
0
 Performance counter stats for './obfusc8_much':

           181,631      instructions:u                                              

       0.504198802 seconds time elapsed

$ sudo perf stat -e instructions:u ./obfusc8_much 
hest
0
 Performance counter stats for './obfusc8_much':

           181,952      instructions:u                                              

       2.496059931 seconds time elapsed

Note here that 'test' as an input caused 181,631 instructions to be run, while 'hest' caused 181,952. This is a bit of a dirty trick since we happen to know the flag format and that the flag should start with 'hackim19{'. During the CTF, i actually switched back to working with angr at this point, though I should have followed down this path more. After the CTF ended, I wrote up a dynamic solver using perf as above. It gets you close, and likely with some modifications to the script you could figure out the real flag, however instruction counting in this manner isn't perfect.

Circling back to angr, I decided i needed to provide angr more information for it to be able to find the solution. Main issue with how this program appears to be executing is that the 'wrong' answer path still had a bunch of branches after it. To identify a way around this issue, I wanted to see how the binary was storing this success or failure variable. In the ltrace run above, we discovered our print statement at 0x5c2add. Let's look before it to try to identify what variable it's using.

[0x005c2add]> pdi 20 @ 0x5c2add-0x31
0x005c2aac         8bb5b8adfdff  mov esi, dword [rbp - 0x25248]
0x005c2ab2                 8916  mov dword [rsi], edx
0x005c2ab4       488b85b8adfdff  mov rax, qword [rbp - 0x25248]
0x005c2abb                 8b08  mov ecx, dword [rax]
0x005c2abd               894dec  mov dword [rbp - 0x14], ecx
0x005c2ac0             837dec00  cmp dword [rbp - 0x14], 0
0x005c2ac4               0f94c0  sete al
0x005c2ac7                 2401  and al, 1
0x005c2ac9               0fb6f0  movzx esi, al
0x005c2acc 48bf772b5c0000000000  movabs rdi, 0x5c2b77
0x005c2ad6                 b000  mov al, 0
0x005c2ad8           e893d9e3ff  call sym.imp.printf
0x005c2add                 31f6  xor esi, esi
0x005c2adf         8985bcacfdff  mov dword [rbp - 0x25344], eax
0x005c2ae5                 89f0  mov eax, esi
0x005c2ae7               4889ec  mov rsp, rbp
0x005c2aea                   5d  pop rbp
0x005c2aeb                   c3  ret

Looking back from the printf statement, we can see the variable is being loaded from rbp-0x14. Let's validate this by looking for where this variable is being accessed from.

[0x00400590]> pdi 0x10000| grep -C 3 -i '\[rbp - 0x14\]'
0x004005a1       c745fc00000000  mov dword [rbp - 4], 0
0x004005a8               897df8  mov dword [rbp - 8], edi
0x004005ab             488975f0  mov qword [rbp - 0x10], rsi
0x004005af       c745ec00000000  mov dword [rbp - 0x14], 0
0x004005b6               4889c7  mov rdi, rax
0x004005b9                 89ce  mov esi, ecx
0x004005bb           ba47000000  mov edx, 0x47
--
0x0040a760       488b85f8f1ffff  mov rax, qword [rbp - 0xe08]
0x0040a767               833800  cmp dword [rax], 0
0x0040a76a         0f84d0330000  je 0x40db40
0x0040a770               8b45ec  mov eax, dword [rbp - 0x14]
0x0040a773               4889e1  mov rcx, rsp
0x0040a776             4883c1f0  add rcx, 0xfffffffffffffff0
0x0040a77a               4889cc  mov rsp, rcx
--
0x0040db2c               418911  mov dword [r9], edx
0x0040db2f       488b85b8f1ffff  mov rax, qword [rbp - 0xe48]
0x0040db36                 8b08  mov ecx, dword [rax]
0x0040db38               894dec  mov dword [rbp - 0x14], ecx
0x0040db3b           e96a340000  jmp 0x410faa
0x0040db40               8b45ec  mov eax, dword [rbp - 0x14]
0x0040db43               4889e1  mov rcx, rsp
0x0040db46             4883c1f0  add rcx, 0xfffffffffffffff0
0x0040db4a               4889cc  mov rsp, rcx
--
0x00410f9c                 890a  mov dword [rdx], ecx
0x00410f9e       488b85d8edffff  mov rax, qword [rbp - 0x1228]
0x00410fa5                 8b08  mov ecx, dword [rax]
0x00410fa7               894dec  mov dword [rbp - 0x14], ecx
0x00410faa             0fbe45a1  movsx eax, byte [rbp - 0x5f]
0x00410fae                 89c1  mov ecx, eax
0x00410fb0               0fafc9  imul ecx, ecx
--
<clipped>

We can see there are a bunch of places that end up setting that variable. They use the same pattern of comparing with a location and then setting that variable. Now that we know where it is, we can tell angr that we should explicitly avoid any path that ends up causing that variable to be set to failure. This will allow us to avoid the state explosion that we were seeing before.

In angr, while you normally feed the IP pointer into the find and avoid params of explore, you can also provide it with a callable. When you give it a callable, that function will be fed individual state objects and must return boolean whether that state should be considered found/avoid or not. In our case, it will look something like this:

def avoid(s):
    global failed_addr

    # Haven't found the stack address yet
    if failed_addr == None:
        return False

    # Avoid states we cannot satisfy
    if not s.satisfiable(extra_constraints=(s.mem[failed_addr].dword.resolved == 0,)):
        print("Pruning state at: {}".format(s.ip))
        return True

    return False

The 'failed_addr' we will dynamically discover during execution. Strictly speaking, i believe that address will be static across runs in angr, but for my own sanity I am dynamically discovering the address at angr run time. The meat of it is that second s.satisfiable call, where I ask angr if the failed variable can still be equal to zero. If the variable had been set, this would not be possible anymore and avoid would return true.

That was the major hurdle to overcome here. One other thing is that you may want to attempt to keep your memory use down, as it can still creep up there and crash from using it all. I've found two nice features for this: Spiller and auto_drop.

Spiller is an exploration technique that keeps only given number of states in memory at any point in time. If you get too many states, it will automatically 'spill' them to disk and drop them from memory. This helps to give you a tradeoff of in-memory states (faster) vs states on disk (slower, but you probably have more disk space).

Auto_drop is a nice feature in the SimulationManager that will, as the name implies, automatically drop paths in the given stashes. I have found that the memory management for angr/z3/python combination works better if you drop early, vs dropping the paths later. I have no hard evidence why, so take it for my opinion. Regardless, I found setting this to auto-drop the avoid states (of which there will be many) helped to alleviate memory consumption.

Finally, since I always end up being worried about my memory getting blown out, especially if i leave angr running in the background for a while, I created my own exploration technique that is now merged into angr called MemoryWatcher. MemoryWatcher allows you to set the minimum amount of free memory your system can have before angr stops itself. By default, it sets it to 95% of your system's memory being used. This way, even if the other memory use mitigations aren't enough, MemoryWatcher can help stop execution if you're about to run out of memory. Note, it is by no means a guarantee that it will stop it in time. Specifically, it checks to stop it during the beginning of a simgr step. If the state ends up taking up a significant amount of memory during a single step, then it's quite possible MemoryWatcher will fail. But hey, it's free and you get what you pay for...

When running my solution, it took a few hours on my i7. My angr solution was certainly not the fastest solution, and given the flag, it was not the intended solution. It seems the author assumed that a dynamic approach should be taken. Oh well.

My solution: https://github.com/bannsec/CTF/blob/master/2019/hackim/obfusc8_much_win.py

Author's writeup: https://sudhackar.github.io/blog/writing-an-llvm-pass-for-ctfs