PoliCTF 2015 -- Reversemeplz

Details: Written by Michael Bann

Category: Reversing Points: 200 Solves: 108 Description:

Last month I was trying to simplify an algorithm.. and I found how to mess up a source really really bad. And then this challenge is born. Maybe is really simple or maybe is so hard that all of you will give up. Good luck!

reversemeplz

Another binary reversing problem. This one is a dynamically linked 32-bit ELF with symbols stripped. A look at strings does not give us any useful information. The binary itself, when run, appears to wait for input, perform some sort of check, then tell you if it's correct or not. An ltrace indicates that there's no useful function calls that will help us better understand what's going on. Lets start by looking at the location of the gets call.

function sub_8048390 {
    ebp = esp & 0xfffffff0;
    esi = ebp + 0xffffff68;
    gets(esi);
    eax = sub_8048801();
    esp = ((esp & 0xfffffff0) - 0x118) + 0x10;
    if (eax != 0x0) {
            stack[2048] = eax;
            strcpy(ebp + 0xfffffee8, "The flag is flag{");
            strcat(stack[2048], ebp + 0xfffffee8);
            strcat(stack[2048], ebp + 0xfffffee8);
    }
    else {
            esp = esp - 0xc;
            stack[2048] = 0x80489d0;
    }
    puts(stack[2048]);
    ecx = stack[2047];
    return 0x0;
}

This function (sub_8048390) looks to be the main entry point of the application. It's purpose is fairly strait forward. Get info with the "gets" call, do validation via "sub_8048801()" function, then tell the person they're right or wrong. Let's take a look at the validation function next.

function sub_8048801 {
    ebp = esp;
    ebx = *(ebp + 0x8);
    *(ebp + 0xffffffb8) = rep intrinsic_movsd(*(ebp + 0xffffffb8), *0x8048980, 0x0);
    edi = 0x0;
    esi = 0x0;
    do {
            if (*(int8_t *)(ebx + esi) <= 0x60) {
                    *(int8_t *)(ebx + esi) = sub_8048519(*(int8_t *)(ebx + 0x1) & 0x1);
            }
            if (*(int8_t *)(ebx + esi) > 0x7a) {
                    *(int8_t *)(ebx + esi) = sub_8048519(sign_extend_32(*(int8_t *)(ebx + 0x1) & 0x2));
            }
            eax = sign_extend_32(*(int8_t *)(ebx + esi));
            eax = sub_8048519(eax);
            *(int8_t *)(ebp + esi + 0xffffffa8) = eax;
            if ((eax > 0xcc) && (0x1 != 0xcf)) {
                    edi = 0x1;
            }
            esi = esi + 0x1;
    } while (esi != 0xf);
    eax = 0x0;
    edi = edi - 0x1;
    COND = edi != 0x0;
    if (COND) goto loc_8048880;

loc_8048877:
    eax = 0x0;
    return eax;

loc_8048880:
    eax = eax + 0x1;
    if ((*(int8_t *)(ebp + eax + 0xffffffa8) & 0xff) - (*(int8_t *)(ebp + eax + 0xffffffa7) & 0xff) == *(ebp + eax * 0x4 + 0xffffffb4)) goto loc_804887b;
    goto loc_8048877;

loc_804887b:
    if (eax == 0xe) goto loc_8048895;
    goto loc_8048880;

loc_8048895:
    if (*(int8_t *)(ebx + 0xf) == 0x0) goto loc_80488ae;

loc_804889b:
    edx = *(int8_t *)(ebp + 0xffffffa9);
    eax = *(int8_t *)(ebp + 0xffffffa8);
    do {
            ecx = edx + 0xffffffff;
            if (eax == 0xcc) {
                break;
            }
            eax = eax ^ edx;
            edx = ecx;
    } while (true);
    goto loc_8048877;

loc_80488ae:
    eax = sub_8048519(0x0);
    if (eax == 0x0) {
            eax = *(int8_t *)ebx & 0xff;
            eax = sub_8048519(eax);
            eax = (eax == 0x62 ? 0x1 : 0x0) & 0xff;
    }
    else {
            eax = 0x0;
    }
    return eax;
}

This one is a bit more hairy. The first part we can see references to checks against less than 0x60 and greater than 0x7a. A quick validation with python (or however you want to do it), shows that this maps to "a"->"z". Basically, it would appear this executable is looking for input to be lowercase ascii. The real validation checks start at "loc_8048880".

I found these checks easier to understand in assembly:

08048881         movzx      edx, byte [ss:ebp+eax+0xffffffa8]
08048886         movzx      ecx, byte [ss:ebp+eax+0xffffffa7]
0804888b         sub        edx, ecx
0804888d         cmp        edx, dword [ss:ebp+eax*4+0xffffffb4]
08048891         je         0x804887b

So what we're doing here is getting two entries in an array. Finding the difference between them, then comparing it against the index in another array. All of these are based off of eax (which is clearly our array index due to it starting at 0 and incrementing). Further, the ecx value is 1 before edx. What this means is we're basically walking down the input, subtracting two characters (call it x, and x+1), and then checking against an array of what the difference should be.

I took a pause at this point to validate my hypothesis. Using input of "AA" gives me two values. If I change it to "AB", only one of the values changes. If I change it to "BA", it changes the other way. What we're seeing is that the mapping of input characters to their new x' characters is a deterministic process, and not reliant on anything around it. This is good because it allows us to map out our input->array output. This was fairly strait forward utilizing the power of gdb python bindings.

import string

lookup = {}

for c in string.ascii_lowercase:
        gdb.execute("r <<< {0}{1}".format(c,c))
        out = gdb.execute("p/c $edx",True,True).split("'")[1]
        lookup[c] = out

That gives us:

lookup = {'v': 'i', 'g': 't', 'c': 'p', 'd': 'q', 'r': 'e', 'm': 'z', 'a': 'n', 'y': 'l', 'f': 's', 'b': 'o', 'e': 'r', 'j': 'w', 'z': 'm', 'p': 'c', 'k': 'x', 'l': 'y', 'q': 'd', 'o': 'b', 't': 'g', 'u': 'h', 's': 'f', 'h': 'u', 'i': 'v', 'x': 'k', 'n': 'a', 'w': 'j'}

Cool. We have our dictionary to work with now. The next question is what are the expected differences. Because this is simply acting as an array, we can dump those values directly using gdb.

0xffffcc90:     -1      17      -11     3
0xffffcca0:     -8      5       14      -3
0xffffccb0:     1       6       -11     6
0xffffccc0:     -8      -10     0       -134549504

So we have our input->input' mapping. We know what the difference is supposed to be. Let's set about doing it.

The first challenge of this is to enumerate what are the possible combinations that make a given difference. Obviously, there will be more than one since 'a'-'b' and 'b'-'c' mathematically will be equal to -1. I created a python script to loop through every two character possibility for every desired value.

c = []
desiredVals = [-1,17,-11,3,-8,5,14,-3,1,6,-11,6,-8,-10]
for desiredVal in desiredVals:
        options = []
        for key1 in lookup.keys():
                for key2 in lookup.keys():
                        if ord(lookup[key2]) - ord(lookup[key1]) == desiredVal:
                                options.append([key1,key2])
        c.append(options)

We now know every valid combination for each of those desired values. The trick is to find the right one. Again, there are many options here. My initial thought was to use the z3 theorem prover (or similar), but I was having difficulty formulating the syntax. This really is a satisfiability challenge, so z3 should be able to do it just fine. As usual, I defaulted to python. What follows is the script that will solve this, but with the caveat that you need to add each char value in one or two at a time. This is important because we get into an exponential combination problem and it quickly becomes unmanageable. But, as noted before, this process is linear and doesn't take into account other parts of the input, so we can solve this one part at a time.

[ ''.join(c0 + [c1[1]] + [c2[1]] + [c3[1]] + [c4[1]] + [c5[1]] + [c6[1]] + [c7[1]] + [c8[1]] + [c9[1]] + [c10[1]] + [c11[1]] + [c12[1]] + [c13[1]])
        for c0 in chars[0]
        for c1 in chars[1]
        for c2 in chars[2]
        for c3 in chars[3]
        for c4 in chars[4]
        for c5 in chars[5]
        for c6 in chars[6]
        for c7 in chars[7]
        for c8 in chars[8]
        for c9 in chars[9]
        for c10 in chars[10]
        for c11 in chars[11]
        for c12 in chars[12]
        for c13 in chars[13]
        if
        c1[0] == c0[1] and
        c2[0] == c1[1] and
        c3[0] == c2[1] and
        c4[0] == c3[1] and
        c5[0] == c4[1] and
        c6[0] == c5[1] and
        c7[0] == c6[1] and
        c8[0] == c7[1] and
        c9[0] == c8[1] and
        c10[0] == c9[1] and
        c11[0] == c10[1] and
        c12[0] == c11[1] and
        c13[0] == c12[1]
]

It isn't terribly elegant, but it worked. It uses a very powerful and fast technique in python called a "list comprehension". In the first line, I'm basically telling python, "I'd like to take one character from each of these arrays." The next bunch of "for" commands is me explaining to python where those values I just mentioned come from. Finally, the magic happens in the "if" section, where I explicitly tell python that the 0th and 1st characters MUST match up for this to be valid.

At length 7 i got ['onetwoth', 'qpgvyqvj', 'pofuxpui', 'srixasxl', 'rqhwzrwk', 'tsjybtym']. My guess at this point is "onetwoth" is the right direction. So I went back and hard coded that in to the array so that python won't have to check for it. Similarly, I added 1 or 2 more characters at a time, simplifying as I went to end up with the following:

onetwotheflagyo

Enter it in to the executable for validation:

$ ./challenge 
onetwotheflagyo
The flag is flag{onetwotheflagyo}