Tokyo Western's CTF 2017 offered a neat simple cryptography challenge that was written in python. For the challenge, we are given the python source code that was used to encrypt, and the output of the encryption. The challenge is to determine what the original input was that produced the output we're given. While the proper solution here is likely to actually understand and reverse the algorithm, I took this as an opportunity to enable my tool pySym to solve this challenge.

Disclaimer

I should note up front, I did not solve this challenge during the contest. Rather, I was looking for challenges to help guide the direction of development for my pySym tool. One selling point of this challenge was it's simplicity. It was simply stated in python without excessive library calls or other complexities. This allowed me to focus my development efforts on a subset of python that would likely be relevant for many challenges, while at the same time helping me validate my output.

 

PySym Overview

Briefly stated, pySym is a python source code symbolic execution engine. It is a toy. I started working on it a year or so ago off and on. My goals were simply to use the concept as a means to further my own understanding of the intricacies of symbolic execution as well as python. In doing so, I have learned a ton on both sides. I modeled pySym after Shellphish's awesome tool called angr. If you are familiar with angr, you will note many similarities with pySym. That said, angr moves very fast, and I certainly cannot keep up with their pace. Hopefully, I have modeled it close enough to how their tool works that you can intuitively understand what to do.

 

I encourage anyone reading this to install pySym, either through Docker (likely the best option), or pypi/git repo. Just be warned, the tool is still a toy. I do not have time to make it as amazing as angr. That said, I hope to prove some of it's usefulness here, and will continue to add on as I find time to do so. Given the staggering scope of writing a symbolic execution engine for python, where I can I will be using CTF challenges to help guide my development efforts. If you have things you are working on with pySym, drop me a line in the github issue tracker or email me through this website. I cannot promise I will complete your requests, but I would value understanding how you might use this and priorities you would see.

 

As they say, now on to the show!

 

The Challenge

Given the brevity of the challenge, I will post the source here with links at the bottom. The first part of the challenge was the source, as follows:

 

#!/usr/bin/python2

import sys
import random

key = sys.argv[1]
flag = '**CENSORED**'

assert len(key) == 13
assert max([ord(char) for char in key]) < 128
assert max([ord(char) for char in flag]) < 128

message = flag + "|" + key

encrypted = chr(random.randint(0, 128))

for i in range(0, len(message)):
  encrypted += chr((ord(message[i]) + ord(key[i % len(key)]) + ord(encrypted[i])) % 128)

print(encrypted.encode('hex'))

 

We were also given what was outputted from running that challenge with the unknown input.

 

7c153a474b6a2d3f7d3f7328703e6c2d243a083e2e773c45547748667c1511333f4f745e

 

Note that there are two unknowns here. The key and the flag. The algorithm itself is fairly strait forward.  It performs a repeated key shifting approach with the message and the key. Again, I am discussing the pySym solution, rather than the actual crypto answer.

 

First, we need to know the lengths of key and flag. We are told via the assertion that the length of the key is 13 bytes. We are not directly told the length of the flag. However, notice that the length of the output must be proportional to the length of the input, as there is no padding or variable lengths. Thus, if we take the length of the encrypted output hex unencoded as 36, subtract 13 from the key we know, and then subtract 2 more (one for the "|" character, and one for the first encrypted char), we see that the flag must be 21 characters long.

 

Why is the length here important? As it turns out, in symbolic execution, symbolic length is difficult. For reasons I will not go into here, I have chosen to keep pySym utilizing concrete lengths. This means that, wherever possible, you should determine appropriate lengths prior to attempting execution. If you are not certain the lengths, you could have pySym attempt different lengths separately. Given we can calculate it in this case, we will simply use the known length.

 

Remember I mentioned above about the scope of pySym being immense. Well, an effect of this is that I will not always have a symbolic function definition for the functions or interactions you're using. pySym will attempt to warn you whenever this happens so you know this is the case. In the above example, as of writing I still have not had time to implement the max function. However, one nice benefit of working on the python source itself, is that you can re-write the functions using other functions that will have the same end meaning, but avoid problematic calls. In this case, we can re-write the max statement as follows:

 

assert max([ord(char) for char in key]) < 128

# This can be re-written as
for char in key:
    assert ord(char) < 128

 

Note that while it's not 100% the same, functionally it's close enough for pySym to understand what you mean. Similarly, the end of the script prints out the value in hex. This is also something that is not currently implemented. However, we do not need it to be implemented, as we can simply hex decode our given output to set constraints equivalently. Thus, I just commented that out.

 

The end modified source code is:

 

#!/usr/bin/python2

key = pyState.String(13) # Symbolic String of length 13
flag = pyState.String(21) # Symbolic String of length 21

assert len(key) == 13

#assert max([ord(char) for char in key]) < 128

for char in key:
    assert ord(char) < 128

#assert max([ord(char) for char in flag]) < 128
for char in flag:
    assert ord(char) < 128

message = flag + "|" + key

encrypted = chr(random.randint(0, 128))

for i in range(0, len(message)):
  encrypted += chr((ord(message[i]) + ord(key[i % len(key)]) + ord(encrypted[i])) % 128)

#print(encrypted.encode('hex'))

 

Note that I didn't have to change much in this example. This is the source code that pySym will now execute through.

 

Running pySym

If you are familiar with angr, much of this will seem very familiar, if not dated. The first step is to import pySym and open up our modified source as a project:

 

In [1]: import pySym

In [2]: proj = pySym.Project("./cipher_mod.py")

 

A project file for now is simply a convenience. It will open the file you pointed to, read in the source, parse the python source to AST, and then be able to setup your paths and path groups. Speaking of, let's grab a vanilla path group.

 

In [3]: pg = proj.factory.path_group()

 

Your path group will manage performing steps in execution for you. For this example in specific, there are no points where we need to state split, and we'd like to make it to the end. To do this, let's just tell the path group to explore the entire thing for us.

 

In [4]: pg.explore()
resolveObject: Could not resolve object named pyState
resolveObject: Could not resolve object named pyState
resolveObject: Could not resolve object named random

 

For now, you will see error messages like the ones above. This isn't actually an error and you can safely ignore it. Yeah, I should fix that at some point too. Total execution time for me is generally under 4 minutes. If it takes substantially longer, go ahead and kill it and try again. Sometimes the solver just chooses a bad direction and gets stuck.

 

During and after exploration, you should check the status of your paths. They can end up completed, deadended, errored, active, or found. Here's what ours looks like now:

 

In [7]: pg
Out[7]: <PathGroup with 1 completed>

 

So we have one completed path through. We can extract a copy of the working state to play around with.

 

In [8]: s = pg.completed[0].state.copy()

 

The interface for pySym allows you to pull out the variables and interact with them as if they were defined within your scope. So wee what I mean by this, we will first pull out the encrypted variable. Then, we will set it to the expected output that we were given.

 

In [9]: encrypted = s.getVar('encrypted')

In [10]: encrypted
Out[10]: <pySym.pyObjectManager.String.String at 0x7f71a69f5710>

In [11]: target = "7c153a474b6a2d3f7d3f7328703e6c2d243a083e2e773c45547748667c1511333f4f745e"

In [12]: from binascii import unhexlify

In [13]: target = unhexlify(target)

In [14]: target = target.decode('cp1252')

In [15]: encrypted.setTo(target)

 

At this point, we have executed through this python script, and informed pySym what our output should be. We can now pull out the flag value and ask pySym to print us what it should be.

 

In [16]: flag = s.getVar('flag')

In [18]: print(flag)
TWCTF{Crypto-is-fun!}

 

From a user perspective, that's all that needs to be done. Your Mileage May Vary with this, but it has effectively proven its ability to solve a minor crypto challenge. I created a quick asciinema cast of this in action: cast

 

Files: cipher.py, cipher_mod.py, encrypted.txt