ShmooCon Kaizen 2018: This XOR That

This challenge was interesting in that the solution speaks to a core principle in cryptography. The challenge presented you with two, random looking character strings, and asked you to determine which was XOR encrypted data, and which was just random noise.

Here's an example:

$ nc challenges.kaizen-ctf.com 10001
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-  THIS XOR THAT  -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                                                                               +
+   You will be presented with  20 prompts, each asking one simple question:    +
+   Is number 1 the encrypted ciphertext, or is number 2?                       +
+                                                                               +
+   Get ready to play "This XOR That!"                                          +
+                                                                               +
+   *****Note****                                                               +
+        ALL the ciphertexts are quotes from the english language               +
+                                                                               +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-  INSTRUCTIONS   -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                                                                               +
+   You'll be given the questions in order - any wrong answers end the game     +
+   immediately! Enter your answer, followed by a newline character "\n"        +
+   to have it read by the input system                                         +
+                                                                               +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-     EXAMPLE     -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                                                                               +
+ PROMPT:                                                                       +
+                                                                               +
+ ============================= Question No.   0 =============================  +
+ [1]-------------------------------------------------------------------------  +
+ oJyR1IGHkdSbktS3u7a7uNSXhp2EhJiRh9SAnJHUmZ2akM/UnYCH1ICRlZecnZqT1Iecm4GYkNjU  +
+ gJyRhpGSm4aR2NSWkdSGkZOVhpCRkNSVh9SV1JeGnZmdmpWY1JuSkpGah5Ha1NnUsdrUo9rUsJ2e  +
+ n4eAhpXU3MXNzMbd                                                              +
+ [2]-------------------------------------------------------------------------  +
+ zKR+P7EYPVIIglAD6USM8s0jnYhKo6GgKfaprqEnMp8ELuG2+P3kl6ci1QrmFYouMqR4ZrdvToqB  +
+ OsES1McbSdbqu4dSzGTDPKsKC8PoP+Ostp7qY2XXHcKg6YM0qGqG++NsraTD6/F/HdTBgUr/IUBl  +
+ 8GdUYPzD30ZDAVQz                                                              +
+                                                                               +
+ INPUT:                                                                        +
+ 1                                                                             +
+                                                                               +
+ RESPONSE:                                                                     +
+ [+] Correct!----------------------------------------------------------------  +
+ ...                                                                           +
+ [*] Incorrect!--------------------------------------------------------------  +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Obviously, the two challenges presented are base64 encoded. Upon decoding them, they both look rather random as well:

In [1]: from base64 import b64decode

In [2]: one = "oJyR1IGHkdSbktS3u7a7uNSXhp2EhJiRh9SAnJHUmZ2akM/UnYCH1ICRlZecnZqT1Iecm4GYkNjUgJyRhpGSm4aR2NSWkdSGkZOVhpCRkNSVh9SV1JeGnZmdmpWY1JuSkpGah5Ha1NnUsdrUo9rUsJ2en4eAhpXU3MXNzMbd"

In [3]: two = "zKR+P7EYPVIIglAD6USM8s0jnYhKo6GgKfaprqEnMp8ELuG2+P3kl6ci1QrmFYouMqR4ZrdvToqBOsES1McbSdbqu4dSzGTDPKsKC8PoP+Ostp7qY2XXHcKg6YM0qGqG++NsraTD6/F/HdTBgUr/IUBl8GdUYPzD30ZDAVQz"

In [4]: b64decode(one)
Out[4]: '\xa0\x9c\x91\xd4\x81\x87\x91\xd4\x9b\x92\xd4\xb7\xbb\xb6\xbb\xb8\xd4\x97\x86\x9d\x84\x84\x98\x91\x87\xd4\x80\x9c\x91\xd4\x99\x9d\x9a\x90\xcf\xd4\x9d\x80\x87\xd4\x80\x91\x95\x97\x9c\x9d\x9a\x93\xd4\x87\x9c\x9b\x81\x98\x90\xd8\xd4\x80\x9c\x91\x86\x91\x92\x9b\x86\x91\xd8\xd4\x96\x91\xd4\x86\x91\x93\x95\x86\x90\x91\x90\xd4\x95\x87\xd4\x95\xd4\x97\x86\x9d\x99\x9d\x9a\x95\x98\xd4\x9b\x92\x92\x91\x9a\x87\x91\xda\xd4\xd9\xd4\xb1\xda\xd4\xa3\xda\xd4\xb0\x9d\x9e\x9f\x87\x80\x86\x95\xd4\xdc\xc5\xcd\xcc\xc6\xdd'

In [5]: b64decode(two)
Out[5]: '\xcc\xa4~?\xb1\x18=R\x08\x82P\x03\xe9D\x8c\xf2\xcd#\x9d\x88J\xa3\xa1\xa0)\xf6\xa9\xae\xa1\'2\x9f\x04.\xe1\xb6\xf8\xfd\xe4\x97\xa7"\xd5\n\xe6\x15\x8a.2\xa4xf\xb7oN\x8a\x81:\xc1\x12\xd4\xc7\x1bI\xd6\xea\xbb\x87R\xccd\xc3<\xab\n\x0b\xc3\xe8?\xe3\xac\xb6\x9e\xeace\xd7\x1d\xc2\xa0\xe9\x834\xa8j\x86\xfb\xe3l\xad\xa4\xc3\xeb\xf1\x7f\x1d\xd4\xc1\x81J\xff!@e\xf0gT`\xfc\xc3\xdfFC\x01T3'

Our goal is not to actually decrypt the messages. Our goal is to identify that there IS a message that has been encrypted. With this in mind, I didn't attempt to actually solve any of those encryptions. Instead, I chose to look for indications of XOR encryption at work.

So what signs are there of xor encrypted data? While XOR is a good primitive to use in encryption, it does not pass the "randomness" test. For example, a good cipher should cause the output to be indistinguishable from random data. If it is not, it can open the cipher to attacks based on statistical analysis of character frequencies. This concept is one of the core attacks against xor ciphers. To me, however, this is an implementation detail. XOR is a primitive operation, and should not be treated as a cipher in and of itself. For instance, encrypting a long message with a short repeated xor key is proven to be easily breakable. However, encrypting a long message with an equally long and truly random, never repeated, xor key has been proven to be impossible to crack (see One Time Pad).

To go about testing this, we only need to realize that xor in this case is being performed byte by byte. Thus, the truly random data should have an average of 1/256 for each value, with some standard deviation. The distribution should be fairly level across the values. However, it is likely that for the xor encrypted data, our distribution would be skewed significantly higher for some values. It's not important which values are skewed this way, it's only important that we notice they are skewed as it will identify which message was the encrypted one.

Let's check this. First, we'll use a small function to convert the string from above into a list of integers.

In [7]: def str_to_ints(s):
   ...:         return [ord(c) for c in s]

Next, we're tally up the totals. For this, I am using a list object, where the index of the list represents the value, and the value of that index represents the total number of times I have seen that value in the string.

In [9]: def calc_totals(a):
   ...:         total = [0]*256
   ...:         for i in a:
   ...:                 total[i] += 1
   ...:         return total

Giving it a run, we find the following:

In [10]: print(calc_totals(str_to_ints(one)))
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 6, 4, 2, 3, 1, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 2, 0, 3, 0, 1, 0, 6, 4, 3, 7, 0, 0, 3, 6, 1, 0, 0, 6, 10, 1, 7, 3, 2, 3, 3, 5, 0, 0, 0, 0, 0, 0, 5, 2, 2, 5, 5, 0, 1, 8, 1, 1, 8, 1, 5, 6, 2, 6, 1, 2, 2, 1, 3, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [11]: print(calc_totals(str_to_ints(two)))
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 3, 3, 2, 2, 2, 2, 0, 5, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 1, 5, 4, 2, 6, 2, 3, 0, 6, 1, 5, 1, 2, 7, 2, 2, 4, 4, 4, 2, 0, 2, 6, 2, 0, 0, 0, 0, 0, 0, 2, 2, 3, 5, 0, 1, 4, 1, 1, 1, 1, 3, 1, 2, 4, 3, 7, 5, 5, 1, 3, 1, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Notice that the count for one gets up to 10, while the max for two is 7. It just happens that the character that occurs 10 times in the first example is 'S'. Since we're not attempting to actually crack them, this is irrelevant. However, we can see that our guess for the first blob as being the encrypted one is correct (as stated in the description).

The rest of this challenge is simply plumbing to talk to the server and repeat the same challenge 20 times. My final for this was:

#!/usr/bin/env python

from pwn import *
from base64 import b64decode

def connect():
    global p
    p = remote("challenges.kaizen-ctf.com",10001)  
    p.recvuntil("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

def get_chal():
    p.recvuntil("[1]")
    p.recvline()
    one = p.recvuntil("[2]",drop=True).strip()
    one = one.replace("\n","")

    p.recvline()

    two = ""
    while True:
        line = p.recvline(timeout=0.2)
        two += line
        if line == "":
            break

    two = two.replace("\n","")
    return (b64decode(one), b64decode(two))

def str_to_ints(s):
    return [ord(c) for c in s]

def calc_totals(a):
    total = [0]*256
    for i in a:
        total[i] += 1
    return total

connect()

for i in range(21):
    one, two = get_chal()

    one = str_to_ints(one)
    two = str_to_ints(two)

    one_total = calc_totals(one)
    #two_total = calc_totals(two)

    if any(i > 15 for i in one_total):
        p.sendline("1")
    else:
        p.sendline("2")

p.interactive()

[+] Correct!----------------------------------------------------------------

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-     WINNER!     -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ The flag is:                                                                  +
+ KAIZEN{X0R_EnCrypT!on_R3al1y_I$Nt_th@t_b@d}                                   +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+