- Tags: Python
- Date: June 1, 2014
- Jump to comments
Update (November 1, 2017): Added Python 3 support.
A few months ago, I got first place in this Code Golf contest to create the weirdest obfuscated program that prints the string “Hello world!”. I decided to write up an explanation of how the hell it works. So, here’s the entry, in Python 2.7:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(lambda _, __, ___, ____, _____, ______, _______, ________:
getattr(
__import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
().__class__.__eq__.__class__.__name__[:__] +
().__iter__().__class__.__name__[_____:________]
)(
_, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
chr(___ % __) + _(_, __, ___ // __) if ___ else
(lambda: _).func_code.co_lnotab,
_ << ________,
(((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)
- _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<
__) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______
<< ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<
((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<
__) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______
<< (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +
_) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<
(((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<
_))) + (_____ << ______) + (_ << ___)
)
)
)(
*(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[(lambda: _).func_code.co_nlocals])] +
_(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
),
lambda _: _.func_code.co_argcount,
(
lambda _: _,
lambda _, __: _,
lambda _, __, ___: _,
lambda _, __, ___, ____: _,
lambda _, __, ___, ____, _____: _,
lambda _, __, ___, ____, _____, ______: _,
lambda _, __, ___, ____, _____, ______, _______: _,
lambda _, __, ___, ____, _____, ______, _______, ________: _
)
)
)
String literals weren’t allowed, but I set some other restrictions for fun: it
had to be a single expression (so no print
statement) with minimal builtin
usage and no integer literals.
Getting started
Since we can’t use print
, we can write to the stdout
file object:
But let’s use something lower-level:
os.write()
. We need
stdout
’s file descriptor, which is
1
(you can check with print sys.stdout.fileno()
).
We want a single expression, so we’ll use
__import__()
:
We also want to be able to obfuscate the write()
, so we’ll throw in a
getattr()
:
This is the starting point. Everything from now on will be obfuscating the three strings and the int.
Stringing together strings
"os"
and "write"
are fairly simple, so we’ll create them by joining parts
of the names of various built-in classes. There are many different ways to do
this, but I chose the following:
"o"
from the second letter ofbool
:True.__class__.__name__[1]
"s"
from the third letter oflist
:[].__class__.__name__[2]
"wr"
from the first two letters ofwrapper_descriptor
, an implementation detail in CPython found as the type of some builtin classes’ methods (more on that here):().__class__.__eq__.__class__.__name__[:2]
"ite"
from the sixth through eighth letters oftupleiterator
, the type of object returned by callingiter()
on a tuple:().__iter__().__class__.__name__[5:8]
We’re starting to make some progress!
1
2
3
4
5
getattr(
__import__(True.__class__.__name__[1] + [].__class__.__name__[2]),
().__class__.__eq__.__class__.__name__[:2] +
().__iter__().__class__.__name__[5:8]
)(1, "Hello world!\n")
"Hello world!\n"
is more complicated. We’re going to encode it as a big
integer, which will be formed of the ASCII code of each character multiplied by
256 to the power of the character’s index in the string. In other words, the
following sum:
where \(L\) is the length of the string and \(c_n\) is the ASCII code of the \(n\)th character in the string. To create the number:
Now we need the code to convert this number back into a string. We use a simple recursive algorithm:
Rewriting in one line with lambda
:
Now we use anonymous recursion to turn this into a single expression. This requires a combinator. Start with this:
Now we just substitute the two definitions into the expression, and we have our function:
Now we can stick this into our code from before, replacing some variable names
along the way (f
→ _
, n
→ __
):
1
2
3
4
5
6
7
8
9
10
getattr(
__import__(True.__class__.__name__[1] + [].__class__.__name__[2]),
().__class__.__eq__.__class__.__name__[:2] +
().__iter__().__class__.__name__[5:8]
)(
1, (lambda _, __: _(_, __))(
lambda _, __: chr(__ % 256) + _(_, __ // 256) if __ else "",
802616035175250124568770929992
)
)
Function internals
We’re left with a ""
in the body of our convert function (remember: no string
literals!), and a large number that we’ll have to hide somehow. Let’s start
with the empty string. We can make one on the fly by examining the internals of
some random function:
What we’re really doing here is looking at the
line number table
of the code
object contained within the function. Since it’s anonymous, there
are no line numbers, so the string is empty. Replace the 0
with _
to make
it more confusing (it doesn’t matter, since the function’s not being called),
and stick it in. We’ll also refactor out the 256
into an argument that gets
passed to our obfuscated convert()
along with the number. This requires
adding an argument to the combinator:
1
2
3
4
5
6
7
8
9
10
11
12
13
getattr(
__import__(True.__class__.__name__[1] + [].__class__.__name__[2]),
().__class__.__eq__.__class__.__name__[:2] +
().__iter__().__class__.__name__[5:8]
)(
1, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
chr(___ % __) + _(_, __, ___ // __) if ___ else
(lambda: _).func_code.co_lnotab,
256,
802616035175250124568770929992
)
)
A detour
Let’s tackle a different problem for a bit. We want a way to obfuscate the
numbers in our code, but it’ll be cumbersome (and not particularly interesting)
to recreate them each time they’re used. If we can implement, say,
range(1, 9) == [1, 2, 3, 4, 5, 6, 7, 8]
, then we can wrap our current work in
a function that takes variables containing the numbers from 1 to 8, and replace
occurrences of integer literals in the body with these variables:
1
2
3
4
5
6
7
8
(lambda n1, n2, n3, n4, n5, n6, n7, n8:
getattr(
__import__(True.__class__.__name__[n1] + [].__class__.__name__[n2]),
...
)(
...
)
)(*range(1, 9))
Even though we need to form 256
and 802616035175250124568770929992
as well,
these can be created using arithmetic operations on these eight “fundamental”
numbers. The choice of 1–8 is arbitrary, but seems to be a good middle ground.
We can get the number of arguments a function takes via its code
object:
Build a tuple of functions with argcounts between 1 and 8:
1
2
3
4
5
6
7
8
9
10
funcs = (
lambda _: _,
lambda _, __: _,
lambda _, __, ___: _,
lambda _, __, ___, ____: _,
lambda _, __, ___, ____, _____: _,
lambda _, __, ___, ____, _____, ______: _,
lambda _, __, ___, ____, _____, ______, _______: _,
lambda _, __, ___, ____, _____, ______, _______, ________: _
)
Using a recursive algorithm, we can turn this into the output of range(1, 9)
:
As before, we convert this into lambda
form:
Then, into anonymous-recursive form:
For fun, we’ll factor out the argcount operation into an additional function argument, and obfuscate some variable names:
1
2
3
4
5
6
7
(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[0])] + _(_, __, ___[1:]) if ___ else []
),
lambda _: _.func_code.co_argcount,
funcs
)
There’s a new problem now: we still need a way to hide 0
and 1
. We can get
these by examining the number of local variables within arbitrary functions:
Even though the function bodies look the same, _
in the first function is not
an argument, nor is it defined in the function, so Python interprets it as a
global variable:
This happens regardless of whether _
is actually defined in the global scope.
Putting this into practice:
1
2
3
4
5
6
7
8
(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[(lambda: _).func_code.co_nlocals])] +
_(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
),
lambda _: _.func_code.co_argcount,
funcs
)
Now we can substitute the value of funcs
in, and then using *
to pass the
resulting list of integers as eight separate variables, we get this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
(lambda n1, n2, n3, n4, n5, n6, n7, n8:
getattr(
__import__(True.__class__.__name__[n1] + [].__class__.__name__[n2]),
().__class__.__eq__.__class__.__name__[:n2] +
().__iter__().__class__.__name__[n5:n8]
)(
n1, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
chr(___ % __) + _(_, __, ___ // __) if ___ else
(lambda: _).func_code.co_lnotab,
256,
802616035175250124568770929992
)
)
)(
*(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[(lambda: _).func_code.co_nlocals])] +
_(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
),
lambda _: _.func_code.co_argcount,
(
lambda _: _,
lambda _, __: _,
lambda _, __, ___: _,
lambda _, __, ___, ____: _,
lambda _, __, ___, ____, _____: _,
lambda _, __, ___, ____, _____, ______: _,
lambda _, __, ___, ____, _____, ______, _______: _,
lambda _, __, ___, ____, _____, ______, _______, ________: _
)
)
)
Shifting bits
Almost there! We’ll replace the n{1..8}
variables with _
, __
, ___
,
____
, etc., since it creates confusion with the variables used in our inner
functions. This doesn’t cause actual problems, since scoping rules mean the
right ones will be used. This is also one of the reasons why we refactored
256
out to where _
refers to 1
instead of our obfuscated convert()
function. It’s getting long, so I’ll paste only the first half:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(lambda _, __, ___, ____, _____, ______, _______, ________:
getattr(
__import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
().__class__.__eq__.__class__.__name__[:__] +
().__iter__().__class__.__name__[_____:________]
)(
_, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
chr(___ % __) + _(_, __, ___ // __) if ___ else
(lambda: _).func_code.co_lnotab,
256,
802616035175250124568770929992
)
)
)
Only two more things are left. We’ll start with the easy one: 256
.
\(256 = 2^8\), so we can rewrite it as 1 << 8
(using a
left bit shift), or _ << ________
with our
obfuscated variables.
We’ll use the same idea with 802616035175250124568770929992
. A simple
divide-and-conquer algorithm can break it up into sums of numbers which are
themselves sums of numbers that are shifted together, and so on. For example,
if we had 112
, we could break it up into 96 + 16
and then
(3 << 5) + (2 << 3)
. I like using bit shifts because the <<
reminds me of
std::cout << "foo"
in C++, or
print
chevron
(print >>
) in Python, both of which are red herrings involving other ways of
doing I/O.
The number can be decomposed in a variety of ways; no one method is correct
(after all, we could just break it up into (1 << 0) + (1 << 0) + ...
, but
that’s not interesting). We should have some substantial amount of nesting, but
still use most of our numerical variables. Obviously, doing this by hand isn’t
fun, so we’ll come up with an algorithm. In pseudocode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
func encode(num):
if num <= 8:
return "_" * num
else:
return "(" + convert(num) + ")"
func convert(num):
base = shift = 0
diff = num
span = ...
for test_base in range(span):
for test_shift in range(span):
test_diff = |num| - (test_base << test_shift)
if |test_diff| < |diff|:
diff = test_diff
base = test_base
shift = test_shift
encoded = "(" + encode(base) + " << " + encode(shift) + ")"
if diff == 0:
return encoded
else:
return encoded + " + " + convert(diff)
convert(802616035175250124568770929992)
The basic idea here is that we test various combinations of numbers in a
certain range until we come up with two numbers, base
and shift
,
such that base << shift
is as closest to num
as possible (i.e. we minimize
their absolute difference, diff
). We then use our divide-and-conquer
algorithm to break up best_base
and best_shift
, and then repeat the
procedure on diff
until it reaches zero, summing the terms along the way.
The argument to range()
, span
, represents the width of the search space.
This can’t be too large, or we’ll end getting num
as our base
and 0
as
our shift
(because diff
is zero), and since base
can’t be represented as
a single variable, it’ll repeat, recursing infinitely. If it’s too small, we’ll
end up with something like the (1 << 0) + (1 << 0) + ...
mentioned above. In
practice, we want span
to get smaller as the recursion depth increases.
Through trial and error, I found this equation to work well:
Translating the pseudocode into Python and making some tweaks (support for the
depth
argument, and some caveats involving negative numbers), we get this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from math import ceil, log
def encode(num, depth):
if num == 0:
return "_ - _"
if num <= 8:
return "_" * num
return "(" + convert(num, depth + 1) + ")"
def convert(num, depth=0):
result = ""
while num:
base = shift = 0
diff = num
span = int(ceil(log(abs(num), 1.5))) + (16 >> depth)
for test_base in xrange(span):
for test_shift in xrange(span):
test_diff = abs(num) - (test_base << test_shift)
if abs(test_diff) < abs(diff):
diff = test_diff
base = test_base
shift = test_shift
if result:
result += " + " if num > 0 else " - "
elif num < 0:
base = -base
if shift == 0:
result += encode(base, depth)
else:
result += "(%s << %s)" % (encode(base, depth),
encode(shift, depth))
num = diff if num > 0 else -diff
return result
Now, when we call convert(802616035175250124568770929992)
, we get a nice
decomposition:
Stick this in as a replacement for 802616035175250124568770929992
, and put
all the parts together:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(lambda _, __, ___, ____, _____, ______, _______, ________:
getattr(
__import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
().__class__.__eq__.__class__.__name__[:__] +
().__iter__().__class__.__name__[_____:________]
)(
_, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
chr(___ % __) + _(_, __, ___ // __) if ___ else
(lambda: _).func_code.co_lnotab,
_ << ________,
(((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)
- _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<
__) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______
<< ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<
((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<
__) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______
<< (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +
_) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<
(((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<
_))) + (_____ << ______) + (_ << ___)
)
)
)(
*(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[(lambda: _).func_code.co_nlocals])] +
_(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
),
lambda _: _.func_code.co_argcount,
(
lambda _: _,
lambda _, __: _,
lambda _, __, ___: _,
lambda _, __, ___, ____: _,
lambda _, __, ___, ____, _____: _,
lambda _, __, ___, ____, _____, ______: _,
lambda _, __, ___, ____, _____, ______, _______: _,
lambda _, __, ___, ____, _____, ______, _______, ________: _
)
)
)
And there you have it.
Addendum: Python 3 support
Since writing this post, several people have asked about Python 3 support. I didn’t think of it at the time, but as Python 3 continues to gain traction (and thank you for that!), this post is clearly long overdue for an update.
Fortunately, Python 3 (as of writing, 3.6) doesn’t require us to change much:
- The
func_code
function object attribute has been renamed to__code__
. Easy fix with a find-and-replace. - The
tupleiterator
type name has been changed totuple_iterator
. Since we use this to extract the substring"ite"
, we can get around this by changing our indexing in().__iter__().__class__.__name__
from[_____:________]
to[_:][_____:________]
. os.write()
requiresbytes
now instead of astr
, sochr(...)
needs to be changed tobytes([...])
.
Here is the full Python 3 version:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(lambda _, __, ___, ____, _____, ______, _______, ________:
getattr(
__import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
().__class__.__eq__.__class__.__name__[:__] +
().__iter__().__class__.__name__[_:][_____:________]
)(
_, (lambda _, __, ___: _(_, __, ___))(
lambda _, __, ___:
bytes([___ % __]) + _(_, __, ___ // __) if ___ else
(lambda: _).__code__.co_lnotab,
_ << ________,
(((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)
- _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<
__) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______
<< ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<
((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<
__) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______
<< (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +
_) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<
(((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<
_))) + (_____ << ______) + (_ << ___)
)
)
)(
*(lambda _, __, ___: _(_, __, ___))(
(lambda _, __, ___:
[__(___[(lambda: _).__code__.co_nlocals])] +
_(_, __, ___[(lambda _: _).__code__.co_nlocals:]) if ___ else []
),
lambda _: _.__code__.co_argcount,
(
lambda _: _,
lambda _, __: _,
lambda _, __, ___: _,
lambda _, __, ___, ____: _,
lambda _, __, ___, ____, _____: _,
lambda _, __, ___, ____, _____, ______: _,
lambda _, __, ___, ____, _____, ______, _______: _,
lambda _, __, ___, ____, _____, ______, _______, ________: _
)
)
)
Thank you for reading! I continue to be amazed by this post’s popularity.