Overview
When dealing with a lot of javascript, I noticed that there are a fair number of node_module packages that minify and obfuscate their packages. Minification makes sense for javascript you plan to deliver with a webpages and obfuscation makes sense to prevent scanners from easily being able to monkey patch, but I really don't know why this happens before the page is packaged up. In other words, IMHO, all javascript should exist in node_modules as some human readable peice of code (albiet TS/ES7 -> ES5), but human readable none-the-less.
Regardless of the javascript ecosystem, minification and obfuscation isn't really common practice in the Python ecosystem so I took a couple hours to look into methods to accomplish this.
Manual Minification
The Naiive Approach
Naturally, the developer can always generate their own minization of code. For example, lets use the following fibonacci code:
def fib(term):
if term <= 1:
return (term)
else:
return (fib(term-1) + fib(term-2))
# Change this value to adjust the number of terms in the sequence.
number_of_terms = 10
for i in range(number_of_terms):
print(fib(i))
Some obvious fixes would be to:
- Only use single byte labels.
- Only use single byte indentation.
- Remove superfluous newlines and spaces.
Manually minimized version may look like:
def f(t):
if t<=1:
return (t)
else:
return (f(t-1)+f(t-2))
n = 10
for i in range(n):
print(f(i))
What else can we do? Compression?
To compress this peice of code we can use the zlib modules. The thing to know about compression with zlib is that it'll create many unprintable characters. This means that what should be a single byte will become 4 bytes to account for the hex value (i.e \x00
instead of the single byte it is in memory). We can mitigate this slightly by base64 encoding the binary. This allows us to recover some of that wasted space. Base64 also happens to make the string significantly more portable if control characters could be an issue.
Lets assume we want to compress and encode our code above. We would perform that action with something like:
import zlib, base64
print(base64.b64encode(zlib.compress("""
def f(t):
if t<=1:
return (t)
else:
return (f(t-1)+f(t-2))
n = 10
for i in range(n):
print(f(i))
""")))
This would give us the result of:
eJxNjMEKgDAMQ+/9ihw3RHAexX2MYCsFqVLn/9vdzCXhkYR2FkhqeSGooK21RIJze90QnMDnw38W7bHkoducMxkqykRyORRq8M0OTtYPQ7ertZhoND85wR0k
That gobblyguuk is the code! Nice. Now all we need to do it write a small loader to make this blob of code execute:
Note: You can put line breaks into the base64 encoded string without breaking things.
import zlib, base64
exec(zlib.decompress(base64.b64decode("""eJxNjMEKgDAMQ+/9ihw3RHAexX2MYC
sFqVLn/9vdzCXhkYR2FkhqeSGooK21RIJze90QnMDnw38W7bHkoducMxkqykRyORRq8M0OT
tYPQ7ertZhoND85wR0k""")))
After running the code you should get:
0
1
1
2
3
5
8
13
21
34