De-obfuscate MACRO abuse in C
What exactly the title of this blog means? We are going to see a ‘messy’ code, that uses ‘unnecessary’ MACROS, then try to understand what is going on and make it simpler.
For that purpose, I’m taking the 1986 award-winning IOCCC code of Jim Hague. The International Obfuscated C Code Contest (IOCCC), as its name implies, is a contest that chooses between a certain number of participants the most obscure or complicated code possible in C. It sounds a little tricky, but if you want to know more about this contest please go to this link.
For a better understanding of this blog, you should have some knowledge about the compilation process in C (especially understanding what the preprocessor stage do) and about the usage MACROS in C programing language.
Now let’s see Jim Hague’s code:
If you want to see the original file please go to this link. The code is really hard to read because of the messy indentation, the confusing names of the MACROS, reading it this way is a really hard task. Then we should try to arrange it a little bit better, but… Does it even work? Well, let’s try to compile it.
It compiled!… but with some warnings as expected because it is the purpose of the contest, obfuscate the compiler and the humans.
Now let’s execute it.
Note that after execution, I typed ‘Hello, World’ (and other words) and Return, then the program printed a set of ‘.’ and ‘-’, spaces and ‘?’. What does it mean? it is morse code!!. In this link, you can translate the morse code. But note that the ‘?’ must be ignored.
‘Hello,’ => ‘…. . .-.. .-.. — — — .. — ’ in morse code
So we already know what the program does, it is a morse code translator, but let’s do the code simpler and more understandable.
De-obfuscating or Untangling the code:
As you know at the pre-processor stage in the compilation process, it replaces tokens in the text, it means, it expands the macro names by the corresponding invocation. When compiling to the pre-processor stage we can see clearly what is going on. If we do it with the
gcc -E command, we get the following code:
Well, the pre-processor swap some macros, so we can see some things little bit clearer but still is a messy code, so I’m trying to sort it up…
Now we can see, some loops, a char array, functions and more, but still it is not easy to read, because names can be easily confused. How we fix it? By reidentifying the variables and functions in a readable way, in the code you can see:
_DIT, DAH_, DIT_, _DIT_ those are variables type *char. And _DAH_ is an external char array.
__DIT, _DAH those appear to be functions
Changing the identifiers, this is what you can get:
In case you are wondering, the compiler prompt warnings every time I try to compile. Also if you want to know, I’m using emacs for editing.
Understanding the program:
char c is a long string which contains ASCII characters.
We have the function
func_1 , it looks like the
putchar function, this function print characters, one at a time to the stdout, it uses integers as an argument that are used in ASCII code characters.
func_2 looks like a recursive, it will call the function again as long as the argument is a number that takes more than 2 bits to write. The return of it will be part of the argument of printing function. This function converts to morse code.
main function: the first or outer loop creates a buffer and reads a line from the standard input git
gets(). Also, the loop assigns an address to
var_3 which will be used in the inner loop. After completion of all loops, even the inner ones, will be printed
'\n' what means a new line.
The second or middle loop, it will iterate through the whole previously obtained string. Meanwhile, checks if the character is in
char c if so it calls the conversion function
func_2, it will print with
func_1 as seen above or
'?' and will add space.
Finally, understanding a little bit how morse works, it is like binary code so it uses 1 and 0, the program will mask this with
As you can see, this insane task to read the code as given, and Hague needed lots of creativity to do this kind of code, It was really hard to untangle, but personally, I think the code deserves a second and deeper look.
Thank you for reading the blog, and please use clear names for variables, try to indent as good as you can, also make clear comments, and make things simple and powerful.