Developers attempt to obscure or obfuscate any reverse engineers view of strings through the use of compile-time encryption techniques, packing, virtualization, and the like. In this post, I briefly describe a method to prevent strings from entering the .rdata segment of the executable by using a method of indirect initialization forcing the various characters and their initialization subroutines to stay in the executable (.text) segment. At the end of the disassembly, I will post the source for how to achieve this.
In the following screenshot, you’ll notice two strings displayed in console and only one visible and commented in the disassembly.
The second string IAmTotallyVisible is clearly automatically commented, and visible to the reverse engineer without any exhausted effort. The first string, or should I say the pointer to the first character, MessageBoxA is loaded into rdx from var_28. If there were to be no visible console output or otherwise the reverse engineer may believe this to be interleaved code and move on thus deterring him/her from continuing his search.
We’ll act as normal and figure out what the first string is and why it isn’t loaded into the initialized data segment.
I went ahead and dug through the disassembly and provided some comments which I will explain following the screenshot.
Starting from the top I changed var_28 to Buffer since it is being passed as some form of buffer to printf. Buffer is initialized to 0, from there we notice that an offset to structure that is global (it’s in the .data segment) being loaded into rdi, and as we move down we suddenly notice a rather interesting instruction.
call qword ptr [rdi]
This instruction tells at that whatever was initially stored in rdi is a function pointer. It is calling this function, storing the return result into eax (or ax, depending on the size), incrementing the counter, and continuing until rbx is equal to 12 (0xC). If we go back up to the off_140003040 and verify the contents we’ll see the following…
It’s obvious now that this structure contains a load of function pointers, which we can only assume are the characters for first string displayed in the console.
This behavior and method of initialization forces the strings to be created at runtime and invisible to the disassembler (by means of looking in the string table and for xrefs.) How all of this is achieved is quite simple. I’ve written my own code generator that creates the to-be-defined structures and macros for quick usage.
First, we begin by creating a function pointer that is to be used for calling a function that returns a character and takes no parameters… like so:
After doing this we define our getter functions that return various characters of the string to be initialized.
The naming and order of these functions are arbitrary, you can use any naming convention you’d like. This is just for the sake of example. After we define our getter functions we create a universal structure to hold our function pointers.
And now we create an array of function pointers to be iterated and assigned.
The size 0xC is the length of the string in characters and thus the number of function pointers to be used and called when it becomes time to iterate and assign the return values. The code generator, if you decide to write one, should calculate the length and set it automatically.
Finally, we create our universal initialization function that iterates until the counter reaches the desired length, calling GetChar() for each element and assigning it to the appropriate element in the buffer. We call the initialization function with the proper arguments, and you have a halfway-decent method of indirectly initializing strings and preventing dead-listings from populating the disassembly with string references.
I’m positive there are many other better solutions to this problem, however, I haven’t seen anyone actively attempt to prevent strings from being placed in the initialized data segment and wanted to spark some creativity in readers. When all is said and done and you’ve written a solid code generator, you can use the InitializeHiddenString routine to initialize any strings you wish to be out of sight.
I have mine create macros that are generated with the structure name, the buffer, and length preprocessed so I can call the function like so:
InitializeHiddenString( STR_MESSAGEBOXA_HDN );
NOTE: I’ve realized that in the disassembly, with compiler optimizations favoring speed, that the initialization functions are reused when characters are. For example, MessageBoxA in plain-text becomes MesagBoxA when the reverse engineer rebuilds the string from return values.
I don’t know why they’d waste time doing so when they can grab the strings at runtime after initialization, but … the more you know.
Any questions, comments, or ideas are welcome.
Thanks for taking the time to read!