Overview
Thanks for joining me in my newest series Applied Reverse Engineering. I decided to write this new series concurrently with the EPT series except I pushed out the first five for this one and haven’t started the other. Typical. Anyways, I have to give a little preface to the article and series as well as a disclaimer. This article is going to cover the basics of microarchitecture – namely the things that apply when reverse engineering something. We’ll cover a few pieces of the microarchitecture that will make learning assembly a little less confusing. This includes the general purpose registers, the processor state flag, the ISA, virtual memory, and a quick overview of the execution of a process on the Intel 64 architecture.
Here’s the disclaimer: I’m assuming you have programming experience in a compiled language such as Rust, C, C++, and so on. If you don’t, but are interested in following this series then I encourage you to take the time to learn the fundamentals of one of those languages. Understanding the high level constructs will help you identify them in a low level environment. We’re going to be working directly with assembly right from the start, so if you’re squeamish with details this may not be for you. There’s a lot to learn, so I’ve broken this series up into many parts in an order I feel appropriate for learning about the architecture and applying it to software reverse engineering.
All projects and examples given in this series (and article) are written to run on Windows 10 x64 (Version 1903 Build 18362). The architecture referenced is the Intel 64 architecture, though most everything still applies if you’re on AMD64. If you’re on a different architecture or operating system make sure to consult the proper specifications to learn about them. You can still take what you learn here and apply it to other systems. Be sure to consult the recommended reading section when confused, or looking for more information!
All that being said, let’s put the rubber on the road and get goin’.
High Level Introduction
If you’ve worked in a high level language and taken some form of computer systems course in a formal institution then you may be familiar with the compilation process and how executables actually run. However, if you haven’t more than dabbled with assembly or heard it’s name then we’re going to cover the process of a simple C program and how it executes on the processor. For the breakdown of the C program I’ve disabled all optimizations, enabled full debug information, and disabled some other settings. I’ll provide the link to the repo that all of the future projects for this series will be posted. You can pull the solutions and pop them in Visual Studio 2019, compile and follow along. We’re going to take a simple C program I wrote which calls a Windows API to get the computer name, implemented a custom strlen function, and prints out the computer name and resulting name length. We’re not focused on the complexity of it – I want it to be as simple as possible so that when we break it down to the spooky assembly nobody runs screaming for the hills. (If you see SIMD instructions in assembly, it makes you want to do that sometimes.)
Let’s cover the compilation process briefly.
— The Compilation Process
The C compilers job is to perform preprocessing of the project, compile, and link the executable. This means that include files, preprocessor directives (macros), and other conditional compilation instructions are handled. This is it’s first pass. The compilation process typically involves 4 major stages and uses a variety of tools – notably, the compiler, assembler, and linker. The second pass of the process is compilation. It takes the output of the preprocessing phase and source and generates an assembler source. It’s worth noting that some compilers use an integrated assembler which generates machine code versus some intermediate representation then invoking an assembler. Which leads us to the next part of the compilation process: assembly. During this stage, an assembler is used to translate assembly instructions into object code. The output is the instructions that are run directly on the target processor – all of which are (or should be) part of the ISA. The final part of the compilation process is linking. Now that we have an object code generated we have to reorder pieces of the program to produce an executable program that functions properly. The linker arranges the various parts of the object code so that functions can invoke functions in different regions. This stage also links the libraries used in a program so that the program can make use of those library functions. In the case of our C program, kernel32.lib will be added to the object code by the linker to invoke GetComputerNameA.
Here’s a visual representation of the compilation process.
Image taken from stackoverflow.com
Now that we’ve covered the compilation process we’re going to take our C program, generate some assembly listings, and take a peek. If you find yourself interested in compilation/compilers, there’s more detailed reading in the recommended reading section.
— Compiled Binary Breakdown
The compiled binary we’re going to breakdown is the C program that was mentioned above. It’s nothing special, this is just to get a taste of what lies underneath the source of the C program.
The above is the C program. It gets our physical computer name, and prints it out along with our computer name length (which is just the number of bytes copies back into our name buffer.) Now let’s compile this, but with some changes to the project settings so we can generate an assembly listing. To make things easier to understand we’re going to disable all optimizations, and remove all debug information. To enable the generation of an assembly listing we’ll go to our Project Settings > C/C++ > Output Files and change the following:
This assembler output will be placed in our project directory. Let’s hit F7 and build this then open up the assembler output. The output shown below may be quite unfamiliar if you’re new to assembly. You may also recognize some instructions and mnemonics (just a representation to identify operations). Let’s take a look…
The listing that generates is a bit longer, so I went ahead and picked out the piece we wanted: the main
function. Looking at this may seem like chinese, even with the various identifiers that were added. In the above assembly excerpt, every one of the lines is an instruction (excluding the comments and main PROC/ENDP.) In the x86 architecture there are hundreds of instructions and sometimes tens of variations of those instructions. Those blue keywords are what we refer to as a mnemonic. Upon looking quickly we see groupings of instructions and some pretty generic mnemonics like mov
(a store operation), and call
(a function invocation). Let’s look at some of the patterns. You’ll see a bunch of operations (each line is a single operation) some reusing the first operands of others. These are called registers, but more on that in a minute. Each one of the lines in this excerpt is a single operation to load, store, or modify data; or call a function. You don’t need to know what all of these mean or what their function is, we’ll cover that in due time. For now, just realize that the 9 line C program we wrote translated to over 25 lines in assembly that are then run directly on your processor to perform billions of operations per second. Interesting, right?
What you just looked at was your (maybe) first glimpse of x64 assembly. In most reverse engineering projects you won’t have the luxury of having prenamed functions and references to strings. We’ll learn how to deal with that as well in this series. However, now that you’ve had your first taste of low level code, let’s learn about some of the fundamentals of the architecture that will help you more easily understand the excerpt above.
The Microarchitecture
You’ve probably heard the word microarchitecture tossed around before with varying degrees of understanding, but to more formally define it for this series a microarchitecture is all the digital logic that allows an instruction set to be executed. It’s the combination of memory, ALUs, registers, logic gates, and so on. When you combine all of these components you wind up with a processor – the digital unit responsible for performing basic arithmetic, input/output operations, and many others. In any processor, even the most basic, you’ll have a register file, an ALU, some form of close to processor memory (a cache), and a unit that allows the processor to make decisions based on an instruction it’s executing (branch predictor). The component we need to cover first on the journey through the architecture is the register file.
If you recall a lot of the operands of those instructions are what are known as registers. Don’t know what I mean? After this section you will.
A quick side note, operands refer to the data being operated on. Some instructions have one, two, or three operands. They’re always referred to from left to right. Take line 19 in the assembly excerpt – xor eax, eax
– the two operands are eax
(operand 1) and eax
(operand 2). Both of those operands also happen to be CPU registers.
Anyways, let’s keep moving so that more of this stuff starts making sense.
— The Register File
Every processor has to perform operations on data and that data usually has to be stored temporarily. This is the purpose of a processor’s register file. The register file is an array (or bank) of processor registers used to store information and subsequently operate on that information. If you’ve taken a computer systems course or read literature regarding system memory versus on chip memory then you know the latter is much faster. Typically, the processor will retrieve information from memory that is relevant to an instruction sequence and store it in a register to operate on that data. If it had to reach out to physical memory for each operations modern systems would be orders of magnitude slower. If you don’t know what a register is think of it as an empty slot with an identifier that’s stored in SRAM on your processor. Each slot is filled with data and various instructions can perform operations on that slot before writing it back to memory or storing it in another register.
For this series, we’re only concerned with the registers relevant to our target architecture (Intel 64). On the Intel 64 architecture the register file contains 16 general purpose registers, each register being 64-bits in size. There are various other registers worth noting, but not until later in this series. The sizes for each of these registers is usually referred to use of terms such as word, doubleword, quadword, etc. A word on the Intel architecture is 16-bits, a doubleword is 32-bits and a quadword is 64-bits. Their sizes can be denoted by size in bytes as well being 2, 4, and 8 bytes, respectively. To be thorough, there are two bytes in a word. Commonly referred to as the high byte, and low byte. We’ll be referencing a lot of this terminology in the next subsection covering these general purpose registers.
— Register Fundamentals
In the previous subsection I mentioned 16 general purpose registers. These general purpose registers are used by the microarchitecture to perform basic data movement, control flow operations, string operations, and so on. You’ll encounter them every time you look at a dead-listing (static disassembly) of an object or debugger. If you recall, we looked at an excerpt of an assembly listing which performed quite a few operations for the simplicity of the application, but more importantly it referenced general purpose registers on almost every line.
Well, you know what a register file is, and that each slot (register) has an identifier. What are these identifiers? I put together a table of the general purpose registers, and if you’re unfamiliar and it looks more confusing than my explanation don’t worry – I’ll break it down as much as necessary.
The image displayed above is a table of the 64-bit general purpose registers and their layout. You might recognize some of the register names from our assembly excerpt. To explain this, there are 16 general purpose registers. Each register on the 64-bit architecture is 64-bits wide. However, on 32-bit architectures there were only 8 general purpose registers. Those registers were the low 32-bit sections of the 64-bit general purpose registers. For instance, in 32-bit architectures, RAX
is reduced to a 32-bit general purpose register and becomes EAX
. To maintain compatibility with 32-bit architectures the 32-bit general purpose registers were extended to 64-bits. In addition to this size extension the 64-bit architecture added 8 more general purpose registers – those being the general purpose registers R8 to R15. You can still, and will frequently, access the lower portions of registers. This can be confusing for a first timer, but back in the old days of 16-bit architectures there wasn’t an RAX
, or EAX
. It was just AX
.
If you recall, the sizes of data types we’re concerned with goes byte (8-bits), word (16-bits), doubleword (32-bits), and quadword (64-bits). AX
in the example just mentioned is a register with the size of a word. In the 64-bit architecture we’re able to use these register mnemonics to access specific portions of the whole register. If we have an operation on EAX
such as xor eax, 10000539h
and only want to look at the low word of the register value following the xor
we could use AX
to see 0539h
.
Example:
xor eax, 10000539h ; xor eax with 0x10000539 mov var, ax ; var = 0x0539
On the other hand if we looked at RAX
it the value would be zero extended to 64-bits (which means all bits above the 31st would be set to 0.) All this just means that the different sized portions of a general purpose register can be accessed using the mnemonic devices shown in the image. For the additional general purpose registers introduced in 64-bit architecture (R8-R15) you’ll use the register names shown for the R8
register, but substituting the number in the diagram for the target register number.
Note: Accesses that reference the legacy portions of these 64-bit general purpose registers do not affect the respective upper bits. A store to the low word of one does not affect the upper 48 bits.
You might have also noticed a register not mentioned, RIP
. This register is referred to as the instruction pointer register. It contains the offset in the current code for the next instruction to be executed. It increments after each execution by the size of the previous instruction (or from instruction boundary to the next). Some instructions can determine wether RIP
will move forward or backward – these instructions are called conditional instructions. We’ll cover them in the future, for now it’s just important to understand that RIP
points to the next instruction and moves based on the type of instruction executed.
You can read a more technical description of the general purpose registers in the Intel SDM Vol 1. Chapter 3.4.1 or the AMD64 Architecture Programming Reference. To recap what’s important to know is that the processor uses them to temporarily store data to operate on, and we will frequently access these registers. Now that they’ve been covered, a special processor register needs to be addressed.
— Processor State Flag Register
Commonly called the EFLAGS register, in 64-bit mode it’s often called the RFLAGS register. You may have also heard it called the current program status register (CPSR). This is a 32-bit register that contains a number of flags related to the state of the processor while executing the current program. Some flags are used to control where branching instructions go, and some are used to control OS related operations. A small group of status flags are affected by the results of arithmetic operations like addition, substraction, and so on. I’m only going to cover the main status flags we’ll run into. The rest of the flags have definitions in their respective manuals, and I advise anyone looking to fully understand these topics to go through the recommended reading. Anyways, depicted below is a figure from the Intel SDM of the layout of the EFLAGS register.
The above is the layout of the EFLAGS register. We’re going to quickly cover the status flags (indicated by S) and we’ll conclude with a few examples of how these related to code at a high level.
The Zero Flag (ZF – Bit 6)
The zero flag is a status flag that is only set if the result of an arithmetic operation is 0. There are certain conditional instructions (meaning they are based on the state of status flags) that will only be taken or perform an operation if the zero flag is set. We introduce all of these in the Accelerated Assembly part of this series.
To provide a high level example take the following code:
int Integer0, Integer1; Integer0 = 510; Integer1 = 511; if ((Integer0 - Integer1) == 0) printf("ZF set, execute this block.\n"); else printf("ZF not set, execute this block.\n");
We have two integers, one is set to 510 and one to 511. If we subtract the two we wind up with the obvious answer of -1. To start easing you into thinking about things in terms of assembly consider these two integers being stored in some general purpose register. For this exercise, they can’t be in the same register. Now, we know we have two registers one with 510 stored and one with 511. We’re going to perform a subtraction on them and then compare their result to 0. Since the result of the subtraction isn’t discarded we’re going to use it to determine which block of code to execute (the printf’s). In this instance the processor will execute the respective instructions to complete the operation, set the zero flag in the status register, and then execute some conditional instruction that will pick the appropriate block to execute based on the status of the zero flag!
Let me translate this to assembly to help you wrap your head around it.
mov rax, 510 mov rbx, 511 sub rax, rbx jnz zf_not_set lea rcx, offset zf_set_string call printf jmp end .zf_not_set: lea rcx, offset zf_not_set_string call printf .end: ret
Alright, don’t run just yet. This is a lot simpler than it looks. If we recap the logic I walked through above you’ll remember we put 510 and 511 in a general purpose register – I chose RAX
and RBX
for simplicity. Then we perform a subtraction using the sub
instruction. Now here’s what’s interesting, assuming you know very little if anything about assembly, some instructions will set the status flags based on their result saving the need for using some compare instruction like cmp
. If the result of the sub
instruction is 0 it will set ZF in the EFLAGS register. Neat! Now, the jnz
instruction might be obvious to you – the mnemonic simply expands to jump if NOT zero. This means that if the zero flag is 0, or not set, that the jump will be taken. The label zf_not_set
is what’s called the jump target. This means that if the ZF is not set (the result of the sub
was not 0) the instruction pointer (RIP) will be set to the first instruction underneath our label .zf_not_set
. Continuing execution, we load an offset to a string into rcx
using lea
, and call our printf function, and then return from the function.
I realize there’s a number of things not yet covered here, but I’m hoping as I trickle information to you that when the details are covered later you’ll be able to draw back to our examples and clear up any confusion you may have! We haven’t covered the stack, or calling conventions (ex: lea rcx, offset string
) but the article following this one does it in great detail and explains how various instructions affect the stack and how arguments are passed to functions on invocation.
Speaking of instructions and things unknown there’s a reference for all the instructions in the Intel 64 and IA-32 architectures. This manual includes a 60 page run down of the instruction format (you don’t have to read that if you don’t want), and an alphabetized reference of every instruction with its various forms, a description, pseudo-code, flags it affects (if any), and exceptions it can generate. This is your bible for this series. If we encounter an instruction you’re unfamiliar with I urge you to open the instruction manual and look it up, read the description and relevant information, and you’ll begin to develop an understanding of assembly where you’ll know which flags are affected like the back of your hand.
I’ll cover key instruction sequences once we get into the disassembly and debugging sections, so for any details I may leave out: consult the instruction manual.
If you haven’t noticed, learning assembly is very hands on and learn as you go. There’s no way to learn every possible instruction prior to working with it, so just know that if there’s things you don’t know there’s still things people who have been doing it for a decade don’t know. Before we switch gears to a whole other topic entirely let’s cover the last two flags that we’re going to be concerned with.
The Sign Flag (SF – Bit 7)
The sign flag is used in signed operations, and will be set equal to the most significant bit of the result – which happens to be the sign bit of a signed data type. 0 is positive and 1 is negative. I put together another example with the assembly translation below. Remember, your exercise is to start thinking of things in terms of assembly.
int Integer0, Integer1; Integer0 = 1; Integer1 = 1000; if ((Integer0 - Integer1) < 0) printf("SF set, execute this block.\n"); else printf("SF not set, execute this block.\n"); return 0;
In this example we’re taking two signed integers and subtracting the larger one from the smaller integer to change the sign. I constructed the condition in the if statement to massage the compiler to place a jump that will be based on the sign flag. As in the previous example, if the SF flag is set that means that the result of this operation is negative (because 1 in the most significant bit of a signed integer indicates negative.) Below is the assembly translation – walk through it.
mov rax, 1 mov rbx, 1000 sub rax, rbx jge sf_not_set lea rcx, sf_set_string call printf jmp end .sf_not_set: lea rcx, sf_not_set_string call printf .end: xor eax, eax ret
Much like the other translation we see two registers used, we subtract the larger value from the smaller which causes the SF flag to be set (if the result is negative), and then a jge
is encountered. This instruction expands to jump if greater or equal. If the result is greater than or equal to zero we go to the target of the jge
instruction, otherwise we execute the instructions directly following it and perform a jmp
(uncoditional jump – meaning always taken) to the return sequence.
Note: There are about 40+ jump instructions that are based on the status of different flags. These are referred to as 'jump if condition is met' instructions, or Jcc instructions for short. We'll be calling them Jcc instructions from here on.
The Carry Flag (CF – Bit 0)
The carry flag is set if the arithmetic operation generates a carry or borrow out of the most significant bit of the result. The flag indicates some overflow condition for unsigned integer operations, such as when you add 1 to the maximum supported value.
Here’s a challenge for you: write a C program that executes a block if an overflow has occurred, and generate an assembly listing to check and see if you accomplished this. To determine if the Jcc instruction generated is based on the carry flag you’ll have to consult the instruction manual.
Once you’ve done that, keep reading on to the next article of this series where we cover virtual memory, and the architectural details of memory addressing in the Intel 64 architecture.
Conclusion
In this article we’ve gone over the compilation process, albeit in very little detail, as well as learned where the registers actually come from and what those general purpose registers are. You had the chance to look at an assembly excerpt from a simple C program and see the madness that is assembly. Following the general purpose register discussion I detailed a few status flags in the EFLAGS register that will be important and consulted often during RE projects. I know that you may not be familiar with some of these concepts which is why I’ve introduced them. By the end of this section of the series teaching the fundamentals of the architecture and operating system we’re working under you’ll be able to take on a variety of projects and not be overwhelmed by the load of information poured on your head. You’ll be well equipped to deal with whatever comes at you.
Learning should be fun, but also informative, so in this series I will explain where and why things happen (much like discussing the register file before introducing registers) because I believe that knowing why things are the way they are can greatly increase understanding versus me just stating what is what with no explanation. I find that style of teaching or exposition annoying, I want the details and I want readers to have the details. If you’re new to this you don’t need to worry because the articles are only going to get longer and provide more detail than you most likely are willing to put up with. Like vegetables, the details may suck but they’re good for you. Read ’em.
I encourage you to create your own assembly listings for simple C/C++/Rust programs, and dig through them using the instruction manual and try to understand the logic. You don’t have to do anything too fancy just enough to get a taste for assembly. After all, assembly is the language you’re primarily going to be working with until we start developing tools to simplify our reverse engineering process.
Check the recommended reading section and then use the sidebar to navigate to the next part of this series!
14 thoughts on “Applied Reverse Engineering: Basic Architecture”
Hi, thanks for doing this series! One question so far. You said, “To make things easier to understand we’re going to disable all optimizations, and remove all debug information.” But you don’t tell how to do that. I’d like to be able compile the same code as you & get the same results. Thanks!
Thanks for reaching out about this. You’ll need to right click your project in VS as shown in this screenshot: https://i.imgur.com/e4VeDGz.png. Following that, under the C/C++ configuration tab, you’ll go to Optimization and set it to Disabled \Od. You’ll also set the Favor Size or Speed field to Neither, and the Whole Program Optimization field to No. The result will look like this: https://i.imgur.com/Vj49iK7.png.
To remove all debug information you’ll need to navigate to the Linker configuration tab, click to drop it down, select Debugging and set Generate Debug Info to No. This is depicted as such: https://i.imgur.com/Fh1OU2B.png.
To get the assembler listing you’ll go to Output Files under C/C++ and change the Assembler Output option to Assembly Only listing.
Hope this cleared it up. I’ll add it to the article 🙂
Thanks! One more question. When installing Visual Studio 2019, which “Workload” should I select? Or should I just select an “individual component”?
That’s really up to you. I don’t remember what the options were, but I only installed the Windows SDK and C/C++ support and the various frameworks and universal C runtime.
Hi Daax.
This sentence is confusing: “Since their result isn’t discarded and used to determine which block of code to execute (the printf’s) we have to have some method of determining which block is to be executed”
You’re right. It was meant to be ‘the’ result. I’ve modified it to make more sense. Sometimes your brain turns to mush when writing, thanks for letting me know!
Brilliant writeup, now I have something nice to read and experiment on this weekend.
Nice read. One note, this link not works anymore : http://faculty.cs.niu.edu/~mcmahon/CS241/Notes/compile.html
Damn it. That’s a shame. I’ll fix that.
Thanks, and there’s a mistyping ‘massage’.
It’s a figure of speech, but it’s meant to be there. I meant it like “coercing the compiler to …” I thought it added a little more personality to an otherwise dry explanation :p