Applied Reverse Engineering: The Stack

26 thoughts on “Applied Reverse Engineering: The Stack”

Pingback: Applied Reverse Engineering Series - Reverse Engineering
dsoumil says:

August 21, 2019 at 20:01

Hi,
Great article, but i think your first two diagrams of stack are incorrect.
When you do push 12 and push 4, in both the diagrams the 4 should be below 12 not above it as the stack moves from higher to lower memory addresses and has a LIFO structure.

Log in to Reply
1. Daax Rynd says:
  
  August 21, 2019 at 20:36
  
  Yep, you’re right. I meant to put the push 4 first and push 12. What’s funny is I wrote that the first time, and then changed it. Since I was thinking about the stack view from my usual perspective in a debugger.
  
  Thanks for pointing that out. It’s been fixed.
  
  Log in to Reply
Smoerre says:

August 24, 2019 at 09:35

Very nice article. Can’t wait to read the next one

Log in to Reply
Khios says:

September 12, 2019 at 07:40

Very nice writing, I finally understand a bit more, I’m eager for the next parts! Especially the Game one, been a dream for year 😛

Log in to Reply
brian says:

October 7, 2019 at 00:31

Thanks for this article. I have a question though; you said the stack is required to be 16 bytes so the stack pointer had to be incremented by 64(40h) `sub rsp, 40`. we have need 4*8 bytes plus 8 bytes for the return address. why didn’t we increment the stack by by 48(30h).

Log in to Reply
1. Daax Rynd says:
  
  October 22, 2019 at 18:22
  
  Hey, so the values are in hex. When doing the calculations I was using the wrong tab in the calculator and performing the ops on the hex value and made a mistake! It’s been fixed and I hope it makes sense now.
  
  Very sorry for the confusion, but thank you for bringing it to my attention.
  
  The function starts with two instructions:
  push rbp
  sub rsp, 60h
  
  Before this instruction, RSP was assumed to be 16-byte aligned as required by the x86-64 ABI before a call. However, since a call instruction implicitly pushes the return address the stack is no longer 16-byte aligned upon entry to the do_math function. This is fine as the stack is only required to be aligned prior to a call instruction (and some other instructions out of scope here). Anyways, the push rbp instruction saves the previous frame pointer onto the stack. This decrements the stack pointer (RSP) by 8 bytes. RSP is once again aligned on a 16-byte boundary. The sub rsp, 60h instruction then subtracts 96 (0x60) bytes from RSP. This allocates space for the function’s stack frame, which includes:
  
  8 bytes for the return address, which was pushed by the call instruction that invoked this function.
  8 bytes for the saved RBP value, which was just pushed in the prologue.
  36 bytes for local variables, as determined by the function’s requirements. In this case, there are 8 integer variables (32 bytes) and a 4-byte character array.
  44 bytes of padding to ensure the total stack frame size is a multiple of 16 bytes, maintaining alignment.
  
  However, the question arises: why not allocate 60 bytes (0x40) or 80 bytes (0x50) instead of 96 (0x60)? 80 bytes would still be a multiple of 16 and sufficient to hold the return address, saved RBP, and local variables. The likely reason is the need for scratch space for the printf call in the middle of the function. The x86-64 ABI requires certain registers (RCX, RDX, R8, R9) to be used for passing the first 4 arguments to functions. If the values in these registers need to be preserved or passed as additional arguments to nested calls, they must be saved somewhere. The extra 32 bytes (96 – 64 (0x40)) in the stack frame provide the necessary scratch space to save these registers before the printf call and restore them afterwards, without needing to further modify RSP.
  
  Log in to Reply
brian says:

October 7, 2019 at 01:39

you showed that, because of 16 byte alignment ABI requirement of the stack, the stack would look like 4 64-bit shadow values followed by a 64-bit padding followed by the 64-bit return value as the last thing on the stack because before the function is invoked. the code for assembly listing for main looks like such.

mov qword ptr[rsp + 24], r8
mov qword ptr[rsp + 16], rdx
mov dword ptr[rsp + 8], ecx

the ecx register (which has the first parameter of main, right?) is being placed inside the stack pointer + 8 bytes offset. this is the location where the padding is. does this cause any probably or is it a non-issue as long as there is enough shadow store to keep four parameters?

Log in to Reply
1. Daax Rynd says:
  
  October 22, 2019 at 18:39
  
  That snippet is indeed using the shadow store for those arguments. The shadow space is required, however, it doesn’t always need to be used. Does that answer your question?
  
  Log in to Reply
Blocker says:

November 16, 2019 at 03:16

Hey Daax. Thanks for taking the time to providing these awesome articles.

Just want some clarification on the 16 byte boundary stuff. Does this only matter just before you call a function? I know you made a note in red that it does but I just wanted to confirm. So for example, if in the middle of a function I push a register onto the stack (which would be 8 bytes) it is fine even though the stack is no longer aligned by a 16 byte boundary after this push?

Also i’m a little confused as to why in your example you changed the sub instruction from “sub rsp, 40” to “sub rsp, 48”. Shouldn’t it be “sub rsp, 40” because as soon as you make the call to the function, it adds the return address to the top of the stack which is another 8 bytes making it 40 + 8 = 48 bytes allocated which is a multiple of 16. If you use “sub rsp 48” and then at the call instruction add another 8 bytes for the return value it would be 56 which is not a multiple of 16.

Thanks.

Log in to Reply
1. Daax Rynd says:
  
  November 18, 2019 at 00:06
  
  Yep, looks like I didn’t read my diagram properly when the other user below mentioned his confusion. I’ll correct that, since it was how you understand it before – which is correct.
  
  Log in to Reply
joemommadaddy says:

November 22, 2019 at 04:19

Hey Daax. Thanks for taking the time to providing these awesome articles for us. you have helped me understand more into my pc to help me develop into kernel drivers

i was just wondering when you will be doing articles on Driver Development i have searched around for it but the series directory doesn’t bring me to any articles on driver development.

Log in to Reply
1. Daax Rynd says:
  
  December 5, 2019 at 18:54
  
  That’s coming in the future. It’s just there so I don’t forget what I plan to write 🙂
  
  For the time being, check out Windows Kernel Programming and Developing Windows NT Device Drivers. Great resources for driver development.
  
  Log in to Reply
Pingback: Applied Reverse Engineering: Exceptions and Interrupts - Reverse Engineering
Pingback: Applied Reverse Engineering: Accelerated Assembly [P1] - Reverse Engineering
prithi says:

April 15, 2020 at 00:28

Great article!! Even though, I have read above stack workings in multiple places, books, your article was a great refresher and helped me discover few new ideas. For instance, I do not understand opcode, and you told about significance of near call(relative startins with E8 and absolute starts have call func_name).

Also a good reminder on the shadow-space and padding along with 6-multiple rule for caller allocating shadow-space.

Log in to Reply
1. Dx says:
  
  April 15, 2020 at 00:44
  
  Thank you! I’m super glad you enjoyed it, and I greatly appreciate the feedback! I apologize for the length of some of them – they get quite thorough heh.
  
  Log in to Reply
pr0Evo says:

August 8, 2020 at 04:27

Great article but something that I just not get fully understand is why exactly 16 byte alignment was chosen and why only before calls?

Log in to Reply
queser says:

May 12, 2021 at 04:57

Hey Dx. Thanks for taking the time to providing these awesome articles.
But I had a question.
I wrote the following code on the system and checked the assembly code:
int main(int a)
{
std::cout << "Hello World!\n";
}

In the generated output, I do not understand something.
stack is not aligned on a 16-byte boundary.
disassembly output:

00007FF7F6071040 mov dword ptr [rsp+8],ecx
00007FF7F6071044 sub rsp,38h
Can you guide me?

Log in to Reply
1. Dx says:
  
  June 23, 2021 at 17:21
  
  The stack is aligned at the function body, not at the beginning. The call instruction implicitly pushes the return address onto the stack which is 8 bytes, this allocates 56 bytes on the stack + the 8 bytes for return address which is 64 bytes. 64 % 16 = 0. The requirement for the stack pointer to be 16-byte aligned is due to the potential use of SIMD instructions (more info if you research on the x86_64 ABI).
  
  Log in to Reply
ztdf123 says:

November 8, 2021 at 04:17

Best article about stack ever! It helps a lot. THANK YOU!

Log in to Reply
Kim says:

February 12, 2022 at 11:01

The image following this line “following the function prolog.” cannot be loaded.

Log in to Reply
1. Dx says:
  
  February 16, 2022 at 10:04
  
  Let me fix this. Thanks for the heads up.
  
  Log in to Reply
Pingback: Tersine Mühendislik Nedir? – HermxNotes
Pingback: My Site
Pingback: Reverse Engineering – Cybersecurity Society ATU

Original content here is published under these license terms:		X

License Type:	Read Only

License Abstract:	You may read the original content in the context in which it is published (at this web address). No other copying or use is permitted without written agreement from the author.

Daax

R&D @ Company, Inc.

Nick Peterson

Anti-Cheat Engineer @ Riot Games

Aidan Khoury

Anti-Cheat Engineer @ Riot Games

Applied Reverse Engineering: The Stack

Overview

The Stack

— What is the stack?

— Stack Layout

Calling and Returning

— The Call Instruction

— Call Stack Operations

— The Return Instruction

— Calling Conventions and the Microsoft ABI

— Fast-call Calling Convention

— Stack Frames

— Passing Large Arguments

Conclusion

Recommended Reading

Author

26 thoughts on “Applied Reverse Engineering: The Stack”

Leave a Reply Cancel reply